Jump to content
UBot Underground

Recommended Posts

I have a complex regex that I need to filter out urls from a string that look like this:

<h3 class="r"><a href="http://thaitopsites.com/" onmousedown="return rwt(this,'','','','1','AFQjCNHylaxMZzDrzrnNGQB4aFI39nOWBQ','','0CCoQFjAA','','',event)">thaitopsites.com/</a></h3>

This is from scraping Google. How can I isolate just the ending url just before the closing </a> tag?

Link to post
Share on other sites

(?=ABC)      - Positive lookahead. Matches a group after your main expression without including it in the result.
 
(?!ABC)      - Negative lookahead. Specifies a group that can not match after your main expression (ie. if it matches, the result is discarded).
 
(?<=ABC)     - Positive lookbehind. Matches a group before your main expression without including it in the result.
 
(?<!ABC)     - Negative lookbehind. Specifies a group that can not match before your main expression (ie. if it matches, the result is discarded).
 
(?<=ABC).*?(?=ABC) - Extracts the text between specified groups.

 

 

Where ABC is what you want to change. In other words pretend ABC is not there.

 

(?<=event\)">).*?(?=</a></h3>)

I escaped the ) after event and the blue is the surrounding strings.

 

http://regexhero.net/videos/

great tuts short and sweet.

 

HTH,

TC

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...