Chris M 55 Posted July 24, 2013 Report Share Posted July 24, 2013 I have a complex regex that I need to filter out urls from a string that look like this: <h3 class="r"><a href="http://thaitopsites.com/" onmousedown="return rwt(this,'','','','1','AFQjCNHylaxMZzDrzrnNGQB4aFI39nOWBQ','','0CCoQFjAA','','',event)">thaitopsites.com/</a></h3> This is from scraping Google. How can I isolate just the ending url just before the closing </a> tag? Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted July 24, 2013 Report Share Posted July 24, 2013 (?=ABC) - Positive lookahead. Matches a group after your main expression without including it in the result. (?!ABC) - Negative lookahead. Specifies a group that can not match after your main expression (ie. if it matches, the result is discarded). (?<=ABC) - Positive lookbehind. Matches a group before your main expression without including it in the result. (?<!ABC) - Negative lookbehind. Specifies a group that can not match before your main expression (ie. if it matches, the result is discarded). (?<=ABC).*?(?=ABC) - Extracts the text between specified groups. Where ABC is what you want to change. In other words pretend ABC is not there. (?<=event\)">).*?(?=</a></h3>)I escaped the ) after event and the blue is the surrounding strings. http://regexhero.net/videos/great tuts short and sweet. HTH,TC 1 Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted July 24, 2013 Report Share Posted July 24, 2013 The surrounding string obviously must be unique to the URL you want. Quote Link to post Share on other sites
Chris M 55 Posted July 24, 2013 Author Report Share Posted July 24, 2013 Thanks man! I added (?<=event\)">).*?(?=/</a></h3>) from you example and it's perfect now Quote Link to post Share on other sites
peleus 2 Posted July 27, 2013 Report Share Posted July 27, 2013 Nice to see this working already. I'm also going to try this out as I'm looking to set up something like this. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.