jonchiehlau 0 Posted April 11, 2013 Report Share Posted April 11, 2013 I have a list of scraped hrefs and I want to extract the urls from them. The list looks like <a href="https://www.facebook.com/isu.iv?ref=br_rs">ISU InterVarsity</a> <a href="https://www.facebook.com/pages/WSU-InterVarsity/149571488395581?ref=br_rs">WSU InterVarsity</a> <a href="https://www.facebook.com/pages/Lindenwood-InterVarsity/133851456707688?ref=br_rs">Lindenwood InterVarsity</a> <a href="https://www.facebook.com/pages/Intervarsity-Memes/171459952966826?ref=br_rs">Intervarsity Memes</a> I want to extract links like: https://www.facebook.com/pages/Lindenwood-InterVarsity/133851456707688?ref=br_rs I have tried using REGEX for the past day and can't get it. Quote Link to post Share on other sites
a2mateit 395 Posted April 11, 2013 Report Share Posted April 11, 2013 Why don't you scrape only the href when scraping? Would cut out the need to use regex all together. Quote Link to post Share on other sites
jonchiehlau 0 Posted April 11, 2013 Author Report Share Posted April 11, 2013 In order to get the correct links, I needed to go up one level and select by <div> and scrape the innerhtml. Normally I would just scrape the href. Quote Link to post Share on other sites
zdot 14 Posted April 11, 2013 Report Share Posted April 11, 2013 try this: (?<=<a href=\").*(?=\">) 1 Quote Link to post Share on other sites
jonchiehlau 0 Posted April 12, 2013 Author Report Share Posted April 12, 2013 Worked like a charm zdot! Thanks! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.