alexb 2 Posted January 23, 2014 Report Share Posted January 23, 2014 I am trying to scrape profiles, but my scrape is getting "contaminated" with another profile: Example: /profile/user210/ (desired to scrape)/profile/user210/Reviews/ (I want to avoid these) The following code doesn't help me avoid those with Reviews. What might be be doing wrong? Thank you. if($not($contains($url, "Reviews"))) { then { add list to list(%url, $scrape attribute(<href=w"/profile/*/">, "fullhref"), "Delete", "Global") Quote Link to post Share on other sites
kev123 132 Posted January 23, 2014 Report Share Posted January 23, 2014 1.just to be clear are you trying to scrape the profiles on /profile/user210/ (this page only)/profile/user210/Reviews/ or 2.are you trying to scrape profiles of a page which has a combination of the below. /profile/user210/ (and you only want this)/profile/user210/Reviews/ (I want to avoid these) Quote Link to post Share on other sites
UBotDev 276 Posted January 23, 2014 Report Share Posted January 23, 2014 You can try with "$element offset" function and set offset to 0, which should retrieve the top most match. The safer way would be to write a better attribute selector or even use REGEX. Quote Link to post Share on other sites
alexb 2 Posted January 23, 2014 Author Report Share Posted January 23, 2014 I'll try the element offset to see if this works. Kev, I want to avoid anything with Reviews in the URL. Thanks! Quote Link to post Share on other sites
alexb 2 Posted January 23, 2014 Author Report Share Posted January 23, 2014 Yes, Element Offset worked like a charm! Much appreciated! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.