wouldjaball 5 Posted May 22, 2013 Report Share Posted May 22, 2013 Hi all,I'm trying to scrape the following search string: KEYWORD "comments" site:youtube.com/userfrom google search results, but can't find a good attribute to scrape. I'd like to scrape 4 pages of results, Any ideas on what I can use as the element to scrape?I'm trying to scrape the channel name from the search results: www.youtube.com/user/optimalkeyword Thanks for your help! Quote Link to post Share on other sites
north_star 16 Posted May 22, 2013 Report Share Posted May 22, 2013 (edited) I think this should be working <cite>*</cite>And this is the Node command add list to list(%URL scrapped, $scrape attribute(<outerhtml=w"<cite>*</cite>">, "innertext"), "Delete", "Global") Edited May 22, 2013 by north_star Quote Link to post Share on other sites
wouldjaball 5 Posted May 22, 2013 Author Report Share Posted May 22, 2013 I'm not sure how that would work, can you please explain? Quote Link to post Share on other sites
north_star 16 Posted May 22, 2013 Report Share Posted May 22, 2013 Try This, it will surely help you understand itsample.ubot Quote Link to post Share on other sites
wouldjaball 5 Posted May 22, 2013 Author Report Share Posted May 22, 2013 North_Star Thanks for all of your help, but when I run that sample, I'm not seeing getting any results added to list. You can see here, debugger not showing anything in list: http://gyazo.com/84f6408d6f7cc4c3ef83f7b97b33b860 Quote Link to post Share on other sites
north_star 16 Posted May 22, 2013 Report Share Posted May 22, 2013 (edited) you dont change anything, you just use the code I give you above, and not change the add to list node, you ask me how to use it. but the main idea its still the same, this command below add list to list(%URL scrapped, $scrape attribute(<outerhtml=w"<cite>*</cite>">, "innertext"), "Delete", "Global") Edited May 22, 2013 by north_star Quote Link to post Share on other sites
ChrisDH 11 Posted May 25, 2013 Report Share Posted May 25, 2013 Yeah this is working great to scrape all of the urls from the page but it is picking up the paid ads as well. You got any ideas on how to scrape only the organic results. It seems they were really spoiling us all that time when they used the class I..... EDIT : I managed to do this with Regex to get only the organic results. Quote Link to post Share on other sites
ewideweb 0 Posted May 27, 2013 Report Share Posted May 27, 2013 Yeah this is working great to scrape all of the urls from the page but it is picking up the paid ads as well. You got any ideas on how to scrape only the organic results. It seems they were really spoiling us all that time when they used the class I..... EDIT : I managed to do this with Regex to get only the organic results. Care to share how you did that? I am not getting passed scraping all the hrefs on the page and can't seem to select the organic search results only. Quote Link to post Share on other sites
brusacco 20 Posted October 2, 2013 Report Share Posted October 2, 2013 Works for me ... add list to list(%results, $find regular expression($document text, "(?<=<h3 class=\"r\"><a href=\")(.*?)(?=\" onmousedown)"), "Delete", "Global") Quote Link to post Share on other sites
brusacco 20 Posted October 2, 2013 Report Share Posted October 2, 2013 The <cite>*</cite>one, returns sometimes bradcrumbs and all kind of odd urls ... trimmed etc. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.