Scraping Google Results

wouldjaball · May 22, 2013

Hi all,

I'm trying to scrape the following search string: KEYWORD "comments" site:youtube.com/user

from google search results, but can't find a good attribute to scrape. I'd like to scrape 4 pages of results, Any ideas on what I can use as the element to scrape?

I'm trying to scrape the channel name from the search results: www.youtube.com/user/optimalkeyword

Thanks for your help!

north_star · May 22, 2013

I think this should be working

<cite>*</cite>

And this is the Node command

add list to list(%URL scrapped, $scrape attribute(<outerhtml=w"<cite>*</cite>">, "innertext"), "Delete", "Global")

Edited May 22, 2013 by north_star

wouldjaball · May 22, 2013

I'm not sure how that would work, can you please explain?

north_star · May 22, 2013

Try This, it will surely help you understand it

sample.ubot

wouldjaball · May 22, 2013

North_Star Thanks for all of your help, but when I run that sample, I'm not seeing getting any results added to list.

You can see here, debugger not showing anything in list: http://gyazo.com/84f6408d6f7cc4c3ef83f7b97b33b860

north_star · May 22, 2013

you dont change anything, you just use the code I give you above, and not change the add to list node,

you ask me how to use it. but the main idea its still the same, this command below

add list to list(%URL scrapped, $scrape attribute(<outerhtml=w"<cite>*</cite>">, "innertext"), "Delete", "Global")

Edited May 22, 2013 by north_star

ChrisDH · May 25, 2013

Yeah this is working great to scrape all of the urls from the page but it is picking up the paid ads as well. You got any ideas on how to scrape only the organic results.

It seems they were really spoiling us all that time when they used the class I.....

EDIT : I managed to do this with Regex to get only the organic results.

ewideweb · May 27, 2013

Yeah this is working great to scrape all of the urls from the page but it is picking up the paid ads as well. You got any ideas on how to scrape only the organic results.

It seems they were really spoiling us all that time when they used the class I.....

EDIT : I managed to do this with Regex to get only the organic results.

Care to share how you did that? I am not getting passed scraping all the hrefs on the page and can't seem to select the organic search results only.

brusacco · October 2, 2013

Works for me ...

add list to list(%results, $find regular expression($document text, "(?<=<h3 class=\"r\"><a href=\")(.*?)(?=\" onmousedown)"), "Delete", "Global")

brusacco · October 2, 2013

The

<cite>*</cite>

one, returns sometimes bradcrumbs and all kind of odd urls ... trimmed etc.

Sign In

Scraping Google Results

Recommended Posts

wouldjaball 5

Link to post

Share on other sites

north_star 16

Link to post

Share on other sites

wouldjaball 5

Link to post

Share on other sites

north_star 16

Link to post

Share on other sites

wouldjaball 5

Link to post

Share on other sites

north_star 16

Link to post

Share on other sites

ChrisDH 11

Link to post

Share on other sites

ewideweb 0

Link to post

Share on other sites

brusacco 20

Link to post

Share on other sites

brusacco 20

Link to post

Share on other sites

Join the conversation

Browse

Activity