Hard Time Scraping Google Search Data

troyinmogi · December 18, 2010

Hi,

I'm fairly new here, so forgive me if I'm making some newbie mistakes...

Was having a hard time even getting the Google Search button to click, but after the Choose By Attribute I put a 2 second delay to allow Google time to display that "Press Enter to Search" message and that did the trick. Problem one solved...

Now, I'm trying to issue a "site:mydomain.com" command and scrape the results...

I want to #1 scrape the returned results count at the top just under the search box and #2 scrape the list of URLs (right now just trying to get it working for page #1, but ideally I want it to paginate through all results and scrape them all).

I can't get #1 or #2 to work...

I've followed the tutorial for Scrape_Chosen_Attribute with the Ezine example and even duplicated it's success. I'm using that type of an approach, but am having no luck whatsoever...

Any help would be greatly appreciated.

Also, when scraping just a single item of data, is there a way to scrape it straight to a variable? Or do you have to use a list even if it is a list of one?

Thanks

MiriamMB · December 18, 2010

Hi,

I'm fairly new here, so forgive me if I'm making some newbie mistakes...

Was having a hard time even getting the Google Search button to click, but after the Choose By Attribute I put a 2 second delay to allow Google time to display that "Press Enter to Search" message and that did the trick. Problem one solved...

Now, I'm trying to issue a "site:mydomain.com" command and scrape the results...

I want to #1 scrape the returned results count at the top just under the search box and #2 scrape the list of URLs (right now just trying to get it working for page #1, but ideally I want it to paginate through all results and scrape them all).

I can't get #1 or #2 to work...

I've followed the tutorial for Scrape_Chosen_Attribute with the Ezine example and even duplicated it's success. I'm using that type of an approach, but am having no luck whatsoever...

Any help would be greatly appreciated.

Also, when scraping just a single item of data, is there a way to scrape it straight to a variable? Or do you have to use a list even if it is a list of one?

Thanks

Choose the results, add some wildcard for the number and the seconds, and then set the scrape to a variable. If you want to watch the results on the UI you can watch the variable with the UI stat monitor.

If you want to scrape to a variable, just set a variable with the set command and place the scrape chosen attribute or the page scrape within the set command under content.

I have attached an image:

MiriamMB · December 18, 2010

As for scraping google, I was having a hard time, so I tried some regular expression and it seems to work, however you might want to read up on Regular expression a bit just so you know how to move things around in case you hit a block. Hopefully someone can show you a much easier and simpler way to get the links you want. Here is my script:

regex scraping google.ubot

Abs* · December 21, 2010

hi - Ive attached a google scraper - not sure if it will help - hope it does

One issue that im facing though is that a few of my customers who also have a google scraper are not able to scrape the serps with it

any idea

thanks

GoogleScraper.ubot

Guerrilla · December 21, 2010

To scrape links on google serp page select this outerhtml wildcard:

<A class=l *>*</A>

To scrape the page links to subsequent results use this outer html wildcard:

<A class=fl href="/search?q=*>*</A>

PS. Set advanced search options so 100 results display per page and turn google instant off.

Abs* · December 21, 2010

To scrape links on google serp page select this outerhtml wildcard:
<A class=l *>*</A>
To scrape the page links to subsequent results use this outer html wildcard:
<A class=fl href="/search?q=*>*</A>
PS. Set advanced search options so 100 results display per page and turn google instant off.

HI - yes I was using advanced search before then many users were not turning off google instant - so changed it

The thing that baffles me is that I have used the same wildcard as you have noted in the example above and it works great for me and many others - I was troubleshooting a user in india today and for some reason it jjust wouldnt scrape the results

could you test the script and see if it works for you

athanks

abs

UBotBuddy · December 21, 2010

Just so you know (if you didn't) Google changes the result pages more than any other pages. This causes more bot failures for me than anything else. IMHO Google only believes in automation on their side of the fence and not ours.

"Do as I say and Not as I do!" Google's mantra.

Abs* · December 21, 2010

lol - I agree

do you think that different Ip ranges use different id's etc

I just cant figure out why it hasnt caused an issue for me for over 4 months and still continues to work like a charm - however a few users its not working for

ALso Buddy - I have a quick question if you dont mind

When im coding using IF commands - then in the past I have been doing like the following

If (choose by attribute xxx) Then click selected

I noticed that after the upgrade some systems are not working well with this but when i coded like

If (choose by attribute xxx) Then (choose by attribute xxx) click selected

and it solved the issue

I have coded the majority of my bots according to the first example - I need to get an update out and im terrified that I will be hit with thousands of support queries

do you know if this has been noted before - or can I assume it was a glitch

UBotBuddy · December 21, 2010

abs,

I cannot say as I dont code the way you do. I typically search and then choose by attribut.

IF (SEARCH) THEN (Choose by attribute & Click) Else (Choose by attribut etc)

Abs* · December 21, 2010

hmm - ok -well it was working like a charm -and still does for me - and it also allowed me not to duplicate the if choose by attribute

the problem that I have is that my bot is big and keeps crashing while i try to change

I mean its taken me like 1 hours to just rmeove the branding and add a splash page

thanks

UBotBuddy · December 21, 2010

Wow!

That's a big bot if it takes that long.

Abs* · December 21, 2010

well - its not that its taking long to save - its just that every time i try to save it crashes - and its become a hit and miss saving now

the bot is autobacklinkbomb.com

thanks

Guerrilla · December 21, 2010

If you can show me the html source code that doesn't scrape then I can see if I can work it out.

It should be as simple as setting up an if statement that detects what format the results page is in.

meter · December 21, 2010

I've coded a google scraper the past couple of days. I noticed there are 3 different kinds of SERPs pages it returns, depending on what datacenter you query. Get all 3 down and your scraper should start working fine

Abs* · December 24, 2010

I've coded a google scraper the past couple of days. I noticed there are 3 different kinds of SERPs pages it returns, depending on what datacenter you query. Get all 3 down and your scraper should start working fine

hi - any idea what the 3 different queries are? or how i can access the different datacenteres to check

the current code that im using is the following

thanks

UBotBuddy · December 24, 2010

*

Also, as someone mentioned in another thread, I also agree that Google probably has other varying Datacenter code for their search engine rendering. That's probably why mimicking other people's versions is so difficult.

Guerrilla · December 24, 2010

You could try the "google global" firefox plugin for getting different countries results pages. That might help you see some variations.

Hard Time Scraping Google Search Data

Recommended Posts

troyinmogi 0

Link to post

Share on other sites

MiriamMB 63

Link to post

Share on other sites

MiriamMB 63

Link to post

Share on other sites

Abs* 12

Link to post

Share on other sites

Guerrilla 19

Link to post

Share on other sites

Abs* 12

Link to post

Share on other sites

UBotBuddy 331

Link to post

Share on other sites

Abs* 12

Link to post

Share on other sites

UBotBuddy 331

Link to post

Share on other sites

Abs* 12

Link to post

Share on other sites

UBotBuddy 331

Link to post

Share on other sites

Abs* 12

Link to post

Share on other sites

Guerrilla 19

Link to post

Share on other sites

meter 145

Link to post

Share on other sites

Abs* 12

Link to post

Share on other sites

UBotBuddy 331

Link to post

Share on other sites

Guerrilla 19

Link to post

Share on other sites

Join the conversation