Jump to content
UBot Underground

Hard Time Scraping Google Search Data


Recommended Posts

Hi,

 

I'm fairly new here, so forgive me if I'm making some newbie mistakes...

 

Was having a hard time even getting the Google Search button to click, but after the Choose By Attribute I put a 2 second delay to allow Google time to display that "Press Enter to Search" message and that did the trick. Problem one solved...

 

Now, I'm trying to issue a "site:mydomain.com" command and scrape the results...

 

I want to #1 scrape the returned results count at the top just under the search box and #2 scrape the list of URLs (right now just trying to get it working for page #1, but ideally I want it to paginate through all results and scrape them all).

 

I can't get #1 or #2 to work...

 

I've followed the tutorial for Scrape_Chosen_Attribute with the Ezine example and even duplicated it's success. I'm using that type of an approach, but am having no luck whatsoever...

 

Any help would be greatly appreciated.

 

Also, when scraping just a single item of data, is there a way to scrape it straight to a variable? Or do you have to use a list even if it is a list of one?

 

Thanks

Link to post
Share on other sites

Hi,

 

I'm fairly new here, so forgive me if I'm making some newbie mistakes...

 

Was having a hard time even getting the Google Search button to click, but after the Choose By Attribute I put a 2 second delay to allow Google time to display that "Press Enter to Search" message and that did the trick. Problem one solved...

 

Now, I'm trying to issue a "site:mydomain.com" command and scrape the results...

 

I want to #1 scrape the returned results count at the top just under the search box and #2 scrape the list of URLs (right now just trying to get it working for page #1, but ideally I want it to paginate through all results and scrape them all).

 

I can't get #1 or #2 to work...

 

I've followed the tutorial for Scrape_Chosen_Attribute with the Ezine example and even duplicated it's success. I'm using that type of an approach, but am having no luck whatsoever...

 

Any help would be greatly appreciated.

 

Also, when scraping just a single item of data, is there a way to scrape it straight to a variable? Or do you have to use a list even if it is a list of one?

 

Thanks

 

 

 

Choose the results, add some wildcard for the number and the seconds, and then set the scrape to a variable. If you want to watch the results on the UI you can watch the variable with the UI stat monitor.

If you want to scrape to a variable, just set a variable with the set command and place the scrape chosen attribute or the page scrape within the set command under content.

 

I have attached an image: search results scrape.jpg

Link to post
Share on other sites

As for scraping google, I was having a hard time, so I tried some regular expression and it seems to work, however you might want to read up on Regular expression a bit just so you know how to move things around in case you hit a block. Hopefully someone can show you a much easier and simpler way to get the links you want. Here is my script:

 

regex scraping google.ubot

Link to post
Share on other sites

hi - Ive attached a google scraper - not sure if it will help - hope it does

 

One issue that im facing though is that a few of my customers who also have a google scraper are not able to scrape the serps with it

 

any idea

 

thanks

GoogleScraper.ubot

Link to post
Share on other sites

To scrape links on google serp page select this outerhtml wildcard:

 

<A class=l *>*</A>

 

To scrape the page links to subsequent results use this outer html wildcard:

 

<A class=fl href="/search?q=*>*</A>

 

PS. Set advanced search options so 100 results display per page and turn google instant off.

Link to post
Share on other sites

To scrape links on google serp page select this outerhtml wildcard:

 

<A class=l *>*</A>

 

To scrape the page links to subsequent results use this outer html wildcard:

 

<A class=fl href="/search?q=*>*</A>

 

PS. Set advanced search options so 100 results display per page and turn google instant off.

 

HI - yes I was using advanced search before then many users were not turning off google instant - so changed it

 

The thing that baffles me is that I have used the same wildcard as you have noted in the example above and it works great for me and many others - I was troubleshooting a user in india today and for some reason it jjust wouldnt scrape the results

 

could you test the script and see if it works for you

 

athanks

 

abs

Link to post
Share on other sites

Just so you know (if you didn't) Google changes the result pages more than any other pages. This causes more bot failures for me than anything else. IMHO Google only believes in automation on their side of the fence and not ours.

 

"Do as I say and Not as I do!" Google's mantra.

Link to post
Share on other sites

lol - I agree

 

do you think that different Ip ranges use different id's etc

 

I just cant figure out why it hasnt caused an issue for me for over 4 months and still continues to work like a charm - however a few users its not working for

 

ALso Buddy - I have a quick question if you dont mind

 

When im coding using IF commands - then in the past I have been doing like the following

 

If (choose by attribute xxx) Then click selected

 

I noticed that after the upgrade some systems are not working well with this but when i coded like

 

If (choose by attribute xxx) Then (choose by attribute xxx) click selected

 

and it solved the issue

 

I have coded the majority of my bots according to the first example - I need to get an update out and im terrified that I will be hit with thousands of support queries

 

do you know if this has been noted before - or can I assume it was a glitch

Link to post
Share on other sites

hmm - ok -well it was working like a charm -and still does for me - and it also allowed me not to duplicate the if choose by attribute

 

the problem that I have is that my bot is big and keeps crashing while i try to change

 

I mean its taken me like 1 hours to just rmeove the branding and add a splash page

 

thanks

Link to post
Share on other sites

well - its not that its taking long to save - its just that every time i try to save it crashes - and its become a hit and miss saving now

 

the bot is autobacklinkbomb.com

 

thanks

Link to post
Share on other sites

I've coded a google scraper the past couple of days. I noticed there are 3 different kinds of SERPs pages it returns, depending on what datacenter you query. Get all 3 down and your scraper should start working fine :)

Link to post
Share on other sites

I've coded a google scraper the past couple of days. I noticed there are 3 different kinds of SERPs pages it returns, depending on what datacenter you query. Get all 3 down and your scraper should start working fine :)

 

hi - any idea what the 3 different queries are? or how i can access the different datacenteres to check

 

the current code that im using is the following

 

<A class=l*</A>

 

thanks

Link to post
Share on other sites

*

 

*

 

Also, as someone mentioned in another thread, I also agree that Google probably has other varying Datacenter code for their search engine rendering. That's probably why mimicking other people's versions is so difficult.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...