Jump to content
UBot Underground

Delay settings to query Google without getting blocked ?


Recommended Posts

Hello Guys,

 

I made a bot compiling some data for different domains and I need to query google for some basic competitor data such as number of competitor with the keyword in the URL, in the title...etc

 

I'm using 10 private proxies, i'm rotating each one for each query + cleaning cookies before each query.

 

So basically i'm doing something like this :

- take proxy 1

- clear cookies

- get allintitle for the keyword

- take proxy 2

- clear cookies

- get inurl for the keyword

- take proxy 3...etc

...until proxy 10

- then take proxy 1 again

 

However, i'm still getting blocked by google after a minute or 2. :angry:

 

What kind of delay do you use to avoid getting blocked ?

 

Thanks a lot,

 

DjProg

Link to post
Share on other sites

Thats the main secret of good google scraping so i dont think anybody that knows will share :)

I can say something that may not be super usefull but that i can share - set it random between 40sec-2 minutes with rotation, then shouldnt get blocked so fast.

Link to post
Share on other sites

Thats the main secret of good google scraping so i dont think anybody that knows will share :)

 

LOL !

 

Anyway I saw a post regarding using HTTP GET instead of using Ubot browser which seems to be quite bad a hiding its footprints. I have a few ideas on how to do this so i might try if I find the time to do so.

 

My theory (as i'm not selling bots I don't mind sharing :rolleyes: ) is that i should be able to get almost all i need by using the open source WGET command line program. :P

 

Anyway thanks for the tips, i'll give it a try if I don't get this WGET thing to work.

 

Cheers,

Link to post
Share on other sites

Random delays work for me as well. I generally do the same using the $rand 30 sec on the low end and 60-120 sec on the high end.

 

Google, I think, is missing a prime opportunity at making money. Give the IM industry an api for performing searches without nasty stamping they are currently doing. I would pay for it.

Link to post
Share on other sites

Google and Blogger (owned by google) are increasingly using useragent checks among other things. Ubot REALLY needs a way of changing the useragent.

 

Andy

Link to post
Share on other sites

I use 5 second delays when using hrefer software and that works well running 200 or so threads with a large proxy list.

 

I havent scraped google en masse with ubot yet but I would guess about 5 seconds between queries would be a good rule of thumb.

 

A "randomize user agent variables between proxies" option in the option screen would be ideal if you are running lots of instances of the bot.

Link to post
Share on other sites

For those who are having problems but whose bots do not specifically require google, Bing has an equally good advanced search, for example if you wanted to get all the dog grooming articles from ezine articles via bing you just search for "dog grooming site:ezinearticles.com"

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...