DjProg 3 Posted January 4, 2011 Report Share Posted January 4, 2011 Hello Guys, I made a bot compiling some data for different domains and I need to query google for some basic competitor data such as number of competitor with the keyword in the URL, in the title...etc I'm using 10 private proxies, i'm rotating each one for each query + cleaning cookies before each query. So basically i'm doing something like this :- take proxy 1- clear cookies- get allintitle for the keyword- take proxy 2- clear cookies- get inurl for the keyword- take proxy 3...etc...until proxy 10- then take proxy 1 again However, i'm still getting blocked by google after a minute or 2. What kind of delay do you use to avoid getting blocked ? Thanks a lot, DjProg Quote Link to post Share on other sites
iglow 8 Posted January 4, 2011 Report Share Posted January 4, 2011 Thats the main secret of good google scraping so i dont think anybody that knows will share I can say something that may not be super usefull but that i can share - set it random between 40sec-2 minutes with rotation, then shouldnt get blocked so fast. Quote Link to post Share on other sites
DjProg 3 Posted January 4, 2011 Author Report Share Posted January 4, 2011 Thats the main secret of good google scraping so i dont think anybody that knows will share LOL ! Anyway I saw a post regarding using HTTP GET instead of using Ubot browser which seems to be quite bad a hiding its footprints. I have a few ideas on how to do this so i might try if I find the time to do so. My theory (as i'm not selling bots I don't mind sharing ) is that i should be able to get almost all i need by using the open source WGET command line program. Anyway thanks for the tips, i'll give it a try if I don't get this WGET thing to work. Cheers, Quote Link to post Share on other sites
UBotBuddy 331 Posted January 4, 2011 Report Share Posted January 4, 2011 Random delays work for me as well. I generally do the same using the $rand 30 sec on the low end and 60-120 sec on the high end. Google, I think, is missing a prime opportunity at making money. Give the IM industry an api for performing searches without nasty stamping they are currently doing. I would pay for it. Quote Link to post Share on other sites
meter 145 Posted January 4, 2011 Report Share Posted January 4, 2011 Hey DjProg, WGET works great, just remember to spoof your useragent correctly, otherwise google will block you outright. -meter Quote Link to post Share on other sites
Net66 54 Posted January 4, 2011 Report Share Posted January 4, 2011 Google and Blogger (owned by google) are increasingly using useragent checks among other things. Ubot REALLY needs a way of changing the useragent. Andy Quote Link to post Share on other sites
Guerrilla 19 Posted January 5, 2011 Report Share Posted January 5, 2011 I use 5 second delays when using hrefer software and that works well running 200 or so threads with a large proxy list. I havent scraped google en masse with ubot yet but I would guess about 5 seconds between queries would be a good rule of thumb. A "randomize user agent variables between proxies" option in the option screen would be ideal if you are running lots of instances of the bot. Quote Link to post Share on other sites
Net66 54 Posted January 5, 2011 Report Share Posted January 5, 2011 For those who are having problems but whose bots do not specifically require google, Bing has an equally good advanced search, for example if you wanted to get all the dog grooming articles from ezine articles via bing you just search for "dog grooming site:ezinearticles.com" Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.