allcapone1912 7 Posted April 27, 2016 Report Share Posted April 27, 2016 i am working on a script that will scrape info from whitepages.comCurrently my script work with ExBrowser but its work very slow(i use proxies) ive tried to use http get function to get info but usually its block the ip after 2-3 times(if i use browser its block ip only after 200 times) my code plugin command("HTTP post.dll", "http set headers", "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8") plugin command("HTTP post.dll", "http set headers", "Accept-Language", "en-us,en;q=0.5") set(#yp,$plugin function("HTTP post.dll", "$http get", "http://www.whitepages.com/phone/1-541-942-8510", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://www.whitepages.com/", #proxy, 5),"Global") so, i am doing something wrong?Should i add more http headers? Quote Link to post Share on other sites
Bot-Factory 602 Posted April 27, 2016 Report Share Posted April 27, 2016 i am working on a script that will scrape info from whitepages.comCurrently my script work with ExBrowser but its work very slow(i use proxies) ive tried to use http get function to get info but usually its block the ip after 2-3 times(if i use browser its block ip only after 200 times) my code plugin command("HTTP post.dll", "http set headers", "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8") plugin command("HTTP post.dll", "http set headers", "Accept-Language", "en-us,en;q=0.5") set(#yp,$plugin function("HTTP post.dll", "$http get", "http://www.whitepages.com/phone/1-541-942-8510", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://www.whitepages.com/", #proxy, 5),"Global") so, i am doing something wrong?Should i add more http headers? Hello. Please take a look at the ExBrowser Function:$ExBrowser Load HTML Page It will load the page via http get directly as well. The problem with direct http get requests is, that no javascript will be executed. And sometimes websites will use that to set some cookies. And if you then execute a direct http request without a real browser, those cookies will be missing. Because the javascript code is not executed. So here's what you can do with ExBrowser. Open Chrom or FF. And navigate to the page. Do one search there. Now all cookies are created and properly loaded. Now try to use the $ExBrowser Load HTML Page function to pull more data.That command will take over all the cookies from the browser session. "Load Html Page" will still just execute a http get request, but because you loaded the site in a real browser first, all the necessary cookies are already there and we can use them for the http get request as well. This helps IF that website uses javascript and cookies to detect bots. It will also automatically use the same useragent as you browser. You still might have to slow down your bot. If you execute to many requests in a short period of timne, that will get you blocked as well. Let me know if this helps. CheersDan 1 Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.