Jump to content
UBot Underground

Recommended Posts

i am working on a script that will scrape info from whitepages.com

Currently my script work with ExBrowser but its work very slow(i use proxies)

 

ive tried to use http get function to get info but usually its block the ip after 2-3 times(if i use browser its block ip only after 200 times)

 

my code

plugin command("HTTP post.dll", "http set headers", "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
plugin command("HTTP post.dll", "http set headers", "Accept-Language", "en-us,en;q=0.5")
set(#yp,$plugin function("HTTP post.dll", "$http get", "http://www.whitepages.com/phone/1-541-942-8510", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://www.whitepages.com/", #proxy, 5),"Global")

so, i am doing something wrong?

Should i add more http headers?

Link to post
Share on other sites

i am working on a script that will scrape info from whitepages.com

Currently my script work with ExBrowser but its work very slow(i use proxies)

 

ive tried to use http get function to get info but usually its block the ip after 2-3 times(if i use browser its block ip only after 200 times)

 

my code

plugin command("HTTP post.dll", "http set headers", "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
plugin command("HTTP post.dll", "http set headers", "Accept-Language", "en-us,en;q=0.5")
set(#yp,$plugin function("HTTP post.dll", "$http get", "http://www.whitepages.com/phone/1-541-942-8510", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://www.whitepages.com/", #proxy, 5),"Global")

so, i am doing something wrong?

Should i add more http headers?

 

Hello.

 

Please take a look at the ExBrowser Function:

$ExBrowser Load HTML Page

 

It will load the page via http get directly as well.

 

The problem with direct http get requests is, that no javascript will be executed. 

And sometimes websites will use that to set some cookies. And if you then execute a direct http request without a real browser, those cookies will be missing. 

Because the javascript code is not executed.

 

So here's what you can do with ExBrowser.

 

Open Chrom or FF. And navigate to the page. Do one search there. Now all cookies are created and properly loaded.

 

Now try to use the $ExBrowser Load HTML Page  function to pull more data.

That command will take over all the cookies from the browser session. "Load Html Page" will still just execute a http get request, but because you loaded the site in a real browser first, all the necessary cookies are already there and we can use them for the http get request as well.

 

This helps IF that website uses javascript and cookies to detect bots. 

It will also automatically use the same useragent as you browser. 

 

You still might have to slow down your bot. If you execute to many requests in a short period of timne, that will get you blocked as well.

 

Let me know if this helps.

 

Cheers

Dan

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...