Jump to content
UBot Underground

Scrape Emails With Http Post Plugin


Recommended Posts

i just bought HTTP POST plugin and try to update my old script used for scraping emails

 

in the old script everything work fine but with HTTP POST there is some problem

 

one of them are hidden javascript emails

add item to list(%http second url,$plugin function("HTTP post.dll", "$http get", "http://www.jc-design.com/contact-us.html", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://google.com", "", 10),"Delete","Global")
set(#http second url,%http second url,"Global")
add list to list(%emails,$find regular expression(#http second url,"(?i)\\b[!#$%&\'*+./0-9=?_`a-z\{|\}~^-]+@[.0-9a-z-]+\\.[a-z]\{2,6\}\\b"),"Delete","Global")

and for this url HTTP POST dont scrape the email

 

can someone give me an idea how to scrape it?

i am not an expert in javascript...

Link to post
Share on other sites

Just an FYI, no need to add it a list you can just add the get request straight into the variable.

set(#http second url,$plugin function("HTTP post.dll", "$http get", "http://www.jc-design.com/contact-us.html", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://google.com", "", 10),"Global")
add list to list(%emails,$find regular expression(#http second url,"(?i)\\b[!#$%&\'*+./0-9=?_`a-z\{|\}~^-]+@[.0-9a-z-]+\\.[a-z]\{2,6\}\\b"),"Delete","Global")

As for the JS part, it appears for some reason that JS is creating that line with the email address, not sure why but you can see this line in the get request:

<br /><script type="text/javascript">insertEmailAddress('','webinfo','jc-design.com','');</script>

Now if you had to scrape many pages with this same kind of thing it would be possible to scrape out that info because it just puts the webinfo together with the jc-design.com with an @ symbol.

 

Give this a shot:

clear list(%emails)
set(#http second url,$plugin function("HTTP post.dll", "$http get", "http://www.jc-design.com/contact-us.html", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://google.com", "", 10),"Global")
set(#email_line,$plugin function("HTTP post.dll", "$xpath parser", #http second url, "//div[@id=\'main-wrapper\']/div/script", "InnerHtml", "HTML"),"Global")
set(#email_line_cleanup,$replace($replace(#email_line,"insertEmailAddress(\'\',",$nothing),",\'\');",$nothing),"Global")
clear list(%split_email)
add list to list(%split_email,$list from text($replace(#email_line_cleanup,"\'",$nothing),","),"Delete","Global")
add item to list(%emails,"{$list item(%split_email,0)}@{$list item(%split_email,1)}","Don\'t Delete","Global")
  • Like 1
Link to post
Share on other sites

 

Just an FYI, no need to add it a list you can just add the get request straight into the variable.

set(#http second url,$plugin function("HTTP post.dll", "$http get", "http://www.jc-design.com/contact-us.html", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://google.com", "", 10),"Global")
add list to list(%emails,$find regular expression(#http second url,"(?i)\\b[!#$%&\'*+./0-9=?_`a-z\{|\}~^-]+@[.0-9a-z-]+\\.[a-z]\{2,6\}\\b"),"Delete","Global")

As for the JS part, it appears for some reason that JS is creating that line with the email address, not sure why but you can see this line in the get request:

<br /><script type="text/javascript">insertEmailAddress('','webinfo','jc-design.com','');</script>

Now if you had to scrape many pages with this same kind of thing it would be possible to scrape out that info because it just puts the webinfo together with the jc-design.com with an @ symbol.

 

Give this a shot:

clear list(%emails)
set(#http second url,$plugin function("HTTP post.dll", "$http get", "http://www.jc-design.com/contact-us.html", $plugin function("HTTP post.dll", "$http useragent string", "Random"), "http://google.com", "", 10),"Global")
set(#email_line,$plugin function("HTTP post.dll", "$xpath parser", #http second url, "//div[@id=\'main-wrapper\']/div/script", "InnerHtml", "HTML"),"Global")
set(#email_line_cleanup,$replace($replace(#email_line,"insertEmailAddress(\'\',",$nothing),",\'\');",$nothing),"Global")
clear list(%split_email)
add list to list(%split_email,$list from text($replace(#email_line_cleanup,"\'",$nothing),","),"Delete","Global")
add item to list(%emails,"{$list item(%split_email,0)}@{$list item(%split_email,1)}","Don\'t Delete","Global")

thanks for your reply and for both advice

 

i just bought http plugin and i not totally familiar with this plugin and by mistake use an useless Add to list

 

also, your code for scraping email its great, just hope will be many urls like this one in order to get all possible emails

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...