Jump to content
UBot Underground

how to scrape an email from a page


Recommended Posts

hello

 

i want to know how to scrape an email from a page

 

 

i watch the tutorial 11 still dont understand

 

 

example i got a list of urls .......from a list.....each url is visited and now i need to scrape the email inside the text

 

example:

 

name: john smith

 

 

 

i try using the page scrape funtion and this is what it scraped manually

 

 

'<DIV style="LINE-HEIGHT: 19px; MARGIN-LEFT: 20px"><B>john smith</B><BR>21 years, canada. soc number 1013606283<BR>birth: 15.08.1989<BR>canada canada<BR>phone. (1) 123456789 / (1) 3112048446<BR><A href="?v=b&cs=wh&to=johnsmith5@hotmail.com" target=_blank>'

 

 

how do i scrape only the email.....on all those urls visited?

Link to post
Share on other sites

i got standard

 

 

what is a regex?

 

 

what is this?

 

[a-zA-Z0-9\._\-]{3,}(@|AT|\s(at|AT)\s|\s*[\[\(\{]\s*(at|AT)\s*[\]\}\)]\s*)[a-zA-Z]{3,}(\.|DOT|\s(dot|DOT)\s|\s*[\[\(\{]\s*(dot|DOT)\s*[\]\}\)]\s^*)[a-zA-Z]{2,}((\.|DOT|\s(dot|DOT)\s|\s*[\[\(\{]\s*(dot|DOT)\s*[\]\}\)]\s*)[a-zA-Z]{2,})?$

 

 

theres no tutorial about regular expressions, so i dont know how to use regular expressions....

 

ok so you mean.... that...

 

[a-zA-Z0-9\._\-]{3,}(@|AT|\s(at|AT)\s|\s*[\[\(\{]\s*(at|AT)\s*[\]\}\)]\s*)[a-zA-Z]{3,}(\.|DOT|\s(dot|DOT)\s|\s*[\[\(\{]\s*(dot|DOT)\s*[\]\}\)]\s^*)[a-zA-Z]{2,}((\.|DOT|\s(dot|DOT)\s|\s*[\[\(\{]\s*(dot|DOT)\s*[\]\}\)]\s*)[a-zA-Z]{2,})?$

 

with that search string.....you can actually scrape any email on the body content of any page visited?

Link to post
Share on other sites

Regex is basically a way of matching (or searching) text that can cover any form that text can take. Here's Wikipedia's definition:

 

http://en.wikipedia.org/wiki/Regex

 

If that sounds complicated (people usually find it complicated at first, don't worry, you get the hang of it after some practice) think of it like this: when you use search in notepad you might look for the word "cow" and it would find "cow" if you had the sentence "The big brown cow chewed happily on some grass".

 

So let's say instead of wanting to find the word "cow" you wanted to find something that took a particular form rather than an exact word. For example, let's say you wanted to find all the words that began with "c" and had "o" as the second letter. A regular expression (or regex, same meaning) will allow you to do this. You can write a regex that will match "cow" but not match "chewed" or any other word not starting with "co". Or lets say you wanted to find all text that is an email - above is one way to do that.

 

Here's two sites that have been very helpful to me with learning regex:

 

http://www.regular-expressions.info/ - this has quick lessons and more detailed explanations of everything.

 

http://rubular.com/ - this is where you can test out regex - you enter what you want to match and it shows you what is matched as you write your regular expression.

 

I read the quickstart at the first site and then went to the second when I needed to use regex and came up with what I needed. Actually writing regex is the best way to learn it I found

Link to post
Share on other sites

i got standard

 

 

what is a regex?

 

 

what is this?

 

[a-zA-Z0-9\._\-]{3,}(@|AT|\s(at|AT)\s|\s*[\[\(\{]\s*(at|AT)\s*[\]\}\)]\s*)[a-zA-Z]{3,}(\.|DOT|\s(dot|DOT)\s|\s*[\[\(\{]\s*(dot|DOT)\s*[\]\}\)]\s^*)[a-zA-Z]{2,}((\.|DOT|\s(dot|DOT)\s|\s*[\[\(\{]\s*(dot|DOT)\s*[\]\}\)]\s*)[a-zA-Z]{2,})?$

 

 

theres no tutorial about regular expressions, so i dont know how to use regular expressions....

 

ok so you mean.... that...

 

[a-zA-Z0-9\._\-]{3,}(@|AT|\s(at|AT)\s|\s*[\[\(\{]\s*(at|AT)\s*[\]\}\)]\s*)[a-zA-Z]{3,}(\.|DOT|\s(dot|DOT)\s|\s*[\[\(\{]\s*(dot|DOT)\s*[\]\}\)]\s^*)[a-zA-Z]{2,}((\.|DOT|\s(dot|DOT)\s|\s*[\[\(\{]\s*(dot|DOT)\s*[\]\}\)]\s*)[a-zA-Z]{2,})?$

 

with that search string.....you can actually scrape any email on the body content of any page visited?

 

 

That's correct. I provided that regex string for you because it is as close to a universal string as you will find. Try it on as many pages as you like, and you should have no problems scraping emails. Enjoy.

 

John

 

 

Link to post
Share on other sites

That's correct. I provided that regex string for you because it is as close to a universal string as you will find. Try it on as many pages as you like, and you should have no problems scraping emails. Enjoy.

 

John

 

BTW, love how your regex scrapes email [AT] server [DOT] com and other ways of hiding emails JohnB :D

Link to post
Share on other sites

BTW, love how your regex scrapes email [AT] server [DOT] com and other ways of hiding emails JohnB :D

 

Thanks! http://ubotstudio.com/forum/public/style_emoticons/default/smile.gif It tries to cover them all!

 

 

 

Link to post
Share on other sites
  • 1 year later...

Is there a ubot 4 version of this? I try to use the find regular expression and the regex but I'm not sure what the text is supposed to be. Should it be the document, or should I scrape the page? It doesn't seem to grab it for some reason.

Link to post
Share on other sites
  • 1 month later...
  • 2 years later...

Anyone have new regex for scraping all type of emails from a website?

 

the codes provided are not working and i cant open a scrape emails.ubot file because it is not valid 4.0 file.

 

Please help.

Link to post
Share on other sites
  • 1 month later...

Bump lol

 

Regex found everywhere not working for emails like this Braham.Candice-GranpapaEnterprises@email.com

 

Probably because of the uppercase and dots and hypen. If someone could make a regex for this, I'd be extremely grateful. I didn't know anything about Regex as a this morning and I've been battling with it all day.

Link to post
Share on other sites

Ok looks like that email might be too hard to get... how about regext to scrape emails in this format

 

Firstname.Comapanyname@randomdomain.com

Braham.GranpapaEnterprises@email.com

 

examples as given above. Kindly note the uppercase letters.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...