Jump to content
UBot Underground

Regex to scrape email from page


Recommended Posts

As you see down there i needed regex for phone numbers and i got code. but i learn nothing.

 

Because now iw ant emails and i m facing a same problem:

http://rubular.com/r/bMk074667i

in this link you see that this code clearly match emails in that generated source.

but when using in ubot as this code:

    set(#rmail, $find regular expression($scrape attribute(<tagname="html">, "innertext"), "[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"), "Global")
 

I get nothing inside ubot!

 

What is going on, is it bug or what?

Link to post
Share on other sites

Regex looks ok, are you sure email is displayed as text? I think it's displayed as part of code, that's why you don't get it via innertext.

 

Also having the page you are working on or at least HTML wold help to answer your question...

Link to post
Share on other sites

Actually it doesn't matter which one you scrape (inner or outer), works here without any problems (returns 3 emails):

set(#rmail, $find regular expression($scrape attribute(<tagname="html">, "innerhtml"), "[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"), "Global")

...so your first code didn't work because emails are not displayed as text.

Link to post
Share on other sites

Yea, it looks like it works. So it is best to scrape outerhtml and it should grab all emails. Now a tricky part:

How do i scrape all type of emails, you know some people type AT instead of @ and some place (.)com instead of .com

 

Do i search for one regex that grabs all or should i scrape each type of email individual?

Link to post
Share on other sites

Yea, it looks like it works. So it is best to scrape outerhtml and it should grab all emails. Now a tricky part:

How do i scrape all type of emails, you know some people type AT instead of @ and some place (.)com instead of .com

 

Do i search for one regex that grabs all or should i scrape each type of email individual?

 

It actually doesn't matter, until you get what you want. The only thing is that having one regex command could execute faster, but it would also be harder to read/maintain....

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...