Regex to scrape email from page

mamica · November 7, 2014

As you see down there i needed regex for phone numbers and i got code. but i learn nothing.

Because now iw ant emails and i m facing a same problem:

in this link you see that this code clearly match emails in that generated source.

but when using in ubot as this code:

    set(#rmail, $find regular expression($scrape attribute(<tagname="html">, "innertext"), "[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"), "Global")

I get nothing inside ubot!

What is going on, is it bug or what?

UBotDev · November 7, 2014

Regex looks ok, are you sure email is displayed as text? I think it's displayed as part of code, that's why you don't get it via innertext.

Also having the page you are working on or at least HTML wold help to answer your question...

mamica · November 8, 2014

From this url: http://www.bespokehotels.com/contact

I have try to scrape innerhtm and outerhtml, but not working. Is there a way to check all forms. because soemtimes emails are not htmls codes and just plain text.

UBotDev · November 8, 2014

Actually it doesn't matter which one you scrape (inner or outer), works here without any problems (returns 3 emails):

set(#rmail, $find regular expression($scrape attribute(<tagname="html">, "innerhtml"), "[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"), "Global")

...so your first code didn't work because emails are not displayed as text.

mamica · November 8, 2014

Yea, it looks like it works. So it is best to scrape outerhtml and it should grab all emails. Now a tricky part:

How do i scrape all type of emails, you know some people type AT instead of @ and some place (.)com instead of .com

Do i search for one regex that grabs all or should i scrape each type of email individual?

UBotDev · November 8, 2014

Yea, it looks like it works. So it is best to scrape outerhtml and it should grab all emails. Now a tricky part:
How do i scrape all type of emails, you know some people type AT instead of @ and some place (.)com instead of .com

Do i search for one regex that grabs all or should i scrape each type of email individual?

It actually doesn't matter, until you get what you want. The only thing is that having one regex command could execute faster, but it would also be harder to read/maintain....

Sign In

Regex to scrape email from page

Recommended Posts

mamica 10

Link to post

Share on other sites

UBotDev 276

Link to post

Share on other sites

mamica 10

Link to post

Share on other sites

UBotDev 276

Link to post

Share on other sites

mamica 10

Link to post

Share on other sites

UBotDev 276

Link to post

Share on other sites

Join the conversation

Browse

Activity