mamica 10 Posted November 7, 2014 Report Share Posted November 7, 2014 As you see down there i needed regex for phone numbers and i got code. but i learn nothing. Because now iw ant emails and i m facing a same problem:http://rubular.com/r/bMk074667iin this link you see that this code clearly match emails in that generated source.but when using in ubot as this code: set(#rmail, $find regular expression($scrape attribute(<tagname="html">, "innertext"), "[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"), "Global") I get nothing inside ubot! What is going on, is it bug or what? Quote Link to post Share on other sites
UBotDev 276 Posted November 7, 2014 Report Share Posted November 7, 2014 Regex looks ok, are you sure email is displayed as text? I think it's displayed as part of code, that's why you don't get it via innertext. Also having the page you are working on or at least HTML wold help to answer your question... Quote Link to post Share on other sites
mamica 10 Posted November 8, 2014 Author Report Share Posted November 8, 2014 From this url: http://www.bespokehotels.com/contact I have try to scrape innerhtm and outerhtml, but not working. Is there a way to check all forms. because soemtimes emails are not htmls codes and just plain text. Quote Link to post Share on other sites
UBotDev 276 Posted November 8, 2014 Report Share Posted November 8, 2014 Actually it doesn't matter which one you scrape (inner or outer), works here without any problems (returns 3 emails): set(#rmail, $find regular expression($scrape attribute(<tagname="html">, "innerhtml"), "[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"), "Global")...so your first code didn't work because emails are not displayed as text. Quote Link to post Share on other sites
mamica 10 Posted November 8, 2014 Author Report Share Posted November 8, 2014 Yea, it looks like it works. So it is best to scrape outerhtml and it should grab all emails. Now a tricky part:How do i scrape all type of emails, you know some people type AT instead of @ and some place (.)com instead of .com Do i search for one regex that grabs all or should i scrape each type of email individual? Quote Link to post Share on other sites
UBotDev 276 Posted November 8, 2014 Report Share Posted November 8, 2014 Yea, it looks like it works. So it is best to scrape outerhtml and it should grab all emails. Now a tricky part:How do i scrape all type of emails, you know some people type AT instead of @ and some place (.)com instead of .com Do i search for one regex that grabs all or should i scrape each type of email individual? It actually doesn't matter, until you get what you want. The only thing is that having one regex command could execute faster, but it would also be harder to read/maintain.... 1 Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.