SmileyBot 13 Posted January 26, 2016 Report Share Posted January 26, 2016 Hey guys regex always does my head in, can someone please help I want to scrape whois data from: whois.ausregistry.com.au/whois/whois_local.jsp I want to regex find only the first email address occurrence this regex works in Regex builder but not in the bot: ^[^.!?]*[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4} I also want to match the second email address occurrence but I have not got past the first..lol Thanks set(#rmail,$find regular expression($scrape attribute(<tagname="html">,"innerhtml"),"^[^.!?]*[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"),"Global") Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted January 26, 2016 Report Share Posted January 26, 2016 Do it like this set(#firstEmail,$list item($find regular expression($document text,"^[^.!?]*[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"),0),"Global") Quote Link to post Share on other sites
SmileyBot 13 Posted January 26, 2016 Author Report Share Posted January 26, 2016 wow...i never thought of doing it like that, thanks mate, but it doesn't seem to work Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted January 26, 2016 Report Share Posted January 26, 2016 Just change the $document test to the text you want to get match the regex. Quote Link to post Share on other sites
SmileyBot 13 Posted January 26, 2016 Author Report Share Posted January 26, 2016 Thanks for the help, i got the result i wanted but It seems like the incorrect way of doing things, but either way i got a result I'm not even going to try and regex the second email address SOLVED I first set a regex to find and scrape the page to get both emails from the page set(#rmail,$find regular expression($scrape attribute(<tagname="html">,"innerhtml"),"[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"),"Global") I then set a regex to find and scrape just the first result from the first set's results set(#firstEmail,$list item($find regular expression(#rmail,"^[^.!?]*([\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\})"),0),"Global") Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted January 26, 2016 Report Share Posted January 26, 2016 Yeah. it depends on how you scrape the target text to match. Quote Link to post Share on other sites
vinnyuk 1 Posted October 29, 2016 Report Share Posted October 29, 2016 Do it like this set(#firstEmail,$list item($find regular expression($document text,"^[^.!?]*[\\w\\-][\\w\\-\\.]+@[\\w\\-][\\w\\-\\.]+[a-zA-Z]\{1,4\}"),0),"Global") This is very smart, It has helped me out also with another piece of regex where I only wanted the first instance Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.