UBOTEM 18 Posted July 19, 2013 Report Share Posted July 19, 2013 Hello community, This is my first post and am new to the forums. Have studied uBot for a little while, but still a little new I'm having trouble scraping emails from a webpage...I have tried most of the regex codes in EditPad and the only one that seemed to work and highlight the email addresses was this code: \b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b But when I am using this code in uBot I dont seem to be able to scrape anything...Also tried looking with insersions like: (?<=.)\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b(?=.) Also tried it in brackets like this: (\([A-Z0-9._%-])+@([A-Z0-9.-]+)\.([A-Z]{2,4})(\ Does all regex work the same on different aplications?My codes pickup emails in Regex editors but not seem to get working in uBot ((It is a little annoying and fustrating for me, as have learnt regex quite quickly, and have produced many other scrapes using regex.And I'm quite sure I'm using uBot correctly.I have attached a screenshot of the uBot node, for people to see. Surely the code i have setup should collect all email addresses that parse my regex and add them to the list?Any help would be appreciated.I had also read this post http://www.ubotstudio.com/forum/index.php?/topic/6482-regex-code-for-email-addresses/and none of them regex codes worked for me, but the topic was mainly talking how to catch email addresses with spaces before and after the @ symbol. My regex is improving, and scraping more 'tightly', without having to clean results when I first stated playing with it. But this one is bugging me so thought would ask the forum... Thanks in advance for any help I get :~) Quote Link to post Share on other sites
UBOTEM 18 Posted July 19, 2013 Author Report Share Posted July 19, 2013 Whey!!! All fixed, restarted uBot and now working, must have been a bug of some sort.... Quote Link to post Share on other sites
UBOTEM 18 Posted July 19, 2013 Author Report Share Posted July 19, 2013 Btw now using the one from cheat sheet: (\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6}) may revert to others, but will see how this performs first Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted July 19, 2013 Report Share Posted July 19, 2013 This should help you out along your Regex path. http://regexhero.net/tester/ Look around they have cool stuff there. Also look for the HTTP plugin by Aymen has some already coded in the nodes. HTH,TC Quote Link to post Share on other sites
kev123 132 Posted July 19, 2013 Report Share Posted July 19, 2013 are you looking to grab java protected and encoded emails or just normal Quote Link to post Share on other sites
UBOTEM 18 Posted September 4, 2013 Author Report Share Posted September 4, 2013 @kev123 Just normal email addresses.... Seem to have made my own REGEX code that is working well for grabbing most emails (works better than of the REGEX cheat sheet) Quote Link to post Share on other sites
theninjamanz 29 Posted September 4, 2013 Report Share Posted September 4, 2013 STRIP TO URL:(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])? STRIP TO EMAIL:[\.\-_A-Za-z0-9]+?@[\.\-A-Za-z0-9]+?[\.A-Za-z0-9]{2,} Enjoy Quote Link to post Share on other sites
UBOTEM 18 Posted March 25, 2015 Author Report Share Posted March 25, 2015 Building a Regular Expressions & XPath Expressions guide/cheat sheet Once done will share with community! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.