Biks 9 Posted March 4, 2016 Report Share Posted March 4, 2016 Anyone ever come up with a regex string that grabs obscured emails like these? abookgeek [at] gmail [dot] comactinupwb (at) gmail (dot) comdaisyjdebruin(at)gmail(dot)comctrost (at) hotmail (dot) com The (at) and [at] part seems pretty consistent. It seems I can just grab any character or space out from there in both directions and manually clean it up later. I'm trying to find if somebody wants to be contacted, then I'll go back and review their website and start a legit conversation. I'm guess it would look like: find (at) OR [at]to the right, grab everything to 'com OR org OR net'to the left, just grab everything out to 20 characters or so. (unless someone could think of a better way) Quote Link to post Share on other sites
HelloInsomnia 1103 Posted March 4, 2016 Report Share Posted March 4, 2016 This should get those but not regular emails, let me know if it works I can't open Ubot at the moment [a-zA-Z0-9\+\-_]+[\s\[\(]{1,2}at[\s\]\)]{1,2}[a-zA-Z0-9\-]+[\s\[\(]{1,2}dot[\s\]\)]{1,2}[a-zA-Z]{2,4} Quote Link to post Share on other sites
Biks 9 Posted March 4, 2016 Author Report Share Posted March 4, 2016 [a-zA-Z0-9\+\-_]+[\s\[\(]{1,2}at[\s\]\)]{1,2}[a-zA-Z0-9\-]+[\s\[\(]{1,2}dot[\s\]\)]{1,2}[a-zA-Z]{2,4} Yea baby! It works. I found some people doing this: grownupbookreviews AT gmail.com The regex above wants to find a DOT next to the com. I wonder what you can do with just an 'AT'. Quote Link to post Share on other sites
Biks 9 Posted March 4, 2016 Author Report Share Posted March 4, 2016 OK, while we're on the subject - how about finding SUBMIT buttons? This implies that there's a form on the page, and they want to be contacted through that. Here are some examples: <input class="contact-form-button contact-form-button-submit" id="ContactForm1_contact-form-submit" value="Submit" type="button"><input name="submit" value="Submit" id="ss-submit" class="jfk-button jfk-button-action " type="submit"><input class="formdefaultbut" id="id123-button-send" onclick=" this.style.display='none'; insertPleaseWaitDiv(this,'Please wait...'); " style="background-color: #c80042; padding: 3px 10px;" value="Send Email" type="submit"><input name="submit" value="Submit" id="ss-submit" type="submit"><input class="class123-button" id="id123-button-send" value="Send email" type="submit"> I'm guessing the regex would look like this: Match <input through any character/space to id=" through any character/space to type="submit" Quote Link to post Share on other sites
deliter 203 Posted March 4, 2016 Report Share Posted March 4, 2016 those input fields are probably inside of a <form> tags so if($exists(<tagname="form">)) { then { alert("do something") } } just on the html of your example this should always work but it mightnt be very effective in practice load html("<input class=\"contact-form-button contact-form-button-submit\" id=\"ContactForm1_contact-form-submit\" value=\"Submit\" type=\"button\"> <input name=\"submit\" value=\"Submit\" id=\"ss-submit\" class=\"jfk-button jfk-button-action \" type=\"submit\"> <input class=\"formdefaultbut\" id=\"id123-button-send\" onclick=\" this.style.display=\'none\'; insertPleaseWaitDiv(this,\'Please wait...\'); \" style=\"background-color: #c80042; padding: 3px 10px;\" value=\"Send Email\" type=\"submit\"> <input name=\"submit\" value=\"Submit\" id=\"ss-submit\" type=\"submit\"> <input class=\"class123-button\" id=\"id123-button-send\" value=\"Send email\" type=\"submit\">") wait for browser event("Everything Loaded","") if($exists(<tagname="input">) OR $exists(<tagname="button">)) { then { add list to list(%html,$scrape attribute(<tagname="input">,"outerhtml"),"Delete","Global") add list to list(%html,$scrape attribute(<tagname="button">,"outerhtml"),"Delete","Global") if($comparison($find regular expression(%html,"(?i)submit"),">",$nothing)) { then { alert("found") } } } } Quote Link to post Share on other sites
Biks 9 Posted March 5, 2016 Author Report Share Posted March 5, 2016 A recent trick I've been using is to let Scrapbox do the searching for me. Man, I can find out if 200-300 urls have an email in them within 20 seconds. (off of 25 proxies) I was hoping I could figure out the regex for my search string. Didn't think about merely looking for a FORM tag. Give that a go. Quote Link to post Share on other sites
HelloInsomnia 1103 Posted March 5, 2016 Report Share Posted March 5, 2016 Yea baby! It works. I found some people doing this: grownupbookreviews AT gmail.com The regex above wants to find a DOT next to the com. I wonder what you can do with just an 'AT'. Try this: [a-zA-Z0-9\+\-_]+([\s\[\(]{1,2}|)[aAtT]+([\s\]\)]{1,2}|)[a-zA-Z0-9\-]+([\s\[\(]{1,2}[dDoOtT]+[\s\]\)]{1,2}|\.)[a-zA-Z]{2,4} Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.