Jump to content
UBot Underground

Help: Scraping Obscured Emails -> (At) Gmail (Dot) Com


Recommended Posts

Anyone ever come up with a regex string that grabs obscured emails like these?

 

abookgeek [at] gmail [dot] com
actinupwb (at) gmail (dot) com
daisyjdebruin(at)gmail(dot)com
ctrost (at) hotmail (dot) com

 

The (at) and [at] part seems pretty consistent. It seems I can just grab any character or space out from there in both directions and manually clean it up later. I'm trying to find if somebody wants to be contacted, then I'll go back and review their website and start a legit conversation.

 

I'm guess it would look like:

 

find (at) OR [at]

to the right, grab everything to 'com OR org OR net'

to the left, just grab everything out to 20 characters or so. (unless someone could think of a better way)

Link to post
Share on other sites

This should get those but not regular emails, let me know if it works I can't open Ubot at the moment

[a-zA-Z0-9\+\-_]+[\s\[\(]{1,2}at[\s\]\)]{1,2}[a-zA-Z0-9\-]+[\s\[\(]{1,2}dot[\s\]\)]{1,2}[a-zA-Z]{2,4}
Link to post
Share on other sites

 

[a-zA-Z0-9\+\-_]+[\s\[\(]{1,2}at[\s\]\)]{1,2}[a-zA-Z0-9\-]+[\s\[\(]{1,2}dot[\s\]\)]{1,2}[a-zA-Z]{2,4}

 

Yea baby! It works.

 

I found some people doing this: grownupbookreviews AT gmail.com

 

The regex above wants to find a  DOT next to the com. I wonder what you can do with just an 'AT'.

Link to post
Share on other sites

OK, while we're on the subject - how about finding SUBMIT buttons? This implies that there's a form on the page, and they want to be contacted through that.

 

Here are some examples:

 

<input class="contact-form-button contact-form-button-submit" id="ContactForm1_contact-form-submit" value="Submit" type="button">
<input name="submit" value="Submit" id="ss-submit" class="jfk-button jfk-button-action " type="submit">
<input class="formdefaultbut" id="id123-button-send" onclick="  this.style.display='none'; insertPleaseWaitDiv(this,'Please wait...');  " style="background-color: #c80042; padding: 3px 10px;" value="Send Email" type="submit">
<input name="submit" value="Submit" id="ss-submit" type="submit">
<input class="class123-button" id="id123-button-send" value="Send email" type="submit">

 

I'm guessing the regex would look like this:

 

Match <input  through any character/space to id=" through any character/space to type="submit"

Link to post
Share on other sites

those input fields are probably inside of a <form> tags

 

so

if($exists(<tagname="form">)) {
    then {
        alert("do something")
    }
}

just on the html of your example this should always work but it mightnt be very effective in practice

load html("<input class=\"contact-form-button contact-form-button-submit\" id=\"ContactForm1_contact-form-submit\" value=\"Submit\" type=\"button\">
 <input name=\"submit\" value=\"Submit\" id=\"ss-submit\" class=\"jfk-button jfk-button-action \" type=\"submit\">
 <input class=\"formdefaultbut\" id=\"id123-button-send\" onclick=\"  this.style.display=\'none\'; insertPleaseWaitDiv(this,\'Please wait...\');  \" style=\"background-color: #c80042; padding: 3px 10px;\" value=\"Send Email\" type=\"submit\">
 <input name=\"submit\" value=\"Submit\" id=\"ss-submit\" type=\"submit\">
 <input class=\"class123-button\" id=\"id123-button-send\" value=\"Send email\" type=\"submit\">")
wait for browser event("Everything Loaded","")
if($exists(<tagname="input">) OR $exists(<tagname="button">)) {
    then {
        add list to list(%html,$scrape attribute(<tagname="input">,"outerhtml"),"Delete","Global")
        add list to list(%html,$scrape attribute(<tagname="button">,"outerhtml"),"Delete","Global")
        if($comparison($find regular expression(%html,"(?i)submit"),">",$nothing)) {
            then {
                alert("found")
            }
        }
    }
}

Link to post
Share on other sites

A recent trick I've been using is to let Scrapbox do the searching for me. Man, I can find out if 200-300 urls have an email in them within 20 seconds. (off of 25 proxies) I was hoping I could figure out the regex for my search string.

 

Didn't think about merely looking for a FORM tag. Give that a go.

Link to post
Share on other sites

Yea baby! It works.

 

I found some people doing this: grownupbookreviews AT gmail.com

 

The regex above wants to find a  DOT next to the com. I wonder what you can do with just an 'AT'.

 

Try this:

[a-zA-Z0-9\+\-_]+([\s\[\(]{1,2}|)[aAtT]+([\s\]\)]{1,2}|)[a-zA-Z0-9\-]+([\s\[\(]{1,2}[dDoOtT]+[\s\]\)]{1,2}|\.)[a-zA-Z]{2,4}
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...