Jump to content
UBot Underground

Need Gmail Address Extractor


Recommended Posts

Hey all,

 

I have an inbox of about 20,000 emails now.

 

I would like a bot to go in and take out and save the email addresses into a text file.

 

I know your suppose to turn Gmail into HTML mode.

 

But what is the right way about scripting this? Scrape page? Chooose by attribute?

 

Does anyone have a sample script I could run off, I've just got myself really confused

trying to do it. Just trying to figure in my head what the script should look like. I've made

plenty of bots now, but none did any scraping I think.

 

Thanks for any replies,

Gilesy

Link to post
Share on other sites

Im not sure if gmail allows to connect to the server using pop. But if so your are looking for connect to mail server, then use the create table from email command.

 

This can be done pretty easy once you create the table. I will make an example in a few and post. Just extremely busy at the moment.

Link to post
Share on other sites

About year ago I was done this by Ubot 3.5.If You have 3.5 I can pass script.My script did it when reading emails with page scrape --- left side-><H3><FONT color=#00681c><B> right side -> > </TD> then replace -></B> </FONT></H3>< with , and add to list. I think via Ubot 4 it will be peace of cake to do it.

Link to post
Share on other sites

Here you are, I created the bot for you (and for everybody else with the same need to scrape the email addresses in Gmail). I have tried to make it really generic so although this is tested with Swedish as the locale, it should work for other languages as well. This bot saves the list that contains the scraped email addresses every 20nd time an email address is scraped. You can change that, I put the frequency in a variable called #saveInterval .

 

 

 

 

ui text box("Email", #email)
ui text box("Password", #password)
ui save file("File with emails", #emailFile)
ui stat monitor("Num of scraped email addresses", #i)
set user agent("Internet Explorer 9")
set(#i, 0, "Global")
navigate("http://www.google.se/", "Wait")
set(#saveInterval, 20, "Global")
click(<innertext="Gmail
">, "Left Click", "No")
wait for browser event("Everything Loaded", "")
type text(<name="Email">, #email, "Standard")
type text(<name="Passwd">, #password, "Standard")
click(<outerhtml="<input type=\"submit\" class=\"g-button g-button-submit\" name=\"signIn\" id=\"signIn\" value=\"Sign in\">">, "Left Click", "No")
wait for element(<href=w"https://mail.google.com/mail/h/*/?logout*">, "", "Appear")
set(#baseURL, $url, "Global")
clear list(%emails)
add list to list(%emails, $scrape attribute(<outerhtml=w"<a href=\"?*v=c&*th=*\">*</a>">, "fullhref"), "Delete", "Global")
set(#mailsPerPage, 50, "Global")
set(#nextEmailCount, #mailsPerPage, "Global")
loop while($exists(<outerhtml=w"<a href=\"?&st={#nextEmailCount}\">*</a>">)) {
   click(<outerhtml=w"<a href=\"?&st={#nextEmailCount}\">*</a>">, "Left Click", "No")
   wait for browser event("Everything Loaded", "")
   add list to list(%emails, $scrape attribute(<outerhtml=w"<a href=\"?*v=c&*th=*\">*</a>">, "fullhref"), "Delete", "Global")
   set(#nextEmailCount, $eval($add(#nextEmailCount, #mailsPerPage)), "Global")
}
set(#i, 0, "Global")
clear list(%emailAddress)
loop($list total(%emails)) {
   set(#tmpURL, $list item(%emails, #i), "Global")
   navigate(#tmpURL, "Wait")
   wait for browser event("Everything Loaded", "")
   add list to list(%emailAddress, $scrape attribute(<outerhtml=w"<td> <a href=\"?&redir=?*&a=st&at=*&m=*\">* <*@*> </td>">, "innertext"), "Delete", "Global")
   increment(#i)
   if($comparison($eval("{#i} % {#saveInterval}"), "=", 0)) {
       then {
           save to file(#emailFile, %emailAddress)
       }
   }
   wait(1)
}
save to file(#emailFile, %emailAddress)
click(<href=w"https://mail.google.com/mail/*?logout*">, "Left Click", "No")

 

 

If the output is not 100% what you wanted, I am sure you can fiddle with the scraping details.

 

One more thing, the bot assumes that you list the contents of your inbox with 50 mails at a time (per page when you list the inbox) - this is a setting in Gmail so you can change it manually in the settings too.

So; you can either 1) scrape and calculate the number of mails shown per page, 2) change the contents of the variable #mailsPerPage in the bot or 3) change the number of mails per page in the settings of Gmail.

 

There is a one second delay between each scrape of the email addresses. I did try to push the bot to the limits and I never got any captchas, but you'll never know. You can of course try to remove the delay and see what happens.

 

 

Have fun! :)

Link to post
Share on other sites

Im not sure if gmail allows to connect to the server using pop. But if so your are looking for connect to mail server, then use the create table from email command.

 

This can be done pretty easy once you create the table. I will make an example in a few and post. Just extremely busy at the moment.

 

Yes they do, but you have to go into your settings and enable pop (in your Gmail account). Then in UBot you need to use Pop3 with SSL, use pop.gmail.com and port 995.

 

John

 

 

 

Link to post
Share on other sites
  • 2 months later...

I actually never realised I got all these replies until now! Thank you all so very much :)

 

Can anyone kindly explain to me how I can use the above code in a uBot? Is there anyway to save it or paste it in (I'm rusty).

 

Had another stab this evening at making this, and it was pretty difficult, I can't find the addresses anywhere on the page, and

not even in basic HTML view.

 

 

(P.S Thanks again ti Lilly for sorting out my forum account, I didn't have the permission to make posts!)

Link to post
Share on other sites
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...