Jump to content
UBot Underground

Google URL Scraper


Recommended Posts

I coulda sworn there was a simple google search URL scraper here (actually thought there were a few) but now I can't find it... far be it for me to try to reinvent the wheel... would someone kindly direct me to it...

 

I need to modify it to scrape image URLs and the one I put together is not collecting all the URLs properly.

 

Thanks!

http://ubotstudio.com/forum/public/style_emoticons/default/rolleyes.gif

 

ui text box("Keyword", #keyword)

navigate("http://images.google.com/", "Wait")

wait(3)

type text(<name="q">, #keyword, "Standard")

click(<name="btnG">, "Left Click", "No")

wait(3)

add list to list(%urls, $scrape attribute(<class="rg_i">, "fullsrc"), "Delete", "Global")

Link to post
Share on other sites

Hey Duane.

 

I've done so many variations of these, I don't know which is which anymore. This one scrapes the green urls I believe (with the cite tag).

 

ui text box("Keyword: ", #kw)

clear list(%urls)

ui stat monitor("Total Scraped", $list total(%urls))

reset account("Any")

navigate("http://www.google.com", "Wait")

type text(<name="q">, #kw, "Standard")

click(<name="btng">, "Left Click", "No")

wait(2)

loop(10) {

add list to list(%urls, $scrape attribute(<class="f kv">, "outertext"), "Delete", "Global")

click(<id="pnnext">, "Left Click", "No")

wait(4)

}

save to file("{$special folder("Desktop")}/urls.txt", %urls)

 

 

 

 

 

John

 

But I'm not positive! http://ubotstudio.com/forum/public/style_emoticons/default/smile.gif

Link to post
Share on other sites

I think It does not work for the Image jhon.

 

===========I'm fail to reply=====

 

I suppose Duane is trying to scrape URL from Google image and it has javscript all over the link.

 

and you are showing the text scraper sample .

 

cheers

 

navigate("http://images.google.com/", "Wait")

Edited by mariah
Link to post
Share on other sites

Thanks John! That's what I was originally looking for!

 

I thought it might help as a model to scrape http://images.google.com/ for image urls.

 

The results I'm getting are whacky though... I don't know where rows 1-14 are coming from (data:image...) and there are huge gaps in the data that I can't seem to scrape.

 

ui text box("Keyword: ", #kw)

ui stat monitor("Total Scraped", $list total(%results))

reset account("Any")

clear list(%results)

clear table(&final)

navigate("http://images.google.com/", "Wait")

type text(<name="q">, #kw, "Standard")

click(<login button>, "Left Click", "No")

wait(3)

click(<id="smb">, "Left Click", "No")

wait(10)

add list to list(%results, $scrape attribute(<class="rg_i">, "src"), "Don\'t Delete", "Global")

save to file("{$special folder("Application")}/portraits.csv", &final)

 

http://ubotstudio.com/forum/public/style_emoticons/default/blink.gif

Link to post
Share on other sites

you can lose all the blank entries by deleting duplicates...

 

ui text box("Keyword: ", #kw)

ui stat monitor("Total Scraped", $list total(%results))

reset account("Any")

clear list(%results)

clear table(&final)

navigate("http://images.google.com/", "Wait")

type text(<name="q">, #kw, "Standard")

click(<login button>, "Left Click", "No")

wait(3)

click(<id="smb">, "Left Click", "No")

wait(10)

add list to list(%results, $scrape attribute(<class="rg_i">, "fullsrc"), "Delete", "Global")

save to file("{$special folder("Application")}/portraits.csv", &final)

 

 

John

Link to post
Share on other sites

Here...this should work for you...

 

ui text box("Keyword: ", #kw)

ui stat monitor("Total Scraped", $list total(%results2))

reset account("Any")

clear list(%results)

clear list(%results2)

clear table(&final)

navigate("http://images.google.com/", "Wait")

type text(<name="q">, #kw, "Standard")

click(<login button>, "Left Click", "No")

wait(3)

click(<id="smb">, "Left Click", "No")

wait(10)

add list to list(%results, $scrape attribute(<class="rg_i">, "fullsrc"), "Delete", "Global")

set(#position, 0, "Global")

loop($list total(%results)) {

if($contains($list item(%results, #position), "gstatic.com")) {

then {

add item to list(%results2, $list item(%results, #position), "Delete", "Global")

}

else {

}

}

increment(#position)

}

add list to table as column(&final, 0, 0, %results2)

save to file("{$special folder("Desktop")}/portraits.csv", &final)

 

 

 

 

John

Link to post
Share on other sites
  • 4 weeks later...

BTW... this worked perfectly... thanks again! http://ubotstudio.com/forum/public/style_emoticons/default/smile.gif

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...