How to scrape the whole url from google serp?

runsoftware · March 24, 2014

i mean, sometimes there are links that dont show completely like www.youtube.com/watch?v=bpbpmsJDb...

how can the bot know whats after the "..."

clear list(%scraped_urls)
ui text box("Search Term", #search_term)
navigate("https://www.google.com/ncr", "Wait")
type text(<name="q">, #search_term, "Standard")
click(<name="btnK">, "Left Click", "No")
wait(3)
add list to list(%scraped_urls, $scrape attribute(<tagname="cite">, "innertext"), "Delete", "Global")
save to file("C:\\Users\\blabla\\Desktop\\ubot\\scraped.txt", %scraped_urls)

Edited March 24, 2014 by KardoseR

the_way · March 24, 2014

you only have a 3 second wait for the result? you should use wait for element, and change the default waiting for element time to 30 seconds.

runsoftware · March 25, 2014

you only have a 3 second wait for the result? you should use wait for element, and change the default waiting for element time to 30 seconds.

bro once the serp page is loaded, then its loaded. i guess they do this for design purposes or some gay stuff.

my scraper is scraping succesfully but

when it comes to something like this:

http://i.imgur.com/r81saSa.png

it will just get "www.youtube.com/watch?v=bpbpmsJDb..."

Edward_2 · March 25, 2014

The only way to scrape G is with Regex.

add list to list(%results, $find regular expression($scrape attribute(<class="r">, "innerhtml"), "(?<=href\\=\\\")http.*?(?=\\\")"), "Delete", "Global")

runsoftware · March 25, 2014

The only way to scrape G is with Regex.

add list to list(%results, $find regular expression($scrape attribute(<class="r">, "innerhtml"), "(?<=href\\=\\\")http.*?(?=\\\")"), "Delete", "Global")

worked perfectly m8. now onto learning some Regex :ph34r:

thanks

Edited March 25, 2014 by KardoseR

Edward_2 · March 25, 2014

worked perfectly m8. now onto learning some Regex

thanks

Your welcome.

zenos · November 7, 2014

Hello guys, some changes about the google results scrape you gave us Edward_2.

Do you have an idea how to do it now ?

the result with your regex is like this :

http://www.google.fr/url?url=http://ubotstudio.com/&rct=j&q=&esrc=s&sa=U&ei=CNdcVO7eH4K-PJjigPgN&ved=0CBUQFjAA&usg=AFQjCNF87tinCUw36AI_UEt2BMtciLT16w

Edited November 7, 2014 by zenos

Sign In

How to scrape the whole url from google serp?

Recommended Posts

runsoftware 14

Link to post

Share on other sites

the_way 52

Link to post

Share on other sites

runsoftware 14

Link to post

Share on other sites

Edward_2 85

Link to post

Share on other sites

runsoftware 14

Link to post

Share on other sites

Edward_2 85

Link to post

Share on other sites

zenos 13

Link to post

Share on other sites

Join the conversation

Browse

Activity