runsoftware 14 Posted March 24, 2014 Report Share Posted March 24, 2014 (edited) How to scrape the whole url from google serp? i mean, sometimes there are links that dont show completely like www.youtube.com/watch?v=bpbpmsJDb...how can the bot know whats after the "..." clear list(%scraped_urls) ui text box("Search Term", #search_term) navigate("https://www.google.com/ncr", "Wait") type text(<name="q">, #search_term, "Standard") click(<name="btnK">, "Left Click", "No") wait(3) add list to list(%scraped_urls, $scrape attribute(<tagname="cite">, "innertext"), "Delete", "Global") save to file("C:\\Users\\blabla\\Desktop\\ubot\\scraped.txt", %scraped_urls) Edited March 24, 2014 by KardoseR Quote Link to post Share on other sites
the_way 52 Posted March 24, 2014 Report Share Posted March 24, 2014 you only have a 3 second wait for the result? you should use wait for element, and change the default waiting for element time to 30 seconds. Quote Link to post Share on other sites
runsoftware 14 Posted March 25, 2014 Author Report Share Posted March 25, 2014 you only have a 3 second wait for the result? you should use wait for element, and change the default waiting for element time to 30 seconds.bro once the serp page is loaded, then its loaded. i guess they do this for design purposes or some gay stuff. my scraper is scraping succesfully but when it comes to something like this: http://i.imgur.com/r81saSa.png it will just get "www.youtube.com/watch?v=bpbpmsJDb..." Quote Link to post Share on other sites
Edward_2 85 Posted March 25, 2014 Report Share Posted March 25, 2014 The only way to scrape G is with Regex. add list to list(%results, $find regular expression($scrape attribute(<class="r">, "innerhtml"), "(?<=href\\=\\\")http.*?(?=\\\")"), "Delete", "Global") 2 Quote Link to post Share on other sites
runsoftware 14 Posted March 25, 2014 Author Report Share Posted March 25, 2014 (edited) The only way to scrape G is with Regex. add list to list(%results, $find regular expression($scrape attribute(<class="r">, "innerhtml"), "(?<=href\\=\\\")http.*?(?=\\\")"), "Delete", "Global")worked perfectly m8. now onto learning some Regex thanks Edited March 25, 2014 by KardoseR Quote Link to post Share on other sites
Edward_2 85 Posted March 25, 2014 Report Share Posted March 25, 2014 worked perfectly m8. now onto learning some Regex thanksYour welcome. Quote Link to post Share on other sites
zenos 13 Posted November 7, 2014 Report Share Posted November 7, 2014 (edited) Hello guys, some changes about the google results scrape you gave us Edward_2. Do you have an idea how to do it now ? the result with your regex is like this : http://www.google.fr/url?url=http://ubotstudio.com/&rct=j&q=&esrc=s&sa=U&ei=CNdcVO7eH4K-PJjigPgN&ved=0CBUQFjAA&usg=AFQjCNF87tinCUw36AI_UEt2BMtciLT16w Edited November 7, 2014 by zenos Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.