Randy Role 1 Posted October 29, 2012 Report Share Posted October 29, 2012 Hey guys, I'm unable to scrape google results when I get a group of results together. Here's an example: http://content.screencast.com/users/riktubrs/folders/Jing/media/6af49237-40ed-415f-a204-bb4b5c8bc6a3/2012-10-29_1530.png As you can see above, I am unable to scrape these 4 results. Any ideas how to scrape them as well? If I get 14 results as 'Page 1' in google, it's fine. Thanks Quote Link to post Share on other sites
Kev 69 Posted October 31, 2012 Report Share Posted October 31, 2012 Can you give me the search term you used and I will see can I get it for you. Quote Link to post Share on other sites
HelloInsomnia 1103 Posted October 31, 2012 Report Share Posted October 31, 2012 Can you give me the search term you used and I will see can I get it for you. The term is: answers now Quote Link to post Share on other sites
Randy Role 1 Posted November 1, 2012 Author Report Share Posted November 1, 2012 Can you give me the search term you used and I will see can I get it for you. What HelloInsomnia said.. I searched for 'answers now'. Thanks for looking into it. Quote Link to post Share on other sites
Kev 69 Posted November 6, 2012 Report Share Posted November 6, 2012 Ok I got a convoluted way of getting this to work, but I'd like some more keywords to test it out please. Cheers Quote Link to post Share on other sites
Randy Role 1 Posted November 7, 2012 Author Report Share Posted November 7, 2012 Ok I got a convoluted way of getting this to work, but I'd like some more keywords to test it out please. Cheers Here are two example phrases: "Calculus Help Please""roger that mean" Let me know how it goes! I'm still stuck at that.. Thanks! Quote Link to post Share on other sites
a2mateit 395 Posted November 8, 2012 Report Share Posted November 8, 2012 Here is a crude way of doing it. add list to list(%google results, $scrape attribute(<onmousedown=w"return *">, "href"), "Delete", "Global") The only problem is that it is also scraping the link for the cached version of the page. I'm sure there is a way to regex out that part of the result. 1 Quote Link to post Share on other sites
Randy Role 1 Posted November 8, 2012 Author Report Share Posted November 8, 2012 Here is a crude way of doing it. add list to list(%google results, $scrape attribute(<onmousedown=w"return *">, "href"), "Delete", "Global") The only problem is that it is also scraping the link for the cached version of the page. I'm sure there is a way to regex out that part of the result. Works like a charm. Removed all cache links with regex $contains. Thanks! Quote Link to post Share on other sites
a2mateit 395 Posted November 8, 2012 Report Share Posted November 8, 2012 Glad I could help. If it's not too much to ask. Could you p.m. me the regex you used? Still learning regex and it will help me understand it a little deeper... If not I understand. Thanks,Justin Quote Link to post Share on other sites
Randy Role 1 Posted November 8, 2012 Author Report Share Posted November 8, 2012 I actually found an easier way of removing it than regex, using $contains. set(#serp position, $list total(%serp), "Global") loop($list total(%serp)) { if($contains($list item(%serp, #serp position), "http://webcache.googleusercontent.com")) { then { remove from list(%serp, #serp position) decrement(#serp position) } else { decrement(#serp position) } } } I hope it helps. Quote Link to post Share on other sites
Kev 69 Posted November 8, 2012 Report Share Posted November 8, 2012 Ok, I may as well share this solution anyhow. It took me quite a while to figure out - and I'm sure there's far simpler ways. However, I only wanted to scrape the indented results, nothing else, so I needed to look at each one of the ten results individually to see whether there were indented results or not. clear cookies clear all lists() ui stat monitor("Total Indented Reults:", $list total(%mainresults)) ui text box("Keyword", #keyword) define clear all lists { clear list(%mainresults) clear list(%results) clear list(%results1) clear list(%results2) clear list(%results3) clear list(%results4) clear list(%results5) clear list(%results6) clear list(%results7) clear list(%results8) clear list(%results9) clear list(%results10) } ui button("Scrape") { clear all lists() navigate("http://www.google.ie/#hl=en&output=search&sclient=psy-ab&q=test", "Wait") wait for browser event("Page Loaded", "") change attribute(<name="q">, "value", #keyword) click(<name="btnG">, "Left Click", "No") wait for browser event("Page Loaded", "") add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'1\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results1, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'1\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results1, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'2\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results2, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'2\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results2, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'3\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results3, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'3\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results3, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'4\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results4, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'4\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results4, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'5\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results5, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'5\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results5, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'6\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results6, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'6\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results6, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'7\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results7, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'7\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results7, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'8\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results8, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'8\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results8, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'9\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results9, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'9\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results9, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'10\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 6) { then { remove from list(%results, 0) remove from list(%results, 0) add list to list(%results10, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%results, $scrape attribute(<outerhtml=w"<a href=\"*\" onmousedown=\"return rwt(this,\'\',\'\',\'\',\'10\',\'*\',\'\',\'*\',null,event)\">*</a>">, "href"), "Don\'t Delete", "Global") if($list total(%results) = 3) { then { remove from list(%results, 0) add list to list(%results10, %results, "Delete", "Global") clear list(%results) } else { clear list(%results) } } add list to list(%mainresults, %results1, "Delete", "Global") add list to list(%mainresults, %results2, "Delete", "Global") add list to list(%mainresults, %results3, "Delete", "Global") add list to list(%mainresults, %results4, "Delete", "Global") add list to list(%mainresults, %results5, "Delete", "Global") add list to list(%mainresults, %results6, "Delete", "Global") add list to list(%mainresults, %results7, "Delete", "Global") add list to list(%mainresults, %results8, "Delete", "Global") add list to list(%mainresults, %results9, "Delete", "Global") add list to list(%mainresults, %results10, "Delete", "Global") } Just enter a keyword then press Scrape. It will give you the total number of indented results. Sometimes, there's more than 4 results on a page so that's where the issue arose for me. 1 Quote Link to post Share on other sites
gabel 51 Posted November 8, 2012 Report Share Posted November 8, 2012 this is what i came up with quickly clear list(%googleresults) ui text box("Keyword", #keyword) ui stat monitor("Links", $list total(%googleresults)) navigate("http://www.google.co.uk/#hl=en&sclient=psy-ab&q={#keyword}", "Wait") set(#scrapepages, $replace regular expression($scrape attribute(<onmousedown=w"return *">, "href"), "http://webcache\\.googleusercontent\\.com.*", ""), "Global") add list to list(%googleresults, $list from text(#scrapepages, $new line), "Delete", "Global") Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.