Pazman 32 Posted April 6, 2015 Report Share Posted April 6, 2015 Hey guys/gals, Haven't been able to figure out how to get the search results from google correctly. I keep getting stuff with .... because its obviously not getting the whole URL ex: http://adsabs.cnn.com/abs/1987ApJ...321..280T I am using this scrape code: add list to list(%scrape url, $list from text($find regular expression($scrape attribute(<class="r">, "outerhtml"), "(?<=<h3 class=\"r\"><a href=\").*?(?=\" onm)"), $new line), "Delete", "Global") Any ideas? I appreciate any help I can get on this. BTW, I know its probably something small BUT man, I'll tell you, sometimes the more you look at the problem, the harder it is ....LOL Thanks again! Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted April 7, 2015 Report Share Posted April 7, 2015 site:ubotstudio.com scraping google put that in google I guarantee you find several answers on this forum I know I have answered this a few times and others as well google is better search than forum searchand now you know how to search well CD Quote Link to post Share on other sites
Pazman 32 Posted April 7, 2015 Author Report Share Posted April 7, 2015 I did, if its out there ... I missed it or already tried what I saw and that did not work either Thanks Code Docta Still looking everyone, any ideas? 1 Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted April 7, 2015 Report Share Posted April 7, 2015 well then, try this sites may change soooo... we shall adapt add list to list(%scrape url, $list from text($scrape attribute(<tagname="cite">, "innertext"), $new line), "Delete", "Global") alert($scrape attribute(<tagname="cite">, "innertext")) Quote Link to post Share on other sites
Pazman 32 Posted April 7, 2015 Author Report Share Posted April 7, 2015 Nope, same results. Keep getting this "..." http://money.cnn.com/.../user/agg/.../tablet.h... Thanks for trying Quote Link to post Share on other sites
itexspert 47 Posted April 7, 2015 Report Share Posted April 7, 2015 well then, try this sites may change soooo... we shall adapt add list to list(%scrape url, $list from text($scrape attribute(<tagname="cite">, "innertext"), $new line), "Delete", "Global") alert($scrape attribute(<tagname="cite">, "innertext")) Haha..... Yes We Are BORG Resistance is Futile! 1 Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted April 7, 2015 Report Share Posted April 7, 2015 Nope, same results. Keep getting this "..." http://money.cnn.com/.../user/agg/.../tablet.h... Thanks for tryingMust be something wrong on your end man. works here even in python using xpath download the "large data" plugin and I will so a script using xpath here in a bit Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted April 7, 2015 Report Share Posted April 7, 2015 that code works on a google search page only not cnn every site has different code on them make sure you wait till everything is loaded as well Quote Link to post Share on other sites
giganut 109 Posted April 7, 2015 Report Share Posted April 7, 2015 Hear is the current working class to scrape: clear list(%urls)add list to list(%urls, $scrape attribute(<class="_Rm">, "innertext"), "Delete", "Global") https://drive.google.com/file/d/0B0hSg60eXoShRHB0UG1yeUU0dkE/view?usp=sharing Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted April 7, 2015 Report Share Posted April 7, 2015 This should work on any page for google plugin command("Bigtable.dll", "Clear all large list") ui text box("Search Term", #UI_search_term) navigate("https://www.google.com/search?q={$replace regular expression(#UI_search_term, "\\s", "+")}&ie=utf-8&oe=utf-8", "Wait") wait for element(<innertext="Help">, 10, "Appear") plugin command("Bigtable.dll", "large List from Xpath", "text and urls", $document text, "//h3[@class=\'r\']/a/@href", "replace") alert($plugin function("Bigtable.dll", "Large list return", "text and urls")) plugin command("Bigtable.dll", "large List from Regex", "urls", $plugin function("Bigtable.dll", "Large list return", "text and urls"), "(?<=href=\").*?(?=\")", "replace") alert($plugin function("Bigtable.dll", "Large list return", "urls")) plugin command("Bigtable.dll", "large List from Regex", "h3 text", $plugin function("Bigtable.dll", "Large list return", "text and urls"), "(?<=>).*?(?=</a>)", "replace") alert($plugin function("Bigtable.dll", "Large list return", "h3 text")) If that doesnt work I dont know what to tell ya Large data PI is free CD Quote Link to post Share on other sites
Pazman 32 Posted April 7, 2015 Author Report Share Posted April 7, 2015 giganut thanks for that, however, I ran into the same issues. If there was a long URL, then Google truncates it with "..."Code Docta F*@#in BINGO!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Awesome job! Works brilliantly !!!!!!!!!!!!!! Thank you! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.