mco5044 0 Posted October 31, 2014 Report Share Posted October 31, 2014 Hi, I'm new to UBot, but went through the training videos and have a good background in html and using xpath for scraping. I'm trying to scrape google urls, but for some reason I'm not getting any classes or ids that will work. The only thing that repeats on each listing is an onmouseclick javascript code, but it's different for every one. Really appreciate any help here. Thanks, Mike Quote Link to post Share on other sites
UBotDev 276 Posted October 31, 2014 Report Share Posted October 31, 2014 The simplest way to solve that is to use wildcards * on places where element attributes differ. For more advanced examples you could even use regex. Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted November 1, 2014 Report Share Posted November 1, 2014 I used regex here and still works clear list(%keywords) add list to list(%keywords, $list from text("purple green yellow blue red", $new line), "Delete", "Global") clear list(%scrape url) set user agent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0") loop($list total(%keywords)) { set(#KW_next_item, $next list item(%keywords), "Global") clear cookies navigate("https://www.google.com/", "Wait") wait for element(<innertext="Google Search">, 10, "Appear") type text(<name="q">, "{#KW_next_item} balloons", "Standard") click(<name="btnK">, "Left Click", "No") wait($rand(3, 10)) wait for element(<innertext="Help">, 10, "Appear") add list to list(%scrape url, $list from text($find regular expression($scrape attribute(<class="r">, "outerhtml"), "(?<=<h3 class=\"r\"><a href=\").*?(?=\" onm)"), $new line), "Delete", "Global") } ui stat monitor("urls: {$list total(%scrape url)}", "") from this post http://www.ubotstudio.com/forum/index.php?/topic/16637-newbie-question-regarding-google-search-result-scraping/&do=findComment&comment=99876 HTTP with xpath set(#get love, $plugin function("HTTP post.dll", "$http get", "https://www.google.com/search?q=love", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0", "", "", ""), "Global") set(#html parse inner text, $plugin function("HTTP post.dll", "$html parser", #get love, "h3", "class", "r", "InnerText"), "Global") set(#html parse h3, $plugin function("HTTP post.dll", "$html parser", #get love, "h3", "class", "r", "OuterHtml"), "Global") set(#html parse href, $plugin function("HTTP post.dll", "$xpath parser", #html parse h3, "h3/a", "href", "HTML"), "Global") set(#both, "{#html parse href},{#html parse inner text}", "Global") CD 1 Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.