Jump to content
UBot Underground

Scraping Google Listings No Class


Recommended Posts

Hi,

 

I'm new to UBot, but went through the training videos and have a good background in html and using xpath for scraping.

 

I'm trying to scrape google urls, but for some reason I'm not getting any classes or ids that will work. The only thing that repeats on each listing is an onmouseclick javascript code, but it's different for every one.

 

Really appreciate any help here.

 

Thanks,

 

Mike

Link to post
Share on other sites

I used regex here and still works

clear list(%keywords)
add list to list(%keywords, $list from text("purple
green
yellow
blue
red", $new line), "Delete", "Global")
clear list(%scrape url)
set user agent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0")
loop($list total(%keywords)) {
set(#KW_next_item, $next list item(%keywords), "Global")
clear cookies
navigate("https://www.google.com/", "Wait")
wait for element(<innertext="Google Search">, 10, "Appear")
type text(<name="q">, "{#KW_next_item} balloons", "Standard")
click(<name="btnK">, "Left Click", "No")
wait($rand(3, 10))
wait for element(<innertext="Help">, 10, "Appear")
add list to list(%scrape url, $list from text($find regular expression($scrape attribute(<class="r">, "outerhtml"), "(?<=<h3 class=\"r\"><a href=\").*?(?=\" onm)"), $new line), "Delete", "Global")
}
ui stat monitor("urls: {$list total(%scrape url)}", "")


from this post

 

http://www.ubotstudio.com/forum/index.php?/topic/16637-newbie-question-regarding-google-search-result-scraping/&do=findComment&comment=99876

 


HTTP with xpath

set(#get love, $plugin function("HTTP post.dll", "$http get", "https://www.google.com/search?q=love", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0", "", "", ""), "Global")
set(#html parse inner text, $plugin function("HTTP post.dll", "$html parser", #get love, "h3", "class", "r", "InnerText"), "Global")
set(#html parse h3, $plugin function("HTTP post.dll", "$html parser", #get love, "h3", "class", "r", "OuterHtml"), "Global")
set(#html parse href, $plugin function("HTTP post.dll", "$xpath parser", #html parse h3, "h3/a", "href", "HTML"), "Global")
set(#both, "{#html parse href},{#html parse inner text}", "Global")

CD

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...