Jump to content
UBot Underground

Scrape URLs of Google Search Results


Recommended Posts

Hello, I've been struggling with with trying to pull the urls from a google search request through the browser for keywords like "insurance" which bring up a ton of ads and things like maps. I have not been able to find a way to grab the url's using imacros because I cannot target the links properly since the number of ads keep changing or pictures results appear. I was recommended to try Ubot; however, I am skeptical ubot has the capacity to do what I'm looking for. If you can convince me it can and provide some guidance on how to filter/target just the search result urls I will definitely give this program a shot.

Link to post
Share on other sites

Should be perfectly doable I believe, though I haven't made a bot to do this personally. With UBot's ability to flexibly scrape by attribute and parse a larger subset of scraped data to extract just what you need, I would think this to be a fairly simple task.

 

Anyone else already done this?

 

Jonathan

Link to post
Share on other sites

@ Goat >> What specifically are you trying to extract? The URL's from the paid ads on the right hand side of google or the actual search results returned to you from google?

 

Walk us through EXACTLY what you're trying to do. For instance:

 

Go to google.com

Search for "insert keyword"

Scrape URLS from search results or Scrape URLS from paid ads or etc, etc, etc.

Link to post
Share on other sites

I'm trying to scrape Google organic only results and then check page rank for top 10 sites. The problem i'm running into is that when I scrape Google, some URLs are shortened.

 

Example:

http://www.techrescueme.com/image1.jpg

 

Here's what I'm getting: inventors.about.com/od/.../a/sewing_machine.htm without the correct url I can't get correct page rank.

 

Here's how I'm scraping:

http://www.techrescueme.com/image2.jpg

 

Any ideas on how I can get the full URLs? I've tried scraping using other attributes but I've had no luck so far. Any help deeply appreciated.

 

Thanks

 

 

PS sorry to hijack thread :>(

Link to post
Share on other sites

Here's how I'm scraping:

http://www.techrescueme.com/image2.jpg

 

Any ideas on how I can get the full URLs? I've tried scraping using other attributes but I've had no luck so far. Any help deeply appreciated.

 

Thanks

 

 

PS sorry to hijack thread :>(

Try this instead:

post-11-12620600151147_thumb.jpg

Link to post
Share on other sites

I didn't know you could change the $scrape by attribute to href. Learn something new everyday.

 

the problem with this is that it will scrape every URL and I only need the organic results. How can I limit the scrape to only the organic results?

 

 

Thanks for the help.

Link to post
Share on other sites

I didn't know you could change the $scrape by attribute to href. Learn something new everyday.

 

the problem with this is that it will scrape every URL and I only need the organic results. How can I limit the scrape to only the organic results?

 

 

Thanks for the help.

 

That should still work to only gather the organic results. The ads use "<A id=an1". Also I just use $scrape instead of $scrape attrib. Not sure if that's any faster.

Link to post
Share on other sites
  • 2 years later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...