Jump to content
UBot Underground

Scraping google links :: Removing the <b> tags?


Recommended Posts

Hey everyone, here is my issue:

 

I'm trying to scrape Google Blogsearch: http://blogsearch.google.com/. First, I navigate to that page, then type in what blogs I'm looking for and perform the search. Then, I want to scrape all of the urls from the 10 results returned. The problem is that when you do the search for related blogs, the only url that I'm able to get a scrape on is the green colored one that appears below each result. Within this url, Google inserts <b> and </b> tags to bold the word in the url that matches your search. Because I want to then visit each url, I can't have these tags in my scraped results. I'm drawing a blank here and would really appreciate any input that you guys can provide me. If I need to make this question more detailed with screenshots, then please let me know and I will update.

 

Thanks for your time,

 

Jeff

Link to post
Share on other sites

Or if anyone knows a dependable source where I can scrape an exhaustive list of active wordpress blogs for any given keyword or niche...that would be baller as well.

Link to post
Share on other sites

I'm looking at the search pages now and the only thing I see is this:

 

all X blogs>> (where x is the number returned)

 

Is that what you are referring to?

 

John

 

 

Try this:

 

blogsearch_scrape.ubot

 

You. Are. My. Hero.

 

Thank you so much!

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...