Jump to content
UBot Underground

Recommended Posts

I am working on a fairly straight forward bot that will scrape google results.  I have several methods in mind that could get to a results page in the browser and wonder which would be best.

 

My intent is to sell this bot.

 

I'm only scraping the titles and URLS, not going into results.

 

The search query itself is fairly complex (long). I had to trim the original down as google has a 32 word limit on queries.

 

I want to run the queries specifically against a fixed # of sites using the site: operator in google, so.. run the query multiple times.

 

I want Google search settings to include 'Safe Search' = ON

Google Instant = Off (required for more than 10 results)

Results = 100 (to improve scrape speed?)

 

Also: After results are returned I go under search tools and change 'any time' to 'last month'

 

Now, for the question --  which is best? 

 

1. Do all these settings in the bot each time it runs. 

 

2. I could do the search and modifiations in a browser, and save the URL of the Google results page, which includes all of the criteria and the modifications and hard code this 'results' url into the bot.

 

3. I could get the results url as in option 2. but put a link to it on a page in my website, which the bot could scrape when run.  This allows me to update the query without updating the bot.

 

For 2 and 3, I don't know if this 'saved url' is 'permanent' or if it will stop working in a couple weeks or days.

 

Any thoughts or similar experiences would be greatly appreciated.

 

 

 

 

 

 

 

 

 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...