Jump to content
UBot Underground

Google SERPS scraper


Recommended Posts

Finally got there with first bot. Works pretty well, might crash out now and again but so be it. Could do with a tidy up and better use of scripts / subs, but it works and am sick of it now :D. Learnt alot, overcome some bitch issues so all good.

 

NOTES

1)Make sure you set your search results to 100 before firing the bot, as it won't check pages 10+, so to get a thousand you need 100 results per the 10 pages

-to change this setting goto Google.com> click Search Settings> change drop down box to 100, click Save settings.

 

Future work is to have the bot work with showing only 10 results, then have it update the list of google page URLS as they appear.

 

2) You'll need my modified string Lib (original by Jim, who deserves a thanks here), file is attached. You will need to put this in your DOCUMENTS folder

 

3) You need to add a file to the root of your DOCUMENTS folder cleanUrls.txt.

3b also add googlepages.txt

 

4) Enter your search term in the UI box at the top of the bot

 

5) This is important, everytime you run this bot, you MUST delete the contents of the cleanUrls.txt file, otherwise your new scraped URLS are just appended to this one file. If that is ok, obviously don't delete the contents :D

 

 

<edit>Thanks to greencat for showing me how to escape using the $replace feature.

 

<edit2> reupload the ubot file as have fixed a bug where it wouldn't always loop correctly if google tried to omit some results. If it hasn't loop correctly, just run it again and it will do it properly the 2nd time :) Will try and get it working correctly first time tomorrow, off to bed now

 

<edit 3> 21:15 5th Feb reuploaded bot, works pretty sweet now. all the way through, no crashing, no needing to run it twice.

ulib_stringsCustom.ubot

serpscraper.ubot

  • Like 1
Link to post
Share on other sites

just another note, when doing a fresh scrape, probably want to delete contents of googlepages.txt.

 

Beginning to think posted the bot a bit early, as in the light of a new morning, realising lots of things could be better, but didn't want to end up constantly working on it and would post up as is and get any input advice etc.

Link to post
Share on other sites

Just reuploaded bot (see edit3 in original post).

 

no need for googlepages.txt.

should do 1000 urls without requiring a double run.

 

want to get it scraping search terms from a text file. 100,000 urls in a single run here we come :) Wife is kicking off that it is friday night so, this will have to wait.

 

Next bot, WP Commenter :)

Link to post
Share on other sites

Thanks for contributing this bot :) However, even the latest version isn't working fully for me. It is only retrieving the first 100 results then throwing up an error. Looks like its trying to click the next button but failing.

 

Sounds like it isn't scraping the Google pages links correctly, so when tries to load Page 2 it is wrong.

 

Unforunatley, no idea why this is. Does this happen for any search term? Or just a specific one?

Link to post
Share on other sites

It happens for every search term. I use scrapebox for my url harvesting usually, but I thought it would be nice to integrate something like this into a few of my bots.

 

I have fixed an issue where it would crash if your search term didn't return more than one page of results.

 

As you say it happens for everyterm you try, I would have thought some terms were returning more than a pages worth.

 

I am in the UK, not sure if the Google lays out it's source code differnet so the scrape isn't working.

 

Do you mind giving me a search term that definately doesn't work for you so I can test it here.

 

 

Have just finished making some modifications.

It now reads from a UI file selector, a file that lists your search terms and another UI file selector where to save the URLS to.

 

Just pulled 6145 urls :)

 

 

The main script to use to run the new bot is MAIN START POINT, but this is the last script in the list, it will need a manual select before running

serpscraper.ubot

Link to post
Share on other sites
  • 3 weeks later...

Thanks a lot for the bot some_guy, i have an issue i hope you have a solution. the bot doesn't seem to work beyond the first page. It manages to scrap 100 urls but after than the bot stops making it seem as if the scrapping is over, but when i check the text file its only 100 urls. Any help?

Link to post
Share on other sites

Thanks a lot for the bot some_guy, i have an issue i hope you have a solution. the bot doesn't seem to work beyond the first page. It manages to scrap 100 urls but after than the bot stops making it seem as if the scrapping is over, but when i check the text file its only 100 urls. Any help?

 

sounds as if it is picking up on the google pages.

 

If can understand the scripting, put a pause node after it scrapes the google pages, then test the list to see if it has picked up the links to the following pages. It could be that the code I used to scrape the google pages isn't working for you as google switches up the code. This is just a guess, as the scraper still works for me.

 

Does this problem occur for every search term?

 

If get some free time over the next few days will try and improve it. But feel free to dive into the code and fix it and post it up :)

Link to post
Share on other sites
  • 1 month later...

thanks for this =D

 

i was going to make one but now, many hours saved =P

 

 

*edit* i am getting that "error including bot" as well

 

*edit* nothing is happening for me after running the bot... i see it go through searching my search terms and that's it. the url list file has 0 urls listed after the script is finished running

Link to post
Share on other sites
  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...