Jump to content
UBot Underground

Best Practices For Automating Scraping Bot


Recommended Posts

Hello fellow Ubotters,

 

I would like to get some advice on best practices for how to automate the running of a bot which has a large number of pages to scrape.  First I will give a little bit of background, and then hopefully someone can give me a few good ideas to implement.

 

There is one site that I would like to scrape, and I need to pass through a series of unique URLs to the site.  With each loop, I write the unique URL into a separate table so that I can keep track of which ones have been done and which ones still need to be done.  Perhaps an example will help to demonstrate my situation.

 

Let's say my list has 100,000 URLs to loop through and check for a certain result.  Only about 50% of the numbers will generate data that needs to be scraped.  After looping the first 10,000 URLs, I might have only scraped data from 5,000 URLs, but I keep track of all 10,000 URLs that have been navigated to in a separate file.  If the bot happens to crash after the first 10,000 rows, I can then start the bot again manually and it will check the separate file to see that 10,000 rows have been completed already, and it will know to continue with the master list starting from 10,001.

 

My question is this:  How can I get the bot to start again automatically if/when it crashes?  Should I be using the "Run on Schedule" feature?  Is there anything else I should be aware of when trying to automate my bot in this way?

 

Thanks for any pointers you can offer. 

Link to post
Share on other sites

What do you mean by crashes,as in the typical browser crash?

 

or

 

your bot hits like a 404 website not found error etc,and it tries to look for links on a broken page causing it to crash?

 

if the former,I think there has been some superbs fixes already to solve this,if the latter,simply find a 404 error,scrape it,and if the bot matches the 404,add it to a broken link list or whatever,and keep going with the link list

Link to post
Share on other sites

Advanced Ubot 2 Plugin has a "app crash restart" command:

 

http://www.ubotstudio.com/forum/index.php?/topic/17500-sell-plugin-advanced-ubot-2/

 

Dan

Thanks Dan.  That Advanced Ubot 2 plugin looks interesting.  I will buy that and play with it to see if it can solve my issue.  

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...