APTS 3 Posted February 4, 2015 Report Share Posted February 4, 2015 Hello fellow Ubotters, I would like to get some advice on best practices for how to automate the running of a bot which has a large number of pages to scrape. First I will give a little bit of background, and then hopefully someone can give me a few good ideas to implement. There is one site that I would like to scrape, and I need to pass through a series of unique URLs to the site. With each loop, I write the unique URL into a separate table so that I can keep track of which ones have been done and which ones still need to be done. Perhaps an example will help to demonstrate my situation. Let's say my list has 100,000 URLs to loop through and check for a certain result. Only about 50% of the numbers will generate data that needs to be scraped. After looping the first 10,000 URLs, I might have only scraped data from 5,000 URLs, but I keep track of all 10,000 URLs that have been navigated to in a separate file. If the bot happens to crash after the first 10,000 rows, I can then start the bot again manually and it will check the separate file to see that 10,000 rows have been completed already, and it will know to continue with the master list starting from 10,001. My question is this: How can I get the bot to start again automatically if/when it crashes? Should I be using the "Run on Schedule" feature? Is there anything else I should be aware of when trying to automate my bot in this way? Thanks for any pointers you can offer. Quote Link to post Share on other sites
Bot-Factory 602 Posted February 4, 2015 Report Share Posted February 4, 2015 Advanced Ubot 2 Plugin has a "app crash restart" command: http://www.ubotstudio.com/forum/index.php?/topic/17500-sell-plugin-advanced-ubot-2/ Dan Quote Link to post Share on other sites
deliter 203 Posted February 4, 2015 Report Share Posted February 4, 2015 What do you mean by crashes,as in the typical browser crash? or your bot hits like a 404 website not found error etc,and it tries to look for links on a broken page causing it to crash? if the former,I think there has been some superbs fixes already to solve this,if the latter,simply find a 404 error,scrape it,and if the bot matches the 404,add it to a broken link list or whatever,and keep going with the link list Quote Link to post Share on other sites
APTS 3 Posted February 4, 2015 Author Report Share Posted February 4, 2015 Advanced Ubot 2 Plugin has a "app crash restart" command: http://www.ubotstudio.com/forum/index.php?/topic/17500-sell-plugin-advanced-ubot-2/ DanThanks Dan. That Advanced Ubot 2 plugin looks interesting. I will buy that and play with it to see if it can solve my issue. Quote Link to post Share on other sites
itexspert 47 Posted February 6, 2015 Report Share Posted February 6, 2015 If you are into Scraping these are very good examples! http://itbots.net/forum/index.php/topic/19-free-superpages-scraper-v10-scrapelearn-apply-for-beginners/Orhttp://itbots.net/forum/index.php/topic/31-sell-source-code-yellowbook-scraper-v10-with-video-explaining-every-functionfor-beginners-moderate-users/ Quote Link to post Share on other sites
Bot-Factory 602 Posted February 6, 2015 Report Share Posted February 6, 2015 If you are into Scraping these are very good examples! http://itbots.net/forum/index.php/topic/19-free-superpages-scraper-v10-scrapelearn-apply-for-beginners/Orhttp://itbots.net/forum/index.php/topic/31-sell-source-code-yellowbook-scraper-v10-with-video-explaining-every-functionfor-beginners-moderate-users/Those links are not working. Looks like people need a login to that site? Dan Quote Link to post Share on other sites
itexspert 47 Posted February 6, 2015 Report Share Posted February 6, 2015 sry different forum anyway its http://www.ubotstudio.com/forum/index.php?/topic/17546-free-superpages-scraper-v10-scrapelearn-apply-for-beginners-with-source-code/ or http://www.ubotstudio.com/forum/index.php?/topic/17549-sell-source-code-yellowbook-scraper-v10-with-video-explaining-every-functionfor-beginners-moderate-users/ I am very good with scraping maybe you can learn! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.