Jump to content
UBot Underground

Jammin' and not in a Bob Marley way :)


Recommended Posts

Currently, I have a project to scrape the keywords from around 2,000,000 websites. Not an overnight project as you will appreciate.

 

I'm using stripped down XP SP3 2GB RAM boxes with only .net 3.5, the filth that is IE 8 and NOD32 Antivirus intalled. When you're scraping websites in volume the antivirus works overtime believe me.

 

IE8 has all image, activex, flash, javascript and any unwanted junk disabled. Many sites are overloaded with this crud which only slows the whole process down.

 

I run 4 bots on each machine. I've included the bot so, those who are interested, can see if I've done anything really stupid with it.

 

The CSV has been delimited with the tilde character, as there are are a huge number of sites out there with badly crafted keyword tags littered with additional quotation marks. This makes the transfer of the resultant CSV into a mysql database for analysis a real pain.

 

Anyway, it all plays nicely together until each bot has worked through around 1500 sites each, and then the whole system jams solid and has to be rebooted. Grrr...

 

Can anyone please shed any light into my darkness as to why this is happening please?

 

As a side note I would really like it if Seth could add a feature to insert records directly into a mysql/sql server database. That would be uber cool. There's no hurry for the next four hours Seth ;)

meta_keywords_improved.ubot

Link to post
Share on other sites

I had a look at your bot. You are doing the whole lot in one go so it will be hard to find what causes the crash.

 

Change the logic so it saves after every 50 results and it could help pinpoint where its failing.

 

For instance:

 

If (#count == 50)
{
  Save to file 
}

 

Use 2 place holders when saving to file, 1 for data already in file and 1 for new data your adding, this will increment it with the 50 new results.

Link to post
Share on other sites

I had a look at your bot. You are doing the whole lot in one go so it will be hard to find what causes the crash.

 

Change the logic so it saves after every 50 results and it could help pinpoint where its failing.

 

For instance:

 

If (#count == 50)
{
  Save to file 
}

 

Use 2 place holders when saving to file, 1 for data already in file and 1 for new data your adding, this will increment it with the 50 new results.

 

Many thanks for taking the time to view my bot and make a suggestion.

I'll give it a whizz.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...