Jump to content
UBot Underground

HUGE speed boost tip for scraping with the basic UBOT package.


Recommended Posts

Check out this video to see the speed at which I scrapped my last project (3-4 complete page scrapes per second at times). Video attached.

 

I wanted to see if I could scrape faster by using this command.

set(#data, $read file("http://www.example.com"), "Global")

 

That takes the contents of a URL, and loads it into a variable.

 

Then you just use regex to find what you need. My script has about a dozen lists and multiple nested regex, replace, trim functions, and it's still blazing fast on a virtual machine.

 

Here is an example of the code used to get the title of the off road trail I am scraping.

 

add item to list(%title, $replace($replace($find regular expression(#data, "<meta name=\"TrailName\" content=\"[^\".]*\" />"), "<meta name=\"TrailName\" content=\"", $nothing), "\" />", $nothing), "Don\'t Delete", "Global")

 

That looks tricky in code view, but paste that into your ubot studio and it will make more sense to you.

 

I hope this helps, this is how I will scrape everything in the future.

Speed.mov

  • Like 1
Link to post
Share on other sites

I think if you are just trying scrape meta description it'll be easier to use:

 

set(#data, "{$title},{$meta description},{$meta keywords}", "Global")

 

Just my .02c

 

Praney

  • Like 1
Link to post
Share on other sites

I think if you are just trying scrape meta description it'll be easier to use:

 

set(#data, "{$title},{$meta description},{$meta keywords}", "Global")

 

Just my .02c

 

Praney

I wasn't scraping keywords or descriptions. These were all meta tags, but not the keyword or description tag.

Link to post
Share on other sites
  • 2 months later...

This thread was a tremendous help tonight, thank you! My scraper was virtually unusable even with sockets and multithreading until I tried your tip, and now it is blazing fast.

Link to post
Share on other sites
  • 4 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...