Jump to content
UBot Underground

Using Ubot - Best Practices?


Recommended Posts

Hi all,

 

I've built my first bot which is ready to crawl just less than 1000 pages. I've dropped a "wait 3 seconds" command in the loop so as to not hammer the server, but I've not used a proxy or anything. The scrape will take around 1 hour to complete which is good with me, but I'm wondering if my crawl rate is still too high? Should I be using a list of proxies to scrape 5-10 pages per second to get the job done quicker and avoid raising red flags with the server admins?

 

Just looking for any tips from the community on what you might call best practices.

 

Any tips or plugins which you find essential in making bot creation even simpler etc?

 

Is there anything I should be looking into which isn't covered in the (excellent) tutorial videos which would improve my workflow? eg. Define was in the title of one of the tut vids but it wasn't actually covered in the video so I've not looked into that yet but guessing I should soon. I'm using Regex a lot but keep seeing mentions of XPath - is this a preferred solution to pattern matching?

 

All tips for a newbie welcome B)
Thanks
Will

Link to post
Share on other sites

Hey Will,

 

That is a good thing you are not hammering the server, even with proxies it is a good practice.

 

It's a good idea to use proxies but since your IP is not banned yet you have not triggered a bot filter. But if others are using this web service in the house it is not a good idea to risk it.

 

You will want to change your headers as well. Using a stealth browser 39 or 49 use "set headers". Using the legacy browser 21 "set user agent".

Also you may want to set the Referrer but prolly not needed here.

 

comment("Refer")
set header("Referer","yahoo.com")
comment("User Agent")
set header("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0")

http://techpatterns.com/downloads/firefox/useragentswitcher.xml

The url is a list of user agents. :)
You can use the $spin function or $random list item to make it random on each loop. You can use the same principal with proxies.
I use $list from file inside $random list item like this...

alert($random list item($list from file("{$special folder("Application")}\\file path\\file name.txt")))

$special folder using "Application" is lovely that whatever folder your yourbot.ubot or yourbot.exe is it will use that file path in addition to what ever you add.

Put $special folder using "Application" in an alert command and you can see the full path.

 

Regex and Xpath are powerful skills. Dan has an free Xpath plugin I can recommend.

 

Xpath is easier to get most stuff and sometimes you will need Regex after Xpath.

Then maybe $trim to get rid of the white space.


I do have a "defines" tutorial but it is mostly for much larger bots and refactoring your code.

[Tut]-"defines"- The How To Use Them And Why You Should Always Use Them! - Tutorials, Tips and Tricks - UBot Underground

 

Good luck and stay server friendly!
Nick

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...