Jump to content
UBot Underground

[Sell] Website Crawler - Crawl Websites And Extract That Succulent Data


Recommended Posts

Plugins Required: HTTP Post (Paid) / Advanced File (Free) / File Management (Free) / Threads Counter (Free) / Large Data (Free) / Local Dictionary (Free)

 

Made and tested in Ubot Studio 4

 

This Is Source Code You Have Full Resale Rights

 

 

http://i.imgur.com/93PDr.jpg

 

  • Dev and Pro version both come with every purchase
  • Source Code video included

This website crawler has been in development for months. I wanted my next product to be big and this is one of the best products I’ve ever made if not the best.

 

There are so many uses for this product. I built the website crawler “engine” so that it would be a beast that can crawl tens of thousands of webpages looking for information. What is that information? That is the question. This can easily be modified to scrape whatever you wish off of the web pages. It is so easy that I will even write the code for you (as long as it is a reasonable request).

 

Some ideas off the top of my head that you can scrape:

  • External domains from authority websites
  • Emails
  • Phone numbers
  • Any kind of videos to make a video site
  • Any kind of images to make a large image site
  • links that contain certain words
  • PDF files to make a document website
  • And more, the list is endless!

Here are some of the features you can see in the program:

  • Dev version comes with a simply coded HTML design – easy to modify
  • Multi-threaded
  • Extension filter to filter out pages you do not want to crawl
  • Custom user agent to tell the websites what your crawler is about
  • Ability to retry failed requests x number of times
  • Ability to crawl multiple websites in one run (may be taken away in the future)

Here are some of the “behind the scenes” features:

  • Intelligent filtering system that only acquires internal links to keep the program running
  • Unique ways of acquiring the links, both internal and external (took me lots of head scratching to figure this out)
  • Elegant solutions for the filtering and link acquisition
  • Comments where they make sense

Watch the video for more information.

 

 

$99 Price may rise at any time.

 

http://i.imgur.com/93PDr.jpg

  • Like 1
Link to post
Share on other sites
  • Replies 69
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Popular Posts

Plugins Required: HTTP Post (Paid) / Advanced File (Free) / File Management (Free) / Threads Counter (Free) / Large Data (Free) / Local Dictionary (Free)   Made and tested in Ubot Studio 4   This Is S

Just wanted to pop in here and give Nick some love with this software.   First off Nick creates really good useful tools. Second, his support is fantastic!   This is a tool with a LOT of different use

Update:   - Fixed: major bug that would close the program to crash when a PDF link (without the .pdf extension) was called - Default script to scrape external domains has been modified now to have gen

Just grabbed a copy. Thanks Nick.

 

Can you add in this code to download and use a random user agent?

download file("http://techpatterns.com/downloads/firefox/useragentswitcher.xml", "{$special folder("Desktop")}\\useragentswitcher.xml")
wait(1)
clear list(%Useragents)
add list to list(%Useragents, $list from text($find regular expression($read file("{$special folder("Desktop")}\\useragentswitcher.xml"), "(?<=useragent=\").*?(?=\")"), "
"), "Delete", "Global")
save to file("{$special folder("Desktop")}\\Useragents.txt", %Useragents)
set(#user_agent, $random list item($list from file("{$special folder("Desktop")}\\Useragents.txt")), "Global")
Link to post
Share on other sites

 

Just grabbed a copy. Thanks Nick.

 

Can you add in this code to download and use a random user agent?

download file("http://techpatterns.com/downloads/firefox/useragentswitcher.xml", "{$special folder("Desktop")}\\useragentswitcher.xml")
wait(1)
clear list(%Useragents)
add list to list(%Useragents, $list from text($find regular expression($read file("{$special folder("Desktop")}\\useragentswitcher.xml"), "(?<=useragent=\").*?(?=\")"), "
"), "Delete", "Global")
save to file("{$special folder("Desktop")}\\Useragents.txt", %Useragents)
set(#user_agent, $random list item($list from file("{$special folder("Desktop")}\\Useragents.txt")), "Global")

 

Thanks! I'm going to send you a PM soon!

Link to post
Share on other sites

Just wanted to pop in here and give Nick some love with this software.

 

First off Nick creates really good useful tools. Second, his support is fantastic!

 

This is a tool with a LOT of different uses. Buy this now and the uses are endless.

 

Great job Nick!

  • Like 1
Link to post
Share on other sites

Hey Nick,

 

I am not sure what can i do with this 'engine', but i tried to grab a copy and the coupon no longer working.

can you explain how to edit the code to scrape custom data? is it by regex or xpath? thanks

Link to post
Share on other sites

A new coupon code has been added since the old one was not working for people. I have tested the new one 3 times now so it should work for everybody.

 

Please use the new code: gimme15off2

 

If you're still having trouble with it please send me a message and I will make sure you get the correct price.

Link to post
Share on other sites

Update:

 

- Fixed: major bug that would close the program to crash when a PDF link (without the .pdf extension) was called
 
- Default script to scrape external domains has been modified now to have generic 
variable and list names to avoid confusion if you edit it to your own liking
 
- Improved default script (scrape external domains) - it now processes the domains by trimming them
to their hostname and removing duplicates (there may be subdomains still though)
 
- Ability to save data while the program is running
 

 

- Added a few more status statements in case the bot gets stuck it may help improve future bugs detection
  • Like 1
Link to post
Share on other sites

I got 2 questions how documented is the source code and and grabing vídeos from sites, os that just the url or can we snatch the file ?

 

The source code has comments where I felt they needed to be - so not every node is commented. The source code video at the moment I am not happy with - due to my location at the moment it's the best I could do. After the next update there is a good chance that the source code is going to look a lot different. For these reasons I will be shooting new videos for that as early as this weekend and as late at next weekend (travelling plus holiday stuff). I have also been requested to better document how to go about editing the source code to add in your own code for scraping so that will also be available soon as well.

 

I don't quite understand the second question. But I can say that when you land on the page you should be able to scrape anything that is in the HTML via regex or the xpath parser. If you have any examples of what you want to scrape I can check it out for you and provide you with the proper code if it is possible.

Link to post
Share on other sites

if i have understand correctly, this source code is the bot 'engine' that enable scraping 1000s of pages without the bot overloaded, but when come to scraping the specific data i need, i have to write my own regex or xpath. am i right?

 

i have your regex builder so i think regex is not a big issue, but how to get xpath? can you please share how to do it?

 


 

Link to post
Share on other sites

if i have understand correctly, this source code is the bot 'engine' that enable scraping 1000s of pages without the bot overloaded, but when come to scraping the specific data i need, i have to write my own regex or xpath. am i right?

 

i have your regex builder so i think regex is not a big issue, but how to get xpath? can you please share how to do it?

 

 

 

There are parts of the source code that can be edited to make it scrape whatever you want from the page via regex or xpath. I am willing to edit it for you as stated in the sales thread - basically as long as it is possible and a reasonable request I'll do it.

 

I'm going to send you a PM and we can talk about it.

Link to post
Share on other sites

Hey Nick,

 

Just started looking at this today and have an issue showing up. See this vid for details...

 

http://screencast.com/t/rttTtSoLt

 

The explanation video is very detailed making the code easy to understand. This bot will save me a ton of time.

 

Some additions to include...

  • Build in the functionality to allow relative path pages to be scraped. EG.  ../page1.html 
  • I didn't spot this in the code but... add the ability to skip/filter urls that point to an anchor on a page. EG.  ../product1.html#picture1 or http://example/page1.html#picture1

Thanks,

Pete

Link to post
Share on other sites

Im trying to buy this with my account on your website.

Im resetting my forgotten password, but when I try to login with the changed password (clicked from received email: [iM Autobots, Software For Internet Marketers] Password Reset )

 

Please let me know if you can reset my password so I can buy it with my account.

Thanks

Link to post
Share on other sites

I am going to be replying to PM's and emails today - I got sent a lot on the 22nd - sorry for not replying to everyone straight away but I wanted to finish sorting out the next update; it is nearly done now.

 

The next update is mostly focused on reducing the memory usage which in several 1 hour tests I did between the old version and this new one the memory usage has been reduced 20-30% after an hour (in shorter tests it is not as noticeable so hopefully in longer tests it will save more; this might make sense because many of the lists have been replaced by a plugin which handles them better).

 

A part of me wants to tinker with it forever until it is perfect but I'd rather release the update now - the reason being is a lot of code has changed. So in the update after this one the code will probably change much less which means I can hopefully get the source code video redone at that point.

  • Like 1
Link to post
Share on other sites

Update:

 

Okay so I kind of pushed this out a little quicker than I wanted to but with that in mind this is going to be part 1 of a series of updates. This update is all about memory and the next update and maybe the one after that will also mainly focus on memory.

 

Because a lot has changed (mostly with lists) please let me know of any bugs and I'll try to get them fixed quickly.

 

There are two more plugins required now (both are free):

 

http://www.ubotstudio.com/forum/index.php?/topic/15327-free-local-dictionary-plugin-local-variables-issue-workaround/

http://www.ubotstudio.com/forum/index.php?/topic/16308-free-plugin-large-data/

 

To download the update login at: http://imautobots.com/wp-login.php

 

Then go back to the homepage and click on "Purchase History"

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...