Jump to content
UBot Underground

Recommended Posts

Hi, I bought Ubot standard edition, back in 2014 I think. Even though Ubot is supposed to be for the non-programmer, it always seemed too complicated to me, so I have never really used it. But I would like to try again. I need to replace a scraper program. I still have the setup files and the license key, but since the dev has gone out of business, it can't phone home to verify the license. And the computers I have it on have died. 

 

I have been watching some of the how-to videos on youtube, but I still don't have a clue what to do. I'm hoping that someone can sort of give me an outline of which functions (is that the right word?) I need to work with. Here is what it needs to do:

 

First, I have a list of urls, in a text file. One per line. In this case, they are redirects, so I need to save the original url, and get the url that it redirects to, and also save that. 

 

Next, I need to get things off the page source. The old scraper program (Happy Harvester) would get the text between x and y. Each thing you wanted to save was added as a "rule". For example, the text between <title> and </title>. Or the text between "<a href=" and ">Contact</a>" (which would give you the url to their contact page - if it existed). 

 

The program would save all this info in a csv file. 

 

I've seen the Ubot page scraping functions, but they seem to work on the live side and not the source side. 

 

I'm not asking for a total detailed how-to, but hoping someone can tell me to use "this" for my list of urls, and "this" to save the 2 url infos, and "this" to get to the page source, and "this" to save the various texts between x and y. Just sort of an outline. And then I can hopefully watch the videos and read the tutorials to figure out the rest. 

 

Really appreciate any help! 

Link to post
Share on other sites


clear list(%urls)
add list to list(%urls,$list from text("http://www.yahoo.com
http://www.bing.com",$new line),"Delete","Global")
clear list(%title)
set list position(%urls,0)
loop($list total(%urls)) {
    set(#curr,$list item(%urls,$list position(%urls)),"Global")
    navigate(#curr,"Wait")
    wait for browser event("Page Loaded","")
    divider
    comment("scrape text in between")
    set(#pagehtml,$document text,"Global")
    set(#title,$plugin function("File Management.dll", "$Find Regex First", #pagehtml, "(?<=<title>).*(?=</title>)"),"Global")
    set(#title,$plugin function("File Management.dll", "$Find Regex First", #pagehtml, "(?<=<title>).*(?=</title>)"),"Global")
    set(#keyword,$plugin function("File Management.dll", "$Find Regex First", #pagehtml, "(?<=\\<meta name\\=\\\"keywords\\\" content\\=\\\").*?(?=\\\"\\>)"),"Global")
    alert("title: {#title}
head: {#keyword}")
    add item to list(%title,"{#title},{#keyword}","Don\'t Delete","Global")
    divider
    set(#next,$next list item(%urls),"Global")
}
save to file("{$special folder("Application")}\\test.txt",%title)

 

Link to post
Share on other sites

to get page source

 

 

 

set(#pagehtml,$document text,"Global")

or 

 

 

 

set(#pagehtml,$page scrape("<html","</html>"),"Global")

 


to find text value in between x & y

 

 

<license>abc</license>

(?<=<license>).*(?=</license>)
answer: abc

some other useful regex

match email:
 

 

([a-z0-9][-a-z0-9_\+\.]*[a-z0-9])@([a-z0-9][-a-z0-9\.]*[a-z0-9]\.(arpa|root|aero|biz|me|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)|([0-9]{1,3}\.{3}[0-9]{1,3}))

to match url:
 

 

(?:https?:\/\/)?(ibilik\.)([a-zA-Z\.]{2,6})([\/\w\.-]*)*\/?

to match phone
 

 

(\S*\d+\S*){8,16}

(\S*[\d ]\S*){8,16}
\(?\d+\)?[-.\s]?\d+[-.\s]?\d+
 
Cynthia Ol2.345.6789 ( contact/whatsApp )
 
washing machine 
 
Result: Ol2.345.6789
Link to post
Share on other sites
  • 5 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...