DjProg

April 11, 2016

I would bet just doing a multithread "browsing 200 sites" bot will crash with v39... and from all the reports on the forum it's not a usable version.

Yet I'll try to see if I can find what version of my bot had this issue, i.e. at what point I switched to v21.

April 11, 2016

Well now I'm dumping my table in CSV for each loop, this way if Ubot goes crazy I'll still know at what loop it crashed and still have the results before the crash.

April 9, 2016

Well by looking at the debugger it's not even that much: 28000 rows and 8 columns...

What's wrong again with Ubot ?

=> IS THERE A WAY TO "SAVE MY DEBUGGER DATA" ?? Because clearly the data is still there...

April 9, 2016

Hello guys,

Another problem for no apparent reason: System.OutOfMemoryException was thrown... when doing a Save to file of a table of only MAX 50K lines !!

Clearly this shouldn't happen, and when I check the system monitor, I can confirm that there is a TON of memory available.

That's the system monitor when running the node...

http://screencast.com/t/UEXhuP58

Needless to say that there is plenty of room to save a d*mn text file.

Any idea ? I lost 3 hours+ of scrape due to this bug. I'm trying not to look to upset but I can tell you I AM !!

Cheers,

April 9, 2016

Update: thanks again Dan, the bloody v39 Ubot browser IS 100% the culprit... changed to v21 and it works flawlessly...

Only downside of this is... can't change the Http Browser Headers with this old version.

April 9, 2016

Thanks Dan, I'll try this and report back.

I don't think my multithread is the culprit as even if I run my old (unthreaded) bot, it gets the white browser screen of death after visiting maybe 40 URLs.

By the way I'm using 5.9.18, I hope it's not like a couple years ago, a.k.a. current version is buggy like hell and I would need to use V4 (?)

April 9, 2016

...until there is simply not a single working browser in any thread !

I'm wondering what the issue is here.

=> Could it be that the "navigate" function with "wait" enabled would crash if somehow the page doesn't load ? (of course I've visited the suspicious pages and they seem to work fine).. I'm expecting it to timeout instead of resulting in a "forever loading browser of dead) but maybe that's the cause (?).

As it's a scrapper I don't see much which could go wrong, especially crash the browsers.

Any idea ? I'm pretty sure it's a very common problem.

Cheers,

April 9, 2016

Thanks !!

April 9, 2016

Thanks Dan.

I've had to put 1 second instead of 0.5 but it's working now (well only the multithread issue I had... now I have the browser hanging / not responding after visiting a few dozens sites... "the white browser forever loading wheel of death" I've opened a ticket for this as it doesn't seem normal at all)

Cheers,

April 9, 2016

Tested again with 3 threads but a bigger test URLs list :

http://www.booking.com/

https://www.airbnb.com

http://www.alexa.com/

https://login.live.com/

https://uk.linkedin.com/

https://www.mozilla.org/

http://www.apple.com/

http://www.linux.org/

And it's the complete havoc:

2016-04-09 14:16:46 [LOG] End crawling>>> http://yahoo.com

2016-04-09 14:16:47 [LOG] End crawling>>> http://yahoo.com

2016-04-09 14:16:49 [LOG] End crawling>>> http://yahoo.com

2016-04-09 14:16:55 [LOG] End crawling>>> http://bing.com

2016-04-09 14:17:04 [LOG] End crawling>>> http://www.booking.com/

2016-04-09 14:17:04 [LOG] End crawling>>> http://ebay.com

2016-04-09 14:17:16 [LOG] End crawling>>> https://www.airbnb.com

2016-04-09 14:17:16 [LOG] End crawling>>> https://login.live.com/

2016-04-09 14:17:17 [LOG] End crawling>>> https://login.live.com/

Tons of missed URLs, and instead duplicates of URLs... and it's not specifically at the "start" of the multithreading.

I would bet I'm not the first one to have this kind of behavior, any tips ?

Thanks,

April 9, 2016

Hello guys,

I have a somewhat funky behavior when running my multithread script here :

reset browser
clear cookies
ui drop down("Max threads","1,2,3,4,5,6,7,8,9,10",#max_threads)
ui block text("URLs to crawl",#ui_URLs)
clear list(%urls)
add list to list(%urls,$list from text(#ui_URLs,$new line),"Delete","Global")
set(#url_crawling_position,"-1","Global")
set(#used_threads,0,"Global")
loop($list total(%urls)) {
    loop while($comparison(#used_threads,">= Greater than or equal to",#max_threads)) {
        wait(1)
    }
    loop_process()
}
define loop_process {
    increment(#used_threads)
    increment(#url_crawling_position)
    scraping_procedure()
}
define scraping_procedure {
    thread {
        in new browser {
            set(#navigate_url,$list item(%urls,#url_crawling_position),"Local")
            navigate(#navigate_url,"Wait")
            wait(5)
            decrement(#used_threads)
            log("End crawling>>> {#navigate_url}")
        }
    }
}

If I set the Max threads at 3...

And run with this set of test URLs :

http://www.booking.com/

https://www.airbnb.com

It'll "skip" google.com and amazon.com to crawl 3 times yahoo.com instead... as you can see in my log:

2016-04-09 13:57:32 [LOG] End crawling>>> http://yahoo.com

2016-04-09 13:57:34 [LOG] End crawling>>> http://yahoo.com

2016-04-09 13:57:38 [LOG] End crawling>>> http://yahoo.com

2016-04-09 13:57:42 [LOG] End crawling>>> http://bing.com

2016-04-09 13:57:54 [LOG] End crawling>>> http://ebay.com

2016-04-09 13:57:54 [LOG] End crawling>>> http://www.booking.com/

2016-04-09 13:58:03 [LOG] End crawling>>> https://www.airbnb.com

Any idea where I screwed up ? I really don't see how can this happen as I'm incrementing the url_crawling_position BEFORE the thread and navigate... so it shouldn't have the same value 3 times

Thanks a lot,

Cheers,

April 8, 2016

Thanks !

I forgot to say but I'm adding the scraped attributes to a List.

So after adding to list I would need to loop thought my list to "replace" the dirty innerHTML into the "cleaned", regexed text ? Or it there a more elegant solution ?

CHeers,

April 7, 2016

Hello guys,

What is the best way to "clean" an innerHTML scraped attribute ?

Basically I'm scraping an innerHTML containing an empty div "inline styled", for which I need to find the inline styled background-image URL...

I scraped the innerHTML of the parent div of blahblah because else I didn't get what I needed, but now I need to clean up a bit.

Any tip is welcome !

Thanks a lot,

Cheers,

March 11, 2016

Hello,

Unfortunately this simply doesn't work at all. Somehow the click dialog doesn't work with this modal window, it simply doesn't seem to do anything: the modal window appear and stay there...

Any other idea?

Cheers,

March 10, 2016

Hello guys,

Since 2 days I keep getting a tons of licensing server issues like that:

http://screencast.com/t/qQmAJ37jCza

Obviously everytime the licensing server is down, the support is down too.

VERY annoying...

What's the ETA to solve this ? I can't believe the system isn't redundant.

Cheers,

March 9, 2016

Hello guys,

I'm having a hard time trying to save a CSV that's dynamically generated by the site I'm using (can't post it, you need a payed account to see the behavior).

Basically the site has a link:

<span ng-click="exportCsv()" class="pagination-results-export-csv" ng-show="rows.length > 0">Export Results as CSV</span>

which triggers an "exportCsv()" function, which triggers the creation of the CSV file which is then "pushed" to the browser (once it's generated, which takes 3-30 seconds) via a Download file dialog window (a bit like automatic downloads if you see what I mean).

Unfortunately using the click dialog button doesn't work:

click(<innertext="Export Results as CSV">,"Left Click","No")
wait(20)
plugin command("WindowsCommands.dll", "click dialog button", "Enregistrer sous", "Enregistrer")

I also can't run the "click dialog" node alone from Ubot as it seems like the Dialog won't let me set the focus anywhere else...

Any idea?

Thanks !!

Cheers,

July 24, 2014

Awesome thanks !!

So simple I never thought about it, lol !!!

July 24, 2014

Hello all,

I'm looking for a way to get the complete rendered page text (i.e. what the visitor see, NOT the HTML... so I want the html rendered !).

Any idea ?

I think I've tried many times without success but now I think that maybe with the new functions it can work (maybe using the new "Windows" commands ?).

Thanks a lot,

Cheers,

November 20, 2011

Try to create a folder first using the "Create Folder" command before downloading.

Thanks... goshhh am i dumb, I never saw this create folder function before... :blink:

Cheers,

November 20, 2011

Something like this?

add list to list(%randomization, $list from text("ubot software studio", " "), "Delete", "Global")
set(#random, "{$random list item(%randomization)} {$random list item(%randomization)} {$random list item(%randomization)}", "Global")
load html(#random)

Hi Kreatus,

I'm not sure I understand exactly how I should interpret this code.

Is it version 4 ?

To be honest I'm still using version 3.5.something as the last time I tried version 4 there was too much issues.

If you have the visual interpretation of you code it would really help :rolleyes:

Thanks a lot,

Cheers,

November 19, 2011

Hello again

I am having a really hard time trying to find a way to generate words permutations...

By this I mean I have a string :

ubot studio software

And i'm trying to generate all uniques permutations (word level) to have at the end:

ubot software studio

studio ubot software

studio software ubot

...etc

How can I do such thing with Ubot ? I really have no clue how to proceed, maybe the solution would be with javascript but I still don't know how... :huh:

Any suggestion ?

Cheers,

November 19, 2011

Hello Guys,

Quick question : can the download file actually "create" a target folder ?

It seems like it's not supported as this script :

http://screencast.com/t/9EnYQ1vr

...doesn't create a new folder and save files (I test before if the folder-name variable isn't empty of course).

So is it not supported or is it just me doing something weird ?

Thanks a lot,

Cheers,

October 12, 2011

Ok, I didn't have any issues with any of this...

After the pause is the code for the registration portion, however, you may need to nav back to the home page...you can alter it as you need to, but it all works...

John

Thanks a lot i'll try it tonite.

I'm really dumb I did not see it was a video you posted :blink: :blink:

I'll tell you if it all works !

Cheers,

October 12, 2011

This is what I see when I try to register. What buttons should I be pressing?

http://screencast.com/t/YalMZtntHOp

John

Hi John,

You just enter something after the "http:" (it's the future subdomain URL of the blog i'm trying to create).

http://screencast.com/t/X0cXMdrhJf

Then comes a first popup like this (my outsourcer will fill manually the captcha). On this one you need to enter the captcha and click "Etape Suivante":

http://screencast.com/t/620EuDbl

Then finally you'll see the damn registration I try to fill :

http://screencast.com/t/QzLjmiJ9e

Thanks a lot for looking !

Cheers,

October 11, 2011

Hello Guys,

I'm having a hell of a hard time with what I think is a full javascript registration window. <_<

I'm using Ubot Pro 3.5.

The website is :

http://www.eklablog.com/

The first stpe is to fill the future subdomain where the blog will be hosted and click the button and that's easy... but after it's another story

I don't want to fully automate so the first "pop-up" will be manually filled (the one with the captcha).

However after this one is filled I would like to enter all email/account data.

The whole idea is to set a precise delay to let my outsourcers enter the captcha and click on the button and after this delay get a focus on this damn javascript window to fill all fields.

I tried lots of things but I really can't get it to work !!! :angry:

I hope someone gets inspired as my poor javascript knowledges seem to reach their limits !

Thanks a lot

Cheers,

Sign In

DjProg

Content Count

Joined

Last visited

Content Type

Profiles

Forums

Posts posted by DjProg

Multithread Scrapper... Browsers In Threads Crash One After The Other...

Out Of Memory... When Writing Max 50K Lines To A Text File ?

Out Of Memory... When Writing Max 50K Lines To A Text File ?

Out Of Memory... When Writing Max 50K Lines To A Text File ?

Multithread Scrapper... Browsers In Threads Crash One After The Other...

Multithread Scrapper... Browsers In Threads Crash One After The Other...

Multithread Scrapper... Browsers In Threads Crash One After The Other...

Best Way To "clean" An Innerhtml Scraped Attribute ?

Multithread Funky Behavior When Starting...

Multithread Funky Behavior When Starting...

Multithread Funky Behavior When Starting...

Best Way To "clean" An Innerhtml Scraped Attribute ?

Best Way To "clean" An Innerhtml Scraped Attribute ?

Saving Dynamically Generated Csv (Dialog Issues)

Never-Ending Licencing Server Issues

Saving Dynamically Generated Csv (Dialog Issues)

Grabing all rendered page text

Grabing all rendered page text

Download file can't create a new folder ?

Generating Words Permutations : How ?

Generating Words Permutations : How ?

Download file can't create a new folder ?

Help with a hard one : full javascript registration (?)

Help with a hard one : full javascript registration (?)

Help with a hard one : full javascript registration (?)

Browse

Activity