DjProg
-
Content Count
86 -
Joined
-
Last visited
Posts posted by DjProg
-
-
Well now I'm dumping my table in CSV for each loop, this way if Ubot goes crazy I'll still know at what loop it crashed and still have the results before the crash.
-
Well by looking at the debugger it's not even that much: 28000 rows and 8 columns...
What's wrong again with Ubot ?
=> IS THERE A WAY TO "SAVE MY DEBUGGER DATA" ?? Because clearly the data is still there...
-
Hello guys,
Another problem for no apparent reason: System.OutOfMemoryException was thrown... when doing a Save to file of a table of only MAX 50K lines !!
Clearly this shouldn't happen, and when I check the system monitor, I can confirm that there is a TON of memory available.
That's the system monitor when running the node...
http://screencast.com/t/UEXhuP58
Needless to say that there is plenty of room to save a d*mn text file.
Any idea ? I lost 3 hours+ of scrape due to this bug. I'm trying not to look to upset but I can tell you I AM !!
Cheers,
- 1
-
Update: thanks again Dan, the bloody v39 Ubot browser IS 100% the culprit... changed to v21 and it works flawlessly...
Only downside of this is... can't change the Http Browser Headers with this old version.
-
Thanks Dan, I'll try this and report back.
I don't think my multithread is the culprit as even if I run my old (unthreaded) bot, it gets the white browser screen of death after visiting maybe 40 URLs.
By the way I'm using 5.9.18, I hope it's not like a couple years ago, a.k.a. current version is buggy like hell and I would need to use V4 (?)
-
...until there is simply not a single working browser in any thread !
I'm wondering what the issue is here.
=> Could it be that the "navigate" function with "wait" enabled would crash if somehow the page doesn't load ? (of course I've visited the suspicious pages and they seem to work fine).. I'm expecting it to timeout instead of resulting in a "forever loading browser of dead) but maybe that's the cause (?).
As it's a scrapper I don't see much which could go wrong, especially crash the browsers.
Any idea ? I'm pretty sure it's a very common problem.
Cheers,
-
-
Thanks Dan.
I've had to put 1 second instead of 0.5 but it's working now (well only the multithread issue I had... now I have the browser hanging / not responding after visiting a few dozens sites... "the white browser forever loading wheel of death" I've opened a ticket for this as it doesn't seem normal at all)
Cheers,
-
Tested again with 3 threads but a bigger test URLs list :
And it's the complete havoc:2016-04-09 14:16:46 [LOG] End crawling>>> http://yahoo.com2016-04-09 14:16:47 [LOG] End crawling>>> http://yahoo.com2016-04-09 14:16:49 [LOG] End crawling>>> http://yahoo.com2016-04-09 14:16:55 [LOG] End crawling>>> http://bing.com2016-04-09 14:17:04 [LOG] End crawling>>> http://www.booking.com/2016-04-09 14:17:04 [LOG] End crawling>>> http://ebay.com2016-04-09 14:17:16 [LOG] End crawling>>> https://www.airbnb.com2016-04-09 14:17:16 [LOG] End crawling>>> https://login.live.com/2016-04-09 14:17:17 [LOG] End crawling>>> https://login.live.com/Tons of missed URLs, and instead duplicates of URLs... and it's not specifically at the "start" of the multithreading.I would bet I'm not the first one to have this kind of behavior, any tips ?Thanks, -
Hello guys,
I have a somewhat funky behavior when running my multithread script here :
reset browser clear cookies ui drop down("Max threads","1,2,3,4,5,6,7,8,9,10",#max_threads) ui block text("URLs to crawl",#ui_URLs) clear list(%urls) add list to list(%urls,$list from text(#ui_URLs,$new line),"Delete","Global") set(#url_crawling_position,"-1","Global") set(#used_threads,0,"Global") loop($list total(%urls)) { loop while($comparison(#used_threads,">= Greater than or equal to",#max_threads)) { wait(1) } loop_process() } define loop_process { increment(#used_threads) increment(#url_crawling_position) scraping_procedure() } define scraping_procedure { thread { in new browser { set(#navigate_url,$list item(%urls,#url_crawling_position),"Local") navigate(#navigate_url,"Wait") wait(5) decrement(#used_threads) log("End crawling>>> {#navigate_url}") } } }
If I set the Max threads at 3...
And run with this set of test URLs :
It'll "skip" google.com and amazon.com to crawl 3 times yahoo.com instead... as you can see in my log:2016-04-09 13:57:32 [LOG] End crawling>>> http://yahoo.com2016-04-09 13:57:34 [LOG] End crawling>>> http://yahoo.com2016-04-09 13:57:38 [LOG] End crawling>>> http://yahoo.com2016-04-09 13:57:42 [LOG] End crawling>>> http://bing.com2016-04-09 13:57:54 [LOG] End crawling>>> http://ebay.com2016-04-09 13:57:54 [LOG] End crawling>>> http://www.booking.com/2016-04-09 13:58:03 [LOG] End crawling>>> https://www.airbnb.comAny idea where I screwed up ? I really don't see how can this happen as I'm incrementing the url_crawling_position BEFORE the thread and navigate... so it shouldn't have the same value 3 timesThanks a lot,Cheers, -
Thanks !
I forgot to say but I'm adding the scraped attributes to a List.
So after adding to list I would need to loop thought my list to "replace" the dirty innerHTML into the "cleaned", regexed text ? Or it there a more elegant solution ?
CHeers,
-
Hello guys,
What is the best way to "clean" an innerHTML scraped attribute ?
Basically I'm scraping an innerHTML containing an empty div "inline styled", for which I need to find the inline styled background-image URL...
<div class="blablah" style="height:120px;background-image:url(http://somewhere.com/image.jpeg)"></div>
I scraped the innerHTML of the parent div of blahblah because else I didn't get what I needed, but now I need to clean up a bit.
Any tip is welcome !
Thanks a lot,
Cheers,
-
Hello,
Unfortunately this simply doesn't work at all. Somehow the click dialog doesn't work with this modal window, it simply doesn't seem to do anything: the modal window appear and stay there...
Any other idea?
Cheers,
-
Hello guys,
Since 2 days I keep getting a tons of licensing server issues like that:
http://screencast.com/t/qQmAJ37jCza
Obviously everytime the licensing server is down, the support is down too.
VERY annoying...
What's the ETA to solve this ? I can't believe the system isn't redundant.
Cheers,
-
Hello guys,
I'm having a hard time trying to save a CSV that's dynamically generated by the site I'm using (can't post it, you need a payed account to see the behavior).
Basically the site has a link:
<span ng-click="exportCsv()" class="pagination-results-export-csv" ng-show="rows.length > 0">Export Results as CSV</span>
which triggers an "exportCsv()" function, which triggers the creation of the CSV file which is then "pushed" to the browser (once it's generated, which takes 3-30 seconds) via a Download file dialog window (a bit like automatic downloads if you see what I mean).
Unfortunately using the click dialog button doesn't work:
click(<innertext="Export Results as CSV">,"Left Click","No") wait(20) plugin command("WindowsCommands.dll", "click dialog button", "Enregistrer sous", "Enregistrer")
I also can't run the "click dialog" node alone from Ubot as it seems like the Dialog won't let me set the focus anywhere else...
Any idea?
Thanks !!
Cheers,
-
Awesome thanks !!
So simple I never thought about it, lol !!!
-
Hello all,
I'm looking for a way to get the complete rendered page text (i.e. what the visitor see, NOT the HTML... so I want the html rendered !).
Any idea ?
I think I've tried many times without success but now I think that maybe with the new functions it can work (maybe using the new "Windows" commands ?).
Thanks a lot,
Cheers,
-
Try to create a folder first using the "Create Folder" command before downloading.
Thanks... goshhh am i dumb, I never saw this create folder function before...
Cheers,
-
Something like this?
add list to list(%randomization, $list from text("ubot software studio", " "), "Delete", "Global") set(#random, "{$random list item(%randomization)} {$random list item(%randomization)} {$random list item(%randomization)}", "Global") load html(#random)
Hi Kreatus,
I'm not sure I understand exactly how I should interpret this code.
Is it version 4 ?
To be honest I'm still using version 3.5.something as the last time I tried version 4 there was too much issues.
If you have the visual interpretation of you code it would really help
Thanks a lot,
Cheers,
-
Hello again
I am having a really hard time trying to find a way to generate words permutations...
By this I mean I have a string :
ubot studio software
And i'm trying to generate all uniques permutations (word level) to have at the end:
ubot software studio
studio ubot software
studio software ubot
...etc
How can I do such thing with Ubot ? I really have no clue how to proceed, maybe the solution would be with javascript but I still don't know how...
Any suggestion ?
Cheers,
-
Hello Guys,
Quick question : can the download file actually "create" a target folder ?
It seems like it's not supported as this script :
http://screencast.com/t/9EnYQ1vr
...doesn't create a new folder and save files (I test before if the folder-name variable isn't empty of course).
So is it not supported or is it just me doing something weird ?
Thanks a lot,
Cheers,
-
Ok, I didn't have any issues with any of this...
After the pause is the code for the registration portion, however, you may need to nav back to the home page...you can alter it as you need to, but it all works...
John
Thanks a lot i'll try it tonite.
I'm really dumb I did not see it was a video you posted
I'll tell you if it all works !
Cheers,
-
This is what I see when I try to register. What buttons should I be pressing?
http://screencast.com/t/YalMZtntHOp
John
Hi John,
You just enter something after the "http:" (it's the future subdomain URL of the blog i'm trying to create).
Then you click on the big blue button called "créer un blog":
http://screencast.com/t/X0cXMdrhJf
Then comes a first popup like this (my outsourcer will fill manually the captcha). On this one you need to enter the captcha and click "Etape Suivante":
http://screencast.com/t/620EuDbl
Then finally you'll see the damn registration I try to fill :
http://screencast.com/t/QzLjmiJ9e
Thanks a lot for looking !
Cheers,
-
Hello Guys,
I'm having a hell of a hard time with what I think is a full javascript registration window.
I'm using Ubot Pro 3.5.
The website is :
The first stpe is to fill the future subdomain where the blog will be hosted and click the button and that's easy... but after it's another story
I don't want to fully automate so the first "pop-up" will be manually filled (the one with the captcha).
However after this one is filled I would like to enter all email/account data.
The whole idea is to set a precise delay to let my outsourcers enter the captcha and click on the button and after this delay get a focus on this damn javascript window to fill all fields.
I tried lots of things but I really can't get it to work !!!
I hope someone gets inspired as my poor javascript knowledges seem to reach their limits !
Thanks a lot
Cheers,
Multithread Scrapper... Browsers In Threads Crash One After The Other...
in Scripting
Posted
I would bet just doing a multithread "browsing 200 sites" bot will crash with v39... and from all the reports on the forum it's not a usable version.
Yet I'll try to see if I can find what version of my bot had this issue, i.e. at what point I switched to v21.