Jump to content
UBot Underground

DjProg

Fellow UBotter
  • Content Count

    86
  • Joined

  • Last visited

Posts posted by DjProg

  1. Hello guys,

     

    Another problem for no apparent reason: System.OutOfMemoryException was thrown... when doing a Save to file of a table of only MAX 50K lines !!

     

    Clearly this shouldn't happen, and when I check the system monitor, I can confirm that there is a TON of memory available.

     

    That's the system monitor when running the node...

     

    http://screencast.com/t/UEXhuP58

     

    Needless to say that there is plenty of room to save a d*mn text file.

     

    Any idea ? I lost 3 hours+ of scrape due to this bug. I'm trying not to look to upset but I can tell you I AM !!

     

    Cheers,

    • Like 1
  2. ...until there is simply not a single working browser in any thread !

     

    I'm wondering what the issue is here.

     

    => Could it be that the "navigate" function with "wait" enabled would crash if somehow the page doesn't load ? (of course I've visited the suspicious pages and they seem to work fine).. I'm expecting it to timeout instead of resulting in a "forever loading browser of dead) but maybe that's the cause (?).

     

    As it's a scrapper I don't see much which could go wrong, especially crash the browsers.

     

    Any idea ? I'm pretty sure it's a very common problem.

     

    Cheers,

  3. Thanks Dan.

     

    I've had to put 1 second instead of 0.5 but it's working now (well only the multithread issue I had... now I have the browser hanging / not responding after visiting a few dozens sites... "the white browser forever loading wheel of death" I've opened a ticket for this as it doesn't seem normal at all)

     

    Cheers,

  4. Tested again with 3 threads but a bigger test URLs list :

     

     
    And it's the complete havoc:
     
    2016-04-09 14:16:46 [LOG] End crawling>>> http://yahoo.com
    2016-04-09 14:16:47 [LOG] End crawling>>> http://yahoo.com
    2016-04-09 14:16:49 [LOG] End crawling>>> http://yahoo.com
    2016-04-09 14:16:55 [LOG] End crawling>>> http://bing.com
    2016-04-09 14:17:04 [LOG] End crawling>>> http://www.booking.com/
    2016-04-09 14:17:04 [LOG] End crawling>>> http://ebay.com
    2016-04-09 14:17:16 [LOG] End crawling>>> https://www.airbnb.com
    2016-04-09 14:17:16 [LOG] End crawling>>> https://login.live.com/
    2016-04-09 14:17:17 [LOG] End crawling>>> https://login.live.com/
     
    Tons of missed URLs, and instead duplicates of URLs... and it's not specifically at the "start" of the multithreading.
     
    I would bet I'm not the first one to have this kind of behavior, any tips ?
     
    Thanks,
  5. Hello guys,

     

    I have a somewhat funky behavior when running my multithread script here :

    reset browser
    clear cookies
    ui drop down("Max threads","1,2,3,4,5,6,7,8,9,10",#max_threads)
    ui block text("URLs to crawl",#ui_URLs)
    clear list(%urls)
    add list to list(%urls,$list from text(#ui_URLs,$new line),"Delete","Global")
    set(#url_crawling_position,"-1","Global")
    set(#used_threads,0,"Global")
    loop($list total(%urls)) {
        loop while($comparison(#used_threads,">= Greater than or equal to",#max_threads)) {
            wait(1)
        }
        loop_process()
    }
    define loop_process {
        increment(#used_threads)
        increment(#url_crawling_position)
        scraping_procedure()
    }
    define scraping_procedure {
        thread {
            in new browser {
                set(#navigate_url,$list item(%urls,#url_crawling_position),"Local")
                navigate(#navigate_url,"Wait")
                wait(5)
                decrement(#used_threads)
                log("End crawling>>> {#navigate_url}")
            }
        }
    }
    
    

    If I set the Max threads at 3...

     

    And run with this set of test URLs :

     

     
    It'll "skip" google.com and amazon.com to crawl 3 times yahoo.com instead... as you can see in my log:
     
    2016-04-09 13:57:32 [LOG] End crawling>>> http://yahoo.com
    2016-04-09 13:57:34 [LOG] End crawling>>> http://yahoo.com
    2016-04-09 13:57:38 [LOG] End crawling>>> http://yahoo.com
    2016-04-09 13:57:42 [LOG] End crawling>>> http://bing.com
    2016-04-09 13:57:54 [LOG] End crawling>>> http://ebay.com
    2016-04-09 13:57:54 [LOG] End crawling>>> http://www.booking.com/
    2016-04-09 13:58:03 [LOG] End crawling>>> https://www.airbnb.com
     
    Any idea where I screwed up ? I really don't see how can this happen as I'm incrementing the url_crawling_position BEFORE the thread and navigate... so it shouldn't have the same value 3 times
     
    Thanks a lot,
     
    Cheers,
  6. Hello guys,

     

    What is the best way to "clean" an innerHTML scraped attribute ?

     

    Basically I'm scraping an innerHTML containing an empty div "inline styled", for which I need to find the inline styled background-image URL... 

     

    <div class="blablah" style="height:120px;background-image:url(http://somewhere.com/image.jpeg)"></div>

     

    I scraped the innerHTML of the parent div of blahblah because else I didn't get what I needed, but now I need to clean up a bit.

     

    Any tip is welcome !

     

    Thanks a lot,

     

    Cheers,

  7. Hello guys,

     

    I'm having a hard time trying to save a CSV that's dynamically generated by the site I'm using (can't post it, you need a payed account to see the behavior).

     

    Basically the site has a link:

    <span ng-click="exportCsv()" class="pagination-results-export-csv" ng-show="rows.length > 0">Export Results as CSV</span>
    

    which triggers an "exportCsv()" function, which triggers the creation of the CSV file which is then "pushed" to the browser (once it's generated, which takes 3-30 seconds) via a Download file dialog window (a bit like automatic downloads if you see what I mean).

     

    Unfortunately using the click dialog button doesn't work:

    click(<innertext="Export Results as CSV">,"Left Click","No")
    wait(20)
    plugin command("WindowsCommands.dll", "click dialog button", "Enregistrer sous", "Enregistrer")
    
    

    I also can't run the "click dialog" node alone from Ubot as it seems like the Dialog won't let me set the focus anywhere else...

     

    Any idea?

     

    Thanks !!

     

    Cheers,

  8. Hello all,

     

    I'm looking for a way to get the complete rendered page text (i.e. what the visitor see, NOT the HTML... so I want the html rendered !).

     

    Any idea ?

     

    I think I've tried many times without success but now I think that maybe with the new functions it can work (maybe using the new "Windows" commands ?).

     

    Thanks a lot,

     

    Cheers,

  9. Something like this?

     

    add list to list(%randomization, $list from text("ubot software studio", " "), "Delete", "Global")
    set(#random, "{$random list item(%randomization)} {$random list item(%randomization)} {$random list item(%randomization)}", "Global")
    load html(#random)
    

     

    Hi Kreatus,

     

    I'm not sure I understand exactly how I should interpret this code.

     

    Is it version 4 ?

     

    To be honest I'm still using version 3.5.something as the last time I tried version 4 there was too much issues.

     

    If you have the visual interpretation of you code it would really help :rolleyes:

     

    Thanks a lot,

     

    Cheers,

  10. Hello again ;)

     

    I am having a really hard time trying to find a way to generate words permutations...

     

    By this I mean I have a string :

     

    ubot studio software

     

    And i'm trying to generate all uniques permutations (word level) to have at the end:

     

    ubot software studio

    studio ubot software

    studio software ubot

    ...etc

     

    How can I do such thing with Ubot ? I really have no clue how to proceed, maybe the solution would be with javascript but I still don't know how... :huh:

     

    Any suggestion ?

     

    Cheers,

  11. This is what I see when I try to register. What buttons should I be pressing?

     

    http://screencast.com/t/YalMZtntHOp

     

     

    John

     

     

    Hi John,

     

    You just enter something after the "http:" (it's the future subdomain URL of the blog i'm trying to create).

     

    Then you click on the big blue button called "créer un blog":

    http://screencast.com/t/X0cXMdrhJf

     

    Then comes a first popup like this (my outsourcer will fill manually the captcha). On this one you need to enter the captcha and click "Etape Suivante":

    http://screencast.com/t/620EuDbl

     

    Then finally you'll see the damn registration I try to fill :

    http://screencast.com/t/QzLjmiJ9e

     

    Thanks a lot for looking ! :)

     

    Cheers,

  12. Hello Guys,

     

    I'm having a hell of a hard time with what I think is a full javascript registration window. <_<

     

    I'm using Ubot Pro 3.5.

     

    The website is :

    http://www.eklablog.com/

     

    The first stpe is to fill the future subdomain where the blog will be hosted and click the button and that's easy... but after it's another story

     

    I don't want to fully automate so the first "pop-up" will be manually filled (the one with the captcha).

     

    However after this one is filled I would like to enter all email/account data.

     

    The whole idea is to set a precise delay to let my outsourcers enter the captcha and click on the button and after this delay get a focus on this damn javascript window to fill all fields.

     

    I tried lots of things but I really can't get it to work !!! :angry:

     

    I hope someone gets inspired as my poor javascript knowledges seem to reach their limits !

     

    Thanks a lot :)

     

    Cheers,

×
×
  • Create New...