Jump to content
UBot Underground

awesome sauce

Members
  • Content Count

    22
  • Joined

  • Last visited

Posts posted by awesome sauce

  1. I was wondering how you would combine multiple data sets to ensure that the data actually matches? For example, say you were scraping retail websites for prices and wanted to make a price comparison website. How would you combine the data so that "Hasbro Hulk Action Figure" and "6-inch Hulk Figure" match if they are the same product, just named differently on the different websites?

     

    I don't really need specific code, I was more just wondering how one would go about combining multiple data sets without doing it manually?

  2. Alright, so I changed the list so it doesn't delete duplicates. However, for some reason everything is being duplicated... a lot. Out of less than 52 items, the following code returned 1144 emails (many duplicated and the blank lines aren't there).

    define Clean Business Data {
        clear list(%email_address_clean)
        loop($list total(%email_address)) {
            add list to list(%email_address_clean,$list from text($replace(%email_address,"mailto:",$nothing),$new line),"Don\'t Delete","Global")
        }
    }
    
  3. That helped me be able to remove mailto:, however it's removing my blank values from my first list. 

     

    Example:

    %email_address has 118 values, some of which are filled with $nothing because there isn't an email for that.

     

    %email_address_clean then only has 44 items and has removed all the lines with $nothing. 

     

    Is it possible to use replace but keep the $nothing values intact? (then I should have 118 items in my list, even though I just have 44 email addresses)

     

    Here is the code I'm using to remove mailto:

    define Clean Business Data {
        loop($list total(%email_address)) {
            add list to list(%email_address_clean,$list from text($replace(%email_address,"mailto:",$nothing),$new line),"Delete","Global")
        }
    }
    
  4. I'm scraping a website that when I scrape the email addresses they have 'mailto:' at the start. What I would like to do is remove 'mailto:' so I just have the address. The email addresses are currently stored in a list. 

     

    Here's the code I have, but it's not doing anything. I can't quite figure it out because I can't use replace with the if statement. Any suggestions?

    define Clean Business Data {
        loop($list total(%email_address)) {
            set(#email_row,$next list item(%email_address),"Global")
            set(#email_row_clean,$replace(#email_row,"mailto:",""),"Global")
        }
    }
    
  5. I hate to have to make a post about this, but it looks like it's the only way I can solve my problems with this software. 

     

    Basically every script that I make ends up throwing out errors. I don't know what the errors mean, as there is nothing in the documentation about it, there is nothing on Google about it, and support apparently can't help either. 

     

    So I made a ticket about a script that keeps failing in the exact same spot. I provided the script. Buddy at support asked me for the the keyword and zip I used to get the error to come up every time. I gave that, then he runs the script a few times and comes back in a ticket and says he can't get an error to pop up and provided the following screenshot. 

     

    http://screencast.com/t/5DeWpIOv

     

    The screenshot is incomplete. He didn't even run the whole script. If you just look at the data the script is supposed to scrape, you can clearly see that the script didn't fully complete as Buddy said. There should be ~290 results in the business_page_urls_cleaned list, not less than 100. After I explained this to Buddy he said this: 

     

    "I pasted your script in my UBot and the put your data into the

    textboxes and then I clicked Run. I did that twice today and it ran and stopped on its own. I did nothing else. So for me it
    looks like it ran okay.

    As long as there is no error then it looks good to me.

    I will run it a couple of more times but unless a problem pops
    up for me to see then I must conclude its okay. Here in Support
    we are not in a position to Debug."

     

    I even made Buddy a video to show it failing: https://youtu.be/Lv_9kgXKySY

     

    Great. I understand UBot support can't provide debugging help. However, I believe that UBot should know what causes the errors, especially if they don't provide public documentation about the errors online. I get these same errors on EVERY script I make, therefore if support doesn't even know the source of the problem, how am I supposed to use the software?

     

    I'm getting very tired of Buddy's responses to any ticket I make. He's completely unhelpful and just tries to push you away. I'm sorry, but UBot also charges for support... and there is no way I would every pay for this "support".

     

    Here is the script. It always fails on about the 7-8th loop of Scrape Business Data. 

    script removed
    
  6. So to add a little bit more about testing my script about 5 times before failure. 

     

    It seems that the more times I 'test' a script in UBot Studio, the more likely it is to fail. For example, I modified the script in my last post and ran it a couple of times.  The first time I run it I will get 268 items (and is lightening fast), but if I run it anymore than that one time in UBot Studio, the script misses items to scrape, or just fails by freezing or not completing (and is very slow). 

     

    Why does UBot do this?

  7. Anything you can tell us about the particulars of the crashing would be VERY helpful! 

     

    Sure. I can tell you that the software is unstable. No matter what simple scripts I make on any website, they crash, fail, and throw errors. 

     

    It seems to me that it's either something with a loop or a list, but I don't know.

     

    Here's another script that I made that fails half way through. 

    clear list(%business_page_urls)
    ui text box("Business Name or Keyword:",#keyword)
    ui text box("City or Zip Code:",#location)
    navigate("http://www.yellowpages.com/","Wait")
    wait for browser event("Everything Loaded","")
    type text(<name="search_terms">,#keyword,"Standard")
    type text(<name="geo_location_terms">,#location,"Standard")
    click(<value="Search">,"Left Click","No")
    wait for browser event("Everything Loaded","")
    define Scrape URLs {
        wait for browser event("Everything Loaded","")
        wait(.2)
        if($exists(<rel="nofollow">)) {
            then {
                change attribute(<rel="nofollow">,"innertext","")
            }
            else {
            }
        }
        add list to list(%business_page_urls,$scrape attribute(<rel="nofollow">,"fullhref"),"Don\'t Delete","Global")
        if($exists(<class=w"next *">)) {
            then {
                click(<class=w"next *">,"Left Click","No")
                wait for browser event("Everything Loaded","")
            }
            else {
                stop script
                alert("Got to the end and no pages remain.")
            }
        }
    }
    loop while($exists(<class=w"next *">)) {
        Scrape URLs()
    }
    
    

    After testing my script ~5 times or so in UBot Studio (any script I have made so far), the program itself fails on everything (the script doesn't complete, errors pop up that didn't when it successfully ran before, it freezes, etc.). The only solution I have found it restarting the program completely. I don't think this should happen for $1000 software. 

     

    I would really like to find a way to actually use this software productively, but I keep running into problems making it impossible. 

     

    UPDATE: I was finally able to get the script to complete, but I still have the problem of needing to close the software before I run any script. 

  8. Maybe it's just the server I'm running it on. Does anyone else here run UBot on AWS? If so, how's your performance?

     

    I just rebooted my server again and started the script. On about the 9th object to scrape UBot froze. I had the performance monitor up and it appears that my processor is getting maxed out. 

     

    This page is indicating that I may not even have enough RAM to run UBot. However, looking around the net, it appears other people are running UBot with less than the suggested 2GB RAM. I'm not sure what version they are running though. Is anyone running version 5 with ~1GB RAM?

     

    Should I upgrade my server?

  9. I've been following the below tutorial to make what I would consider a very basic bot:

     

     

    However, I can't get the bot to complete without crashing or throwing errors. Here's one error I've got:

     

    Error converting value True to type 'System.Collection.Generic.List'1[system.String]'.Path", line, position 4

     

    The script is only trying to grab 36 items. Here's my version of the script:

     

    ui stat monitor("Movie Titles:",$list total(%movie titles))

    ui stat monitor("Thumbnails:",$list total(%thumbnailurls))
    ui stat monitor("Full Size:",$list total(%largeimageurl))
    define Scrape Data {
        clear table(&movieposters)
        clear list(%movie titles)
        clear list(%thumbnailurls)
        clear list(%largeimageurl)
        clear list(%movieposterurls)
        wait for browser event("Page Loaded","")
        add list to list(%thumbnailurls,$scrape attribute(<class="thmbd galImage">,"src"),"Delete","Global")
        add list to table as column(&movieposters,0,1,%thumbnailurls)
        add list to list(%movieposterurls,$scrape attribute(<class="nuln shortenedTitle">,"fullhref"),"Delete","Global")
        loop($list total(%movieposterurls)) {
            navigate($next list item(%movieposterurls),"Wait")
            wait for browser event("Everything Loaded","")
            add item to list(%movie titles,$scrape attribute(<tagname="h1">,"innertext"),"Don\'t Delete","Global")
            add list to table as column(&movieposters,0,0,%movie titles)
            add item to list(%largeimageurl,$scrape attribute(<class="mainImage shadow">,"src"),"Don\'t Delete","Global")
        }
        add list to table as column(&movieposters,0,2,%largeimageurl)
        save to file("{$special folder("Desktop")}/movieposters.csv",&movieposters)
    }
    Scrape Data()

     

      

    Each time I run the script it crashes, which then causes UBot Studio to be extremely slow. I can't use the program after attempting to run the script without pulling my hair out, so I have to spend 10 min trying to get the program to close so I can re open it. Sometimes I even have to restart the server to get UBot back to a state that is usable. 

     

    I'm running UBot on AWS with 1 CPU and 1GB RAM. 

     

    What can I do to A) Have this script run to completion B) Stop UBot from being so slow all the time. 

  10. Hi,

     

    I'm trying to scrape a website that shows a lot of ads around the listings. I'm only trying to scrape the organic listings. The way the site is set up it has different classes for the ads and the organic listings. 

     

    So what I am trying to do is select the "organic" class, then select all of the "business" classes that are all inside of the "organic" class and save them to a list. Is this possible with UBot?

     

    P.S. The "business" class exists in the ads classes as well, so I need to do it this way I think.

  11. Hi guys,

     

    I just bought Ubot the other day and am working on my first project. I would like to scrape the first page of google for a query and get all of the links.

     

    This is what I have so far for the scraping part of the program. I for some reason can't extract the URL from class r, even though it appears to me by looking in the source code this should work.

    clear list(%scraped urls)
    add list to list(%scraped urls,$scrape attribute($element offset(<class="r">,0),"fullhref"),"Don\'t Delete","Global")
    Hopefully it's just a noob mistake that's a simple fix. 
×
×
  • Create New...