Jump to content
UBot Underground

Need Some Help With The Loop Function


Recommended Posts

Hi,

 

New here so sorry for just asking for some help in the first instance, hopefully be able to offer back advice myself when I get into the software.

 

I am trying to scrape the following data from this url, move to the next page if it exists and place into a csv.

 

http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240

 

Lot number

Title

URL

 

I then need to move to the next page if one exists so I am able to grab all the details for the one sale.

 

So far what I am able to do is scrape all the date from the one page but not move to the next etc.  I have place the items inside the look and outside but must be missing something!

 

Using the if next button command, does it matter the fact its got an offset ?

 

here is my code, its working for the first page fine, a minor issue in that the lot is scraping two additional elements but I can delete these later to make all the same number. 

Is how I have done it the best way.

 

Many thanks

Andrew

 

navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240","Wait")
clear table(&saleroomscrape)
clear list(%scape)
clear list(%title)
clear list(%Lot number)
add list to list(%Lot number,$scrape attribute(<innertext=w"Lot *">,"innertext"),"Delete","Global")
add list to table as column(&saleroomscrape,0,0,%Lot number)
add list to list(%title,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"innertext"),"Delete","Global")
add list to table as column(&saleroomscrape,0,1,%title)
add list to list(%url,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"fullhref"),"Delete","Global")
add list to table as column(&saleroomscrape,0,2,%url)
save to file("C:\\Users\\Andrew\\Desktop\\saleroom.csv",&saleroomscrape)
loop(15) {
    if($exists($element offset(<class="next">,1))) {
        wait(5)
        then {
        }
        else {
            stop script
        }
    }
}

Link to post
Share on other sites

At the bottom of the page, there is a dropdown box that shows the number of pages that are available. The value attribute of each option in that dropdown has the url for that particular page.

 

What I would do is use the following code to grab those urls and put them in a list, then you can loop through the list and navigate to each page to grab the data.

add list to list(%pages,$scrape attribute(<value=w"/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter*">,"value"),"Delete","Global")

Hope this helps.

  • Like 1
Link to post
Share on other sites

don't use the offset, if anything on the page changes the offset probably wont be there

 

navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&","Wait")
if($exists(<class="next">)) {
    then {
        click(<class="next">,"Left Click","No")
    }
    else {
        alert("not there")
    }
}

  • Like 1
Link to post
Share on other sites

Managed to get it to loop through all the pages - great

however its only pulling the first set of page data, so I maust be doing something wrong after it loops.

 

clear list(%scape)
clear list(%title)
clear list(%Lot number)
navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240","Wait")
clear table(&saleroomscrape)
add list to list(%Lot number,$scrape attribute(<innertext=w"Lot *">,"innertext"),"Delete","Global")
add list to table as column(&saleroomscrape,0,0,%Lot number)
add list to list(%title,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"innertext"),"Delete","Global")
add list to table as column(&saleroomscrape,0,1,%title)
add list to list(%url,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"fullhref"),"Delete","Global")
add list to table as column(&saleroomscrape,0,2,%url)
loop(23) {
    if($exists(<class="next">)) {
        then {
            click(<class="next">,"Left Click","No")
        }
        else {
            alert("not there")
        }
    }
}
save to file("C:\\Users\\Andrew\\Desktop\\saleroom.csv",&saleroomscrape)
 

Link to post
Share on other sites

Many thanks Stanf, that really helped..

I knew the software was able to do what I wanted, hence why I purchased it.. just trying to bend my mind around it.

 

This is what I ended up with -

 

Something seems odd !

 

Its pulling 348 lot numbers and urls but only adding the first 237 descriptions (this is when I set it to 250 per page) so only pulling the first page of descriptions - this seems strange any idea why its doing that as the lot numbers and urls are all being pulled.

 

The mylist.txt is holding all the urls is this correct.

 

I am using a dedicated server with 16gb of memory and a pipe connection so the memory drain isn't to much of an issue for me, but understand what you say.  Its cycling through in seconds which is fantastic.

 

Sorry for being a pain, but once I get this one right I will have the basis to then do the same on the other sites I need to pull from.

 

Kind regards and thanks

Andrew

 

 

navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?pageSize=240&page=1","Wait")
divider
comment("put your list clears and table clearing here")
clear list(%scape)
clear list(%title)
clear list(%Lot number)
clear table(&saleroomscrape)
divider
set(#loop until told to stop,"yes","Global")
divider
loop while($comparison(#loop until told to stop,"= Equals","yes")) {
    comment("grab your data")
    add list to list(%Lot number,$scrape attribute(<innertext=w"Lot *">,"innertext"),"Delete","Global")
    add list to table as column(&saleroomscrape,0,0,%Lot number)
    add list to list(%title,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"innertext"),"Delete","Global")
    add list to table as column(&saleroomscrape,0,1,%title)
    add list to list(%url,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"fullhref"),"Delete","Global")
    add list to table as column(&saleroomscrape,0,2,%url)
    add list to list(%the data,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-*">,"fullhref"),"Delete","Global")
    comment("building list while looping tends to suck up memmory")
    comment("thats why i saved the data")
    comment("notice the new_line in the append to file")
    append to file("{$special folder("Desktop")}\\mylist.txt","{$new line}{%the data}","End")
    comment("remember to clear your data")
    clear list(%the data)
    if($exists(<class="next">)) {
        then {
            click(<class="next">,"Left Click","No")
            wait for browser event("Everything Loaded","")
        }
        else {
            set(#loop until told to stop,"stop","Global")
            alert("done")
        }
    }
}

Link to post
Share on other sites

This worked for me 

 

 run()
define run {
    navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240","Wait")
    wait for browser event("Everything Loaded","")
    wait(2)
    set(#row,0,"Global")
    set(#pageination,$exists(<class="next">),"Global")
    clear table(&data)
    loop while($comparison(#pageination,"= Equals","true")) {
        if($comparison(#pageination,"= Equals","true")) {
            then {
                build_table()
            }
            else {
            }
        }
        click($element offset(<class="next">,1),"Left Click","No")
        wait for browser event("Everything Loaded","")
        wait(2)
        set(#pageination,$exists(<class="next">),"Global")
    }
}
divider
define build_table {
    set list position(%Lot number,0)
    set list position(%title,0)
    set list position(%url,0)
    clear list(%Lot number)
    clear list(%title)
    clear list(%url)
    add list to list(%Lot number,$scrape attribute(<class="number">,"innertext"),"Delete","Global")
    add list to list(%title,$find regular expression($scrape attribute(<class="main">,"outerhtml"),"(?<=\\<p\\>).*?(?=\\<\\/p\\>)"),"Don\'t Delete","Global")
    add list to list(%url,$find regular expression($scrape attribute(<class="main">,"outerhtml"),"(?<=\\<a\\ href\\=\\\").*?(?=\\\"\\>More)"),"Delete","Global")
    loop($list total(%Lot number)) {
        set table cell(&data,#row,0,$next list item(%Lot number))
        set table cell(&data,#row,1,$next list item(%title))
        set table cell(&data,#row,2,$next list item(%url))
        increment(#row)
    }
}

Link to post
Share on other sites

Hi Bill,

 

I don't seem to be able to post your code in, it posts as single line, do you have this as a bot file I can open?

 

I have had a ubot crash and the last one I did above has vanished..ahahhah so will recreate - maybe the delay until fully loaded is the issue, maybe because its going so quick the titles are not loading in time.. will try and slow down my version, but if you have it working and have it in code form I can open that would be great.

 

Thanks both of you for your help, really appreciate this, also Bill your version, enables me to see another way of doing it which is also great.

 

Kind regards

Andrew

Link to post
Share on other sites

Try changing the delete to don't delete and see what you get

I crashed ubot and lost the file so need to rebuild it, I think it could be just not loading that data in time so will try a slow down function and see as there is no reason for it not to collect the additional data. many thanks for all your help. its really helped me bend my mind around it. I will let you know. Kind regards Andrew

Link to post
Share on other sites

I crashed ubot and lost the file so need to rebuild it, I think it could be just not loading that data in time so will try a slow down function and see as there is no reason for it not to collect the additional data. many thanks for all your help. its really helped me bend my mind around it. I will let you know. Kind regards Andrew

No still not pulling the next batch of titles in, really strange as the urls and the lot numbers are fine, so see no reason why its not pulling them

Link to post
Share on other sites

This worked for me 

 

 run()

define run {

    navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240","Wait")

    wait for browser event("Everything Loaded","")

    wait(2)

    set(#row,0,"Global")

    set(#pageination,$exists(<class="next">),"Global")

    clear table(&data)

    loop while($comparison(#pageination,"= Equals","true")) {

        if($comparison(#pageination,"= Equals","true")) {

            then {

                build_table()

            }

            else {

            }

        }

        click($element offset(<class="next">,1),"Left Click","No")

        wait for browser event("Everything Loaded","")

        wait(2)

        set(#pageination,$exists(<class="next">),"Global")

    }

}

divider

define build_table {

    set list position(%Lot number,0)

    set list position(%title,0)

    set list position(%url,0)

    clear list(%Lot number)

    clear list(%title)

    clear list(%url)

    add list to list(%Lot number,$scrape attribute(<class="number">,"innertext"),"Delete","Global")

    add list to list(%title,$find regular expression($scrape attribute(<class="main">,"outerhtml"),"(?<=\\<p\\>).*?(?=\\<\\/p\\>)"),"Don\'t Delete","Global")

    add list to list(%url,$find regular expression($scrape attribute(<class="main">,"outerhtml"),"(?<=\\<a\\ href\\=\\\").*?(?=\\\"\\>More)"),"Delete","Global")

    loop($list total(%Lot number)) {

        set table cell(&data,#row,0,$next list item(%Lot number))

        set table cell(&data,#row,1,$next list item(%title))

        set table cell(&data,#row,2,$next list item(%url))

        increment(#row)

    }

}

Bill,

Managed to get this to add the code to each line now, needed to take the code from here and make plan text and then add to code view, however it just says there is an error in the code, fix befor you switch to node view so not sure whats the issue is.

Link to post
Share on other sites

Ok managed to get it to work by just setting the loop to run x times depending on the amount of additional pages ie 9 if 9 and just getting it to click the next button. works great. again needed to slow it down as it was running faster then the page would load so missing data.

 

All is good

 

Many thanks for you help, it focused my mind on how it could work so have a better unedrstanding now.

 

Andrew

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...