a-harvey 0 Posted January 31, 2016 Report Share Posted January 31, 2016 Hi, New here so sorry for just asking for some help in the first instance, hopefully be able to offer back advice myself when I get into the software. I am trying to scrape the following data from this url, move to the next page if it exists and place into a csv. http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240 Lot numberTitleURL I then need to move to the next page if one exists so I am able to grab all the details for the one sale. So far what I am able to do is scrape all the date from the one page but not move to the next etc. I have place the items inside the look and outside but must be missing something! Using the if next button command, does it matter the fact its got an offset ? here is my code, its working for the first page fine, a minor issue in that the lot is scraping two additional elements but I can delete these later to make all the same number. Is how I have done it the best way. Many thanksAndrew navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240","Wait")clear table(&saleroomscrape)clear list(%scape)clear list(%title)clear list(%Lot number)add list to list(%Lot number,$scrape attribute(<innertext=w"Lot *">,"innertext"),"Delete","Global")add list to table as column(&saleroomscrape,0,0,%Lot number)add list to list(%title,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"innertext"),"Delete","Global")add list to table as column(&saleroomscrape,0,1,%title)add list to list(%url,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"fullhref"),"Delete","Global")add list to table as column(&saleroomscrape,0,2,%url)save to file("C:\\Users\\Andrew\\Desktop\\saleroom.csv",&saleroomscrape)loop(15) { if($exists($element offset(<class="next">,1))) { wait(5) then { } else { stop script } }} Quote Link to post Share on other sites
nichewebstrategies 12 Posted January 31, 2016 Report Share Posted January 31, 2016 At the bottom of the page, there is a dropdown box that shows the number of pages that are available. The value attribute of each option in that dropdown has the url for that particular page. What I would do is use the following code to grab those urls and put them in a list, then you can loop through the list and navigate to each page to grab the data. add list to list(%pages,$scrape attribute(<value=w"/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter*">,"value"),"Delete","Global") Hope this helps. 1 Quote Link to post Share on other sites
stanf 43 Posted January 31, 2016 Report Share Posted January 31, 2016 don't use the offset, if anything on the page changes the offset probably wont be there navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&","Wait")if($exists(<class="next">)) { then { click(<class="next">,"Left Click","No") } else { alert("not there") }} 1 Quote Link to post Share on other sites
a-harvey 0 Posted January 31, 2016 Author Report Share Posted January 31, 2016 Thank you both,I can understand both options, stanf - where do I need to add this code, prior to grabing the data for each or at the end of my code? kind regardsAndrew Quote Link to post Share on other sites
a-harvey 0 Posted January 31, 2016 Author Report Share Posted January 31, 2016 Managed to get it to loop through all the pages - greathowever its only pulling the first set of page data, so I maust be doing something wrong after it loops. clear list(%scape)clear list(%title)clear list(%Lot number)navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240","Wait")clear table(&saleroomscrape)add list to list(%Lot number,$scrape attribute(<innertext=w"Lot *">,"innertext"),"Delete","Global")add list to table as column(&saleroomscrape,0,0,%Lot number)add list to list(%title,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"innertext"),"Delete","Global")add list to table as column(&saleroomscrape,0,1,%title)add list to list(%url,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"fullhref"),"Delete","Global")add list to table as column(&saleroomscrape,0,2,%url)loop(23) { if($exists(<class="next">)) { then { click(<class="next">,"Left Click","No") } else { alert("not there") } }}save to file("C:\\Users\\Andrew\\Desktop\\saleroom.csv",&saleroomscrape) Quote Link to post Share on other sites
a-harvey 0 Posted January 31, 2016 Author Report Share Posted January 31, 2016 set loop at 23 as this is most likely the highest loop, it will need, when not set it didn't loop Quote Link to post Share on other sites
stanf 43 Posted January 31, 2016 Report Share Posted January 31, 2016 take a lookfor_a_harvy.ubot Quote Link to post Share on other sites
a-harvey 0 Posted February 1, 2016 Author Report Share Posted February 1, 2016 Many thanks Stanf, that really helped..I knew the software was able to do what I wanted, hence why I purchased it.. just trying to bend my mind around it. This is what I ended up with - Something seems odd ! Its pulling 348 lot numbers and urls but only adding the first 237 descriptions (this is when I set it to 250 per page) so only pulling the first page of descriptions - this seems strange any idea why its doing that as the lot numbers and urls are all being pulled. The mylist.txt is holding all the urls is this correct. I am using a dedicated server with 16gb of memory and a pipe connection so the memory drain isn't to much of an issue for me, but understand what you say. Its cycling through in seconds which is fantastic. Sorry for being a pain, but once I get this one right I will have the basis to then do the same on the other sites I need to pull from. Kind regards and thanksAndrew navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?pageSize=240&page=1","Wait")dividercomment("put your list clears and table clearing here")clear list(%scape)clear list(%title)clear list(%Lot number)clear table(&saleroomscrape)dividerset(#loop until told to stop,"yes","Global")dividerloop while($comparison(#loop until told to stop,"= Equals","yes")) { comment("grab your data") add list to list(%Lot number,$scrape attribute(<innertext=w"Lot *">,"innertext"),"Delete","Global") add list to table as column(&saleroomscrape,0,0,%Lot number) add list to list(%title,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"innertext"),"Delete","Global") add list to table as column(&saleroomscrape,0,1,%title) add list to list(%url,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/lot*">,"fullhref"),"Delete","Global") add list to table as column(&saleroomscrape,0,2,%url) add list to list(%the data,$scrape attribute(<href=w"http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-*">,"fullhref"),"Delete","Global") comment("building list while looping tends to suck up memmory") comment("thats why i saved the data") comment("notice the new_line in the append to file") append to file("{$special folder("Desktop")}\\mylist.txt","{$new line}{%the data}","End") comment("remember to clear your data") clear list(%the data) if($exists(<class="next">)) { then { click(<class="next">,"Left Click","No") wait for browser event("Everything Loaded","") } else { set(#loop until told to stop,"stop","Global") alert("done") } }} Quote Link to post Share on other sites
stanf 43 Posted February 1, 2016 Report Share Posted February 1, 2016 Try changing the delete to don't delete and see what you get Quote Link to post Share on other sites
Bill 7 Posted February 1, 2016 Report Share Posted February 1, 2016 This worked for me run()define run { navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240","Wait") wait for browser event("Everything Loaded","") wait(2) set(#row,0,"Global") set(#pageination,$exists(<class="next">),"Global") clear table(&data) loop while($comparison(#pageination,"= Equals","true")) { if($comparison(#pageination,"= Equals","true")) { then { build_table() } else { } } click($element offset(<class="next">,1),"Left Click","No") wait for browser event("Everything Loaded","") wait(2) set(#pageination,$exists(<class="next">),"Global") }}dividerdefine build_table { set list position(%Lot number,0) set list position(%title,0) set list position(%url,0) clear list(%Lot number) clear list(%title) clear list(%url) add list to list(%Lot number,$scrape attribute(<class="number">,"innertext"),"Delete","Global") add list to list(%title,$find regular expression($scrape attribute(<class="main">,"outerhtml"),"(?<=\\<p\\>).*?(?=\\<\\/p\\>)"),"Don\'t Delete","Global") add list to list(%url,$find regular expression($scrape attribute(<class="main">,"outerhtml"),"(?<=\\<a\\ href\\=\\\").*?(?=\\\"\\>More)"),"Delete","Global") loop($list total(%Lot number)) { set table cell(&data,#row,0,$next list item(%Lot number)) set table cell(&data,#row,1,$next list item(%title)) set table cell(&data,#row,2,$next list item(%url)) increment(#row) }} Quote Link to post Share on other sites
a-harvey 0 Posted February 1, 2016 Author Report Share Posted February 1, 2016 Hi Bill, I don't seem to be able to post your code in, it posts as single line, do you have this as a bot file I can open? I have had a ubot crash and the last one I did above has vanished..ahahhah so will recreate - maybe the delay until fully loaded is the issue, maybe because its going so quick the titles are not loading in time.. will try and slow down my version, but if you have it working and have it in code form I can open that would be great. Thanks both of you for your help, really appreciate this, also Bill your version, enables me to see another way of doing it which is also great. Kind regardsAndrew Quote Link to post Share on other sites
a-harvey 0 Posted February 1, 2016 Author Report Share Posted February 1, 2016 Try changing the delete to don't delete and see what you getI crashed ubot and lost the file so need to rebuild it, I think it could be just not loading that data in time so will try a slow down function and see as there is no reason for it not to collect the additional data. many thanks for all your help. its really helped me bend my mind around it. I will let you know. Kind regards Andrew Quote Link to post Share on other sites
a-harvey 0 Posted February 1, 2016 Author Report Share Posted February 1, 2016 I crashed ubot and lost the file so need to rebuild it, I think it could be just not loading that data in time so will try a slow down function and see as there is no reason for it not to collect the additional data. many thanks for all your help. its really helped me bend my mind around it. I will let you know. Kind regards AndrewNo still not pulling the next batch of titles in, really strange as the urls and the lot numbers are fine, so see no reason why its not pulling them Quote Link to post Share on other sites
a-harvey 0 Posted February 2, 2016 Author Report Share Posted February 2, 2016 This worked for me run()define run { navigate("http://www.the-saleroom.com/en-gb/auction-catalogues/1818-auctioneers/catalogue-id-sr1810075/search-filter?page=1&pageSize=240","Wait") wait for browser event("Everything Loaded","") wait(2) set(#row,0,"Global") set(#pageination,$exists(<class="next">),"Global") clear table(&data) loop while($comparison(#pageination,"= Equals","true")) { if($comparison(#pageination,"= Equals","true")) { then { build_table() } else { } } click($element offset(<class="next">,1),"Left Click","No") wait for browser event("Everything Loaded","") wait(2) set(#pageination,$exists(<class="next">),"Global") }}dividerdefine build_table { set list position(%Lot number,0) set list position(%title,0) set list position(%url,0) clear list(%Lot number) clear list(%title) clear list(%url) add list to list(%Lot number,$scrape attribute(<class="number">,"innertext"),"Delete","Global") add list to list(%title,$find regular expression($scrape attribute(<class="main">,"outerhtml"),"(?<=\\<p\\>).*?(?=\\<\\/p\\>)"),"Don\'t Delete","Global") add list to list(%url,$find regular expression($scrape attribute(<class="main">,"outerhtml"),"(?<=\\<a\\ href\\=\\\").*?(?=\\\"\\>More)"),"Delete","Global") loop($list total(%Lot number)) { set table cell(&data,#row,0,$next list item(%Lot number)) set table cell(&data,#row,1,$next list item(%title)) set table cell(&data,#row,2,$next list item(%url)) increment(#row) }}Bill,Managed to get this to add the code to each line now, needed to take the code from here and make plan text and then add to code view, however it just says there is an error in the code, fix befor you switch to node view so not sure whats the issue is. Quote Link to post Share on other sites
a-harvey 0 Posted February 4, 2016 Author Report Share Posted February 4, 2016 Ok managed to get it to work by just setting the loop to run x times depending on the amount of additional pages ie 9 if 9 and just getting it to click the next button. works great. again needed to slow it down as it was running faster then the page would load so missing data. All is good Many thanks for you help, it focused my mind on how it could work so have a better unedrstanding now. Andrew Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.