Fibi 0 Posted December 31, 2018 Report Share Posted December 31, 2018 Hi guys, I am fairly new to Ubot Studio. I am right now creating my first scraper. I am extracting big lists of data & write them into a csv. The website to be scraped exceeds more than 5.000 elements. I face the following problem. Sometimes, an element I scrape is "nil" or "empty". Instead of returning the value "empty", the next valid value is inserted. This messes up with the way the data is structured and makes it useless. Any tips on how to insert a "blank" statement instead of just skipping it? clear all data ui text box("no. of pages to scrape",#pages) set(#counter,1,"Global") loop while($comparison(#pages,"!= Does not equal",#counter)) { navigate("https://www.digistore24.com/de/home/marketplace/auto?page=1&page={#counter}","Wait") increment(#counter) set user agent("Chrome") add list to list(%product,$scrape attribute(<class="col-lg-12 col-md-12 col-sm-12 col-xs-12 helBold">,"innertext"),"Don't Delete","Global") add list to list(%description,$scrape attribute(<class="productText col-lg-7 col-lg-offset-0 col-xs-12 col-xs-offset-1 pull-left no-pad">,"innertext"),"Don't Delete","Global") add list to list(%verkaufsseite,$scrape attribute(<(class="underline" AND innertext="Verkaufsseite")>,"href"),"Don't Delete","Global") add list to list(%supportseite,$scrape attribute(<(class="underline" AND innertext="Affiliate-Support-Seite")>,"href"),"Don't Delete","Global") add list to list(%verkaufspreis durchschnittlich,$page scrape("<span class=\"shadow\">Verkaufspreis: durchschn. <strong>","</strong></span>"),"Don\'t Delete","Global") add list to list(%provision,$page scrape("<span class=\"shadow\">Provision: <strong>","</strong></span>"),"Don\'t Delete","Global") add list to list(%verdienst pro verkauf,$page scrape("<span class=\"shadow\">Verdienst/Verkauf**: ca. <strong>","</strong> netto</span>"),"Don\'t Delete","Global") add list to list(%verdienst pro cartbesucher,$page scrape("<span class=\"shadow\">Verdienst/Cartbesucher**: <strong>","</strong></span>"),"Don\'t Delete","Global") add list to list(%vendor,$page scrape("<span class=\"shadow\">Vendor: <strong>","</strong></span>"),"Don\'t Delete","Global") add list to list(%erstellt,$page scrape("<span class=\"shadow\">Erstellt: <strong>","</strong></span>"),"Don\'t Delete","Global") add list to list(%bezahlarten,$page scrape("<span class=\"shadow\">Bezahlarten: <strong>","</strong></span>"),"Don\'t Delete","Global") add list to list(%verkaufsrang,$page scrape("<span class=\"shadow\">Verkaufsrang: <strong>","</strong></span>"),"Don\'t Delete","Global") add list to list(%cartconversion,$page scrape("<span class=\"shadow\">Cart Conversion**: <strong>","</strong></span>"),"Don\'t Delete","Global") add list to list(%stornoquote,$page scrape("<span class=\"shadow\">Stornoquote**: <strong>","</strong></span>"),"Don\'t Delete","Global") } Thank you! PS: Also welcome to get feedback from you guys. I was thinking that first I loop through the scraping part and then save everything into a table ONCE. But maybe there is a better way to append each and every scrape to a table and save it. The lists eventually will exceed 5000 lines. I am not sure if Ubot is able to save everything without crashing ! Quote Link to post Share on other sites
HelloInsomnia 1103 Posted December 31, 2018 Report Share Posted December 31, 2018 Any time you have a list of results which contain multiple things to be scraped its always a good idea to scrape the container of each item first into a list. Then loop through that list of containers and pull out the info from each one individually. To do that you will either need to load the container HTML in the browser (slowest way but should still work), or use regex or xpath to get the thing you want to scrape. Doing this will ensure you get valid data, instead of directly scraping a list which will give you invalid data do to the null issue. Quote Link to post Share on other sites
Fibi 0 Posted January 6, 2019 Author Report Share Posted January 6, 2019 Just saw your message. Thank you for the feedback. I will try to incorporate the changes in my script and feedback you as soon as I have an update. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.