Jump to content
UBot Underground

Scraper : Return A "blank" String Instead Of Leaving It Out?


Recommended Posts

Hi guys,

 

I am fairly new to Ubot Studio. I am right now creating my first scraper. I am extracting big lists of data & write them into a csv. 

 

The website to be scraped exceeds more than 5.000 elements. 

 

I face the following problem. Sometimes, an element I scrape is "nil" or "empty". Instead of returning the value "empty", the next valid value is inserted.

This messes up with the way the data is structured and makes it useless.

 

Any tips on how to insert a "blank" statement instead of just skipping it?

 



clear all data
ui text box("no. of pages to scrape",#pages)
set(#counter,1,"Global")
loop while($comparison(#pages,"!= Does not equal",#counter)) {
navigate("https://www.digistore24.com/de/home/marketplace/auto?page=1&page={#counter}","Wait")
increment(#counter)
set user agent("Chrome")
add list to list(%product,$scrape attribute(<class="col-lg-12 col-md-12 col-sm-12 col-xs-12 helBold">,"innertext"),"Don't Delete","Global")
add list to list(%description,$scrape attribute(<class="productText col-lg-7 col-lg-offset-0 col-xs-12 col-xs-offset-1 pull-left no-pad">,"innertext"),"Don't Delete","Global")
add list to list(%verkaufsseite,$scrape attribute(<(class="underline" AND innertext="Verkaufsseite")>,"href"),"Don't Delete","Global")
add list to list(%supportseite,$scrape attribute(<(class="underline" AND innertext="Affiliate-Support-Seite")>,"href"),"Don't Delete","Global")
add list to list(%verkaufspreis durchschnittlich,$page scrape("<span class=\"shadow\">Verkaufspreis: durchschn. <strong>","</strong></span>"),"Don\'t Delete","Global")
add list to list(%provision,$page scrape("<span class=\"shadow\">Provision: <strong>","</strong></span>"),"Don\'t Delete","Global")
add list to list(%verdienst pro verkauf,$page scrape("<span class=\"shadow\">Verdienst/Verkauf**: ca. <strong>","</strong> netto</span>"),"Don\'t Delete","Global")
add list to list(%verdienst pro cartbesucher,$page scrape("<span class=\"shadow\">Verdienst/Cartbesucher**: <strong>","</strong></span>"),"Don\'t Delete","Global")
add list to list(%vendor,$page scrape("<span class=\"shadow\">Vendor: <strong>","</strong></span>"),"Don\'t Delete","Global")
add list to list(%erstellt,$page scrape("<span class=\"shadow\">Erstellt: <strong>","</strong></span>"),"Don\'t Delete","Global")
add list to list(%bezahlarten,$page scrape("<span class=\"shadow\">Bezahlarten: <strong>","</strong></span>"),"Don\'t Delete","Global")
add list to list(%verkaufsrang,$page scrape("<span class=\"shadow\">Verkaufsrang: <strong>","</strong></span>"),"Don\'t Delete","Global")
add list to list(%cartconversion,$page scrape("<span class=\"shadow\">Cart Conversion**: <strong>","</strong></span>"),"Don\'t Delete","Global")
add list to list(%stornoquote,$page scrape("<span class=\"shadow\">Storno­quote**: <strong>","</strong></span>"),"Don\'t Delete","Global")
}


 

Thank you!

 

PS: Also welcome to get feedback from you guys. I was thinking that first I loop through the scraping part and then save everything into a table ONCE. But maybe there is a better way to append each and every scrape to a table and save it. The lists eventually will exceed 5000 lines. I am not sure if Ubot is able to save everything without crashing :)!

Link to post
Share on other sites

Any time you have a list of results which contain multiple things to be scraped its always a good idea to scrape the container of each item first into a list. Then loop through that list of containers and pull out the info from each one individually. To do that you will either need to load the container HTML in the browser (slowest way but should still work), or use regex or xpath to get the thing you want to scrape. Doing this will ensure you get valid data, instead of directly scraping a list which will give you invalid data do to the null issue.

Link to post
Share on other sites

Just saw your message. Thank you for the feedback. I will try to incorporate the changes in my script and feedback you as soon as I have an update.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...