ed08724 6 Posted July 16, 2016 Report Share Posted July 16, 2016 Just started and been through the videos.Here is my problem.I am trying to scrape data from a page with multiple records on it but not in html tables.It is member data but when they did not provide the info the complete element is missing.for example for a member that dis not provide an email address there is no mailto: to scrape between.this entire element is missing <a href="mailto:johndoe@gmail.com">so if I scrape name, userid, email into 3 lists to make a table the lists won't match up because the name list might have 10 records but the email list only 6 records so who knows which email address corresponds to which name.How do I handle this? Quote Link to post Share on other sites
HelloInsomnia 1103 Posted July 16, 2016 Report Share Posted July 16, 2016 In these situations I try to find the container for each bit of information and scrape that first. If there is a div or something which contains all the member info then you can go from there and pull out the information in a variety of ways. Since it's HTML you can use xpath to get the info out, there is a free plugin for that here: http://www.bot-factory.com/ubotstudio-xpath-plugin/ you can also use regex. If you're new both of these will seem difficult but if you post the HTML here somebody will give you help to extract what you need. So to recap, scrape the parent container of the member, get that HTML and use xpath/regex to extract the info you need. Then add it into a table. Quote Link to post Share on other sites
pash 504 Posted July 16, 2016 Report Share Posted July 16, 2016 if you have sample code it more understandingon list check it Quote Link to post Share on other sites
ed08724 6 Posted July 16, 2016 Author Report Share Posted July 16, 2016 HelloInsomnia: Thanks. There is a named div I can scrape. Do I just put all the html in the email list then go through and clean up the list or do I not even use lists and just iterate through each record and put each rows data directly in a table after processing it? Quote Link to post Share on other sites
ed08724 6 Posted July 16, 2016 Author Report Share Posted July 16, 2016 pash I don't have any code yet as I have to figure out how to do it first. Quote Link to post Share on other sites
pash 504 Posted July 16, 2016 Report Share Posted July 16, 2016 pash I don't have any code yet as I have to figure out how to do it first.you have sample site or html code? Quote Link to post Share on other sites
HelloInsomnia 1103 Posted July 16, 2016 Report Share Posted July 16, 2016 HelloInsomnia: Thanks.There is a named div I can scrape. Do I just put all the html in the email list then go through and clean up the list or do I not even use lists and just iterate through each record and put each rows data directly in a table after processing it? You put each div into a variable, from there you use xpath or regex (- again post one example here and you can redact personal info by replacing it with dummy info, we will give you the correct xpath or regex from there) to scrape the information you need to be placed into a table. Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted July 17, 2016 Report Share Posted July 17, 2016 Hi, Here is what your logic may look like. clear list(%scraped data) clear list(%urls) comment("you can use $list from file instead") add list to list(%urls,$list from text("http://network.ubotstudio.com/forum/index.php/user/29014-ed08724/ http://network.ubotstudio.com/forum/index.php/user/5096-abbas/",$new line),"Delete","Global") loop($list total(%urls)) { set(#urls NLI,$next list item(%urls),"Global") comment("you can put the $next list item function in navigaiton if you like. but if you need to store it or use it somewhere else put it in the variable like it is") navigate(#urls NLI,"Wait") wait for element(<innertext="License">,15,"Appear") set(#group,$scrape attribute(<style="color:grey;">,"innertext"),"Global") comment("If there is nothing scraped set group as null otherwise it wont overwrite what was scraped") if($comparison(#group,"= Equals",$nothing)) { then { set(#group,"NULL","Global") } comment("if you dont use it u dont need the else node") else { } } comment("pretend scraped email and gender") set(#email,"jack@jill.com","Global") set(#gender,"Male","Global") comment("you can drag in as many variables as you like") add item to list(%scraped data,"{#group},{#email},{#gender}","Don\'t Delete","Global") } comment("can save as .txt too") save to file("{$special folder("Desktop")}\\scraped-data.csv",%scraped data) Basically, you check to see if there's something there. So, compare it with an IF(conditional statement). If the $comparison is true the next node will be THEN, If false it will go to ELSE node. Now, if there is nothing it will set a place holder value of NULL(or whatever you like). If you don't use one you data table will all messed up.Hope this helps,CDbasic loop-scrape if not there.ubot Quote Link to post Share on other sites
ed08724 6 Posted July 17, 2016 Author Report Share Posted July 17, 2016 thanks will take a look at that Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.