Jump to content
UBot Underground

Multiple Lists Can't Sync


Recommended Posts

Just started and been through the videos.

Here is my problem.

I am trying to scrape data from a page with multiple records on it but not in html tables.

It is member data but when they did not provide the info the complete element is missing.

for example for a member that dis not provide an email address there is no mailto: to scrape between.
this entire element is missing <a href="mailto:johndoe@gmail.com">

so if I scrape name, userid, email into 3 lists to make a table the lists won't match up because the name list might have 10 records but the email list only 6 records so who knows which email address corresponds to which name.

How do I handle this?

 

Link to post
Share on other sites

In these situations I try to find the container for each bit of information and scrape that first. If there is a div or something which contains all the member info then you can go from there and pull out the information in a variety of ways. Since it's HTML you can use xpath to get the info out, there is a free plugin for that here: http://www.bot-factory.com/ubotstudio-xpath-plugin/ you can also use regex. If you're new both of these will seem difficult but if you post the HTML here somebody will give you help to extract what you need.

 

So to recap, scrape the parent container of the member, get that HTML and use xpath/regex to extract the info you need. Then add it into a table.

Link to post
Share on other sites

HelloInsomnia: Thanks.

There is a named div I can scrape. Do I just put all the html in the email list then go through and clean up the list or do I not even use lists and just iterate through each record and put each rows data directly in a table after processing it?

Link to post
Share on other sites

HelloInsomnia: Thanks.

There is a named div I can scrape. Do I just put all the html in the email list then go through and clean up the list or do I not even use lists and just iterate through each record and put each rows data directly in a table after processing it?

 

You put each div into a variable, from there you use xpath or regex (- again post one example here and you can redact personal info by replacing it with dummy info, we will give you the correct xpath or regex from there) to scrape the information you need to be placed into a table. 

Link to post
Share on other sites

Hi,

 

Here is what your logic may look like.

clear list(%scraped data)
clear list(%urls)
comment("you can use $list from file instead")
add list to list(%urls,$list from text("http://network.ubotstudio.com/forum/index.php/user/29014-ed08724/
http://network.ubotstudio.com/forum/index.php/user/5096-abbas/",$new line),"Delete","Global")
loop($list total(%urls)) {
    set(#urls NLI,$next list item(%urls),"Global")
    comment("you can put the $next list item function
in navigaiton if you like.
but if you need to store it or use it somewhere
else put it in the variable like it is")
    navigate(#urls NLI,"Wait")
    wait for element(<innertext="License">,15,"Appear")
    set(#group,$scrape attribute(<style="color:grey;">,"innertext"),"Global")
    comment("If there is nothing scraped set group as
null
otherwise it wont overwrite what was scraped")
    if($comparison(#group,"= Equals",$nothing)) {
        then {
            set(#group,"NULL","Global")
        }
        comment("if you dont use it u dont need the else node")
        else {
        }
    }
    comment("pretend scraped email and gender")
    set(#email,"jack@jill.com","Global")
    set(#gender,"Male","Global")
    comment("you can drag in as many variables as you like")
    add item to list(%scraped data,"{#group},{#email},{#gender}","Don\'t Delete","Global")
}
comment("can save as .txt too")
save to file("{$special folder("Desktop")}\\scraped-data.csv",%scraped data)

Basically, you check to see if there's something there. So, compare it with an IF(conditional statement). If the $comparison is true the next node will be THEN, If false it will go to ELSE node.

 

Now, if there is nothing it will set a place holder value of NULL(or whatever you like). If you don't use one you data table will all messed up.

Hope this helps,
CD

basic loop-scrape if not there.ubot

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...