Jump to content
UBot Underground

Reasons Why Table-Scraping Misses Items Out?


Recommended Posts

I am trying to scrape a table of URLs in expireddomains.net. The table provides a list of URLs each with a host of parameters like trust flow and backlinks associated with each URL, arranged in columns.

 

I have found the elements I'm interested in, and am scraping the table. The problem is that it's hit and miss: I can scrape each of the URLs reliably, but some of the contents of other cells get missed sometimes for no obvious reason. So if there are 10 URLs (1 per row) in a table, I might get 8 of the associated backlinks and 7 of the other parameters. The items that get missed out are not in the same rows, and the results are consistent: run the script several times I get the same results. I've switched browsers in Ubot - no difference.

 

There seems to be no rhyme nor reason why some of the elements are missed. I replicated the page in my Firefox browser and inspected the elements in the table - they're all as they should be. None of the cells in the table are empty. I put pauses between each add table to table command wondering if Ubot is  trying to do too much too quickly.

 

So why might UBot be missing elements out? It's not as if the elements are incorrect - they work some of the time, and the table is well structured as far as I can see.

 

Thanks

Steve

Link to post
Share on other sites

your code is probably wrong,  please post your code so someone can take a look, if I had to guess you probably have set your list to delete duplicates, but thats only guessing without looking at your code, although ubots child and sibling selectors are a disgrace, better off using the the add on I made probably 1 line of code for this table, but heres every parameter on that table working fine

navigate("https://www.expireddomains.net/backorder-expired-domains/", "Wait")
wait for browser event("Everything Loaded", "")
add list to list(%name, $scrape attribute(<class="field_domain">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%BL, $scrape attribute(<class="field_bl">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_domainpop, $scrape attribute(<class="field_domainpop">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_abirth, $scrape attribute(<class="field_abirth">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_aentries, $scrape attribute(<class="field_aentries">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_similarweb, $scrape attribute(<class="field_similarweb">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_similarweb_countrycode, $scrape attribute(<class="field_similarweb_countrycode">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_dmoz, $scrape attribute(<class="field_dmoz">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_statuscom, $scrape attribute(<class="field_statuscom">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_statusnet, $scrape attribute(<class="field_statusnet">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_statusorg, $scrape attribute(<class="field_statusorg">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_statusde, $scrape attribute(<class="field_statusde">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_statustld_registered, $scrape attribute(<class="field_statustld_registered">, "textcontent"), "Don\'t Delete", "Global")
add list to list(%field_enddate, $scrape attribute(<class="field_enddate">, "textcontent"), "Don\'t Delete", "Global")

  • Like 1
Link to post
Share on other sites

@deliter - yes, you guessed right! Sorry, such a rookie mistake. Anyway - that fixed it perfectly.

Brilliant forum - thank you for all your help!

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...