Jump to content
UBot Underground

How to scrape data to a table?


Recommended Posts

In many cases I would really REALLY like to be able to use the table function of UBot, but I can't really figure it out.

 

How does one take data on screen, and scrape and insert it into a table for later use? I need to do this sort of thing quite often.

 

Thanks in advance for any tutorial on this sort of topic!

 

Jonathan

Link to post
Share on other sites

Unfortunately that tutorial does not cover the one specific thing I'm asking about here... how to SCRAPE a table. I have a table on screen from a website, and I want to turn it into a useable &table in UBot so I can begin to walk the columns & rows to get the data I need.

 

How can we do that?

 

Jonathan

Link to post
Share on other sites

I think we're getting slightly confused here. A HTML table on a web page does not translate to the table command in Ubot. The Ubot table is used to store data in an array type data structure, and can be used with CSV files to load/save data. Of course it could be used to store data from a HTML table, but it may not be the most efficient way.

 

If you wish to scrape a HTML table, then you will need to use the same methods as you would use to scrape any other HTML elements, e.g. $page scrape, add to list, etc.

Link to post
Share on other sites

Yes but that's exactly what I want to do - take data formatted in a table-like structure on a web page, and put it into a &table construct in UBot for easier handling and manipulation.

 

I mean, wouldn't that be a primary point of having tables in UBot? If not to more easily work with table formatted data on a web page, then I don't see it as being tremendously useful. Yes sometimes I have a CSV file I'm starting from and that's fine & dandy... but more often I get some sort of table formatted results on a page, and I want to work with each of the columns and rows... doing so with lists is a gigantic pain in the ass, especially when we actually have the perfect solution already in UBot with the table function.

 

What am I missing here? Are you saying there is NO way to translate an html table into a &table??

 

Jonathan

Link to post
Share on other sites

Are you saying there is NO way to translate an html table into a &table??

Not without doing some work first. As mentioned above, you could do something like a $page scrape, $add to list, strip HTML, then loop through the list (containing HMTL table data) adding list items to the UBot &table.
Link to post
Share on other sites

I agree with Jonathan here. I recently found a table (html) that I was able to scrape to a list. It scraped in perfect format with one exception. The first row(entire row) of the html table translated into the FIRST list item. So 8 lines on the list made up the whole html table.

 

You see the problem here? I can't use the scraped data even though it scraped to its near perfect original form. If I am not making sense here is an example of what I mean:

 

 

Html table:

 

Heading 1 Heading 2 Heading 3

Item 1 Item 1 Item 1

Item 2 Item 2 Item 2

Item 3 Item 3 Item 3

 

 

Scraped List:

 

Heading 1Heading 2Heading 3

Item 1Item 1Item 1

Item 2Item 2Item 2

Item 3Item 3Item 3

 

If I were to save that to a csv file then obviously all 3 headings would be in one cell, etc.

 

So back to Jonathan's question. Is this the only option for scraping a table as it stands?

 

By the way, it's not always coded in, but if you are trying to scrape a table I would suggest looking for the id attribute under "table" as it provides all the data in the table without having to strip anything from it. (Notwithstanding a very difficult to work with resulting list)

 

John

Link to post
Share on other sites

Great point John about the Table attribute, I hadn't thought about that before. But yes, the resulting parsing required is annoying to say the least.

 

I really thought this was precisely what the &table function was intended to solve... which explains my frustration at trying to use it.

 

So does anyone have a simple way to take data from a page and stick it into a table without having to iterate through a bunch of loops and strips to format everything properly? I know I can do that but man what an ugly approach.

 

Also what about data that is visually formatted like a table, but doesn't actually use a <table> to do it? Like more modern sites where they do everything in CSS and DIVs?

 

Jonathan

Link to post
Share on other sites

It's funny Jonathan because I have a bunch of bots that got put on the backburner some months ago because I didn't have the knowledge to execute the proper commands. Now I have the knowledge, but not the motivation to go back and loop and if and not and evaluate and replace and loop some more, etc, etc, just to get some data organized.

 

Here's what I would love to see... $Scrape to Table or $Save to Table (I am not certain, but I am assuming that a "table" is basically a grid structure with cells)

Link to post
Share on other sites

EXACTLY!!!

 

Scrape to table. That is precisely what I'm looking for. Pretty easy if you're dealing with an actual HTML table... not so sure about table "looking" data that isn't in a <table>.

 

Jonathan

Link to post
Share on other sites

EXACTLY!!!

 

Scrape to table. That is precisely what I'm looking for. Pretty easy if you're dealing with an actual HTML table... not so sure about table "looking" data that isn't in a <table>.

 

Jonathan

 

That's where $scrape to table cell would be really handy!

 

 

 

Link to post
Share on other sites

Oh yes that's right! Indeed it would... wow that would solve a LOT of issues it sounds like we've both been encountering!

 

OK I formally propose that $scrape to table and $scrape to table cell, be added to UBot. All opposed? Motion carries! w00t!

 

Jonathan

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...