Jump to content
UBot Underground

[solved] Scrape list separated by new line


Recommended Posts

How do I scrape a list that is only separated by new lines? I tried using list to list and item to list, but everytime I saved I ended up with a text with no breaks or spaces.

 

Example of what I want to scrape:

 

<pre>
item1
item2
item3
</pre>

Link to post
Share on other sites

What are you scraping and from where? All scraped content, if scraped properly will end up as a ist (each item on it's own line).

 

John

 

thanks and sorry for the late reply. The problem was that I always tried to store the list of items in a "list" or a variable. Now that I tried to directly save the scraped content everything worked fine.

 

Just in case anyone is interested, I tried to scrape the content of a code-tag in a vBulletin-Board:

 

save to file("{$special folder("Desktop")}\\testList.txt", $scrape attribute(<outerhtml=w"<pre class=\"alt2\" dir=\"ltr\" style=\"*\">*</pre>">, "innertext"))

Link to post
Share on other sites
  • 6 months later...

What are you scraping and from where? All scraped content, if scraped properly will end up as a ist (each item on it's own line).

 

John

Hi John,

Hoping you can help.

 

My issue is that the list is not inserting line breaks at all.

So - a page scrape, add item to list

Returns a list with all items on one line

 

So...

then their is no seperator, it treats the whole list as one item

(so - no way to put the list into a table, and no way to select an item from the list and fill in a field)

 

http://www.feedage.com/html2rss/html2rss.php?id=8387590 http://www.feedage.com/html2rss/html2rss.php?id=8387591 http://www.feedage.com/html2rss/html2rss.php?id=8387592 http://www.feedage.com/html2rss/html2rss.php?id=8387593 http://www.feedage.com/html2rss/html2rss.php?id=8387594

 

This is a list that I scraperd of six RSS feeds.

They are all on the same line...

 

No idea how to solve it.

 

Cheers.

Walt

Link to post
Share on other sites

Add item to list will always add everything as a single item...You need to use add list to list.

 

 

John

Thanks John and Kreatus,

I appreciate the reply.

 

So - When I am scraping from a page...

can I still add list to list?

 

This is the source from the page I am trying to scrape

br>

<table border="1"><tbody><tr><td>#</td><td>URL</td><td>RSS URL</td></tr>

<tr><td>0</td><td>http://knowforex.com/forexbrokers/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387590">http://www.feedage.com/html2rss/html2rss.php?id=8387590</a></td></tr>

<tr><td>1</td><td>http://knowforex.com/forextradingsystem/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387591">http://www.feedage.com/html2rss/html2rss.php?id=8387591</a></td></tr>

<tr><td>2</td><td>http://knowforex.com/forextradingstrategies/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387592">http://www.feedage.com/html2rss/html2rss.php?id=8387592</a></td></tr>

<tr><td>3</td><td>http://knowforex.com/forextradingsignals/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387593">http://www.feedage.com/html2rss/html2rss.php?id=8387593</a></td></tr>

<tr><td>4</td><td>http://knowforex.com/forextradingtraining/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387594">http://www.feedage.com/html2rss/html2rss.php?id=8387594</a></td></tr>

</tbody></table>

<br>

</div>

</div>

 

 

 

And I am looking to create a list from just the HREF of

http://www.feedage.com/html2rss/html2rss.php?id=8387590

 

 

What do you think?

 

Cheers.

Walt

Link to post
Share on other sites

Try this:

 

clear list(%feeds)

load html("<table border=\"1\"><tbody><tr><td>#</td><td>URL</td><td>RSS URL</td></tr>

<tr><td>0</td><td>http://knowforex.com/forexbrokers/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387590\">http://www.feedage.com/html2rss/html2rss.php?id=8387590</a></td></tr>

<tr><td>1</td><td>http://knowforex.com/forextradingsystem/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387591\">http://www.feedage.com/html2rss/html2rss.php?id=8387591</a></td></tr>

<tr><td>2</td><td>http://knowforex.com/forextradingstrategies/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387592\">http://www.feedage.com/html2rss/html2rss.php?id=8387592</a></td></tr>

<tr><td>3</td><td>http://knowforex.com/forextradingsignals/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387593\">http://www.feedage.com/html2rss/html2rss.php?id=8387593</a></td></tr>

<tr><td>4</td><td>http://knowforex.com/forextradingtraining/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387594\">http://www.feedage.com/html2rss/html2rss.php?id=8387594</a></td></tr>

</tbody></table>")

add list to list(%feeds, $scrape attribute(<href=w"http://www.feedage.com/html2rss/html2rss.php?id=*">, "innertext"), "Delete", "Global")

 

 

John

 

 

Thanks John!!

Just learned something HUGE with the load HTML command.

Because of the fact that the original table will be dynamic,

the first thing I did - was to create a variable, from the page scrape to get the initial HTML

and then use your commands above (but substituting the variable to re-create the HTML)

and it works beautifully.

RIGHT!

off to compile and schedule.

thanks again!

 

Cheers.

Walt

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...