braun 0 Posted October 10, 2011 Report Share Posted October 10, 2011 How do I scrape a list that is only separated by new lines? I tried using list to list and item to list, but everytime I saved I ended up with a text with no breaks or spaces. Example of what I want to scrape: <pre> item1 item2 item3 </pre> Quote Link to post Share on other sites
JohnB 255 Posted October 10, 2011 Report Share Posted October 10, 2011 What are you scraping and from where? All scraped content, if scraped properly will end up as a ist (each item on it's own line). John Quote Link to post Share on other sites
braun 0 Posted October 13, 2011 Author Report Share Posted October 13, 2011 What are you scraping and from where? All scraped content, if scraped properly will end up as a ist (each item on it's own line). John thanks and sorry for the late reply. The problem was that I always tried to store the list of items in a "list" or a variable. Now that I tried to directly save the scraped content everything worked fine. Just in case anyone is interested, I tried to scrape the content of a code-tag in a vBulletin-Board: save to file("{$special folder("Desktop")}\\testList.txt", $scrape attribute(<outerhtml=w"<pre class=\"alt2\" dir=\"ltr\" style=\"*\">*</pre>">, "innertext")) Quote Link to post Share on other sites
walterbayliss 0 Posted April 26, 2012 Report Share Posted April 26, 2012 What are you scraping and from where? All scraped content, if scraped properly will end up as a ist (each item on it's own line). JohnHi John, Hoping you can help. My issue is that the list is not inserting line breaks at all. So - a page scrape, add item to listReturns a list with all items on one line So... then their is no seperator, it treats the whole list as one item(so - no way to put the list into a table, and no way to select an item from the list and fill in a field) http://www.feedage.com/html2rss/html2rss.php?id=8387590 http://www.feedage.com/html2rss/html2rss.php?id=8387591 http://www.feedage.com/html2rss/html2rss.php?id=8387592 http://www.feedage.com/html2rss/html2rss.php?id=8387593 http://www.feedage.com/html2rss/html2rss.php?id=8387594 This is a list that I scraperd of six RSS feeds. They are all on the same line... No idea how to solve it. Cheers.Walt Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted April 26, 2012 Report Share Posted April 26, 2012 Hi try using "$add list to list" instead of "add item to list" it should solve the problem. Quote Link to post Share on other sites
JohnB 255 Posted April 26, 2012 Report Share Posted April 26, 2012 Add item to list will always add everything as a single item...You need to use add list to list. John Quote Link to post Share on other sites
walterbayliss 0 Posted April 26, 2012 Report Share Posted April 26, 2012 Add item to list will always add everything as a single item...You need to use add list to list. JohnThanks John and Kreatus, I appreciate the reply. So - When I am scraping from a page... can I still add list to list? This is the source from the page I am trying to scrapebr><table border="1"><tbody><tr><td>#</td><td>URL</td><td>RSS URL</td></tr><tr><td>0</td><td>http://knowforex.com/forexbrokers/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387590">http://www.feedage.com/html2rss/html2rss.php?id=8387590</a></td></tr><tr><td>1</td><td>http://knowforex.com/forextradingsystem/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387591">http://www.feedage.com/html2rss/html2rss.php?id=8387591</a></td></tr><tr><td>2</td><td>http://knowforex.com/forextradingstrategies/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387592">http://www.feedage.com/html2rss/html2rss.php?id=8387592</a></td></tr><tr><td>3</td><td>http://knowforex.com/forextradingsignals/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387593">http://www.feedage.com/html2rss/html2rss.php?id=8387593</a></td></tr><tr><td>4</td><td>http://knowforex.com/forextradingtraining/</td><td><a href="http://www.feedage.com/html2rss/html2rss.php?id=8387594">http://www.feedage.com/html2rss/html2rss.php?id=8387594</a></td></tr></tbody></table><br></div></div> And I am looking to create a list from just the HREF ofhttp://www.feedage.com/html2rss/html2rss.php?id=8387590 What do you think? Cheers.Walt Quote Link to post Share on other sites
JohnB 255 Posted April 26, 2012 Report Share Posted April 26, 2012 Try this: clear list(%feeds)load html("<table border=\"1\"><tbody><tr><td>#</td><td>URL</td><td>RSS URL</td></tr><tr><td>0</td><td>http://knowforex.com/forexbrokers/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387590\">http://www.feedage.com/html2rss/html2rss.php?id=8387590</a></td></tr><tr><td>1</td><td>http://knowforex.com/forextradingsystem/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387591\">http://www.feedage.com/html2rss/html2rss.php?id=8387591</a></td></tr><tr><td>2</td><td>http://knowforex.com/forextradingstrategies/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387592\">http://www.feedage.com/html2rss/html2rss.php?id=8387592</a></td></tr><tr><td>3</td><td>http://knowforex.com/forextradingsignals/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387593\">http://www.feedage.com/html2rss/html2rss.php?id=8387593</a></td></tr><tr><td>4</td><td>http://knowforex.com/forextradingtraining/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387594\">http://www.feedage.com/html2rss/html2rss.php?id=8387594</a></td></tr></tbody></table>")add list to list(%feeds, $scrape attribute(<href=w"http://www.feedage.com/html2rss/html2rss.php?id=*">, "innertext"), "Delete", "Global") John Quote Link to post Share on other sites
walterbayliss 0 Posted April 26, 2012 Report Share Posted April 26, 2012 Try this: clear list(%feeds)load html("<table border=\"1\"><tbody><tr><td>#</td><td>URL</td><td>RSS URL</td></tr><tr><td>0</td><td>http://knowforex.com/forexbrokers/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387590\">http://www.feedage.com/html2rss/html2rss.php?id=8387590</a></td></tr><tr><td>1</td><td>http://knowforex.com/forextradingsystem/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387591\">http://www.feedage.com/html2rss/html2rss.php?id=8387591</a></td></tr><tr><td>2</td><td>http://knowforex.com/forextradingstrategies/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387592\">http://www.feedage.com/html2rss/html2rss.php?id=8387592</a></td></tr><tr><td>3</td><td>http://knowforex.com/forextradingsignals/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387593\">http://www.feedage.com/html2rss/html2rss.php?id=8387593</a></td></tr><tr><td>4</td><td>http://knowforex.com/forextradingtraining/</td><td><a href=\"http://www.feedage.com/html2rss/html2rss.php?id=8387594\">http://www.feedage.com/html2rss/html2rss.php?id=8387594</a></td></tr></tbody></table>")add list to list(%feeds, $scrape attribute(<href=w"http://www.feedage.com/html2rss/html2rss.php?id=*">, "innertext"), "Delete", "Global") John Thanks John!!Just learned something HUGE with the load HTML command. Because of the fact that the original table will be dynamic, the first thing I did - was to create a variable, from the page scrape to get the initial HTMLand then use your commands above (but substituting the variable to re-create the HTML) and it works beautifully. RIGHT!off to compile and schedule. thanks again! Cheers.Walt Quote Link to post Share on other sites
JohnB 255 Posted April 26, 2012 Report Share Posted April 26, 2012 Awesome...I am glad I was able to help! http://ubotstudio.com/forum/public/style_emoticons/default/smile.gif John Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.