Bot-Factory 602 Posted October 17, 2010 Report Share Posted October 17, 2010 Hi. I try to scrape text from a very long website (17 Pages with 12.200 Words on it)I have to select the whole page to download it. But without the header and footer code. The problem happens when I try to define the Page Scrape Parameters. It's very slow and it hangs for a couple of seconds when I scroll up and down. Looks like it's just to much data for the page scraper parameters window. Here's the URL:http://dl.dropbox.com/u/10322/x.html And here's my complicated bot :-)http://dl.dropbox.com/u/10322/support.ubot The page scrape parameters in the bot are wrong. But I'm not able to select / choose the right left and right parameters. That's just not working. Thanks in advance for your helpDan Quote Link to post Share on other sites
Praney Behl 314 Posted October 17, 2010 Report Share Posted October 17, 2010 Issue fixed. I have attached a sample modified script for scraping and downloading the data to a text file. Give it a try let me know how you when. Cheers! PraneyMod_sample.ubot 1 Quote Link to post Share on other sites
Bot-Factory 602 Posted October 18, 2010 Author Report Share Posted October 18, 2010 Issue fixed. I have attached a sample modified script for scraping and downloading the data to a text file. Give it a try let me know how you when. Cheers! Praney Wow. That's working fine. Would be great if you could explain a little bit how you find such a solution?I watched all the scrape tutorials. But how did you find thechoose by attribute outerhtml<P>*</p> Would be great if you could explain that a little bit? Thanks in advanceDan Quote Link to post Share on other sites
Praney Behl 314 Posted October 18, 2010 Report Share Posted October 18, 2010 Hi Dan, The thing with scrape page or scrape attribute is you need to get hold of an attribute from the html code on the page. If you see in the code, I chose by attribute and selected outerhtml element, and I found that all the text you need is contained in <p>...</p> tags, so I used wildcards and scraped them all. Cheers! Praney Quote Link to post Share on other sites
Bot-Factory 602 Posted October 19, 2010 Author Report Share Posted October 19, 2010 Thanks a lot Praney, is there a tutorial or documentation which explains all the attributes like innerhtml / outerhtml and all the other stuff? That would be really helpful. Thanks Dan Quote Link to post Share on other sites
IRobot 43 Posted October 19, 2010 Report Share Posted October 19, 2010 Hi Dan, Have you seen http://www.ubotstudio.com/tutorials.aspx - Tutorials 10/11? Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.