Jump to content
UBot Underground

Page Scrape Issue


Recommended Posts

Hi.

 

I try to scrape text from a very long website (17 Pages with 12.200 Words on it)

I have to select the whole page to download it. But without the header and footer code.

 

The problem happens when I try to define the Page Scrape Parameters. It's very slow and it hangs for a couple of seconds when I scroll up and down.

 

Looks like it's just to much data for the page scraper parameters window.

 

Here's the URL:

http://dl.dropbox.com/u/10322/x.html

 

And here's my complicated bot :-)

http://dl.dropbox.com/u/10322/support.ubot

 

The page scrape parameters in the bot are wrong. But I'm not able to select / choose the right left and right parameters. That's just not working.

 

 

Thanks in advance for your help

Dan

Link to post
Share on other sites

Issue fixed. I have attached a sample modified script for scraping and downloading the data to a text file.

 

Give it a try let me know how you when.

 

Cheers!

 

Praney

 

 

Wow. That's working fine. Would be great if you could explain a little bit how you find such a solution?

I watched all the scrape tutorials. But how did you find the

choose by attribute

 

outerhtml

<P>*</p>

 

Would be great if you could explain that a little bit?

 

Thanks in advance

Dan

Link to post
Share on other sites

Hi Dan,

 

The thing with scrape page or scrape attribute is you need to get hold of an attribute from the html code on the page. If you see in the code, I chose by attribute and selected outerhtml element, and I found that all the text you need is contained in <p>...</p> tags, so I used wildcards and scraped them all.

 

Cheers!

 

Praney

Link to post
Share on other sites

Thanks a lot Praney,

 

is there a tutorial or documentation which explains all the attributes like innerhtml / outerhtml and all the other stuff? That would be really helpful.

 

 

Thanks

Dan

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...