Jump to content
UBot Underground

Scraping archive.org


Recommended Posts

Hey Guys,

 

I've got a great problem. My wordpress databasecrashed and now I have to rebuild my page.

 

I thought about scraping archive.org. For example scrape the content of

 

http://web.archive.org/web/20100211154758/http://www.guteshandy.de/prepaid-vergleich.html

 

I tried:

 

set(#content, $page scrape("<div class=\"sale-title\">", "<!--<div class=\"body-right\">"), "Global")

save to file("C:\\content.txt", #content)

 

to scrape the HTML Code between the htm-tag <div class=\"sale-title\"> and <!--<div class=\"body-right\">

 

but the content.txt file is empty :-( I'm going crazy because I can't find a solution :-(((((

Link to post
Share on other sites

It looks like there might be an issue with $page scrape that I can look into.

 

You can use $scrape attribute though, to scrape the outerhtml of the <div class="body-middle">

 

Here's the code I used that seems to work fine for me:

 

set(#content, $scrape attribute(<class="body-middle">, "outerhtml"), "Global")

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...