Jump to content
UBot Underground

Scraping Both Href And Innerhtml From Links


Recommended Posts

Good morning -

 

I'm trying to spider my own site, to get a list of all internal links.  On each I want both the href and the innertext

 

So far I have this:

navigate("http://www.xxxxxxxxx.com/locations/this-page/","Wait")
add list to list(%my innertext list,$scrape attribute(<href=w"/locations/*">,"innertext"),"Delete","Global")
alert(%my innertext list)
add list to list(%my href list,$scrape attribute(<href=w"/locations/*">,"href"),"Delete","Global")
alert(%my href list)

This does what I expect - gives me one list of innertext and another list of href

 

What I want is more of a table / array, where each "row" has both an href and an innertext

 

I'm not sure how I would either:

 

1.  Loop over these separately and add them to a table (array) type structure, or

 

2.  Whether I should be scraping the outerhtml instead, then loop over the outerhtml which is a list of this:

<a hef="/locations/location-one">This is the first text</a>
<a hef="/locations/location-two">This is the second text</a>
<a hef="/locations/location-three">This is the third text</a>

... and then try to re-parse them individually.  And if I should reparse them individually, how would I do that, since I can't (as far as I know) $scrape attribute of an item on a list?

 

Thanks!

 

 

 

Link to post
Share on other sites

Here is another way  

 

set(#counter,0,"Global")
loop($list total(%links)) {
    set table cell(&result,#counter,0,$next list item(%links))
    set table cell(&result,#counter,0,$next list item(%titles))
    increment(#counter)
}
 
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...