Jump to content
UBot Underground

A small explanation needed with scrape attribute


Recommended Posts

Hi ubotters,

 

From my understanding of ubot, the ability to select an element in order to grab the url of the item in question has always worked for me.

 

Recently though I'm finding that from the element selection I am only able to grab things like the URL of the image, rather than a link to the page that the image is on. For example go to this page http://vimeo.com/channels/aework/videos

 

Put a mouse over any of those videos - you can see the path to the page they are on, down at the bottom of your browser. Now do the same in ubot and do a scrape attribute. It will only scrape the image url.

 

So, when I do a scrape attribute there's no way for me to grab that url path... So, I've taken to looking inside the source code of that page, but honestly, I dont know what Im supposed to be looking for.

 

So, my question really is this: is there a simple way for me to obtain these paths to the correct pages rather than scraping the image url?

 

I think I will need to look outside the element selector for this, but it is this bit I have the problem with. What should I be looking for in the source code? Is it something to do with <class>? if so, what?

Sometimes I see <class id = xyz

however when I get a solution from ubotters I often see that the result I get is eomthing like <class= and I cant see that anywhere inside the source code, so I figure that you guys must know something that will make this easier for me.

 

 

Looking forward to your help! btw this page http://vimeo.com/channels/aework/videos is a perfect example. The same goes for a pinterest page http://pinterest.com/search/?q=apples (thanks to k1 (Kevin) for helping me with that).

 

 

 

Cheers,

 

Kev

Link to post
Share on other sites

Here's the code that will work on vimeo. Just paste it on ubot.

 

add list to list(%channels, $scrape attribute(<href=w"/channels/*/*">, "fullhref"), "Delete", "Global")

 

Another one using regex. To get only the main channel urls:

add list to list(%channels, $scrape attribute(<href=r"channels/\\w+\\/[0-9]\{4,10\}">, "fullhref"), "Delete", "Global")

 

You need to expand your selection on scrape attribute to get the details you want..

Link to post
Share on other sites

 

 

You need to expand your selection on scrape attribute to get the details you want..

 

Hi Kreatus,

 

This is the bit I dont understand. What do you mean by explanding my selection?

Link to post
Share on other sites

Hi Kreatus,

 

This is the bit I dont understand. What do you mean by explanding my selection?

Hi Kev,

What I mean by that is you need to expand your scrape attribute selection to get more attribute options. Check this screenshot post-440-0-43963200-1338817805_thumb.png

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...