Jump to content
UBot Underground

Newb: How to make it scrape/visit links on certain cases only


Recommended Posts

Hi,

 

im not sure how to do this, because im just started learning this program and it looks pretty advanced. Anyways, what im standing against with is a problem. Im working on a bot, that will go to sidereel and search for episodes that I have entered into txt file. All is Ok up to this part, but from there it goes hard...

 

I need the bot to search the page if there is megavideo.com presented (this i can do with if function, if it does not fidn megavideo.com on the result page, it will continue with another line from the text file) but if it does find, then how can I make it open 2-3 links which are related to megavideo? I have tried it so many ways, but have no clue....

 

There is no good outerhtml or any similarity to sort it out. What I would like is the bot to gather 2-3 links from megavideo.com for each episode...I just cant make it to work...

 

What information you would need to help me with this problem?

Link to post
Share on other sites

Yeah, but what should the sub consist of, how to teach the bot that it only clicks onto megavideo.com links or saves only megavideo.com links...

 

the problem is when I watch the outerhtml or any other possible solution there is nothing related to megavideo.com

 

EDIT: Ok, i got it to scrape all urls (thats a start, but the list it saves, still has some links i dont like)

 

EDIT2: Another problem - okay, i got it to collect only megavideo.com links now and it saves them to a list, then it loads the list and navigates to those links and saves the embed code and saves it, all is working, but...

 

somehow the text is not unicode therefor ruining the whole point...

 

and example

 

This is how it scrapes

<object width="640" height="361"><param name="movie" value="http://www.megavideo.com/v/VYHY8IR507253d05b6306e73445c230b9692b4fc"></param><param name="allowFullScreen" value="true"></param><embed src="http://www.megavideo.com/v/VYHY8IR507253d05b6306e73445c230b9692b4fc" type="application/x-shockwave-flash" allowfullscreen="true" width="640" height="361"></embed></object>

 

This is how it should

<object width="640" height="361"><param name="movie" value="http://www.megavideo.com/v/9HCN3ZLI07253d05b6306e73445c230b9692b4fc"></param><param name="allowFullScreen" value="true"></param><embed src="http://www.megavideo.com/v/9HCN3ZLI07253d05b6306e73445c230b9692b4fc" type="application/x-shockwave-flash" allowfullscreen="true" width="640" height="361"></embed></object>

 

All other codes look clean everywhere, but not this part...

Link to post
Share on other sites

i wanna do something like what ur doing too...i want it to check my inbox for a particular email and if its not there then return to the top of the script until the email is present...my question is how do u tell it to find something thats not there?

Link to post
Share on other sites

[quote name='eestisiin' date='25 January 2010 - 06:16 AM' timestamp='1264418210' post='7623'

 

 

(this i can do with if function, if it does not find megavideo.com on the result page, it will continue with another line from the text file) but if it does find, then how can I make it open 2-3 links which are related to megavideo?

 

this is what i would like to know...i have identified the attribute but how do i tell it if this attribute isnt found run a sub?

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...