Jump to content
UBot Underground

Recommended Posts

Hello,

 

My name is Ricardo.  I am trying to learn how to scrape URLS and are having a hard time trying to figure out how to get a list of URLs off of googles search page.  I am looking here trying to find the "class" so I can select all the URLS on the page.  I am looking in here:

 

<a href="

" onmousedown="return rwt(this,'','','','8','AFQjCNET0Cgo85zpZkk1bf16E0ddjYO3Xw','','0CFkQtwIwBw','','',event)">Getting Started With <em>Ubot Studio</em> Drag And Drop - YouTube</a>

 

 

Not working no matter what I try.  Any help is greatly appreciated.  Thanks in advance.

Link to post
Share on other sites

Wow, a little rude....but thanks none the less.  I was in the proces of looking around after I posted.  Hope others are a little more friendlier to a noob around here.  I will remember to be when I make it to advanced member.  cheers.

 

We're not all harsh, dont worry :), thats just like saying  rtfm.... hate it.

  • Like 1
Link to post
Share on other sites

Before anyone post thinking I havent looked in ample places to fix this issue. I have. I promise.  I DONT WANT ANY HANDOUTS just some advice and maybe some examples.  If you only want to post to bash please keep it moving.  Thanks.

 

Quote: "I would rather roam lost than reach my destination with a car full of idiots!" - Ricardo McCarty 2013

Link to post
Share on other sites

Before anyone post thinking I havent looked in ample places to fix this issue. I have. I promise.  I DONT WANT ANY HANDOUTS just some advice and maybe some examples.  If you only want to post to bash please keep it moving.  Thanks.

 

Quote: "I would rather roam lost than reach my destination with a car full of idiots!" - Ricardo McCarty 2013

 

hahah i wasnt bashing you i was sticking up for you actually!

 

 

and for your problem, try scraping href and innertext :)

Link to post
Share on other sites

Here you go:

 

http://i.imgur.com/TOxVKoW.png

 

This is the regular expression:

 

(?<=href\=\")http\:\/\/.*?(?=\")

 

Now this was just the first thing I came up with so I am not saying it is the best way. What this will do is first scrape class = r which is the h3 tags and grab the innerhtml this will contain the <a href= in it and so I just made a regular expression that would grab everything that starts with http:// that has href=" before it and " after it. I hope this makes sense.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...