Jump to content
UBot Underground

Recommended Posts

Hi all,

I'm trying to scrape the following search string: KEYWORD "comments" site:youtube.com/user

from google search results, but can't find a good attribute to scrape.  I'd like to scrape 4 pages of results, Any ideas on what I can use as the element to scrape?

I'm trying to scrape the channel name from the search results: www.youtube.com/user/optimalkeyword

 

Thanks for your help!

 

 

Link to post
Share on other sites

I think this should be working

<cite>*</cite>

And this is the Node command

add list to list(%URL scrapped, $scrape attribute(<outerhtml=w"<cite>*</cite>">, "innertext"), "Delete", "Global")
Edited by north_star
Link to post
Share on other sites

you dont change anything, you just use the code I give you above, and not change the add to list node,

 

you ask me how to use it.  but the main idea its still the same, this command below

add list to list(%URL scrapped, $scrape attribute(<outerhtml=w"<cite>*</cite>">, "innertext"), "Delete", "Global")
Edited by north_star
Link to post
Share on other sites

Yeah this is working great to scrape all of the urls from the page but it is picking up the paid ads as well. You got any ideas on how to scrape only the organic results.

 

It seems they were really spoiling us all that time when they used the class I.....

 

 

EDIT : I managed to do this with Regex to get only the organic results.

Link to post
Share on other sites

Yeah this is working great to scrape all of the urls from the page but it is picking up the paid ads as well. You got any ideas on how to scrape only the organic results.

 

It seems they were really spoiling us all that time when they used the class I.....

 

 

EDIT : I managed to do this with Regex to get only the organic results.

 

Care to share how you did that? I am not getting passed scraping all the hrefs on the page and can't seem to select the organic search results only.

Link to post
Share on other sites
  • 4 months later...

Works for me ...

add list to list(%results, $find regular expression($document text, "(?<=<h3 class=\"r\"><a href=\")(.*?)(?=\" onmousedown)"), "Delete", "Global")
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...