Jump to content
UBot Underground

Trying To Build A Simple Yt Scraper And Running Into A Few Issues


Recommended Posts

I'm trying to build a simple YT title and url scraper and am running into a few issues.

 

1.) I'm not quite sure if I'm using the best attribute possible to scrape my data...it seems that the last 2 bots I made died the next day even though I used wildcards. I think it may be good now but guess I'll find out tomorrow :)

 

2.) I'm running into an issue with the title and url not matching up at some point, my guess is due to it not being able to find the page attribute. It could just be something in my script though. I suspect one culprit might be that I'm using "don't delete" for titles but "delete for urls". I did this because for titles, they can be the same sometimes and for urls, I put don't delete because for some reason it was spitting out double the amount of records (could be an error in my script though).

 

What possible checks could I add to prevent the mismatching of the title and urls. What would be the best way to handle this? Not really sure how to tackle this one.

 

If you'd like to reproduce my error go to youtube.com and search for something, then run the script (don't forget to change output directory). It *might* run fine for the first 2 or 3 pages but if you let it run through about 5-10 that's usually where something screws up and I end up with the titles and urls not matching properly beyond the point of the initial failure

 

To reproduce the double records simply change the %fullurl list to "don't delete".

 

Ready to get my bot ripped apart :o

 

Here is my code. Thank you in advance!

 clear all data
define Scrapes Titles and URLs {
    add list to list(%title,$scrape attribute(<outerhtml=w"<a href=\"/watch?v=*\" class=\"yt-uix-sessionlink yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2*\" title=*">,"title"),"Don\'t Delete","Global")
    add list to list(%url,$scrape attribute(<outerhtml=w"<a href=\"/watch?v=*\" class=\"yt-uix-sessionlink yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2*\" title=*">,"href"),"Don\'t Delete","Global")
    with each(%url,#url) {
        add item to list(%fullurl,"http://www.youtube.com{#url}","Delete","Global")
    }
    add list to table as column(&TitlesURLs,0,0,%title)
    add list to table as column(&TitlesURLs,0,1,%fullurl)
    save to file("DIRECTORYGOESHERE\\testcsv.csv",&TitlesURLs)
}
loop while($exists(<data-link-type="next">)) {
    Scrapes Titles and URLs()
    click(<data-link-type="next">,"Left Click","No")
    wait for browser event("Everything Loaded","")
}
Edited by nubot
Link to post
Share on other sites

sample scrape 1 time

clear all data
navigate("https://www.youtube.com/results?search_query=Gaming+Music","Wait")
wait for browser event("Everything Loaded","")
wait(1)
add list to list(%title,$scrape attribute(<outerhtml=w"<a href=\"/watch?v=*\" class=\"yt-uix-sessionlink yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2*\" title=*">,"title"),"Don\'t Delete","Global")
add list to list(%url,$scrape attribute(<outerhtml=w"<a href=\"/watch?v=*\" class=\"yt-uix-sessionlink yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2*\" title=*">,"fullhref"),"Don\'t Delete","Global")
add list to table as column(&TitlesURLs,0,0,%title)
add list to table as column(&TitlesURLs,0,1,%url)
save to file("DIRECTORYGOESHERE\\testcsv.csv",&TitlesURLs)
if($exists(<data-link-type="next">)) {
    then {
        click(<data-link-type="next">,"Left Click","No")
        wait for browser event("Everything Loaded","")
        wait(1)
    }
    else {
    }
}

for url use "fullhref" instead "href"

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...