nubot 1 Posted March 11, 2016 Report Share Posted March 11, 2016 (edited) I'm trying to build a simple YT title and url scraper and am running into a few issues. 1.) I'm not quite sure if I'm using the best attribute possible to scrape my data...it seems that the last 2 bots I made died the next day even though I used wildcards. I think it may be good now but guess I'll find out tomorrow 2.) I'm running into an issue with the title and url not matching up at some point, my guess is due to it not being able to find the page attribute. It could just be something in my script though. I suspect one culprit might be that I'm using "don't delete" for titles but "delete for urls". I did this because for titles, they can be the same sometimes and for urls, I put don't delete because for some reason it was spitting out double the amount of records (could be an error in my script though). What possible checks could I add to prevent the mismatching of the title and urls. What would be the best way to handle this? Not really sure how to tackle this one. If you'd like to reproduce my error go to youtube.com and search for something, then run the script (don't forget to change output directory). It *might* run fine for the first 2 or 3 pages but if you let it run through about 5-10 that's usually where something screws up and I end up with the titles and urls not matching properly beyond the point of the initial failure To reproduce the double records simply change the %fullurl list to "don't delete". Ready to get my bot ripped apart Here is my code. Thank you in advance! clear all data define Scrapes Titles and URLs { add list to list(%title,$scrape attribute(<outerhtml=w"<a href=\"/watch?v=*\" class=\"yt-uix-sessionlink yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2*\" title=*">,"title"),"Don\'t Delete","Global") add list to list(%url,$scrape attribute(<outerhtml=w"<a href=\"/watch?v=*\" class=\"yt-uix-sessionlink yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2*\" title=*">,"href"),"Don\'t Delete","Global") with each(%url,#url) { add item to list(%fullurl,"http://www.youtube.com{#url}","Delete","Global") } add list to table as column(&TitlesURLs,0,0,%title) add list to table as column(&TitlesURLs,0,1,%fullurl) save to file("DIRECTORYGOESHERE\\testcsv.csv",&TitlesURLs) } loop while($exists(<data-link-type="next">)) { Scrapes Titles and URLs() click(<data-link-type="next">,"Left Click","No") wait for browser event("Everything Loaded","") } Edited March 11, 2016 by nubot Quote Link to post Share on other sites
pash 504 Posted March 11, 2016 Report Share Posted March 11, 2016 sample scrape 1 time clear all data navigate("https://www.youtube.com/results?search_query=Gaming+Music","Wait") wait for browser event("Everything Loaded","") wait(1) add list to list(%title,$scrape attribute(<outerhtml=w"<a href=\"/watch?v=*\" class=\"yt-uix-sessionlink yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2*\" title=*">,"title"),"Don\'t Delete","Global") add list to list(%url,$scrape attribute(<outerhtml=w"<a href=\"/watch?v=*\" class=\"yt-uix-sessionlink yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2*\" title=*">,"fullhref"),"Don\'t Delete","Global") add list to table as column(&TitlesURLs,0,0,%title) add list to table as column(&TitlesURLs,0,1,%url) save to file("DIRECTORYGOESHERE\\testcsv.csv",&TitlesURLs) if($exists(<data-link-type="next">)) { then { click(<data-link-type="next">,"Left Click","No") wait for browser event("Everything Loaded","") wait(1) } else { } }for url use "fullhref" instead "href" 1 Quote Link to post Share on other sites
nubot 1 Posted March 11, 2016 Author Report Share Posted March 11, 2016 Thanks, works great! Now have to work on some filtering... Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.