Asentrix 17 Posted May 25, 2015 Report Share Posted May 25, 2015 Need help scraping twitch usernames from game directories.There's no exact element to scrape twitch usernames within game directories as far as I can seeThe only way is to scrape the channel links "/example" and add them to a list Only problem is , when scraping the href using a wildcard , you pick up all the generic / crappy links with it.I was hoping some regex expert could help me filter the href / links on the page? - The page http://www.twitch.tv/directory Any game category works Quote Link to post Share on other sites
HelloInsomnia 1103 Posted May 25, 2015 Report Share Posted May 25, 2015 You can use this: add list to list(%usernames, $scrape attribute(<href=w"/*/profile">, "href"), "Delete", "Global") Then just use the $replace function to get rid of the bits you don't need like / and /profile Quote Link to post Share on other sites
Asentrix 17 Posted May 25, 2015 Author Report Share Posted May 25, 2015 That doesn't seem to work :lAlso with the replace function , I don't think it's able to remove wildcards is it? Eg. /profile/random-string Thanks for the help! Quote Link to post Share on other sites
HelloInsomnia 1103 Posted May 25, 2015 Report Share Posted May 25, 2015 If you want to regex you can do this: add list to list(%usernames, $find regular expression($document text, "(?<=\\/).*?(?=\\/profile)"), "Delete", "Global") Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.