Learjet 27 Posted January 4, 2016 Report Share Posted January 4, 2016 Been trying to get this for about 6 hours without any luck, I'm trying to get the links from Google RSS here: https://news.google.com/news?cf=all&hl=en&ned=us&q=Red+Skelton&output=rss The links look like this: <link>http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNGdVPVljckRj3DjAnoe4B1Bs8I6Ow&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779022175028&ei=u9yJVvi_MerrwQGb67GQBA&url=http://www.buffalonews.com/life-arts/book-reviews/book-review-limping-on-water-by-phil-beuth-with-kc-schulberg-20160103</link> However, I just need this: http://www.buffalonews.com/life-arts/book-reviews/book-review-limping-on-water-by-phil-beuth-with-kc-schulberg-20160103 The Ubot browser tries to parse it with html and it's an RSS feed so I can't get it to scrape right.Thanks for your patience with new folks :-) Peace,EJ Quote Link to post Share on other sites
pash 504 Posted January 4, 2016 Report Share Posted January 4, 2016 try. alert($find regular expression("<link>http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNGdVPVljckRj3DjAnoe4B1Bs8I6Ow&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779022175028&ei=u9yJVvi_MerrwQGb67GQBA&url=http://www.buffalonews.com/life-arts/book-reviews/book-review-limping-on-water-by-phil-beuth-with-kc-schulberg-20160103</link>","(?<=url=).*?(?=<\\/link>)")) Quote Link to post Share on other sites
Learjet 27 Posted January 6, 2016 Author Report Share Posted January 6, 2016 Hi Pash, I got the file downloaded and the regex figured out, having a hard time figuring how to scrape from the .txt file that I created with the RSS Code in it. Here's the Regex for Google News RSS in case someone needs it: https://news.google.com/news?cf=all&hl=en&ned=us&q=KEYWORD&output=rss Or https://news.google.com/news?cf=all&hl=en&ned=us&q=YOUR+KEYWORD&output=rss Here's the REGEX to extract the links: (?<=\&url=).*?(?=<\/link>) Thanks in advance for your help!Peace,Z Quote Link to post Share on other sites
ds062692 19 Posted January 6, 2016 Report Share Posted January 6, 2016 You don't have to download it. You can just use the read file command. Something like this. add list to list(%links,$find regular expression($read file("https://news.google.com/news?cf=all&hl=en&ned=us&q=Red+Skelton&output=rss"),"(?<=\\&url=).*?(?=<\\/link>)"),"Delete","Global") Quote Link to post Share on other sites
pash 504 Posted January 6, 2016 Report Share Posted January 6, 2016 try set(#Html,$read file("https://news.google.com/news?cf=all&hl=en&ned=us&q=KEYWORD&output=rss"),"Global") set(#Html,$replace(#Html,"<","<"),"Global") set(#Html,$replace(#Html,">",">"),"Global") set(#Html,$replace(#Html,""","\""),"Global") set(#Html,$replace(#Html,"'","\'"),"Global") set(#Html,$replace(#Html,"'","\'"),"Global") loop(10) { set(#Html,$replace(#Html,"&","&"),"Global") } load html(#Html) wait(2) add list to list(%Links,$find regular expression(#Html,"(?<=&url=).*?(?=(</link>|\"))"),"Delete","Global") load html(" {$replace(%Links," ","<br>")}") 1 Quote Link to post Share on other sites
Learjet 27 Posted January 6, 2016 Author Report Share Posted January 6, 2016 ds062692, Thanks so much, perfect! Thanks to you too pash! My head is swimming, there's so much to learn that it seems a bit overwhelming right now, but I'm getting it slowly... Peace,EJ Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.