Brutal 164 Posted September 3, 2013 Report Share Posted September 3, 2013 Hi guys, I'm having trouble figuring out how to do this and would really appreciate any help that anyone could offer. I am scraping a list of urls, and I want to remove (or replace with nothing) all google related urls and leave all non-google related urls. Any ideas? Here's a snippet of the list i'm working with... http://www.google.com/ http://www.google.com/search?hl=en&q=xOn3HOP739w&bav=on.2,or.r_qf.&bvm=pv.xjs.s.en_US.M4-36_38X9A.O&um=1&ie=UTF-8&tbm=isch&source=og&sa=N&tab=wi http://maps.google.com/maps?hl=en&q=xOn3HOP739w&bav=on.2,or.r_qf.&bvm=pv.xjs.s.en_US.M4-36_38X9A.O&um=1&ie=UTF-8&sa=N&tab=wl https://play.google.com/?hl=en&q=xOn3HOP739w&bav=on.2,or.r_qf.&bvm=pv.xjs.s.en_US.M4-36_38X9A.O&um=1&ie=UTF-8&sa=N&tab=w8 http://www.youtube.com/results?hl=en&q=xOn3HOP739w&bav=on.2,or.r_qf.&bvm=pv.xjs.s.en_US.M4-36_38X9A.O&um=1&ie=UTF-8&sa=N&tab=w1 http://translate.google.com/?hl=en&q=xOn3HOP739w&bav=on.2,or.r_qf.&bvm=pv.xjs.s.en_US.M4-36_38X9A.O&um=1&ie=UTF-8&sa=N&tab=wT http://www.google.com/search?hl=en&q=xOn3HOP739w&bav=on.2,or.r_qf.&bvm=pv.xjs.s.en_US.M4-36_38X9A.O&um=1&ie=UTF-8&tbo=u&tbm=bks&source=og&sa=N&tab=wp http://www.google.com/search?hl=en&q=xOn3HOP739w&bav=on.2,or.r_qf.&bvm=pv.xjs.s.en_US.M4-36_38X9A.O&um=1&ie=UTF-8&tbo=u&tbm=shop&source=og&sa=N&tab=wf http://www.google.com/finance?hl=en&q=xOn3HOP739w&bav=on.2,or.r_qf.&bvm=pv.xjs.s.en_US.M4-36_38X9A.O&um=1&ie=UTF-8&sa=N&tab=we Quote Link to post Share on other sites
pftg4 102 Posted September 3, 2013 Report Share Posted September 3, 2013 regex would be the way to go Quote Link to post Share on other sites
Brutal 164 Posted September 3, 2013 Author Report Share Posted September 3, 2013 pftg4 - I hear ya man, but i still struggle with very low-scale regex... The regex it would take to do something like this is well above my skill level. I appreciate your input man. Quote Link to post Share on other sites
pftg4 102 Posted September 3, 2013 Report Share Posted September 3, 2013 hey they all have google in them Quote Link to post Share on other sites
Brutal 164 Posted September 4, 2013 Author Report Share Posted September 4, 2013 oh - then I just didn't copy/paste in enough of the list - out of the 56 links, there are likely only around 15 that are non-google. Quote Link to post Share on other sites
pftg4 102 Posted September 4, 2013 Report Share Posted September 4, 2013 ok then lets see um Quote Link to post Share on other sites
Brutal 164 Posted September 4, 2013 Author Report Share Posted September 4, 2013 add list to list(%scrapegooglelinks, $scrape attribute(<outerhtml=w"*{#MyKeyword}*">, "fullhref"), "Delete", "Global") Basically, because I couldn't find a better way, I just set up to scrape all urls in the google serps for a keyword, and then was/am trying to take out all of the google stuff so that I'm left with the 'regular' links left behind. Quote Link to post Share on other sites
pftg4 102 Posted September 4, 2013 Report Share Posted September 4, 2013 http://ubotstudio.com/blog/2011/11/19/871/ read this will help with your selecting then you have no more problems hope this helps (it will) pftg4 Quote Link to post Share on other sites
Brutal 164 Posted September 4, 2013 Author Report Share Posted September 4, 2013 Thanks pftg4 - Very appreciated! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.