turovieal 0 Posted November 6, 2013 Report Share Posted November 6, 2013 Guys can anyone help me with this stuff? htmlblabla12343refikririrnrhtmlllllsskksjksjsjsjdjijfiejfiejfiefjeijfeijfiejfiejfiejfeijfjeicminiaooasoasoaoaa htmlblabla www.autotrader.com/p/for-sale/new-peugeot-diesel-white/1037414207"wegegegehtml htmlblabla www.autotrader.com/p/for-sale/renault-clio-blue-for-repair/1037768565"ergergerg4g4ghtml htmlblabla www.autotrader.com/p/for-sale/wanted-renault-megane-yellow/1037768565"ergegegeehtml htmlblabladfvervrvrtbrefikririrnrhtmlllllsskksjksjsjsjdjijfiejfiejfiefjeijfeijfiejfiejfiejfeijfjeifrgbrtoasoasoaoaa I use this the formula to scrape all links from website's html (www[.]autotrader[.]com\/p\/)[^"]+ this formula let's me scrape only links starting with www.autotrader.com/p/ and all characters after that until " it works perfectly with one BUT please have a look on those 3 links, the bottom one has a word ,,wanted,, in it, and middle one has ,,for repair,, WHAT's THE FORMULA to scrape only all links that have NO words ,,wanted,, ,,repair,, (for repair?) in it, so as a result I should get only link number one scraped to the list in ubot, I hope you know what I mean. Please help Quote Link to post Share on other sites
LazyBotter 188 Posted November 7, 2013 Report Share Posted November 7, 2013 www.(.*)(?=") Quote Link to post Share on other sites
k1lv9h 76 Posted November 7, 2013 Report Share Posted November 7, 2013 Hi, Regex code to select url with the word new in it: (www[.]autotrader[.]com\/p\/.*new[^"]+)Regex code to select url without the words repair and wanted: (?!.*repair|.*wanted)(www[.]autotrader[.]com\/p\/[^"]+)Kevin Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.