Kreatus (Ubot Ninja) 422 Posted July 6, 2011 Report Share Posted July 6, 2011 I have a large list of URLs in a .txt file and I need to remove duplicate DOMAINS (and the entire corresponding URL to each duplicate) while leaving behind the first occurrence of each domain. http://www.exampleurl.com/something.php http://exampleurl.com/somethingelse.htm http://exampleurl2.com/another-url http://www.exampleurl2.com/a-url.htm http://exampleurl2.com/yet-another-url.html http://exampleurl.com/ http://www.exampleurl3.com/here_is_a_url http://www.exampleurl5.com/something Whatever the solution is, the output file using the above as the input, should be this: http://www.exampleurl.com/something.php http://exampleurl2.com/another-url http://www.exampleurl3.com/here_is_a_url http://www.exampleurl5.com/something You notice there are no duplicate domains now, and it left behind the first occurrence it came across. Quote Link to post Share on other sites
sales 1 Posted October 16, 2011 Report Share Posted October 16, 2011 This regex worked for me in uBot 4 Pro. It deletes duplicate URLs, whether adjacent to a duplicate row or not: ^(.*)(?:\\r?\\n|\\r)(?=[\\s\\S]*^\\1$) Quote Link to post Share on other sites
xcgsoho 0 Posted September 14, 2012 Report Share Posted September 14, 2012 set(#url, "http://www.exampleurl.com/something.phphttp://exampleurl.com/somethingelse.htm http://exampleurl2.com/another-url http://www.exampleurl2.com/a-url.htm http://exampleurl2.com/yet-another-url.html http://exampleurl.com/ http://www.exampleurl3.com/here_is_a_url http://www.exampleurl5.com/something", "Global")set(#urls, $find regular expression(#url, "^(.*)(?:\\\\r?\\\\n|\\\\r)(?=[\\\\s\\\\S]*^\\\\1$) "), "Global")alert(#urls) It is wrong. Quote Link to post Share on other sites
Pete 121 Posted September 14, 2012 Report Share Posted September 14, 2012 You are going to have to loop them i think set(#List, "http://www.exampleurl.com/something.php http://exampleurl.com/somethingelse.htm http://exampleurl2.com/another-url http://www.exampleurl2.com/a-url.htm http://exampleurl2.com/yet-another-url.html http://exampleurl.com/ http://www.exampleurl3.com/here_is_a_url http://www.exampleurl5.com/something", "Global") add list to list(%RawUrls, $list from text(#List, $new line), "Delete", "Global") add item to list(%Clean Url List, $next list item(%RawUrls), "Delete", "Global") loop($list total(%RawUrls)) { set(#Current Url, $find regular expression($next list item(%RawUrls), "[a-z-A-Z0-9]\{1,99\}\\.((com|org|net|eu|pt|uk|es|br|co|cz|fn)|\\.(uk|vu|cz|en|br|es))"), "Global") set(#Compair, %Clean Url List, "Global") if($contains(#Compair, #Current Url)) { then { } else { add item to list(%Clean Url List, $list item(%RawUrls, $subtract($list position(%RawUrls), 1)), "Delete", "Global") } } } 1 Quote Link to post Share on other sites
zenos 13 Posted February 5, 2013 Report Share Posted February 5, 2013 Hello, really good script, but i find an error when i run it. Can you help me fix the error please ? http://tdiv.free.fr/Captureerrorbot.JPG Quote Link to post Share on other sites
VaultBoss 310 Posted February 5, 2013 Report Share Posted February 5, 2013 You should be aware that list indexes (counting items) is 0-based; as such, a list with 3 items has them indexed correspondingly in positions: 0 -- for first list item1 -- for second list item2 -- for third list item The error you presented most probably comes from you looping the list indexes starting from 1 instead of 0, so that the last item you want to get from the list actually isn't there anymore (your $next list item returns an index outside the bound of the list)Make sure to loop the list within its boundaries. Use $list item instead of $next list item and an index counter to set the list position properly and definitely. With $next list item you also have to make sure you set the list position index counter to 0 before starting to loop it, which may be another cause of your fail. Hope this helps you... 2 Quote Link to post Share on other sites
zenos 13 Posted February 8, 2013 Report Share Posted February 8, 2013 (edited) You rock ! Thanks a lot man it works ! Edited February 8, 2013 by zenos Quote Link to post Share on other sites
VaultBoss 310 Posted February 8, 2013 Report Share Posted February 8, 2013 You're welcome - feel free to hit the ✔ LIKE THIS button on the bottom-right corner of any post of anyone who helps you on the forum. Cheers! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.