Legend 181 Posted January 20, 2012 Report Share Posted January 20, 2012 Hey gang, I need a regex expression to trim a url string to its root, removing everything... subdomains, http, trailing backslash, folder, etc. Ex: all the below should return abc.com http://www.abc.com 123.abc.com/www.abc.com/123.htmlhttp://www.abc.com/123/456 Is this doable? Anyone have such a creature in their magic bag of tricks? http://ubotstudio.com/forum/public/style_emoticons/default/blink.gif TIAhttp://ubotstudio.com/forum/public/style_emoticons/default/smile.gif Quote Link to post Share on other sites
LoWrIdErTJ - BotGuru 904 Posted January 20, 2012 Report Share Posted January 20, 2012 http://regexlib.comIs a great repository of regex statements. Best thing I found to do is: Make sure each url has some sort of consistencyIf some have http have the others have it as well (makes it easier when building specific regex statements) I haven't been able to find a regex solution that will specifically (without getting really in depth) that will pull just the domain. However one that will pull sub domain, domain, and tld\b([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b Your problem will be the inconsistency's.As in with sub domains existing or not? www is considered by regex as a sub domain Quote Link to post Share on other sites
odeesuba 24 Posted January 20, 2012 Report Share Posted January 20, 2012 Try to expand on this load html("http://www.abc.com 123.abc.com/ www.abc.com/123.html http://www.abc.com/123/456") add list to list(%Domains, $find regular expression($document text, "[0-9A-Za-z]+\\.(com|net|org)"), "Delete", "Global") Quote Link to post Share on other sites
Legend 181 Posted January 21, 2012 Author Report Share Posted January 21, 2012 Ah... very nice! That is something I can definitely work with! Thanks guys!!http://ubotstudio.com/forum/public/style_emoticons/default/smile.gif Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.