Jump to content
UBot Underground

Recommended Posts

Hey gang,

 

I need a regex expression to trim a url string to its root, removing everything... subdomains, http, trailing backslash, folder, etc.

 

 

Ex: all the below should return abc.com

 

http://www.abc.com

123.abc.com/

www.abc.com/123.html

http://www.abc.com/123/456

 

Is this doable? Anyone have such a creature in their magic bag of tricks? http://ubotstudio.com/forum/public/style_emoticons/default/blink.gif

 

TIA

http://ubotstudio.com/forum/public/style_emoticons/default/smile.gif

Link to post
Share on other sites

http://regexlib.com

Is a great repository of regex statements.

 

Best thing I found to do is:

 

Make sure each url has some sort of consistency

If some have http have the others have it as well (makes it easier when building specific regex statements)

 

I haven't been able to find a regex solution that will specifically (without getting really in depth) that will pull just the domain.

 

However one that will pull sub domain, domain, and tld

\b([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b

 

 

Your problem will be the inconsistency's.

As in with sub domains existing or not? www is considered by regex as a sub domain

Link to post
Share on other sites

Try to expand on this

 

load html("http://www.abc.com 
123.abc.com/
www.abc.com/123.html
http://www.abc.com/123/456")
add list to list(%Domains, $find regular expression($document text, "[0-9A-Za-z]+\\.(com|net|org)"), "Delete", "Global")

Link to post
Share on other sites

Ah... very nice! That is something I can definitely work with!

 

Thanks guys!!

http://ubotstudio.com/forum/public/style_emoticons/default/smile.gif

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...