Jump to content
UBot Underground

Recommended Posts

Hey everyone. After a discussion yesterday in another thread I was encouraged to start a topic regarding regular expressions. I think it's a good idea! This thread will serve as "Everything Regex" and is open to everything regex related.

 

So...Let me just say...I am not a true expert in regex, but I definitely know enough to get by. Therefore this thread can be used to ask questions, post tips, tricks, etc and anything else that will help the community learn everything they need to know to take advantage of the great powers known as..."Regex"!

 

So I will start this off by providing a regex string to scrape email addresses. The original string was created by Kreatus (no pun intended) and I made a small modification to address the problem he was experiencing with it. (Here is the original thread: http://ubotstudio.co...h__1#entry31204)

 

Here is the string:

 

(\w+(\s|)@(\s|)[a-zA-Z_]+?\.[a-zA-Z]{2,3})

 

(You can see it in action here: http://www.rubular.com/r/nidQpOizwC )

 

Now let's break it down into it's parts.

 

1) \w+ refers to a word character (which is primarily letters and digits) from 1 to unlimited times

 

2) (\s|) refers to a whitespace OR nothing (the vertical pipe indicates a choice between the two). When there is nothing on either side of the pipe it means it is optional, therefore a space does not need to be found to continue.

 

3) @ refers to the literal symbol "@"

 

4) (\s|) [see #2]

 

5) [a-zA-Z_]+? refers to an character between a-z and/or A-Z between one and unlimited times (as few times as possible)

 

6) \. refers to the literal character "."

 

7) [a-zA-Z]{2,3} - see #5. The difference is this time it is only looking for 2 or three characters instead of unlimited

 

 

The modification was #2 since some sites list email addresses with spaces: "name @ site.com"

 

So that is the first entry...please feel free to add your two cents or ask any questions along the way!

 

 

John

 

 

 

 

PS You should be aware that learning regex will exponentially increase your Ubotting abilities and power! You can go to a great site to learn more (Thanks Lilly!)...http://www.rubular.com/

 

And I highly recommend you watch the videos posted below. Seriously!

  • Like 4
Link to post
Share on other sites

I was going to copy this here, but the whole page actually has a lot of good information regarding what regex is...It is the definition page in Wikipedia and it's pretty straightforward and easy to understand...have a look:

 

http://en.wikipedia.org/wiki/Regular_expression

 

John

Link to post
Share on other sites

I want to add this:

 

If you have never heard of the software Regex Buddy 3, I am here to tell you it's amazing. It can be used in conjunction with Edit Pad Pro, but truthfully I only use it by itself. So here's the thing...when you build a regex string in Regex Buddy you can export it with explanations...I do exactly that and I place them on my website. The cool thing is you can highlight an explanation point and it highlights the part of the string it refers to, and vice-versa. If you go to http://learnubot.com you can see the 4 examples I have put up.

 

The point of this post is that if you want a regex string for the purpose of learning how it's structured, just let me know and I will create one for you with explanations. Regex Buddy 3 is kind of like UBot. It won't build it for you, but it provides everything you need to do it yourself (I stand corrected...it actually has a library of pre-made strings you can use also).

Link to post
Share on other sites

Is there any documentation on the replace regex command? I can only find info on the find regex one.

 

Thanks.

Link to post
Share on other sites
Guest
This topic is now closed to further replies.
×
×
  • Create New...