If you aren’t at all familiar with Regular Expressions, and the phrase is giving you chills, relax, it’s easier than you think. Before you dive into the rest of the lesson, here’s a simple way to explain it:
I know that doesn’t seem simple–but that phrase will find you most email addresses. Here’s how.
Let’s start at the beginning.
(\w+) simply means find any one or more character word. The \w says “Go out and find every instance of anything that is a letter, number, or underscore,” and the + means it must be at least one character in length but can be more. Putting this expression on either side of the @ symbol means find at least one word that contains one or more characters, then the @ symbol, then another word, followed by the “.” – ie, find anything that looks like “word@domain.”.
The next portion –
(\w+) – you’ve already seen. With that addition, our regular expression is looking for anything along the lines of “firstname.lastname@example.org” (or, “email@example.com”, “firstname.lastname@example.org”, etc).
But what about .co.uk addresses?
(\.\w+)* says that we’re looking for a period followed by one or more word characters. But what’s the
* after the end parentheses mean?
* means that the preceding metacharacter, literal or group can occur zero or more times. As an example,
\w\d* would match a word character followed by zero or more digits. In our example, we use parentheses to group together a series of metacharacters, so the
* applies to the whole group. So, you can interpret
(\.\w+)* as ‘match a period followed by one or more word characters, and match that combination zero or more times’. The goal here is that not all email addresses have a .co.uk ending–but some do. So using the \.\w+ expression, we can find email addresses that end in .com, net.au, etc.
The UBot Studio RegEx Builder includes common expressions like this one, to make this sort of thing easy for everyone. But for now, let’s continue on to the lessons get a more in-depth understanding of RegEx.