Regular Expressions (REGEX)
Regular Expressions (REGEX) are special text sequences that describe patterns to be found in text. With Regular Expressions, you can find substrings in a text field that match the desired pattern.
In the pattern, you can specify specific characters (for example, a or b), or wildcards representing a range of possible characters. The most commonly used wildcard types include:
- Word character - \w to indicate a alphanumeric character or a underscore (\W for the opposite)
- Digit character - \d to indicate a numeric character (\D for the opposite)
- White space character - \s to indicate any white space including space, tab, form-feed, etc. (\S for the opposite)
Instead of wildcards, you can specify a character set [ ] using brackets to include all the characters inside the brackets. For example:
- [a-z] matches all lowercase characters from a to z
- [A-Z] matches all uppercase characters from A to Z
- [abc] matches the characters a, b, and c
You can specify how many times a character, wildcard, or character set appears by nesting an exact count number within curly brackets { }, or by specifying an occurrence modifier:
- + for one or more times
- * for zero or more times
- ? for zero or one time (i.e. optional)
The position in the target text of the desired pattern can be specified using positional characters in the regular expression. The most commonly used ones are:
- ^ for start of the string (or line)
- $ for end of the string (or line)
- \b for word boundary
You can use the pipe character | to define expressions that can match one of two possible expressions, and you can use parentheses ( ) to define groups in an expression.
Many free online resources are available to learn more about Regular Expressions; the following two are particularly informative:
- http://regexone.com - offers an interactive tutorial for trying out exercises as you learn
- http://regexstorm.net/tester - provides an interactive testing environment that allows you to write a regular expression and a test input string, and then see what substrings match the typed expression

Meta-characters |
|
Character |
Meaning |
\ |
Marks the next character as a special or escapes a literal |
^ |
Line start |
$ |
Line End |
. |
Any character except newline |
* |
Match 0 or more times |
+ |
Match 1 or more times |
? |
Match 0 or 1 times (or minimal matching) |
| |
Alternative |
( ) |
Grouping |
[ ] |
Set of characters |
{ } |
Repetition modifier |
Repetition |
|
Combination |
Meaning |
a* |
Zero or more a’s |
a+ |
One or more a’s |
a? |
Optional a |
a{m} |
Exactly m a’s |
a{m,} |
At least m a’s |
a{m,n} |
At least m but at most n a’s |
Matching Special Single Characters |
|
Combination |
Meaning |
\t |
Tab character |
\n |
New line character |
\r |
Carriage return character |
\unnnn |
Matches the Unicode character represented by the hexadecimal nnnn |
Boundaries |
|
Combination |
Meaning |
\b |
Must match at a word boundary |
\B |
Must NOT match at a word boundary |
\A |
Must match at the string beginning |
\Z |
Must match at the string end |
Wildcards |
|
Combination |
Meaning |
\w |
Any word character (Alphanumeric or underscore) |
\W |
Any non-word character |
\s |
Any whitespace character (space, tab, etc.) |
\S |
Any non-whitespace character |
\d |
Any digit (Character 0 to 9) |
\D |
Any non-digit |
Character Sets |
|
Combination |
Meaning |
[characters] |
Any of the characters in the sequence |
[x-y] |
Any of the characters from x to y in the ASCII code |
[\-] |
The hyphen character |
[^characters] |
Any character but the ones following ^ |

Objective |
Input String |
Regular Expression |
Matched Strings |
Matching all words (i.e. ignore white spaces and special characters) |
Chicago IL ? 60608 |
\w+ |
Chicago IL ? 60608 |
Matching the first word (i.e. ignore white spaces and special characters) |
Chicago IL ? 60608 |
^\w+ |
Chicago IL ? 60608 |
Matching the last word (i.e. ignore white spaces and special characters) |
Chicago IL ? 60608 |
\w+$ |
Chicago IL ? 60608 |
Matching all words with alphabetic characters only |
Chicago IL ? 60608 |
[a-zA-Z]+ |
Chicago IL ? 60608 |
Matching 2 upper case characters |
Chicago IL ? 60608 |
[A-Z]{2} |
Chicago IL ? 60608 |
Matching 2 isolated characters (upper or lower case) |
Chicago iL ? 60608 |
\b[a-zA-Z]{2}\b |
Chicago iL ? 60608 |
Match sequence of digits of a fixed length Generic Postcodes including US postal codes |
Chicago IL ? 60608 |
\d{5} (or [0-9]{5}) |
Chicago IL ? 60608 |
Matching a decimal number (comma or decimal point) Latitude and Longitude |
Latitude:51.5161025 Longitude: 51,5161025 |
\d+(\.|,)\d+ |
Latitude:51.5161025 Longitude: 51,5161025 |
Matching a mix of fixed characters and variable Product dimensions |
Product A H:10L:0.8D:3.4 |
H:\d+\.?\d*L:\d+\.?\d*D:\d+\.?\d* |
Product A H:10L:0.8D:3.4 |
Matching character sequences. Units of measure |
1000ml or 1l or 100cl or 33.814oz |
ml|l|cl|oz |
1000ml or 1l or 100cl or 33.814oz |
Last modified: Thursday December 19, 2024