Regular Expressions (REGEX)

Regular Expressions (REGEX) are special text sequences that describe patterns to be found in text. With Regular Expressions, you can find substrings in a text field that match the desired pattern.

In the pattern, you can specify specific characters (for example, a or b), or wildcards representing a range of possible characters. The most commonly used wildcard types include:

  • Word character - \w to indicate a alphanumeric character or a underscore (\W for the opposite)
  • Digit character - \d to indicate a numeric character (\D for the opposite)
  • White space character - \s to indicate any white space including space, tab, form-feed, etc. (\S for the opposite)

Instead of wildcards, you can specify a character set [ ] using brackets to include all the characters inside the brackets. For example:

  • [a-z] matches all lowercase characters from a to z
  • [A-Z] matches all uppercase characters from A to Z
  • [abc] matches the characters a, b, and c

You can specify how many times a character, wildcard, or character set appears by nesting an exact count number within curly brackets { }, or by specifying an occurrence modifier:

  • + for one or more times
  • * for zero or more times
  • ? for zero or one time (i.e. optional)

The position in the target text of the desired pattern can be specified using positional characters in the regular expression. The most commonly used ones are:

  • ^ for start of the string (or line)
  • $ for end of the string (or line)
  • \b for word boundary

You can use the pipe character | to define expressions that can match one of two possible expressions, and you can use parentheses ( ) to define groups in an expression.

Many free online resources are available to learn more about Regular Expressions; the following two are particularly informative:

  • http://regexone.com - offers an interactive tutorial for trying out exercises as you learn
  • http://regexstorm.net/tester - provides an interactive testing environment that allows you to write a regular expression and a test input string, and then see what substrings match the typed expression

Last modified: Thursday December 19, 2024

Is this useful?