D'Arcy J.M. Cain

Regular Expressions

Many programs use what is known as Regular Expressions to specify search criteria. Regular Expressions are similar to wildcards as used in Unix and DOS but are more general.

Different programs have slightly different versions of Regular Expressions so there may be some differences between what is described here and what your application expects. RTFM and caveat emptor. Furthermore, I have tried to keep this as simple as possible. The idea is not to be a full tutorial on Regular Expressions but rather a basic introduction to allow you to start using it. Further reading is suggested to see the full power Regular Expressions.

First of all, a Regular Expressions is commonly shortened to RE and we will use that convention here. An RE is a string of characters, some with special meaning. The characters with special meaning can be escaped with a backslash (\) and two backslashes reduce to a single one. A character not in the special set simply matches itself. For example, the string "hello" matches "hello" as a trivial case. Note that it also matches "XhelloX" but not "Hello." The power of REs comes from the use of the special characters. Here are the special characters and their meanings. Examples follow.

. Matches any single characer except newline.
* Matches zero or more of the preceding character.
+ Matches one or more of the preceding character.
^ Only special at the start of an RE. This matches the beginning of a string thus allowing you to "anchor" the string at the beginning.
$ Like '^' but anchors to the end of the string and is only special as the last character of an RE.
[string] This pattern matches any single character within the string. The '^' character has a special meaning if it is the first character in the brackets. It reverses the meaning so that this pattern matches any single character not in the string. You can also have ranges within square brackets as shown in the examples below. The characters '$', '*' and '.' are not special within square brackets.

Examples

RE Explanation
a.b This RE matches any string which contains the letter 'a' followed by any character followed by the letter 'b'. For example; "aab", "aaab", "azb" and "aaabbb."
a*b This RE matches any string which contains zero or more instances of the letter 'a' followed by the letter 'b' such as "aaab", "b" and "zb."
a+b This RE matches any string which contains one or more instances of the letter 'a' followed by the letter 'b' such as "aaab", "ab" and "zabz."
^abc This will match any string that starts with the substring "abc" such as "abc" or "abcdef."
^a.b$ This will match any string consisting of exactly three letters with the first being 'a' and the last being 'b'.
^a.*b$ This will match any string that starts with the letter 'a' and ends with the letter 'b' such as "ab", "aaabbb" or "axb."
^abc.*[0-9]$ Matches a string starting with "abc" and ending in a digit
^[^a-z][a-z]\$ This matches a string where the first character is NOT a lower case letter, the second IS a lower case letter and the third is a dollar sign