Distributed Networks Distributed Networks

Unix Shell Scripts   «Prev  Next»
Lesson 8Regular expressions
Objective Regular expressions define patterns

Use regular expressions to define patterns in command parameters

As you use UNIX commands, you will frequently want to refer to a set of files rather than to a single file. To do this, you use a wildcard character. For example, using the asterisk wildcard character, the expression *doc can be used in a command to include all filenames ending with “doc.” This is just one use of the UNIX feature called regular expressions.
Regular expressions allow complex patterns to be expressed using special characters. These patterns are used in many commands to refer to filenames, text within files, or other pieces of data. By learning to use regular expressions, you can concisely express exactly which file or data you want to work on.
The following are five basic codes can be used in any regular expression:
  1. * An asterisk represents zero or more characters. It does not matter which characters.
  2. . A period represents exactly one character. This can be any character.
  3. [ ] Characters within brackets are treated as a list; any single character within the brackets can be used to match the string that is being searched
  4. ^ A carat ties a pattern to the beginning of a line (used for text within files, not for filenames)
  5. $ A dollar sign ties a pattern to the end of a line (used for text within files, not for filenames)

The examples below should help you to use each of these special characters.
Suppose you want to refer to all filenames that end with the file extension tgz. The following pattern will list them:
A period is helpful when you know how many characters are part of the pattern you are searching for, but you do not know which ones will be used. For example, you want to search for the word “complimentary,” but you think it may be spelled as complementary or even complementery in some places. This regular expression will find all three words: compl.ment.ry
Suppose you need to match a filename that starts with the word Chapter, but may begin with an upper or lowercase “C.” The following regular expression will match filenames beginning with both Chapter and chapter: [cC]hapter*
Or suppose you needed to search a file for all the lines that begin with the word “Thus.” The following pattern would do the trick (with uppercase and lowercase included for good measure): ^[Tt]hus
A dollar sign ties a pattern to the end of a line of text. For example, if you are searching a list of records with a state code at the end of each line, you could use this regular expression to search for lines ending with the Vermont state code, VT: VT$
Often you will want to “escape” a special character within a regular expression to indicate that it should be interpreted literally and not used as a special pattern indicator. For example, to search for the character “*”, you must include \* in your search expression. The backslash indicates that the following character, the asterisk, should be interpreted literally, and not as the wildcard character that represents zero or more characters.
Obviously, regular expressions can become quite complex and even difficult to interpret at first. But they are used in many situations, from simple ls and grep commands to scripting with awk, Perl, and even C programming.
The following MouseOver tooltip includes several sample regular expressions with pop-up explanations of what the expression will match within the context of the given command.

Regular Expresssion
Regular Expression - Exercise
Click the Exercise link below to practice what you have learned in the UNIX Lab.
Regular Expression - Exercise
In the next lesson, we wrap up this module.

Regular Expression - Quiz
Click on the Quiz link below to test yourself on using regular expressions.
Regular Expression - Quiz