|Lesson 8||Regular expressions |
|Objective|| Regular expressions define patterns |
Use regular expressions to define patterns in command parameters
As you use UNIX commands, you will frequently want to refer to a set of files rather than to a single file.
To do this, you use a wildcard character. For example, using the asterisk wildcard character, the expression *doc can be used in a command to include
all filenames ending with doc. This is just one use of the UNIX feature called regular expressions.
Regular expressions allow complex patterns to be expressed using special characters. These patterns are used in many commands to
refer to filenames, text within files, or other pieces of data. By learning to use regular expressions, you can concisely express
exactly which file or data you want to work on.
The following are five basic codes can be used in any regular expression:
* An asterisk represents zero or more characters. It does not matter which characters.
. A period represents exactly one character. This can be any character.
[ ] Characters within brackets are treated as a list; any single character within the brackets can be used to match the
string that is being searched
^ A carat ties a pattern to the beginning of a line (used for text within files, not for filenames)
$ A dollar sign ties a pattern to the end of a line (used for text within files, not for filenames)
The examples below should help you to use each of these special characters.
Suppose you want to refer to all filenames that end with the file extension
The following pattern will list them:
A period is helpful when you know how many characters are part of the pattern you are searching for, but you do not know which ones will be used.
For example, you want to search for the word complimentary, but you think it may be spelled as complementary or even complementery in some places.
This regular expression will find all three words: compl.ment.ry
Suppose you need to match a filename that starts with the word Chapter, but may begin with an upper or lowercase C.
The following regular expression will match filenames beginning with both Chapter and chapter: [cC]hapter*
Or suppose you needed to search a file for all the lines that begin with the word Thus.
The following pattern would do the trick (with uppercase and lowercase included for good measure): ^[Tt]hus
A dollar sign ties a pattern to the end of a line of text. For example, if you are searching a list of records with a state code at the end of each line,
you could use this regular expression to search for lines ending with the Vermont state code, VT:
Often you will want to escape a special character within a regular expression to indicate that it should be
interpreted literally and not used as a special pattern indicator.
For example, to search for the character *, you
must include \* in your search expression. The backslash indicates that the following character, the asterisk, should be
interpreted literally, and not as the wildcard character that represents zero or more characters.
Obviously, regular expressions can become quite complex and even difficult to interpret at first. But they are used in many
situations, from simple
grep commands to scripting with awk, Perl, and even C programming.
The following MouseOver tooltip includes several sample regular expressions with pop-up explanations of what the expression will
match within the context of the given command.
Regular Expression - Exercise
Click the Exercise link below to practice what you have learned in the UNIX Lab.
Regular Expression - Exercise
In the next lesson, we wrap up this module.
Regular Expression - Quiz