Using grep with regular expressions
This module covers regular expression syntax, which is used for performing wildcard searches of the contents of files.
If you have used a word processing application on another operating system, you are probably familiar with the concept of regular expressions. Many UNIX programs that manipulate or search for text can take advantage of regular expressions.
This module will cover regular expressions as used by
grep, a program that is commonly used to search the contents of files.
By the end of this module, you will be able to: In the next lesson, you will learn about regular expressions.
- Describe what regular expressions are
- Identify how the use of quotes affects the interpretation of regular expressions.
- Create regular expressions using various metacharacters.
- Use the backslash (
\) to disable the special meaning of a character
For users already familiar with the concept of regular expression metacharacters, this section may be bypassed. However, this preliminary material is crucial to understanding the variety of ways in which grep,
sed, and awk are used to display and manipulate data. What is a regular expression? A regular expression is just a pattern of characters used to match the same characters in a search. In most programs, a regular expression is enclosed in forward slashes;
/love/ is a regular expression delimited by forward slashes, and the pattern love will be matched any time the same pattern is found in the line being searched. What makes regular expressions interesting is that they can be controlled by special metacharacters.
If you are new to the idea of regular expressions, let us look at an example that will help you understand what this whole concept is about.
You must understand what the following predefined character classes do:
. Any character (may or may not match line terminators) (Depending on ?m)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
\b A word boundary
\B A non-word boundary