grep
Prepared by : Mahesh Yadav Gaddam Editor: Pavan Devarakonda
The grep command
This is an article on grep command might helpful for system administrators (Middleware Admins, DBA, WLA, WAS Admin), developers, and others who want to learn more about most handy UNIX command. The UNIX most powerful command grep comes from the ed (UNIX text editor) search command, it stands for “global regular expression print” or g/re/p.
Sometimes, while working with the powerful grep command felt that it is like 'Trishul' in the hands of System Administrator. Lets have more understanding on grep command in this article.
This was such a useful command that it was written as a standalone utility
There are two other variants, egrep and fgrep that comprise the grep family
grep is the answer to the moments where you know you want the file that contains a specific phrase but you can’t remember its name
The grep command Family Differences
grep - uses regular expressions for pattern matching
fgrep - file grep, fgrep will search one or more files for a given lines that match the specified text string. You can handle in scripting with the 'Exit status' is 0 if any lines match, 1 if not, and 2 for errors. fgrep is faster than normal grep searches, but less flexible: it can only find fixed text, no support for regular expressions.
egrep - extended grep, uses a more powerful set of regular expressions but does not support backreferencing, generally the fastest member of the grep family
agrep – approximate grep; not standard
Syntax differences
Regular expression concepts we have seen so far are common to grep and egrep.
grep and egrep have slightly different syntax
grep: BREs
egrep: EREs (enhanced features we will discuss)
Major syntax differences:
grep: \( and \), \{ and \}
egrep: ( and ), { and }
Protecting Regex Meta characters
Since many of the special characters used in regexs also have special meaning to the shell, it’s a good idea to get in the habit of single quoting your regexs
This will protect any special characters from being operated on by the shell
If you habitually do it, you won’t have to worry about when it is necessary
Escaping Special Characters
Even though we are single quoting our regexs so the shell won’t interpret the special characters, some characters are special to grep (eg * and .)
To get literal characters, we escape the character with a \ (backslash)
Suppose we want to search for the character sequence a*b*
Unless we do something special, this will match zero or more ‘a’s followed by zero or more ‘b’s, not what we want
a\*b\* will fix this - now the asterisks are treated as regular characters
egrep: Alternation
Regex also provides an alternation character | for matching one or another sub-expression
(T|Fl)an will match ‘Tan’ or ‘Flan’
^(From|Subject): will match the From and Subject lines of a typical email message
It matches a beginning of line followed by either the characters ‘From’ or ‘Subject’ followed by a ‘:’
Subexpressions are used to limit the scope of the alternation
At(ten|nine)tion then matches “Attention” or “Atninetion”, not “Atten” or “ninetion” as would happen without the parenthesis - Atten|ninetion
egrep: Repetition Shorthands
The * (star) has already been seen to specify zero or more occurrences of the immediately preceding character
+ (plus) means “one or more”
abc+d will match ‘abcd’, ‘abccd’, or ‘abccccccd’ but will not match ‘abd’
Equivalent to {1,}
The ‘?’ (question mark) specifies an optional character, the single character that immediately precedes it
July? will match ‘Jul’ or ‘July’
Equivalent to {0,1}
Also equivalent to (Jul|July)
The *, ?, and + are known as quantifiers because they specify the quantity of a match
Quantifiers can also be used with subexpressions
(a*c)+ will match ‘c’, ‘ac’, ‘aac’ or ‘aacaacac’ but will not match ‘a’ or a blank line
grep: Back-references
Sometimes it is handy to be able to refer to a match that was made earlier in a regex
This is done using backreferences
\n is the backreference specifier, where n is a number
Looks for nth subexpression
For example, to find if the first word of a line is the same as the last:
^\([[:alpha:]]\{1,\}\) .* \1$
The \([[:alpha:]]\{1,\}\) matches 1 or more letters
Practical Regex Examples
Variable names in C
[a-zA-Z_][a-zA-Z_0-9]*
You may need to search for Dollar amount with optional cents. The expression could be as follows
\$[0-9]+(\.[0-9][0-9])?
Similarly there could be need of search for Time of day
(1[012]|[1-9]):[0-5][0-9] (am|pm)
Some times HTML tags such as headers <h1> <H1> <h2> …
<[hH][1-4]>
The grep Family
Syntax
grep [-hilnv] [-e expression] [filename]
egrep [-hilnv] [-e expression] [-f filename] [expression] [filename]
fgrep [-hilnxv] [-e string] [-f filename] [string] [filename]
-h Do not display filenames
-i Ignore case
-l List only filenames containing matching lines
-n Precede each matching line with its line number
-v Negate matches
-x exact Match whole line only (fgrep only)
-e expression Specify expression as option
-f filename Take the regular expression (egrep) or a list of strings (fgrep) from filename
The grep Examples: Fun with the Dictionary
/usr/dict/words contains about 25,000 words
egrep hh /usr/dict/words
beachhead
highhanded
withheld
withhold
egrep as a simple spelling checker: Specify plausible alternatives you know
egrep "n(ie|ei)ther" /usr/dict/words
neither
How many words have 3 a’s one letter apart?
egrep a.a.a /usr/dict/words | wc –l
54
egrep u.u.u /usr/dict/words
cumulus
Other Notes
Use /dev/null as an extra file name
Will print the name of the file that matched
grep test bigfile
This is a test.
grep test /dev/null bigfile
bigfile:This is a test.
Return code of grep is useful
grep fred filename > /dev/null && rm filename
Good references:
1. Wiki grep