![]() |
Foundations of Computer Science Department of Computer Science | ![]() |
egrep is used to select lines from a
file that contain strings matching a given regular expression. So,
you may use egrep to test your answer to a regular
expression exercise by first creating a file containing some subset of
all the strings defined over a given alphabet, one string per line,
then using egrep to select the lines that match your
regular expression. Looking through the resulting list will help you
decide if your regular expression is correct. If you see strings that
are not part of the language you are trying to define, obviously your
regular expression is not correct.
Take some time now to read the Unix man page for
egrep.
To help you get started, here is a file named sigmaStar.cpp
that produces all strings with length 8 or less from the language
{a,b}*, ordered by length. You can compile it by doing
% g++ -o sigmaStar sigmaStar.cppand run it by doing:
% sigmaStar | more a b aa ab ba bb aaa aab aba abb baa bab bba bbb aaaa aaaband so on.
Now let's try to specify a regular expression for the set of all
strings containing an even number of symbols. In regular set
notation, this would be {aa,ab,ba,bb}*. In the regular
expression syntax of Unix, in particular, of egrep, this
would be (aa|ab|ba|bb)*. However, this will match every
substring with an even number of symbols. We want to select lines
for which the entire string is made up of an even number of symbols.
We can specify this by telling egrep to apply the regular
expression to the entire line by adding the beginning of line
character, ^, and end of line character, $, to our regular expression
to get ^(aa|ab|ba|bb)*$.
Let's try it. We can run sigmaStar and redirect the result
into a file and apply egrep to it. Here are the steps and the result:
% sigmaStar > output % egrep '^(aa|ab|ba|bb)*$' output | more aa ab ba bb aaaa aaab aaba aabb abaa abab abba abbb baaa baab baba babb bbaa bbab bbba bbbb aaaaaa aaaaaband so on. Notice that the first one is a blank line, representing the null string which does have even length.
Now try for the odd length strings. Just start with the regular expression for even length strings and add one more symbol from our alphabet.
% egrep '^(aa|ab|ba|bb)*(a|b)$' output | more a b aaa aab aba abb baa bab bba bbb aaaaa aaaaband so on.
You can skip the step of creating a file with all the strings by
piping the output of sigmaStar directly into
egrep:
% sigmaStar | egrep '^(aa|ab|ba|bb)*$' | more aa ab ba bb aaaa aaaband so on.