Demo 2.ppt

advertisement
CS 497C – Introduction to UNIX
Lecture 31: - Filters Using Regular
Expressions – grep and sed
Chin-Chih Chang
chang@cs.twsu.edu
Substitution
• sed’s strongest feature is substitution,
achieved with its s (substitute) command.
• It has the following format:
[address]s/expression1/string2/flag
• This is how you replace the | with a colon:
$ sed ‘s/|/:/g’ emp.lst | head -2
• To check whether substitution is performed,
you can use the cmp command as follows:
$ sed ‘s/|/:/g’ emp.lst | cmp -l - emp.lst | wc -l
Substitution
• You can perform multiple substitutions with
one invocation of sed by pressing [Enter] at
the end of each instruction, and then close
the quote at the end:
$ sed ‘s/<I>/<EM>/g
> s/<B>/<STRONG>/g’ form.html
• You can compress multiple spaces as below:
$ sed ‘s^ *|^|^g’ emp.lst | head -2
Substitution
sed ‘/dirctor/s/director/member/’ emp.lst
sed ‘/dirctor/s//member/’ emp.lst
• The above command suggests that sed
‘remembers’ the scanned pattern, and stores
it in // (2 frontslashes).
• The // representing an empty (or null)
regular expression is interpreted to mean
that the search and substituted patterns are
the same. This is called the remembered
pattern.
Substitution
• When a pattern in the source string also
occurs in the replaced string, you can use
the special character & to represent it.
sed ‘s/director/executive director/’ emp.lst
sed ‘s/director/executive &/’ emp.lst
• These two commands are same. The &,
known as the repeated pattern, expands to
the entire source string.
Regular Expressions
• The interval regular expression (IRE)
uses the escaped pair of curly braces {}
with a single or a pair of numbers between
them.
• We can use this sequence to display files
which have write permission set for group:
$ ls -l | grep “^.\{5\}w”
• The regular expression ^.\{5\}w matches
five characters (.\{5\}) at the beginning (^)
of the line, followed by the pattern (w).
Regular Expressions
• The \{5\} signifies that the previous
character (.) has to occur five times. The .
(dot) character is used to match any
character.
• The IRE has three forms:
– ch\{m\} – The metacharacter ch can occur m
times.
– ch\{m,n\} – ch can occur between m and n
times.
– ch\{m,\} – ch can occur at least m times.
Regular Expressions
• We can display the listing for those files that
have the write bit set either for group or
others:
$ ls –l | grep “^.\{5,8\}w”
• To locate the people born in 1945 in the
sample database, use sed as follows:
$ sed –n ‘/^.\{49\}45/p’ emp.lst
• The tagged regular expression (TRE) uses
\( and \) to enclose a pattern.
Regular Expressions
• Suppose you want to replace the words
John Wayne by Wayne, John. The sed
substitution instruction will then look like
this:
$ echo “John Wayne” | sed ‘s/\(John\) \(Wayne\)/\2, \1/’
• Because the TRE remembers a grouped
pattern, you can look for these repeated
words like this:
$ grep “\[a-z][a-z][a-z]*\) *\1” note
Regular Expressions
• These are pattern matching options used by
grep, sed, and perl (Page 441):
– abc : match the character string “abc”.
– * : zero or more occurrences of previous
character.
– . : match any character except newline.
– .* : nothing or any number of characters.
– a? : match zero or one instance “a”.
– a* : match zero or more repetitions of “a”.
Regular Expressions
– [abcde] : match any character within the
brackets.
– [a-b] : match any character within the range a
to b.
– [^abcde] : match any character except those
within the brackets.
– [^a-b] : match any character except those in the
range a to b.
– ^ : match beginning of line, e.g., /^#/.
– ^$ : lines containing nothing.
Regular Expressions
– $ : match end of line, e.g., /money.$/.
– a\{2\} : match exactly two repetitions of “a”.
– a\{4,\} : match four or more repetitions of “a”.
– a\{2, 4\} : match between two and four
repetitions of “a”.
– \(exp\): expression exp for later referencing
with \1, \2, etc.
– a|b : match a or b.
Download