Sheet10.doc

advertisement
CST334
Tutorial Sheet 10
Chapter 9
Regular Expressions
Summary of Common Regular Expressions
Character Name
Meaning
.
[…]
[^…]
dot
character class
negated character class
any one character
any character listed
any character not listed
^
$
\<
\>
?
*
+
\{m,n\}
|
(…)
caret
dollar
backslash less-than
backslash greater-than
question mark
asterisk or star
plus sign
repetition
bar, or
parenthesis
position at start of line
position at end of line
position at beginning of word
position at end of word
matches zero or one occurence
matches zero or more occurrences
matches one or more occurrences
matches m to n occurrences or \{m\}
matches either expression it separates
limits scope of | or encloses
subexpressions for backreferencing
Matches text previously matched within
first, second, etc set of parenthesis
\1, \2, …
backreference
Examples
Variable names in C
[a-zA-Z_][a-zA-Z_0-9]*
Dollar amount with optional cents
\$[0-9]+(\.[0-9][0-9])?
Time of day
(1[012]|[1-9]):[0-5][0-9] (am|pm)
Hands On
Copy directory regexp to your home directory and go into it
How many lines in entire file speech?
What words or patterns do these match?
a) grep ‘w’ speech
c) grep ‘we’ speech
e) grep ‘we*’ speech
g) grep '\([a-z]\)\1' speech
i) egrep '\<([a-zA-Z]+) +\1\>' speech
test a regular expression using echo:
cat speech | wc -l
b) grep ‘^w’ speech
d) grep ‘\<we\>’ speech
f) grep ‘w..[lk]’ speech
h) egrep '\<the +the\>' speech
echo aa | grep '\([a-z]\)\1'
Also try out examples in Lecture28.ppt and Lecture31.ppt
Using sed to substitute an expression with another
sed 's/searchPattern/replacePattern/g' fileToProcess > fileToStoreResults
You don’t even need to use vi or emacs:
sed 's/war/struggle/g' 2003-02-28.txt > 2003-02-28mod.txt
This uses sed to modify a series of hyperlink pathnames to make them refer to new locations
(such as if directory '~mpc01c..' and subdirectories were moved to
~tom/public_html/csis1S06/mpc01c..'
sed 's/mpc01c../tom\/csis1S06\/&\/public_html/' WebPagesS06.htm > ArchWebS06.htm
Search
regexp
Replace
pattern
& = Matched
Pattern
(like
mpc01c13)
Another cool trick (not Reg Exp)
How to rename all *.html files to *.htm ?
WRONG:
RIGHT:
mv *.html *.htm
rename .html .htm *.html
Download