Uploaded by Adithya Rajagopalan

regular expressions

advertisement
Regular Expressions
Regular Expressions
• pattern template to filter text
• A Linux utility matches the regular expression pattern against data as data flows into the
utility
• cp, ls, chmod, pwd
• If the data matches the pattern, it’s accepted for processing
• If the data doesn’t match the pattern, it’s rejected
• The regular expression pattern makes use of wildcard characters to represent one or
more characters in the data stream
• A regular expression is implemented using a regular expression engine
• interprets regular expression patterns and uses those patterns to match text
• The Linux world has two popular regular expression engines:
■ The POSIX Basic Regular Expression (BRE) engine
■ The POSIX Extended Regular Expression (ERE) engine
Basic Regular expressions
Symbol
Descriptions
.
replaces any character
^
matches start of string
$
matches end of string
*
matches up zero or more
times the preceding character
\
Represent special characters
()
Groups regular expressions
?
Matches up exactly one
Interval Regular expressions
Expression
Description
{n}
Matches the preceding
character appearing 'n' times
exactly
{n,m}
Matches the preceding
character appearing 'n' times
but not more than m
{n, }
Matches the preceding
character only when it appears
Extended regular expressions
Expression
Description
\+
Matches one or more
occurrence of the previous
character
\?
Matches zero or one
occurrence of the previous
character
Brace expansion
Defining BRE Patterns
• basic BRE pattern is matching text characters in a data
stream.
Plain text
Special characters
Anchor characters
Starting at the beginning
• The caret character (^) defines a pattern that starts at the beginning of a line of text
in
the data stream
Looking for the ending
• The dollar sign ($) special character defines the end anchor
Combining anchors
The dot character
• used to match any single character except a newline character
Character classes
ZIP code example
Special character classes
The asterisk
• Preceding character must appear zero or more times in the text
• dot with the asterisk symbol provides a pattern to match any number of any characters
• The asterisk can also be applied to a character class
Extended Regular Expressions
• The gawk program recognizes the ERE patterns, but sed editor doesn’t
• The question mark indicates that the preceding character can appear zero or one time
The question mark
• you can use the question mark symbol along with a character class:
• The plus sign is another pattern symbol that’s similar to the asterisk
• The plus sign indicates that the preceding character can appear one or more times, but
must be present at least once
Using braces
• allow you to specify a limit on a repeatable regular expression
• This is often referred to as an interval
• You can express the interval in two formats:
■ m: The regular expression appears exactly m times.
■ m,n: The regular expression appears at least m times, but no more than n times.
• By default, the gawk program doesn’t recognize regular expression intervals. You must
specify the --re-interval command line option for the gawk program to recognize regular
expression intervals.
The pipe symbol
• allows to specify two or more patterns that the regular expression engine uses in a
logical OR formula when examining the data stream
Grouping expressions
• Regular expression patterns can also be grouped by using parentheses
sed Editor Basics
• s command substitutes new text for the text in a line
• Four types of substitution fl ags are available:
■ A number, indicating the pattern occurrence for which new text should be substituted
■ g, indicating that new text should be substituted for all occurrences of the existing text
■ p, indicating that the contents of the original line should be printed
■ w file, which means to write the results of the substitution to a file
Replacing characters
Using addresses
• There are two forms of line addressing in the sed editor:
■ A numeric range of lines
■ A text pattern that filters out a line
Addressing the numeric line
$ grep Samantha /etc/passwd
Samantha:x:502:502::/home/Samantha:/bin/bash
$
$ sed '/Samantha/s!/bin/bash!/bin/csh!' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
[...]
Christine:x:501:501:Christine B:/home/Christine:/bin/bash
Samantha:x:502:502::/home/Samantha:/bin/csh
Timothy:x:503:503::/home/Timothy:/bin/bash
$
Grouping commands
Deleting lines
Inserting and appending text
■ The insert command (i) adds a new line before the specified line.
■ The append command (a) adds a new line after the specified line.
Changing lines
Transforming
characters
[address]y/inchars/outchars/
Printing revisited
■ The p command to print a text line
■ The equal sign (=) command to print line numbers
■ The l (lowercase L) command to list a line
Printing lines
Printing line numbers
Listing lines
The list command (l) allows you to print
both the text and nonprintable characters
Using files with sed
[address]w filename
Reading data from a file
[address]r filename
• The read command (r) allows you to insert
data contained in a separate fi le.
Regular Expressions in Action
Counting directory fi les
Validating a phone number
Parsing an e-mail address
username@hostname
username value can use any alphanumeric character, along with several special
characters:
■ Dot
■ Dash
■ Plus sign
■ Underscore
• The server and domain names allowing only alphanumeric characters, along with the special
characters
■ Dot
■ Underscore
Download