Uploaded by Adithya Rajagopalan

Regular Expressions in Linux: A Comprehensive Guide

Regular Expressions
Regular Expressions
• pattern template to filter text
• A Linux utility matches the regular expression pattern against data as data flows into the
• cp, ls, chmod, pwd
• If the data matches the pattern, it’s accepted for processing
• If the data doesn’t match the pattern, it’s rejected
• The regular expression pattern makes use of wildcard characters to represent one or
more characters in the data stream
• A regular expression is implemented using a regular expression engine
• interprets regular expression patterns and uses those patterns to match text
• The Linux world has two popular regular expression engines:
■ The POSIX Basic Regular Expression (BRE) engine
■ The POSIX Extended Regular Expression (ERE) engine
Basic Regular expressions
replaces any character
matches start of string
matches end of string
matches up zero or more
times the preceding character
Represent special characters
Groups regular expressions
Matches up exactly one
Interval Regular expressions
Matches the preceding
character appearing 'n' times
Matches the preceding
character appearing 'n' times
but not more than m
{n, }
Matches the preceding
character only when it appears
Extended regular expressions
Matches one or more
occurrence of the previous
Matches zero or one
occurrence of the previous
Brace expansion
Defining BRE Patterns
• basic BRE pattern is matching text characters in a data
Plain text
Special characters
Anchor characters
Starting at the beginning
• The caret character (^) defines a pattern that starts at the beginning of a line of text
the data stream
Looking for the ending
• The dollar sign ($) special character defines the end anchor
Combining anchors
The dot character
• used to match any single character except a newline character
Character classes
ZIP code example
Special character classes
The asterisk
• Preceding character must appear zero or more times in the text
• dot with the asterisk symbol provides a pattern to match any number of any characters
• The asterisk can also be applied to a character class
Extended Regular Expressions
• The gawk program recognizes the ERE patterns, but sed editor doesn’t
• The question mark indicates that the preceding character can appear zero or one time
The question mark
• you can use the question mark symbol along with a character class:
• The plus sign is another pattern symbol that’s similar to the asterisk
• The plus sign indicates that the preceding character can appear one or more times, but
must be present at least once
Using braces
• allow you to specify a limit on a repeatable regular expression
• This is often referred to as an interval
• You can express the interval in two formats:
■ m: The regular expression appears exactly m times.
■ m,n: The regular expression appears at least m times, but no more than n times.
• By default, the gawk program doesn’t recognize regular expression intervals. You must
specify the --re-interval command line option for the gawk program to recognize regular
expression intervals.
The pipe symbol
• allows to specify two or more patterns that the regular expression engine uses in a
logical OR formula when examining the data stream
Grouping expressions
• Regular expression patterns can also be grouped by using parentheses
sed Editor Basics
• s command substitutes new text for the text in a line
• Four types of substitution fl ags are available:
■ A number, indicating the pattern occurrence for which new text should be substituted
■ g, indicating that new text should be substituted for all occurrences of the existing text
■ p, indicating that the contents of the original line should be printed
■ w file, which means to write the results of the substitution to a file
Replacing characters
Using addresses
• There are two forms of line addressing in the sed editor:
■ A numeric range of lines
■ A text pattern that filters out a line
Addressing the numeric line
$ grep Samantha /etc/passwd
$ sed '/Samantha/s!/bin/bash!/bin/csh!' /etc/passwd
Christine:x:501:501:Christine B:/home/Christine:/bin/bash
Grouping commands
Deleting lines
Inserting and appending text
■ The insert command (i) adds a new line before the specified line.
■ The append command (a) adds a new line after the specified line.
Changing lines
Printing revisited
■ The p command to print a text line
■ The equal sign (=) command to print line numbers
■ The l (lowercase L) command to list a line
Printing lines
Printing line numbers
Listing lines
The list command (l) allows you to print
both the text and nonprintable characters
Using files with sed
[address]w filename
Reading data from a file
[address]r filename
• The read command (r) allows you to insert
data contained in a separate fi le.
Regular Expressions in Action
Counting directory fi les
Validating a phone number
Parsing an e-mail address
username value can use any alphanumeric character, along with several special
■ Dot
■ Dash
■ Plus sign
■ Underscore
• The server and domain names allowing only alphanumeric characters, along with the special
■ Dot
■ Underscore