Regular Expressions
1
Regular Expressions are used for pattern matching.
$scalarName = ”This is a line with pattern in it”.
Matching:
• if ($scalarName =~ m” pattern ”) { ….
- evaluates to true (1)
Substituting or Replacing:
• $line =~ s ”patternA”patternB”; # searches for patternA in $line and replaces it with patternB.
Translating
• $scalarName =~ tr/A-Z/a-z/
2
Regular Expressions Match only on scalars
#!/usr/local/bin/perl
$name = "Smith"; if ($name =~ m"it"){ print "yes\n";}
$name =~ s/S/s/; print "$name\n";
#substitution
$pattern = 'abc';
$a = 'We start with abcdef and more abcdef';
$status = ($a =~ /abc/);
# Double quoted strings can be used
$status = ($a =~ "abc"); print "$status\n"; # 1
$browser = $ENV{‘HTTP_USE_AGENT’};
If ($browser =~m/Mozilla/){….}
3
$pattern = 'abc';
$a = 'We start with abcdef and more abcdef';
$b = 'bcdefjl';
$status = ($a =~ s/abc/ABC/); print "$status\n"; if ($a =~ m/abc/) { print "match\n";
} if ($b !~ m/abc/){ print "No match\n";
} if ($a =~ m/${pattern}/){
#Variable interpolation print "match\n";
}
4
Regular Expressions can be used to match against the values in the special variable, $_ without using !~ =~
Example: my @elements = (‘a1’,’a2’,’a3’); foreach (@elements) { s/a/b); }
#the special variable $_ will contain the elements from the list,
# one at a time through the iterations.
# The elements will be b1, b2 and b3 because of substitution.
while (<FD>) { print if m/ERROR/); } # prints the line that contains
ERROR
$text = “abcdef”;
$text =~ m/a/ #does it match character a, return true if it does
$text !~ m/a/ #does it match character a, return false if it does
$text =~ s/a/A # substitute A for a and return true if it happens
$text !~ s/a/A # substitute A for a and return false if it happens
5
• . (period) – matches any single character.
• Example:
• [] – matches any one character in the range of characters given
• Examples:
$level = "12345"; if ( $level =~ /^[1-9]/ ) {…} # begins with 1-9 if ( $level =~ /[^1-9]/ ){print ":ok";} # Negated class. if level does not contain any from 1-9
$level = “123450”; if ( $level =~ /[^1-9]/ ){print "ok";}# what is the output?
#Matching on the word boundary
$line = "A rugged rug";
$line =~ s/\brug\b/ RUG/; print $line; # A rugged RUG
6
Example: Grouping and quantifying
$line = “tootootoo"; if ($line =~ m/((too){2})/){ print $1, "\n";
}
my @vars = ('a1','a2','a3'); foreach (@vars) { s/a/b/;} print "@vars\n";
7
•
\d – matches a digit
•
\D – matches a non digit
•
\w – matches a word character(an upper case, or lower case letter, a digit or an underscore)
•
\s – matches a space character
•
\S – matches a non-space character
•
\b – require an element to appear at the beginning or end of a word
•
\B - require an element not to appear at the beginning or end of a word
•
^ - beginning of the string
•
$ - end of the string
•
* - zero or more
•
+ - one or more
•
? – zero or one time
•
{ X } – match ‘X’ times
•
{X, } – match ‘X’ or more times
•
{ X, Y} – match X to Y times
• Alternation - | match one or more patterns.
8
Regular Expressions Match only on scalars
Use grep to match array elements (More Examples on grep later)
@array = ('one','two','ton');
$n = grep (m/on/,@array)
9
A Regular Expression matches the earliest possible match of a given pattern. By default, it only matches or replaces a given regular expression once.
Example:
$variable = “A crazy horse jumped over a crazy fox”
$pattern = “crazy fox”
• If a partial match is found, it “backtracks” to the least possible amount in the string and starts matching again.
10
Regular Expressions can take ANY and ALL characters that doublequoted strings can.
Examples:
$name = "John Smith";
$line = " John Smith is the author of this book."; if ($line =~m/author of/){ print "Author\n“ } if ($line =~m/${name} is/){ print "match\n"} #matches
John Smith is
•
If there are special characters in the pattern, use either back slashes or double quotes.
Example:
$path =~m”usr/local/bin”
• Use the function quotemeta() to to automatically backslash things.
Example:
$pattern = “({“
$variable =~m/”$pattern” is same as saying,
$variable =~m/”({“ # this will cause runtime error because
({ are special characters.
$variable = quotemeta (“{(“); # will make the pattern =
\{\(
11
A Regular Expression creates two things in the process of being evaluated: result status and back references.
Result Status: is an indication of how many times a given regular expression matched your string.
$line = " round and round the rugged rock the rascal ran";
$status = ($line =~ m"round"); print "$status\n"; #1
$matches = ($line =~ s"round"Round"); print "Matches = $matches\n"; #1
$matches = ($line =~ s "round"Round“ g ); print "Matches = $matches\n"; #2
12
Back References will enable you to save some of the matches for later use. The symbols that you want to match are enclosed in ().
Example:
$line = "round and round the rugged rock the rascal ran";
$matches = ($line =~ m"(round) the (rugged)"); print "Matches = $matches\n";# 1 print $1 # round
Print $2
Print $&
# rugged
# $& contains the matched string
# prints round the rugged if ($line =~ m"(round) the (rugged)"){
$first = $1;
$second = $2;
}
13
Using the back references in the regular expressions
Switch the words
$line = "first minus second ";
$line =~ s"(first) minus (second)"$2 minus $1"; print $line; $second minus first
Example:
$line = "A taxi and a fox"; if ($line =~ s/(a+x+)/--/ ){
#print "Yes", $1,"\n"; print $line,"\n"; print "Before: print "After: print "Match:
#A t--i and a fox
", $`,"\n"; #Before: A t
", $',"\n"; #After: i and a fox
", $&,"\n"; #Match: ax
}
14
• Write a regular expression to swap the first two words
• Match a line of 80 characters
15
$line = 'It is THIS and not THAT';
$line =~ /(TH..)/; print "$1\n
"; #earliest match - default behavior
$line =~ /(TH..).*(TH..)/; print "$1 $2\n";
($one,$two) = ($line =~ /(Th..).*(TH..)/); print "1:$one 2:$two \n";
16
What happens to the back references if the regular expression fails to match?
Example:
$text = “This is scary”;
$text =~ m”(scari)”; print $1 ; # $1 does not get set if the regular expression match fails.
#Make use of short-circuit evaluation as shown below:
($text =~ m”(scari)”) && ($found = $1);
17
Nesting backreferences
Some rules:
The earlier a backreference is in an expression, the lower ots backreference number.
Example:
$string = 'abracadabra';
$string =~ m"(a)(b)"; print "$1 $2\n"; #a b
18
Nesting backreferences
Some rules:
The more general a backreference is, the lower the backreference number.
Example:
$string = "softly slowly surely subtly";
$string =~ m"((s...ly\s*)*)"; print "$1\n";# softly slowly surely subtly print "$2\n"; # subtly
Explanation:
The pattern, "(s...ly\s*)* matches multiple things:
First, softly, then slowly, then surely and then subtly.
Since it matches multiples, the first matches are thrown out and $2 has subtly.
19
• Back references can be used in the regular expression itself
• If you put () around a group of characters and you want to refer to the back references in the second part of s “ ” ”, you use $1, $2 etc. If you want to use the back references in m” “ or the first part of the s” “ “, you use \1 \2 etc.
Example:
$string = "sample examples"; if ($string =~ m"(amp..) ex\1"){ print "Matches! \n";
}
20
Example:
• $text = ‘bballball’;
• $string =~ s”(b)\1(a..)\1\2”$1$2”; #Does this match the string in $text?
•
Steps in matching:
1. The first b in () matches the first b in $text and is saved in $1 and
\1.
2. \1 matches the second b in the string.
3. (a..) matches the string all and is stored in $2 and \2.
4. \1 matches the next b.
5. \2 (contains all) matches the next and last three characters, all.
21
There are six multiple match operators.
They are: * - zero or more
• + - one or moew
• ? – zero or one time
• { X } – match ‘X’ times
• {X, } – match ‘X’ or more times
• { X, Y} – match X to Y times
Example:
22
Example:
$line = 'This is the example of the greedy match pattern';
$line =~ m /This (.*)the/; print $1; #is the example of
$line = 'Somethings';
$line =~ m"(\w{2,3})"; print "$1\n"; #Som
23
The multiple match operators will, by default, will gobble up the maximum number of characters in a string and still match the pattern.
Example 1:Find and replace the first “round” with “square” in the string.
$line = “About a round and round rock”;
#Tries to match the maximal number of “any” characters represented by .*.
$line =~ s/.*(round)/square/; print $line; # square rock
#A question mark after the greedy quantifiers, the smallest quantity is chosen.
$line =~ s/.*?(round)/square/; print $line; # square and round rock
24
*? – match zero, 1 or many times, but match the fewest number of times.
+? – match 1 or many times, but match the fewest number of times.
?? - match zero or 1 time, but match the fewest possible number of times.
25
The /g modifier in substitution means that every single instance of a regular expression is replaced.
$line = " round and round the rugged rock the rascal ran";
$matches = ($line =~ s"round"Round"); print "Matches = $matches\n"; #1
$matches = ($line =~ s "round"Round“ g ); print "Matches = $matches\n"; #2
Perl uses the /g modifier in a different way with match than it does with substitution .
26
Perl uses the /g modifier in a different way with match than it does with substitution .
Perl attaches an iterator to the g operator. When you match once, Perl remembers where the match occurred. Therefore, you can continue to match where you left of. When Perl reaches the end of the string, the iterator is reset.
Example:
$line = ‘hello Susan hello Jane’; while ($line =~ m/hello (\w+)/g) { print “$1\n”;
}
Output:
Susan
Jane
27
• Perl uses the /g modifier in a different way with match than it does with substitution .
• specifies global pattern matching--that is, matching as many times as possible within the string.
• How it behaves depends on the context. In list context, it returns a list of all the substrings matched by all the parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
• In scalar context, each execution of m//g finds the next match, returning TRUE if it matches, and FALSE if there is no further match.
Example:
$line = ‘hello Susan hello Jane’;
@matches = ($line =~ m/hello (\w+)/g); print “@matches\n”; #Susan Jane
28
• To match more than one pattern.
• Alternation always tries to match the first item in parentheses. If it does not match, the second pattern is tried and so on.
Example:
$line = “starship”
$line =~ m”(.*|star)”; #matches starship
29
• The grep function evaluates the BLOCK or EXPR for each element of LIST, locally setting the $_ variable equal to each element.
• BLOCK is one or more Perl statements delimited by curly brackets.
• LIST is an ordered set of values.
• EXPR is one or more variables, operators, literals, functions, or subroutine calls.
• grep returns a list of those elements for which the EXPR or BLOCK evaluates to TRUE.
• LIST can be a list or an array. In a scalar context, grep returns the number of times the expression was TRUE.
30
#!/usr/local/bin/perl -w
@numbers = (1,2,3,4,5,6,7,8,9);
@lessThanFive = grep ($_ < 5, @numbers); print @lessThanFive, "\n";
@tokens = ("A String", 234, "String 2", 111);
@ints = grep (m"^\d+$", @tokens); print @ints, "\n";
@words = qw(a silly fox jumped over a silly horse); print @words, "\n";
31
$howMany = grep /silly/, @words; print $howMany;
@line = qw(jumped over a rail and hopped across a meadow);
@list = grep { s/ed/s/ if /^jump/ } @words; print @list;
32
@numbers = (1,30,200,50,200,450,2000);
@greaterThan50 = grep { $_ > 50 } @numbers; print "@greaterThan50\n";
@colors = ();
$colors[1] = "yellow"; $colors[5] = "green";
$colors[10] = "blue";
@array = grep { defined $_ } @colors; print "@array\n";
What does this do?
@results = grep { $array[$_] =~ /PACKAGE/ }
(0..$#array)
33
open FILE "<myfile" or die "can't open myfile:
$!";
@lines = <FILE>;print grep /xml|xslt/i, @lines;
@unique = grep { not $found{$_}++ } qw(a b a c d d e f g f h h); print "@unique\n";
34