Regular Expressions

advertisement

Perl-CGI 2

Regular Expressions

1

Regular Expressions

Regular Expressions are used for pattern matching.

$scalarName = ”This is a line with pattern in it”.

Matching:

• if ($scalarName =~ m” pattern ”) { ….

- evaluates to true (1)

Substituting or Replacing:

• $line =~ s ”patternA”patternB”; # searches for patternA in $line and replaces it with patternB.

Translating

• $scalarName =~ tr/A-Z/a-z/

2

Regular Expressions

Regular Expressions Match only on scalars

#!/usr/local/bin/perl

$name = "Smith"; if ($name =~ m"it"){ print "yes\n";}

$name =~ s/S/s/; print "$name\n";

#substitution

$pattern = 'abc';

$a = 'We start with abcdef and more abcdef';

$status = ($a =~ /abc/);

# Double quoted strings can be used

$status = ($a =~ "abc"); print "$status\n"; # 1

$browser = $ENV{‘HTTP_USE_AGENT’};

If ($browser =~m/Mozilla/){….}

3

Regular Expressions

$pattern = 'abc';

$a = 'We start with abcdef and more abcdef';

$b = 'bcdefjl';

$status = ($a =~ s/abc/ABC/); print "$status\n"; if ($a =~ m/abc/) { print "match\n";

} if ($b !~ m/abc/){ print "No match\n";

} if ($a =~ m/${pattern}/){

#Variable interpolation print "match\n";

}

4

Regular Expressions

Regular Expressions can be used to match against the values in the special variable, $_ without using !~ =~

Example: my @elements = (‘a1’,’a2’,’a3’); foreach (@elements) { s/a/b); }

#the special variable $_ will contain the elements from the list,

# one at a time through the iterations.

# The elements will be b1, b2 and b3 because of substitution.

while (<FD>) { print if m/ERROR/); } # prints the line that contains

ERROR

$text = “abcdef”;

$text =~ m/a/ #does it match character a, return true if it does

$text !~ m/a/ #does it match character a, return false if it does

$text =~ s/a/A # substitute A for a and return true if it happens

$text !~ s/a/A # substitute A for a and return false if it happens

5

Regular Expressions

• . (period) – matches any single character.

• Example:

• [] – matches any one character in the range of characters given

• Examples:

$level = "12345"; if ( $level =~ /^[1-9]/ ) {…} # begins with 1-9 if ( $level =~ /[^1-9]/ ){print ":ok";} # Negated class. if level does not contain any from 1-9

$level = “123450”; if ( $level =~ /[^1-9]/ ){print "ok";}# what is the output?

#Matching on the word boundary

$line = "A rugged rug";

$line =~ s/\brug\b/ RUG/; print $line; # A rugged RUG

6

Regular Expressions

Example: Grouping and quantifying

$line = “tootootoo"; if ($line =~ m/((too){2})/){ print $1, "\n";

}

# match against $_

Example:

my @vars = ('a1','a2','a3'); foreach (@vars) { s/a/b/;} print "@vars\n";

7

Common Wildcards

\d – matches a digit

\D – matches a non digit

\w – matches a word character(an upper case, or lower case letter, a digit or an underscore)

\s – matches a space character

\S – matches a non-space character

\b – require an element to appear at the beginning or end of a word

\B - require an element not to appear at the beginning or end of a word

^ - beginning of the string

$ - end of the string

* - zero or more

+ - one or more

? – zero or one time

{ X } – match ‘X’ times

{X, } – match ‘X’ or more times

{ X, Y} – match X to Y times

• Alternation - | match one or more patterns.

8

Regular Expressions

Regular Expressions Match only on scalars

Use grep to match array elements (More Examples on grep later)

@array = ('one','two','ton');

$n = grep (m/on/,@array)

9

Regular Expressions

A Regular Expression matches the earliest possible match of a given pattern. By default, it only matches or replaces a given regular expression once.

Example:

$variable = “A crazy horse jumped over a crazy fox”

$pattern = “crazy fox”

• If a partial match is found, it “backtracks” to the least possible amount in the string and starts matching again.

10

Regular Expressions

Regular Expressions can take ANY and ALL characters that doublequoted strings can.

Examples:

$name = "John Smith";

$line = " John Smith is the author of this book."; if ($line =~m/author of/){ print "Author\n“ } if ($line =~m/${name} is/){ print "match\n"} #matches

John Smith is

If there are special characters in the pattern, use either back slashes or double quotes.

Example:

$path =~m”usr/local/bin”

• Use the function quotemeta() to to automatically backslash things.

Example:

$pattern = “({“

$variable =~m/”$pattern” is same as saying,

$variable =~m/”({“ # this will cause runtime error because

({ are special characters.

$variable = quotemeta (“{(“); # will make the pattern =

\{\(

11

Regular expressions

A Regular Expression creates two things in the process of being evaluated: result status and back references.

Result Status: is an indication of how many times a given regular expression matched your string.

$line = " round and round the rugged rock the rascal ran";

$status = ($line =~ m"round"); print "$status\n"; #1

$matches = ($line =~ s"round"Round"); print "Matches = $matches\n"; #1

$matches = ($line =~ s "round"Round“ g ); print "Matches = $matches\n"; #2

12

Regular Expressions

Back References will enable you to save some of the matches for later use. The symbols that you want to match are enclosed in ().

Example:

$line = "round and round the rugged rock the rascal ran";

$matches = ($line =~ m"(round) the (rugged)"); print "Matches = $matches\n";# 1 print $1 # round

Print $2

Print $&

# rugged

# $& contains the matched string

# prints round the rugged if ($line =~ m"(round) the (rugged)"){

$first = $1;

$second = $2;

}

13

Regular Expressions

Using the back references in the regular expressions

Switch the words

$line = "first minus second ";

$line =~ s"(first) minus (second)"$2 minus $1"; print $line; $second minus first

Example:

$line = "A taxi and a fox"; if ($line =~ s/(a+x+)/--/ ){

#print "Yes", $1,"\n"; print $line,"\n"; print "Before: print "After: print "Match:

#A t--i and a fox

", $`,"\n"; #Before: A t

", $',"\n"; #After: i and a fox

", $&,"\n"; #Match: ax

}

14

Exercise

• Write a regular expression to swap the first two words

• Match a line of 80 characters

15

Backreferences

$line = 'It is THIS and not THAT';

$line =~ /(TH..)/; print "$1\n

"; #earliest match - default behavior

$line =~ /(TH..).*(TH..)/; print "$1 $2\n";

($one,$two) = ($line =~ /(Th..).*(TH..)/); print "1:$one 2:$two \n";

16

Using back references

What happens to the back references if the regular expression fails to match?

Example:

$text = “This is scary”;

$text =~ m”(scari)”; print $1 ; # $1 does not get set if the regular expression match fails.

#Make use of short-circuit evaluation as shown below:

($text =~ m”(scari)”) && ($found = $1);

17

Using back references

Nesting backreferences

Some rules:

The earlier a backreference is in an expression, the lower ots backreference number.

Example:

$string = 'abracadabra';

$string =~ m"(a)(b)"; print "$1 $2\n"; #a b

18

Using back references

Nesting backreferences

Some rules:

The more general a backreference is, the lower the backreference number.

Example:

$string = "softly slowly surely subtly";

$string =~ m"((s...ly\s*)*)"; print "$1\n";# softly slowly surely subtly print "$2\n"; # subtly

Explanation:

The pattern, "(s...ly\s*)* matches multiple things:

First, softly, then slowly, then surely and then subtly.

Since it matches multiples, the first matches are thrown out and $2 has subtly.

19

Backreferences

• Back references can be used in the regular expression itself

• If you put () around a group of characters and you want to refer to the back references in the second part of s “ ” ”, you use $1, $2 etc. If you want to use the back references in m” “ or the first part of the s” “ “, you use \1 \2 etc.

Example:

$string = "sample examples"; if ($string =~ m"(amp..) ex\1"){ print "Matches! \n";

}

20

Backreferences

Example:

• $text = ‘bballball’;

• $string =~ s”(b)\1(a..)\1\2”$1$2”; #Does this match the string in $text?

Steps in matching:

1. The first b in () matches the first b in $text and is saved in $1 and

\1.

2. \1 matches the second b in the string.

3. (a..) matches the string all and is stored in $2 and \2.

4. \1 matches the next b.

5. \2 (contains all) matches the next and last three characters, all.

21

Multiple-Match Operators

There are six multiple match operators.

They are: * - zero or more

• + - one or moew

• ? – zero or one time

• { X } – match ‘X’ times

• {X, } – match ‘X’ or more times

• { X, Y} – match X to Y times

Example:

22

Greediness

Example:

$line = 'This is the example of the greedy match pattern';

$line =~ m /This (.*)the/; print $1; #is the example of

$line = 'Somethings';

$line =~ m"(\w{2,3})"; print "$1\n"; #Som

23

Greediness

The multiple match operators will, by default, will gobble up the maximum number of characters in a string and still match the pattern.

Example 1:Find and replace the first “round” with “square” in the string.

$line = “About a round and round rock”;

#Tries to match the maximal number of “any” characters represented by .*.

$line =~ s/.*(round)/square/; print $line; # square rock

#A question mark after the greedy quantifiers, the smallest quantity is chosen.

$line =~ s/.*?(round)/square/; print $line; # square and round rock

24

Curbing Greediness

*? – match zero, 1 or many times, but match the fewest number of times.

+? – match 1 or many times, but match the fewest number of times.

?? - match zero or 1 time, but match the fewest possible number of times.

25

Some modifiers

The /g modifier in substitution means that every single instance of a regular expression is replaced.

$line = " round and round the rugged rock the rascal ran";

$matches = ($line =~ s"round"Round"); print "Matches = $matches\n"; #1

$matches = ($line =~ s "round"Round“ g ); print "Matches = $matches\n"; #2

Perl uses the /g modifier in a different way with match than it does with substitution .

26

Some modifiers

Perl uses the /g modifier in a different way with match than it does with substitution .

Perl attaches an iterator to the g operator. When you match once, Perl remembers where the match occurred. Therefore, you can continue to match where you left of. When Perl reaches the end of the string, the iterator is reset.

Example:

$line = ‘hello Susan hello Jane’; while ($line =~ m/hello (\w+)/g) { print “$1\n”;

}

Output:

Susan

Jane

27

Some modifiers

• Perl uses the /g modifier in a different way with match than it does with substitution .

• specifies global pattern matching--that is, matching as many times as possible within the string.

• How it behaves depends on the context. In list context, it returns a list of all the substrings matched by all the parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

• In scalar context, each execution of m//g finds the next match, returning TRUE if it matches, and FALSE if there is no further match.

Example:

$line = ‘hello Susan hello Jane’;

@matches = ($line =~ m/hello (\w+)/g); print “@matches\n”; #Susan Jane

28

Alternation to match more than one set of characters

• To match more than one pattern.

• Alternation always tries to match the first item in parentheses. If it does not match, the second pattern is tried and so on.

Example:

$line = “starship”

$line =~ m”(.*|star)”; #matches starship

29

Perl’s grep

• The grep function evaluates the BLOCK or EXPR for each element of LIST, locally setting the $_ variable equal to each element.

• BLOCK is one or more Perl statements delimited by curly brackets.

• LIST is an ordered set of values.

• EXPR is one or more variables, operators, literals, functions, or subroutine calls.

• grep returns a list of those elements for which the EXPR or BLOCK evaluates to TRUE.

• LIST can be a list or an array. In a scalar context, grep returns the number of times the expression was TRUE.

30

grep -Some examples

#!/usr/local/bin/perl -w

@numbers = (1,2,3,4,5,6,7,8,9);

@lessThanFive = grep ($_ < 5, @numbers); print @lessThanFive, "\n";

@tokens = ("A String", 234, "String 2", 111);

@ints = grep (m"^\d+$", @tokens); print @ints, "\n";

@words = qw(a silly fox jumped over a silly horse); print @words, "\n";

31

grep - Some Examples

$howMany = grep /silly/, @words; print $howMany;

@line = qw(jumped over a rail and hopped across a meadow);

@list = grep { s/ed/s/ if /^jump/ } @words; print @list;

32

grep - Some Examples

@numbers = (1,30,200,50,200,450,2000);

@greaterThan50 = grep { $_ > 50 } @numbers; print "@greaterThan50\n";

@colors = ();

$colors[1] = "yellow"; $colors[5] = "green";

$colors[10] = "blue";

@array = grep { defined $_ } @colors; print "@array\n";

What does this do?

@results = grep { $array[$_] =~ /PACKAGE/ }

(0..$#array)

33

grep – Some Examples

open FILE "<myfile" or die "can't open myfile:

$!";

@lines = <FILE>;print grep /xml|xslt/i, @lines;

@unique = grep { not $found{$_}++ } qw(a b a c d d e f g f h h); print "@unique\n";

34

Download