Learning Perl - Chapter 8

Learning Perl Chapter 8 Matching with Regular Expressions The last chapter covered a brief overview of regular expressions, in this chapter we get to see some uses for regular expressions in Perl with pattern matching. Matches with m// The m// operator in Perl is the matching operator. You will often see this used simply as // as well, as the m// is implied in this use. The regular expression goes between the forward slashes (//), for example m/fred/ would match "fred." Like we saw with qw//, the forward slash delimiter can be changed to almost anything you'd like to use, m#regex# for example. Match Modifiers Match modifiers go at the end of the m// operator to modify the behavior of Perl's regular expression matching. Case Insensitive The i modifier tells Perl to match the regular expression without caring about the case, UPPER or lower case. Example m/fred/i This example would match fred, Fred, FRED, fRED, etc. Matching any Character As we saw with the regular expression chapter, the dot (.) metacharacter in regular expressions matches any character, with the exception of a new line. You can use the s modifier to tell Perl to include the new line character. Example m/Fred.*Wilma/s This example would match the string "Fred likes to bowl.\nWilma also likes to bowl." Adding Whitespace The x modifier tells Perl to ignore any literal whitespace in your regular expression. This means actual spaces and tabs rather than the \s metacharacter. This modifier is useful with complex regular expressions that may be easier to read with extras spaces or if you would like to add comments to your regular expression. Example m/[0-9] [0-9] [0-9]/x This example would match three decimal numbers in a row. Combining Matching Modifiers This section covers the ability of combining matching modifiers. Example m/fred.*wilma/is This example would match "Fred likes to bowl.\nWilma also likes to bowl." This example combines the i modifier for case insensitivity and the s modifier that adds the new line character to the possible matches of the dot (.) metacharacter. Choosing Character Interpretation This section covers modifiers for character interpretation. One modifier, which was also covered in the last chapter, is the a modifier to match the old ASCII values for metacharacters in the regular expression. Modifier a u l aai Chacter Interpretation ASCII Unicode Locale ASCII only case folding The above table provides the possible modifiers for character interpretation. Using the l modifier for locale delegates the character interpretation to the operating system value. The last modifier, aai, is for case folding. Since other character encoding outside of ASCII can be ambiguous when attempting to find an upper or lower case value of a character, the aai tells Perl to only implement case folder for case insensitivity using hte ASCII format. Anchors By default, regular expressions will match a pattern anywhere they can in a string. If you want to match a patter somewhere specifically in a string, at the beginning or end of a line, then you can use anchors to enforce that behavior in Perl. Anchor ^ $ \A \z \Z \b Function Beginning of string (old Perl 4 anchor) End of string (old Perl 4 anchor) Absolute beginning of string End of string End of string, allowing an optional new line to proceed it Word anchor, indicating a beginning or end of a word Example m#\Ahttp://# This example is using the newer Perl 5 beginning of string anchor to match a string starting with http://. This string cannot have anything before the http://, including any whitespace. Example m#^http://# This example is the same as the first, but using the old Perl 4 anchor. Example m/\.jpg\z/ This example will match a string ending in ".jpg." This pattern will not match if there is a new line character at the end of the string, so a chomp may be necessary before using this pattern. Example m/\.jpg\Z/ This example is the same as the previous, however we do not need to do a chomp to account for the new line that may be after the ".jpg." The \Z anchor allows a new line to be present at the end of the line without being specified in the regular expression. Word Anchors The \b word anchor in the table from the previous section allows you to match a pattern based on what Perl considers a word boundary, or the beginning or ending of a word. Note that punctuation, apostrophes and quotes count as a word boundary in Perl. Example m/\bstone\b/ This example would match the string "That is a stone over there." Example m/stone\b/ This example is similar to the previous, but it would match "That is Fred Flinstone over there!" where the previous would not. The Binding Operator So far the book has used pattern matching in Perl based on the default scalar, $_. The binding operator (=~) allows us to match on other values. Example $value =~ m/\bstone\b/ This example would apply the pattern match of "\bstone\b" to the value in the scalar $value. Interpolating into Patterns This section covers the ability to interpolate scalars into a pattern in Perl. Example my $needle = "stone"; $haystack =~ m/\b$needle\b/; This example is similar to the example in the previous example, the effective regular expression is "\bstone\b". The Match Variables Match variables are variables that Perl stores pieces of a pattern match. The pieces Perl picks out are defined by parentheses ( ). The variables that Perl stores these in are defined as $1, $2, $3 ... $n, where n is the number of parentheses groups in the pattern. Example m/Fred Flin(stone) likes to (bowl)/ In this example, Perl would store the value "stone" to $1 and the value "bowl" to $2. These are more useful in something like the following. Example m/Fred Flinstone likes to (\w+)/ In this example, the word after "to" will be stored in the $1 variable. The Persistence of Captures This section warns that the $1, $2, $3 ... $n variables will remain in Perl into a subsequent match is successful. So, if you have one successful match that populates $1 and then you do another match afterwards that is not successful, the value of $1 will be from the first match instead of the second. Example my $string = "Fred Flintstone likes to bowl."; $string =~ m/Fred Flin(stone) likes to (bowl)."; $string =~ m/Fred Flin(rock) likes to (bowl)."; In this example, since the first match is successful and the second is not, the value of $1 would remain "stone" and the value of $2 would remain "bowl." If you expected $1 to be "rock" after the second match, your code would not behave as expected. Noncapturing Parentheses Perl offers a way to use parentheses without capturing the value into one of the $1, $2, $3 ... $n variables. This function is useful if you are updating a pattern in existing code and do not wish to go through the rest of the code to ensure that your $1, $2, $3 ... $n variables are correct. Example my $string = "Fred Flinstone likes to bowl."; $string =~ m/Fred Flin(?:stone) likes to (bowl)."; In this example, $1 would be "bowl" and there would not be a $2 defined. Named Captures Perl 5.10 added a feature to name your captures in your pattern. This feature uses the %+ hash to store the values of the matches in the keys named in the pattern. Example m/Fred Flin(?<name>stone) likes to (?<sport>bowl)/ In this example, the key "name" ($+{'name'}) would be defined as "stone" and the key "sport" ($+{'sport'}) would be defined as "bowl." The Automatic Match Variables Perl has the ability to define three variables after each pattern match, see the table below for details. The caveat with these variables is that any matches done after any of these variables are read will be significantly slower than normal, as Perl begins to define these after the first time they are used inside of a script. Variable $` $& $' Data The value of the string before the pattern, if any The matched portion of the string The value of the string after the pattern, if any These three variables, when combined, form the entire string the match has been ran against. In Perl 5.10 a new method was introduced to access this data, which does not have the performance implications of the older versions. See the table below for the new equivalents. Perl < 5.10 $` $& $' Perl >= 5.10 ${^PREMATCH} ${^MATCH} ${^POSTMATCH} General Quantifiers In addition to the quantifiers we saw previously (?, *, +) Perl has the ability to specify the exact number of matches we expect. This is accomplished by using the {MIN,MAX} notation after the pattern or character we wish to quatify. The MAX value is optional in two ways; for any number of matches you can do {MIN,} for an EXACT number of matches you can do {MIN} (omitting the comma). Example m/a{5}/ This example would match only "aaaaa." Example m/a{5,}/ This example would match "aaaaa", "aaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaa", etc. Effectively matching as many occurrences of the character a as possible as long as there is a minimum of 5. Example m/a{5,8}/ This example would match "aaaaa", "aaaaaa", "aaaaaaa" and "aaaaaaa". Effectively matching the character a where it occurs at least 5 times but no more than 8 times in a row. Precedence This section defines the precedence in how Perl processes regular expression, similar to PEMDAS in arithmetic. See the table on page 151 of the book for more details. Examples of Precedence This section contains examples on how precedence can affect the behavior of your regular expressions. The most common precedence issue has to do with the alternate metacharacter (|). Example m/he is happy|sad about that/ In this example we would be matching "he is happy" or "sad about that" where we probably really intended to match "he is happy about that" or "he is sad about that." To work around this precedence issue, you would use the following example. Example m/he is (happy|sad) about that/ Since the parentheses are high up on the precedence of how Perl handles regular expressions, this pattern would have the intended behavior. A Pattern Test Program See page 152 of the book for this program. This program provides a quick way to test regular expression behavior.

Learning Perl - Chapter 8

Related documents

Products

Support

Learning Perl - Chapter 8

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib