Regular Expressions

Basics of Perl Regular Expressions (“regexp”) Jon Radoff // jradoff@charter.net // Biophysics 101, Fall 2002 Simplistic use of a regular expression: $_ = "this is a test"; if(/est/) { print "Match!\n"; } In the above code, the /est/ is the regular expression. It succeeds because est is a substring of this is a test. The string may also contain “meta-characters” that allow you to specify special rules about how you would like to match. Meta-characters: \ Quote the next metacharacter ^ Match the beginning of the line . Match any character (except newline) $ Match the end of the line (or before newline at the end) | Alternation () Grouping [] Character class The most common meta-character in regular expressions is . which matches anything. For example, if you used /te.t/ as the regular expression in the above code, it would succeed, because the s character counts as the “any character.” /foo|test/ would succeed because the | (read as “or”) finds anything that contains either foo or test. The [] operator let’s you check for any one of a class of characters. For example, if you wanted to see if a codon contained AGU or AGC you could use either /AG[UC]/ or /AGU|AGC/. “Quantifiers” may be added to the regular expression to control how many of a certain character to look for. Quantifiers: * + ? {n} {n,} {n,m} Match 0 or more times Match 1 or more times Match 1 or 0 times Match exactly n times Match at least n times Match at least n but not more than m times Examples: /this.*test/ would succeed for any string containing with this and test separated by any number of arbitrary characters. /thi+s/ would succeed for a string containing th followed by one or more i characters followed by s. Modifiers are appended to the end of a regular expression and apply special rules to your entire expression. Modifiers: i g m s x Do case-insensitive pattern matching. global (in substitutions, repeat substitution multiple times – see below) Treat string as multiple lines Treat string as single line; i.e., treat newlines as “dots” Allow whitespace and comments in your regular expression Example: /[acgt]+/i checks if a string contains any number of valid DNA sequence characters of either case. Using the caret (^) with character class In practice, it is often useful to check if a string contains anything except the characters of a particular class. The example above will still return positive even if it contains invalid DNA sequence characters. Insert a character in the beginning of the class to tell it to return positive for any exceptions to the class. Example: /[^acgt]+/i checks if a string contains anything except valid DNA sequence characters of either case. Substitutions with s/// In addition to matching strings, you may also use regular expressions to perform substitutions. Do this by creating a regular expression that is prepended with s, and then append it with the string you want to replace with, followed by another /. Note that substitutions can be placed on a line of code by themselves (they do not need to be part of an assignment or a conditional statement). Example: $_ = "this will be a test"; s/will be/is/; print "$_\n"; will output: this is a test By default, only the first one substitution is performed. To perform multiple, append the g modifier. Example: $_ = "Frodo Baggins and Bilbo Baggins are both hobbits."; s/ baggins//gi; print "$_\n"; will output: Frodo and Bilbo are both hobbits.

Regular Expressions

Related documents

Products

Support

Regular Expressions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib