Regular Expressions BKF03 Brian Ciccolo Agenda • Definition • Uses – within Aspen and beyond • Matching • Replacing What’s a Regular Expression? In computing, regular expressions, also referred to as regex or regexp, provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification. http://en.wikipedia.org/wiki/Regular_expression Why Use a Regex? • Validate data entry Example: Verify the format of a date field is mm/dd/yyyy • Find/replace on steroids Example: Reformat phone numbers to (###) ###-#### Regex Use in Aspen • Data validation o Date, time field input o Validation rules (new in 3.0 – see session TEC07) • Find/replace on steroids o System Log filter o Field formatting RegEx Examples Using Notepad++ Select this option for our examples Select the proper Search Mode Matching – The Basics • Literals - plain old text • Classes Example Definition [abc] a, b, or c [a-z] Any lowercase letter [a-zA-Z] [0-9] [^a-zA-Z] Any lowercase or uppercase letter Any digit, 0 through 9 Not a letter (could be a digit or punctuation) Matching – Predefined Classes Predefined Definition Class . Any character \d Any digit: [0-9] \D Any non-digit: [^0-9] \s A whitespace character (space, tab, newline) \S A non-whitespace character: [^\s] \w A word character: [a-zA-Z_0-9] \W A non-word character (i.e., punctuation): [^\w] Matching – Quantifiers Quantifier ? Definition Matches 0 or 1 time (Not supported by Notepad++) + Matches 1 or more times * Matches 0 or more times {n,m} Matches at least n times but no more than m times (Not supported by Notepad++) Matching – Greedy vs. Lazy • Quantifiers are “greedy” by default – they match as many characters as possible • Sometimes you want to match the fewest characters possible – enter “lazy” quantifiers Quantifier Lazy Equivalent* ? ?? + +? * *? * Not supported by Notepad++ Replacing – Groups • “Groups” in the regex can be used in the replacement value • Delimited with parentheses in the regex • Identified with \n where n is the nth group in the original expression • \0 represents the entire match (not supported in Notepad++) Reformatting Dates • Change mm/dd/yyyy to yyyy-mm-dd • Regex: (\d+)/(\d+)/(\d+) • Replacement: \3-\1-\2 Step 2 – pad the single digits! • Regex: -(\d)([-"]) • Replacement: -0\1\2 Reformatting Phone Numbers (v1) • Wrap the area code in parentheses • Regex: "(\d\d\d)- • Replacement: "(\1) Ends with a space! Reformatting Phone Numbers (v2) • Strip punctuation (numbers only) • Regex: \((\d+)\) (\d+)-(\d+) • Replacement: \1\2\3 Reformatting Social Security Numbers • Format SSN as ###-##-#### • Do it in Aspen! • Define a record in the Regular Expression Library table • Set the regex on the Person ID field in the Data Dictionary Define a Regular Expression Regex and format properties Update the Data Dictionary Link to the regex Verify the Results Extras • Wikipedia Entry http://en.wikipedia.org/wiki/Regular_expression • Regular Expressions Cheat Sheet (V2) http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet • Java regex support http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html • Notepad++ text editor and regex support http://notepad-plus.sourceforge.net http://notepad-plus.sourceforge.net/uk/regExpList.php Thank you. bciccolo@x2dev.com