BKF03 - Regular Expressions

advertisement
Regular Expressions
BKF03
Brian Ciccolo
Agenda
• Definition
• Uses – within Aspen and beyond
• Matching
• Replacing
What’s a Regular Expression?
In computing, regular expressions, also referred to as regex
or regexp, provide a concise and flexible means for
matching strings of text, such as particular characters,
words, or patterns of characters. A regular expression is
written in a formal language that can be interpreted by a
regular expression processor, a program that either serves
as a parser generator or examines text and identifies parts
that match the provided specification.
http://en.wikipedia.org/wiki/Regular_expression
Why Use a Regex?
• Validate data entry
Example: Verify the format of a date field is mm/dd/yyyy
• Find/replace on steroids
Example: Reformat phone numbers to (###) ###-####
Regex Use in Aspen
• Data validation
o Date, time field input
o Validation rules (new in 3.0 – see session TEC07)
• Find/replace on steroids
o System Log filter
o Field formatting
RegEx Examples Using Notepad++
Select this option for our examples
Select the proper Search Mode
Matching – The Basics
• Literals - plain old text
• Classes
Example
Definition
[abc]
a, b, or c
[a-z]
Any lowercase letter
[a-zA-Z]
[0-9]
[^a-zA-Z]
Any lowercase or uppercase letter
Any digit, 0 through 9
Not a letter (could be a digit or punctuation)
Matching – Predefined Classes
Predefined
Definition
Class
.
Any character
\d
Any digit: [0-9]
\D
Any non-digit: [^0-9]
\s
A whitespace character (space, tab, newline)
\S
A non-whitespace character: [^\s]
\w
A word character: [a-zA-Z_0-9]
\W
A non-word character (i.e., punctuation): [^\w]
Matching – Quantifiers
Quantifier
?
Definition
Matches 0 or 1 time
(Not supported by Notepad++)
+
Matches 1 or more times
*
Matches 0 or more times
{n,m}
Matches at least n times but no more than m times
(Not supported by Notepad++)
Matching – Greedy vs. Lazy
• Quantifiers are “greedy” by default –
they match as many characters as possible
• Sometimes you want to match the fewest
characters possible – enter “lazy” quantifiers
Quantifier
Lazy Equivalent*
?
??
+
+?
*
*?
* Not supported by Notepad++
Replacing – Groups
• “Groups” in the regex can be used in the
replacement value
• Delimited with parentheses in the regex
• Identified with \n where n is the nth
group in the original expression
• \0 represents the entire match
(not supported in Notepad++)
Reformatting Dates
• Change mm/dd/yyyy to yyyy-mm-dd
• Regex: (\d+)/(\d+)/(\d+)
• Replacement: \3-\1-\2
Step 2 – pad the single digits!
• Regex: -(\d)([-"])
• Replacement: -0\1\2
Reformatting Phone Numbers (v1)
• Wrap the area code in parentheses
• Regex: "(\d\d\d)-
• Replacement: "(\1)
Ends with a space!
Reformatting Phone Numbers (v2)
• Strip punctuation (numbers only)
• Regex: \((\d+)\) (\d+)-(\d+)
• Replacement: \1\2\3
Reformatting Social Security Numbers
• Format SSN as ###-##-####
• Do it in Aspen!
• Define a record in the Regular Expression
Library table
• Set the regex on the Person ID field in the
Data Dictionary
Define a Regular Expression
Regex and format properties
Update the Data Dictionary
Link to the regex
Verify the Results
Extras
• Wikipedia Entry
http://en.wikipedia.org/wiki/Regular_expression
• Regular Expressions Cheat Sheet (V2)
http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet
• Java regex support
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
• Notepad++ text editor and regex support
http://notepad-plus.sourceforge.net
http://notepad-plus.sourceforge.net/uk/regExpList.php
Thank you.
bciccolo@x2dev.com
Download