CS 211 Regular Expressions 2-1 Today’s Lecture • Review Chapter 4 • Go over exercises Processing Input • If we know how to read in a line of input, what else might we want to do with it? • Analyze it in some way, based on some pattern • Extract certain values out of it, based on some pattern • We can create regular expressions to identify patterns, and then use them to extract the relevant info out of the pattern. • A regular expression represents a pattern • Can be used to "match" a particular string → With Scanner’s findInLine() method • Java represents a regular expression with a String literal Regular Expressions: appendix H in the text. Special Symbols: Repetition repetition symbol meaning . any single character * zero or more of the previous thing + one or more of the previous thing ? zero or one of the previous thing any non-special char matches itself grouping pattern meaning (pattern) parentheses group things a|b matches pattern a, or pattern b, exactly 4 Special Symbols: "character classes" "character class" pattern meaning [chars] any single char between []'s [a-z] any single char from a-to-z. Many more character classes can be found at: http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html 5 Special Symbols: Pre-defined groups boundary representation pattern meaning \d [0-9] any single digit char \D [^0-9] any single non-digit char \s [ \t\n\f\r] any whitespace char * \S [^ \t\n\f\r] any non-whitespace char* \w [a-zA-Z0-9_] any identifier char (any 'word' char) \W [^a-zA-Z0-9_] any non-identifier char * note: there is a space char in this. Other whitespace chars also, but their unicode representations were omitted here. 6 Special Symbols: everything else boundary representation meaning \★ represents ★ instead of its special meaning † any non-special char matches itself the backslash is used to escape any special character, so that we can match the character itself. a* matches zero or more a's a\* matches an a followed by a star \b "matches" the gap between characters, instead of a particular character. \bhe\b would match within "if he is" → wouldn't match within "if she is" or "anthem". † here, ★ could be [,],*,+,?,{,},and so on. It's a placeholder for the special symbols, and ★ would not show up in a regular expression itself. 7 Representing Regular Expressions in Java • We use a String literal to represent a regular expression in Java. • This means that " must be escaped: \" • This also means the \ must also be escaped! \\" (represents ") • Suggested conversion: write the regExp on paper, carefully represent each character correctly inside the String, one at a time: regular expression Java String representation an example matching String (without the surrounding quotes) \(\d\) "\\(\\d\\)" (4) I "hate" airquotes "I \"hate\" airquotes" I "hate" airquotes \\d means digits "\\\\d means digits" \d means digits abc\n123 "abc\\n123" abc\n123 8 Let’s go over the exercises 9 Questions? 10