10g Regular Expressions

advertisement
10g Regular Expressions
Syntax:
source_char – any character string with a datatype of CHAR, VARCHAR2, NCHAR,
NVARCHAR2, CLOB, or NCLOB
pattern (max 512 bytes)
Operator
\
*
+
?
|
^
$
.
[ ]
[^ ]
()
{m}
{m,}
{m,n}
\n
[..]
[: :]
[==]
Description
The backslash character can have four different meanings depending on the
context. It can:
Stand for itself
Quote the next character
Introduce an operator
Do nothing
Matches zero or more occurrences
Matches one or more occurrences
Matches zero or one occurrence
Alternation operator for specifying alternative matches
Matches the beginning of a string
Matches the end of a string
Matches any character in the supported character set except NULL
Bracket expression for specifying a matching list that should match any one
of the expressions represented in the list
A non-matching list expression specifies a list that matches any character
except for the expressions represented in the list
Grouping expression, treated as a single sub-expression
Matches exactly m times
Matches at least m times
Matches at least m times but no more than n times
The back-reference expression (n is a digit between 1 and 9) matches the nth
sub-expression enclosed between '(' and ')' preceding the \n
Specifies one collation element, and can be a multi-character element (for
example, [.ch.] in Spanish)
Specifies character classes (for example, [:alpha:]), it matches any character
within the character class (see table below)
Specifies equivalence classes. For example, [=a=] matches all characters
having base letter 'a'
(Explanations taken, in part, from Oracle on-line help)
Page 1 of 3
Character classes
CHARACTER
CLASS SYNTAX MEANING
[:alnum:]
All alphanumeric characters
[:alpha:]
All alphabetic characters
[:blank:]
All blank space characters
[:cntrl:]
All control characters (nonprinting)
[:digit:]
All numeric digits
[:graph:]
All [:punct:], [:upper:], [:lower:], and [:digit:] characters
[:lower:]
All lowercase alphabetic characters
[:print:]
All printable characters
[:punct:]
All punctuation characters
[:space:]
All space characters (nonprinting), such as carriage return,
newline, vertical tab, and form feed
[:upper:]
All uppercase alphabetic characters
[:xdigit:]
All valid hexadecimal characters
match_pattern: lets you change the default matching behavior
i
c
n
m
x
specifies case-insensitive matching
specifies case-sensitive matching
allows the period (.), which is the match-any-character wildcard character,
to match the newline character – if you omit this parameter, the period does
not match the newline character
'm' treats the source string as multiple lines (Oracle interprets ^ and $ as the
start and end, respectively, of any line anywhere in the source string, rather
than only at the start or end of the entire source string – if you omit this
parameter, Oracle treats the source string as a single line)
ignores whitespace characters (by default, whitespace characters match
themselves)
(Explanations taken, in part, from Oracle on-line help)
Page 2 of 3
Example:
Assume you have a table with an area code / phone number combination field. Select the
records that have the exact format (123)123-4567.
SELECT
FROM
WHERE
Areacode_Phone “Valid Area Code and Phone Numbers”
Customer_Table
REGEXP_LIKE (Areacode_Phone, '^\([0-9]{3}\)[[:digit:]]{3}-[0-9]{4}$');
Note:
^ means the pattern starts looking at the start of the string
The open and close brackets need the quoting character (\) in front of them because they are
“special” characters (i.e. can be used else in pattern matching)
[0-9] and [[:digit:]] looks for any digit between 0 and 9 (two ways of doing exactly the same
thing)
{3} looks for exactly three instances of the preceding pattern (i.e. digits)
$ means the pattern looks for the end of the string
(Explanations taken, in part, from Oracle on-line help)
Page 3 of 3
Download