Uploaded by Daniel Zhu

Regex Cheat Sheet

advertisement
4/25/22, 10:21 PM
Regex Cheat Sheet
\s
Most engines: "whitespace character": space, tab, newline, carriage return, vertical tab a\sb\sc
ab
c
ab
a\sb\sc
c
\D\D\D
ABC
\W\W\W\W\W *-+=)
\S\S\S\S
Yoyo
\s
.NET, Python 3, JavaScript: "whitespace character": any Unicode separator
\D
\W
\S
One character that is not a digit as defined by your engine's \d
One character that is not a word character as defined by your engine's \w
One character that is not a whitespace character as defined by your engine's \s
(direct link)
Quantifiers
Quantifier
Legend
+
One or more
{3}
Exactly three times
{2,4}
Two to four times
{3,}
Three or more times
*
Zero or more times
?
Once or none
Example
Sample Match
Version \w-\w+ Version A-b1_1
\D{3}
ABC
\d{2,4}
156
\w{3,}
regex_tutorial
A*B*C*
AAACC
plurals?
plural
(direct link)
More Characters
Character
Legend
.
Any character except line break
.
Any character except line break
\.
A period (special character: needs to be escaped by a \)
\
Escapes a special character
\
Escapes a special character
Example
a.c
.*
a\.c
\.\*\+\? \$\^\/\\
\[\{\(\)\}\]
Sample Match
abc
whatever, man.
a.c
.*+? $^/\
[{()}]
(direct link)
Logic
Logic
|
(…)
\1
\2
(?: … )
Legend
Alternation / OR operand
Capturing group
Contents of Group 1
Contents of Group 2
Non-capturing group
Example
Sample Match
22|33
33
A(nt|pple)
Apple (captures "pple")
r(\w)g\1x
regex
(\d\d)\+(\d\d)=\2\+\1 12+65=65+12
A(?:nt|pple)
Apple
(direct link)
More White-Space
Character
Legend
\t
\r
\n
Tab
Carriage return character
Line feed character
\r\n
Line separator on Windows
\N
\h
\H
\v
Perl, PCRE (C, PHP, R…): one character that is not a line break
Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
One character that is not a horizontal whitespace
.NET, JavaScript, Python, Ruby: vertical tab
Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed,
paragraph or line separator
Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)
\v
\V
\R
https://www.rexegg.com/regex-quickstart.html
Sample
Match
T\t\w{2} T ab
see below
see below
AB
AB\r\nCD
CD
\N+
ABC
Example
2/6
4/25/22, 10:21 PM
Regex Cheat Sheet
(direct link)
More Quantifiers
Quantifier
Legend
+
The + (one or more) is "greedy"
?
Makes quantifiers "lazy"
*
The * (zero or more) is "greedy"
?
Makes quantifiers "lazy"
{2,4}
Two to four times, "greedy"
?
Makes quantifiers "lazy"
Example Sample Match
\d+
12345
\d+?
1 in 12345
A*
AAA
A*?
empty in AAA
\w{2,4} abcd
\w{2,4}? ab in abcd
(direct link)
Character Classes
Character
Legend
[…]
One of the characters in the brackets
[…]
One of the characters in the brackets
Range indicator
[x-y]
One of the characters in the range from x to y
[…]
One of the characters in the brackets
[x-y]
One of the characters in the range from x to y
[^x]
One character that is not x
Example
[AEIOU]
T[ao]p
[a-z]
[A-Z]+
[AB1-5w-z]
[ -~]+
[^a-z]{3}
[^x-y]
One of the characters not in the range from x to y
[^ -~]+
[\d\D]
One character that is a digit or a non-digit
[\d\D]+
[\x41]
Matches the character at hexadecimal position 41 in the ASCII
table, i.e. A
[\x41-\x45]
{3}
Sample Match
One uppercase vowel
Tap or Top
One lowercase letter
GREAT
One of either: A,B,1,2,3,4,5,w,x,y,z
Characters in the printable section of the ASCII table.
A1!
Characters that are not in the printable section of the ASCII
table.
Any characters, including new lines, which the regular dot doesn't match
ABE
(direct link)
Anchors and Boundaries
Anchor
Legend
^
Start of string or start of line depending on multiline mode. (But when [^inside brackets], it means "not")
$
End of string or end of line depending on multiline mode. Many engine-dependent subtleties.
Beginning of string
\A
(all major engines except JS)
Very end of the string
\z
Not available in Python and JS
End of string or (except Python) before final line break
\Z
Not available in JS
Beginning of String or End of Previous Match
\G
.NET, Java, PCRE (C, PHP, R…), Perl, Ruby
Word boundary
\b
Most engines: position where one side only is an ASCII letter, digit or underscore
Word boundary
\b
.NET, Java, Python 3, Ruby: position where one side only is a Unicode letter, digit or underscore
\B
Not a word boundary
Example
^abc .*
.*? the end$
\Aabc[\d\D]*
Sample Match
abc (line start)
this is the end
abc (string...
...start)
the end\z
this is...\n...the end
the end\Z
this is...\n...the end\n
Bob.*\bcat\b
Bob ate the cat
Bob.*\b\кошка\b Bob ate the кошка
c.*\Bcat\B.*
copycats
(direct link)
POSIX Classes
Character
Legend
[:alpha:] PCRE (C, PHP, R…): ASCII letters A-Z and a-z
[:alpha:] Ruby 2: Unicode letter or ideogram
[:alnum:] PCRE (C, PHP, R…): ASCII digits and letters A-Z and a-z
[:alnum:] Ruby 2: Unicode digit, letter or ideogram
https://www.rexegg.com/regex-quickstart.html
Example
[8[:alpha:]]+
[[:alpha:]\d]+
[[:alnum:]]{10}
[[:alnum:]]{10}
Sample Match
WellDone88
кошка99
ABCDE12345
кошка90210
3/6
4/25/22, 10:21 PM
Regex Cheat Sheet
[:punct:]
PCRE (C, PHP, R…): ASCII punctuation mark
[[:punct:]]+
?!.,:;
[:punct:]
Ruby: Unicode punctuation mark
[[:punct:]]+
‽,:〽⁆
(direct link)
Inline Modifiers
None of these are supported in JavaScript. In Ruby, beware of (?s) and (?m).
Modifier
(?i)
(?s)
Legend
Example
Case-insensitive mode
(?i)Monday
(except JavaScript)
DOTALL mode (except JS and Ruby). The dot (.) matches new line characters (\r\n). Also
(?s)From A.*to Z
known as "single-line mode" because the dot treats the entire input as a single line
(?m)
Multiline mode
(except Ruby and JS) ^ and $ match at the beginning and end of every line
(?m)
In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks (?m)From A.*to Z
(?x)
Free-Spacing Mode mode
(except JavaScript). Also known as comment mode or whitespace mode
(?n)
.NET, PCRE 10.30+: named capture only
(?d)
Java: Unix linebreaks only
(?^)
PCRE 10.32+: unset modifiers
Sample
Match
monDAY
From A
to Z
1
2
3
From A
to Z
(?m)1\r\n^2$\r\n^3$
(?x) # this is a
# comment
abc # write on multiple
abc d
# lines
[ ]d # spaces must be
# in brackets
Turns all (parentheses) into non-capture
groups. To capture, use named groups.
The dot and the ^ and $ anchors are only
affected by \n
Unsets ismnx modifiers
(direct link)
Lookarounds
Lookaround
Legend
(?=…)
Positive lookahead
(?<=…)
Positive lookbehind
(?!…)
Negative lookahead
(?<!…)
Negative lookbehind
Example
Sample Match
(?=\d{10})\d{5} 01234 in 0123456789
(?<=\d)cat
cat in 1cat
(?!theatre)the\w+ theme
\w{3}(?<!mon)ster Munster
(direct link)
Character Class Operations
Class
Operation
Legend
Example
[…-[…]]
.NET: character class subtraction. One character that is in those on the left,
[a-z-[aeiou]]
but not in the subtracted class.
[…-[…]]
.NET: character class subtraction.
[…&&
[…]]
[…&&
[…]]
[…&&
[^…]]
[…&&
[^…]]
Java, Ruby 2+: character class intersection. One character that is both in
those on the left and in the && class.
Java, Ruby 2+: character class intersection.
Java, Ruby 2+: character class subtraction is obtained by intersecting a
class with a negated class
Java, Ruby 2+: character class subtraction
Sample Match
Any lowercase consonant
An Arabic character that is not a non-digit,
i.e., an Arabic digit
An non-whitespace character that is a non[\S&&[\D]]
digit.
[\S&&[\D]&&[^a-zA- An non-whitespace character that a nonZ]]
digit and not a letter.
An English lowercase letter that is not a
[a-z&&[^aeiou]]
vowel.
[\p{InArabic}&&
An Arabic character that is not a letter or a
[^\p{L}\p{N}]]
number
[\p{IsArabic}-[\D]]
(direct link)
Other Syntax
https://www.rexegg.com/regex-quickstart.html
4/6
Download