Polyglot CheatSheet - RegEx
Updated: 2020-06-29
.
: any character^
: beginning of a line or string$
: end of a line or string|
: or?
: match 0 or 1 times+
: match at least once*
: match 0 or multiple times()
: group[...]
: one of the characters[^...]
: anything but the characters listed\w
: alphanumeric character plus_
, equivalent to[A-Za-z0-9_]
\W
: non-alphanumeric character excluding_
, equivalent to[^A-Za-z0-9_]
\s
: whitespace\S
: anything BUT whitespace\d
: digit, equivalent to[0-9]
\D
: non-digit, equivalent to[^0-9]
\A
: Matches the beginning of a string (but not an internal line).\z
: Matches the end of a string (but not an internal line).\b
: word boundary-
{M,N}
: minimum M matches and maximum N matches{M,}
: match at least M times{0,N}
: match at most N times
Greedy vs Lazy
.*
: match as long as possible.*?
: match as short as possible
Basic vs Extended Regex
The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’).
- basic regular expressions: should be escaped to behave as special characters
- extended regular expressions: should be escaped to match a literal character.
-
sed
sed
: basicsed -r
: extended
-
grep
grep
: basicegrep
orgrep -E
Regular expressions: PCRE vs. ERE/BRE # Conciseness matters:
\d === [0-9]
\w === word-constituent
\W === non-word-constituent
\b === zero-width word-boundary (like \< and \>)
\s === whitespace
\S === non-whitespace
Get to know regular expressions well (esp. EREs and PCREs) Which of these do you prefer? - /'.?'/ - /'''/ (e.g., write a C-comment-remover: easy with PCRE's .*? )
Javascript
Literal vs. Constructor
- Literal:
re = /.../g
-
Constructor:
re = new RegExp("...")
- can use string concat:
re = new RegExp("..." + some_variable + "...")
- can use string concat:
Local vs. Global
re = /.../
:re.match(str)
will return a list of captures of the FIRST matchre = /.../g
:re.match(str)
will return a list of matches but NOT captures
match vs. exec
match
: as stated aboveexec
: return captures, exec multiple times
Example
var match;
while ((match = re.exec(str)) !== null) {}
Javascript
- str.search
- str.match
- str.replace
Example: split country name and country code in strings like "China (CN)"
> s = "China (CN)";
'China (CN)'
> match = s.match(/\((.*?)\)/)
[ '(CN)', 'CN', index: 6, input: 'China (CN)', groups: undefined ]
> match[1]
'CN'
> s.substring(0, match.index).trim()
'China'