Polyglot CheatSheet - RegEx
Last Updated: 2022-04-25
Syntax
|
: or()
: group
Characters
.
: any character\w
: alphanumeric character plus_
, equivalent to[A-Za-z0-9_]
\W
: non-alphanumeric character excluding_
, equivalent to[^A-Za-z0-9_]
\s
: whitespace\S
: anything BUT whitespace\d
: digit, equivalent to[0-9]
\D
: non-digit, equivalent to[^0-9]
[...]
: one of the characters[^...]
: anything but the characters listed
Anchors
^
: beginning of a line or string$
: end of a line or string\b
: zero-width word-boundary (like the caret and the dollar sign)\A
: Matches the beginning of a string (but not an internal line).\z
: Matches the end of a string (but not an internal line).
Repetition Operators
?
: match 0 or 1 times+
: match at least once*
: match 0 or multiple times{M,N}
: minimum M matches and maximum N matches{M,}
: match at least M times{0,N}
: match at most N times
Greedy vs Lazy
.*
: match as long as possible.*?
: match as short as possible
BRE vs ERE vs PCRE
The only difference between basic and extended regular expressions is in the behavior of a few characters: ?
, +
, parentheses (()
), and braces ({}
).
- basic regular expressions (BRE): should be escaped to behave as special characters
- extended regular expressions (ERE) : should be escaped to match a literal character.
- Perl Compatible Regular Expressions (PCRE): much more powerful and flexible than BRE and ERE.
Multiple flavors may be supported by the tools:
- sed
sed
: basicsed -E
: extended
- grep
grep
: basicegrep
orgrep -E
JavaScript
str.search
str.match
str.matchAll
str.replace
Example: split country name and country code in strings like "China (CN)"
> s = "China (CN)";
'China (CN)'
> match = s.match(/\((.*?)\)/)
[ '(CN)', 'CN', index: 6, input: 'China (CN)', groups: undefined ]
> match[1]
'CN'
> s.substring(0, match.index).trim()
'China'
Match all:
const regex = /.*/g;
const matches = content.matchAll(regex);
for (let match of matches) {
// match[0] is the matched string
// match[1] is the first capture, etc
}
Literal vs. Constructor
- Literal:
re = /.../g
- Constructor:
re = new RegExp("...")
- can use string concat:
re = new RegExp("..." + some_variable + "...")
- can use string concat:
Local vs. Global
re = /.../
:re.match(str)
will return a list of captures of the FIRST match.re = /.../g
:re.match(str)
will return a list of matches but NOT captures.
match vs. exec
str.match()
: as stated above.regex.exec()
: return captures, more detailed info; exec multiple times.
Example
var match;
while ((match = re.exec(str)) !== null) {}