Cheatsheet - RegEx

Syntax

|: or
(): group

Characters

.: any character (dot matches everything except newlines)
\w: alphanumeric character plus _, equivalent to [A-Za-z0-9_]
\W: non-alphanumeric character excluding _, equivalent to [^A-Za-z0-9_]
\s: whitespace
\S: anything BUT whitespace
\d: digit, equivalent to [0-9]
\D: non-digit, equivalent to [^0-9]
[...]: one of the characters
[^...]: anything but the characters listed

Anchors

^: beginning of a line or string
$: end of a line or string
\b: zero-width word-boundary (like the caret and the dollar sign)
\A: Matches the beginning of a string (but not an internal line).
\z: Matches the end of a string (but not an internal line).

Repetition Operators

?: match 0 or 1 times
+: match at least once
*: match 0 or multiple times
{M,N}: minimum M matches and maximum N matches
- {M,}: match at least M times
- {0,N}: match at most N times

Greedy vs Lazy

.*: match as long as possible
.*?: match as short as possible

BRE vs ERE vs PCRE

The only difference between basic and extended regular expressions is in the behavior of a few characters: ?, +, parentheses (()), and braces ({}).

basic regular expressions (BRE): should be escaped to behave as special characters
extended regular expressions (ERE) : should be escaped to match a literal character.
Perl Compatible Regular Expressions (PCRE): much more powerful and flexible than BRE and ERE.

Multiple flavors may be supported by the tools:

sed
- sed: basic
- sed -E: extended
grep
- grep: basic
- egrep or grep -E

JavaScript

str.search
str.match
str.matchAll
str.replace

Example: split country name and country code in strings like "China (CN)"

> s = "China (CN)";
'China (CN)'
> match = s.match(/\((.*?)\)/)
[ '(CN)', 'CN', index: 6, input: 'China (CN)', groups: undefined ]
> match[1]
'CN'
> s.substring(0, match.index).trim()
'China'

Match all:

const regex = /.*/g;
const matches = content.matchAll(regex);
for (let match of matches) {
  // match[0] is the matched string
  // match[1] is the first capture, etc
}

Literal vs. Constructor

Literal: re = /.../g
Constructor: re = new RegExp("...")
- can use string concat: re = new RegExp("..." + some_variable + "...")

Local vs. Global

re = /.../: re.match(str) will return a list of captures of the FIRST match.
re = /.../g: re.match(str) will return a list of matches but NOT captures.

match vs. exec

str.match(): as stated above.
regex.exec(): return captures, more detailed info; exec multiple times.

Example

var match;
while ((match = re.exec(str)) !== null) {}

Python

match, search and findall:

re.match(): only match at the beginning of the string, returns a match object.
re.search(): locate a match anywhere in string, returns a match object.
re.findall(): find all occurrences, returns a list of strings.

>>> type(re.search("foo", "foobarfoo"))
<class '_sre.SRE_Match'>
>>> type(re.match("foo", "foobarfoo"))
<class '_sre.SRE_Match'>

re.match()/re.search()

re.match() and re.search() return a match object:

>>> match = re.search("f(.*?),", "foo,faa,fuu,bar")
>>> match.groups()
('oo',)

match.group(0) returns the string snippet that matches the pattern:

>>> match.group(0)
'foo,'

other group captures the ones in ():

>>> match.group(1)
'oo'

re.findall()

re.findall() returns a list, extract value using []:

>>> match = re.findall("f(.*?),", "foo,faa,fuu,bar")
>>> match
['oo', 'aa', 'uu']
>>> match[0]
'oo'

Compiled Patterns

pattern = re.compile(pattern_string)
result = pattern.match(string)

is equivalent to

result = re.match(pattern_string, string)

re.compile() returns a SRE_Pattern object:

>>> type(re.compile("pattern"))
<class '_sre.SRE_Pattern'>