logo

Polyglot CheatSheet - RegEx

Last Updated: 2022-04-25

Syntax

  • |: or
  • (): group

Characters

  • .: any character
  • \w: alphanumeric character plus _, equivalent to [A-Za-z0-9_]
  • \W: non-alphanumeric character excluding _, equivalent to [^A-Za-z0-9_]
  • \s: whitespace
  • \S: anything BUT whitespace
  • \d: digit, equivalent to [0-9]
  • \D: non-digit, equivalent to [^0-9]
  • [...]: one of the characters
  • [^...]: anything but the characters listed

Anchors

  • ^: beginning of a line or string
  • $: end of a line or string
  • \b: zero-width word-boundary (like the caret and the dollar sign)
  • \A: Matches the beginning of a string (but not an internal line).
  • \z: Matches the end of a string (but not an internal line).

Repetition Operators

  • ?: match 0 or 1 times
  • +: match at least once
  • *: match 0 or multiple times
  • {M,N}: minimum M matches and maximum N matches
    • {M,}: match at least M times
    • {0,N}: match at most N times

Greedy vs Lazy

  • .*: match as long as possible
  • .*?: match as short as possible

BRE vs ERE vs PCRE

The only difference between basic and extended regular expressions is in the behavior of a few characters: ?, +, parentheses (()), and braces ({}).

  • basic regular expressions (BRE): should be escaped to behave as special characters
  • extended regular expressions (ERE) : should be escaped to match a literal character.
  • Perl Compatible Regular Expressions (PCRE): much more powerful and flexible than BRE and ERE.

Multiple flavors may be supported by the tools:

  • sed
    • sed: basic
    • sed -E: extended
  • grep
    • grep: basic
    • egrep or grep -E

JavaScript

  • str.search
  • str.match
  • str.matchAll
  • str.replace

Example: split country name and country code in strings like "China (CN)"

> s = "China (CN)";
'China (CN)'
> match = s.match(/\((.*?)\)/)
[ '(CN)', 'CN', index: 6, input: 'China (CN)', groups: undefined ]
> match[1]
'CN'
> s.substring(0, match.index).trim()
'China'

Match all:

const regex = /.*/g;
const matches = content.matchAll(regex);
for (let match of matches) {
  // match[0] is the matched string
  // match[1] is the first capture, etc
}

Literal vs. Constructor

  • Literal: re = /.../g
  • Constructor: re = new RegExp("...")
    • can use string concat: re = new RegExp("..." + some_variable + "...")

Local vs. Global

  • re = /.../: re.match(str) will return a list of captures of the FIRST match.
  • re = /.../g: re.match(str) will return a list of matches but NOT captures.

match vs. exec

  • str.match(): as stated above.
  • regex.exec(): return captures, more detailed info; exec multiple times.

Example

var match;
while ((match = re.exec(str)) !== null) {}