Polyglot CheatSheet - RegEx

Updated: 2018-06-18
  • .: any character
  • ^: beginning of a line or string
  • $: end of a line or string
  • |: or
  • ?: match 0 or 1 times
  • +: match at least once
  • *: match 0 or multiple times
  • (): group
  • [...]: one of the characters
  • [^...]: anything but the characters listed
  • \w: alphanumeric character plus _, equivalent to [A-Za-z0-9_]
  • \W: non-alphanumeric character excluding _, equivalent to [^A-Za-z0-9_]
  • \s: whitespace
  • \S: anything BUT whitespace
  • \d: digit, equivalent to [0-9]
  • \D: non-digit, equivalent to [^0-9]
  • \A: Matches the beginning of a string (but not an internal line).
  • \z: Matches the end of a string (but not an internal line).
  • \b: word boundary
  • {M,N}: minimum M matches and maximum N matches

    • {M,}: match at least M times
    • {0,N}: match at most N times

Greedy vs Lazy

  • .*: match as long as possible
  • .*?: match as short as possible

Basic vs Extended Regex

The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’).

  • basic regular expressions: should be escaped to behave as special characters

  • extended regular expressions: should be escaped to match a literal character.

  • sed

    • sed: basic
    • sed -r: extended
  • grep

    • grep: basic
    • egrep or grep -E

Regular expressions: PCRE vs. ERE/BRE # Conciseness matters:

\d === [0-9]
\w === word-constituent
\W === non-word-constituent
\b === zero-width word-boundary (like \< and \>)
\s === whitespace
\S === non-whitespace

Get to know regular expressions well (esp. EREs and PCREs) Which of these do you prefer? - /'.?'/ - /'''/ (e.g., write a C-comment-remover: easy with PCRE's .*? )

Javascript

Literal vs. Constructor

  • Literal: re = /.../g
  • Constructor: re = new RegExp("...")

    • can use string concat: re = new RegExp("..." + some_variable + "...")

Local vs. Global

  • re = /.../: re.match(str) will return a list of captures of the FIRST match
  • re = /.../g: re.match(str) will return a list of matches but NOT captures

match vs. exec

  • match: as stated above
  • exec: return captures, exec multiple times

Example

var match;
while ((match = re.exec(str)) !== null) {}

Javascript

  • str.search
  • str.match
  • str.replace

Example: split country name and country code in strings like "China (CN)"

> s = "China (CN)";
'China (CN)'
> match = s.match(/\((.*?)\)/)
[ '(CN)', 'CN', index: 6, input: 'China (CN)', groups: undefined ]
> match[1]
'CN'
> s.substring(0, match.index).trim()
'China'