ReGuLaR ExPre$$iOnS

How to use regular expressions in R


This is a quick reference guide on how to utilize `regular expressions` in R.

Metacharacters

Metacharacter Name Matches
. dot any one character
[...] character class any character listed
[^...] negated character class any character not listed
^ caret the position at the start of the line
$ dollar the position at the end of the line
\< backslash less-than the position at the start of a word
\> backslash greater-than the positionat the end of a word
| or, bar matches either expression it separates
(...) parentheses used to limit the scope of | , plus additiona uses
\\s space removes whitespaces | using \\s+ removes one or more white spaces

Metacharacters and repetition

Metacharacter Minimum Required Maximum to Try Meaning
? none 1 one allowed; non required ("one optional")
* none no limit unlimited allowed; non required ("any amount OK")
+ 1 no limit unlimited allowed; one required ("at least one")

Position Metacharacters

Metacharacter Name Matches
^ caret Matches the position at the start of the line
$ dollar Matches the position at the end of the line
\< word boundary: beginning of word Matches the position at the start of a word
\> word boundary: end of word Matches the position at the end of a word

Other Metacharacters

Metacharacter Name Matches
| alternation (bar): e.g. either or Matches either expression it seperates
(...) parentheses Oimits scop of alternation, provides grouping for the quantifiers, and "captures" for backreferences
\1,\2,... backreference Matches text previously matched within the first, second, etc., set of parentheses

Metacharacters and repetition

Metacharacter Description Meaning
\t tab a tab character
\n newline a newline character
\r carriage-return a carriage-return character
\s whitespace matches any "whitespace" character (space, tab, newline, formfeed, and such)
\S not a whitespace \s matches anything that is not a whitespace
\w [a-zA-Z0-9_] useful as in \w+ to ostensibly match a word
\W anything not [a-zA-Z0-9_] anything that is not a word or numeric character
\d [0-9] i.e., a digit
\D anything not \d i.e., [^0-9]

Lookaround

Type Regex Successful if the enclosed subexpression...
Positive Lookbehind (?<=.....) successful if can match to the left
Negative Lookbehind (? successful if can not match to the left
Positive Lookahead (?<=.....) successful if can match to the right
Negative Lookahead (? successful if can not match to the right