Gonzalez Analytics

ReGuLaR ExPre$$iOnS

How to use regular expressions in R

This is a quick reference guide on how to utilize `regular expressions` in R.

Metacharacters

Metacharacter	Name	Matches
.	dot	any one character
[...]	character class	any character listed
[^...]	negated character class	any character not listed
^	caret	the position at the start of the line
$	dollar	the position at the end of the line
\<	backslash less-than	the position at the start of a word
\>	backslash greater-than	the positionat the end of a word
\|	or, bar	matches either expression it separates
(...)	parentheses	used to limit the scope of \| , plus additiona uses
\\s	space	removes whitespaces \| using \\s+ removes one or more white spaces

Metacharacters and repetition

Metacharacter	Minimum Required	Maximum to Try	Meaning
?	none	1	one allowed; non required ("one optional")
*	none	no limit	unlimited allowed; non required ("any amount OK")
+	1	no limit	unlimited allowed; one required ("at least one")

Position Metacharacters

Metacharacter	Name	Matches
^	caret	Matches the position at the start of the line
$	dollar	Matches the position at the end of the line
\<	word boundary: beginning of word	Matches the position at the start of a word
\>	word boundary: end of word	Matches the position at the end of a word

Other Metacharacters

Metacharacter	Name	Matches
\|	alternation (bar): e.g. either or	Matches either expression it seperates
(...)	parentheses	Oimits scop of alternation, provides grouping for the quantifiers, and "captures" for backreferences
\1,\2,...	backreference	Matches text previously matched within the first, second, etc., set of parentheses

Metacharacters and repetition

Metacharacter	Description	Meaning
\t	tab	a tab character
\n	newline	a newline character
\r	carriage-return	a carriage-return character
\s	whitespace	matches any "whitespace" character (space, tab, newline, formfeed, and such)
\S	not a whitespace \s	matches anything that is not a whitespace
\w	[a-zA-Z0-9_]	useful as in \w+ to ostensibly match a word
\W	anything not [a-zA-Z0-9_]	anything that is not a word or numeric character
\d	[0-9]	i.e., a digit
\D	anything not \d	i.e., [^0-9]

Lookaround

Type	Regex	Successful if the enclosed subexpression...
Positive Lookbehind	(?<=.....)	successful if can match to the left
Negative Lookbehind	(?	successful if can not match to the left
Positive Lookahead	(?<=.....)	successful if can match to the right
Negative Lookahead	(?	successful if can not match to the right

$10 Digital Ocean Credit