This is a quick reference guide on how to utilize `regular expressions` in R.
| Metacharacter | Name | Matches |
|---|---|---|
| . | dot | any one character |
| [...] | character class | any character listed |
| [^...] | negated character class | any character not listed |
| ^ | caret | the position at the start of the line |
| $ | dollar | the position at the end of the line |
| \< | backslash less-than | the position at the start of a word |
| \> | backslash greater-than | the positionat the end of a word |
| | | or, bar | matches either expression it separates |
| (...) | parentheses | used to limit the scope of | , plus additiona uses |
| \\s | space | removes whitespaces | using \\s+ removes one or more white spaces |
| Metacharacter | Minimum Required | Maximum to Try | Meaning |
|---|---|---|---|
| ? | none | 1 | one allowed; non required ("one optional") |
| * | none | no limit | unlimited allowed; non required ("any amount OK") |
| + | 1 | no limit | unlimited allowed; one required ("at least one") |
| Metacharacter | Name | Matches |
|---|---|---|
| ^ | caret | Matches the position at the start of the line |
| $ | dollar | Matches the position at the end of the line |
| \< | word boundary: beginning of word | Matches the position at the start of a word |
| \> | word boundary: end of word | Matches the position at the end of a word |
| Metacharacter | Name | Matches |
|---|---|---|
| | | alternation (bar): e.g. either or | Matches either expression it seperates |
| (...) | parentheses | Oimits scop of alternation, provides grouping for the quantifiers, and "captures" for backreferences |
| \1,\2,... | backreference | Matches text previously matched within the first, second, etc., set of parentheses |
| Metacharacter | Description | Meaning |
|---|---|---|
| \t | tab | a tab character |
| \n | newline | a newline character |
| \r | carriage-return | a carriage-return character |
| \s | whitespace | matches any "whitespace" character (space, tab, newline, formfeed, and such) |
| \S | not a whitespace \s | matches anything that is not a whitespace |
| \w | [a-zA-Z0-9_] | useful as in \w+ to ostensibly match a word |
| \W | anything not [a-zA-Z0-9_] | anything that is not a word or numeric character |
| \d | [0-9] | i.e., a digit |
| \D | anything not \d | i.e., [^0-9] |
| Type | Regex | Successful if the enclosed subexpression... |
|---|---|---|
| Positive Lookbehind | (?<=.....) | successful if can match to the left |
| Negative Lookbehind | (? | successful if can not match to the left |
| Positive Lookahead | (?<=.....) | successful if can match to the right | Negative Lookahead | (? | successful if can not match to the right |