CapturePoint has the ability to incorporate Regular Expressions, or Regex, to increase the accuracy of classifications and index data. This reference guide was designed to aid users looking to further refine classification and index rules. Regular Expressions are not required but are sometimes a helpful addition.
Download a copy of this regular expressions reference sheet: Download
Regular Expressions Table
Regular Expressions Table
Note: CapturePoint Regular Expressions can be written in multiple ways to achieve the same results.
Regex String | Result |
\d |
Matches any digit. |
\D |
Matches any character except a digit. |
\s |
Matches any whitespace character. |
\S |
Matches any character except a whitespace character. |
\w |
Matches any letter, digit, or underscore. |
\W |
Matches any character other than a letter, digit, or underscore. |
. |
Matches any character except for a line break. This can also be used as a placeholder. |
\b |
Matches a space that follows before or after a word (Word Boundary) |
\B |
Matches when there is no space following a word (Word Boundary) |
[A-Z] |
Matches any Uppercase letter A-Z of any length. |
[a-z] |
Matches any lowercase letter a-z of any length. |
[a-zA-Z] |
Matches any Uppercase or lower case letter of any length. |
[A-Z0-9] |
Matches any Uppercase letter or any number of any length. |
^ |
Matches the start of the string. |
[^] |
Matches anything that is not written in the square brackets. For example if the square brackets had [^\s] then this will match anything other than whitespace. |
| |
Pipe symbol generally means ‘or’ as in boolean and or logic. So, your code could say find 1 through 3 or 7 through 9 like this. [1-3 | 7-9] |
{1} |
This will look for the preceding element the exact number of times as what is written inside the curly brackets If it shows {1}, then it will grab 1 character. |
{2,} |
This will look for the preceding element at least 2 times. |
{3,8} |
This will look for the preceding element a minimum of 3 times and at most 8 times. |
? |
Matches previous elements zero or one times. This is considered lazy. |
?? |
Matches previous elements zero or one times, but as few as possible. |
* |
Matches previous elements zero or more times to unlimited number of times. As it is unlimited, it is considered greedy. |
*? |
Matches previous elements zero or more times, but as few as possible. |
+ |
Matches previous element one or more times. |
+? |
Matches previous elements one or more times, but as few as possible. |
[ ] |
Matches anything that is written in the square brackets. |
[ , ] |
Matches anything that is written in the square brackets and then a comma inside is used as a separator for multiple elements. |
[0-9] |
Matches any number with any length of numbers 0-9. |
\A |
Matches the start of the string except it is unaffected by a multiline option. |
\z |
Matches the end of the input without exception. |
\Z |
Matches the end of the string or the point before the final \n at the end of the input. This is unaffected by a multiline option. |
\G |
Matches the point that the previous match ended. Used to find contiguous matches. |
\f |
Form feed |
\n |
Newline |
\r |
Carriage Return |
\t |
Tab |
[\b] |
Matches a backspace. It must be enclosed in these square brackets to have this meaning. |
$ |
Matches the end of the string or the point before a final \n at the end of the input. |
Regex Desired Outcomes
Find Purchase Order, but Not Capture it and then Capture any Numbers, Letters, Dashes, or Dots until the End of String
What I Want | Regex String | Input Text | Result | What This Does |
Find Purchase Order, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string. |
(?<=Purchase\sOrder:\s*)[0-9|A-Z\-\.]+(?=\s|$) |
Purchase Order: 98752-854 |
98752-854 |
Validate Purchase Order and then capture anything until the end of string including dashes or dots. |
Find the Exact Format of 3 Uppercase Letters Followed by a Dash and then 3 Numbers and Ignore anything Else in Front of or Behind This
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
Finds the exact format of 3 Uppercase letters followed by a dash and then 3 numbers and ignores anything else in front of or behind this. |
[A-Z]{3}-\d{3} |
Order ABC-123 placed successfully. |
ABC-123 |
Finds the exact format of 3 Uppercase letters followed by a dash and then 3 numbers and ignores anything that preceeded or followed. |
Captures only Digits and Dashes and Ignores all else and this Will Run One or More Times until the End of the Line
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
Captures only digits and dashes and ignores all else and this will run one or more times until the end of the line. |
[\d-]+ |
My Cell # is 555-555-1212. |
555-555-1212 |
Captures only digits and dashes and ignores all else and this will run one or more times until the end of the string. |
Capture First 4 Digits and then Ignore any Spaces, a Dash, any More Spaces, and then any Text Written after That
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
Capture first 4 digits and then ignore any spaces, a dash, any more spaces, and then any text written after that. |
\d{4}(?=\s*\-\s*[a-zA-Z]*) |
7373 – Location: San Francisco |
7373 |
Captures first 4 numbers and then looks for a specific format of space – space and then text and ignores all of that. |
Capture Everything Before the Dash and Ignore Anything After That
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
Capture everything before the dash and ignore anything after that. |
.+(?=\-) |
*A12-345678B* |
*A12 |
Captures any character other than a line break until a dash is reached and then ignore anything after that. |
Find, but Don’t Capture the Asterisk at the Beginning of the Line, then Grab Anything until you Reach the Asterisk at the End of the Line, but Don’t Capture that One Either
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
Find, but don’t capture the asterisk at the beginning of the line, then grab anything until you reach the asterisk at the end of the line, but don’t capture that one either. |
(?<=\*\s*).+(?=\s*\*) |
*A19400-2* |
A19400-2 |
Validate but don’t capture an asterisk at the beginning of the string, then capture any character other than a line break until you reach another asterisk. |
Find the Word PAGE, page, or Page but not Capture it and then Only Capture the Number 1
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
Find the word PAGE, page or Page but don’t capture it and then only capture the number 1. |
(?<=PAGE\s*).*[1]+|(?<=page\s*).*[1]+|(?<=Page\s*).*[1]+ |
Page 1 of 11 |
1 |
Find the word PAGE, page or Page but don’t capture it and then only capture the number 1. |
To Find the 1st Page in a Series of Pages that Look Similar
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
To find the 1st page in a series of pages that look similar. |
\b1\b |
Page 1 of 11 |
1 |
This code will look for the beginning word boundary, search until it finds a 1 and then look for ending word boundary. It will only grab the 1 and nothing else. |
Remove Anything that isn’t a Number at the Beginning of the Line. Then Grab 6 Numbers and Remove Anything until the End of the Line
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
Remove Anything that isn’t a Number at the Beginning of the Line. Then grab 6 numbers and remove anything until the end of the line. |
(?<=^|\D+)\d{6}(?=\D+|$) |
abc-d1234567efgh |
123456 |
Removes anything that is not a digit at the beginning of the string, then grabs 6 digits and then removes everything after that until the end of the line. |
Find INVOICE any Number of Spaces Before #, but Not Capture it and then Capture any Numbers, Letters, Dashes, or Dots until the end of String
What I Want | CapturePoint Regex String | Input Text | Result | What This Does |
Find INVOICE any number of spaces before #, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string. |
(?<=INVOICE\s?#\s*)[0-9|a-zA-Z\-\.]+(?=\s|$) |
INVOICE # 1A2b3Cc |
1A2b3Cc |
Validate INVOICE and any number of spaces before #, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string. |
To Only Find Numbers Without a Limit
What I Want | Regex String | Input Text | Result | What This Does |
To only find numbers without a limit. |
\d* |
abc123456def |
123456 |
Captures numbers and Ignores anything else and this will happen an unlimited number of times. |
Capture the 8 numbers after Sale No., or Sale:
What I Want | Regex String | Input Text | Result | What This Does |
Capture the 8 numbers after Sale No., or Sale: |
(?<=Sale\s*No.\s*)\d{8}|(?<=Sale\s*:\s*)\d{8}|(?<=SN:\s*)\d{8} |
Sale No. 12345678 |
12345678 |
Validate either Sale No., or Sale :, but don’t capture it and then capture the next 8 digits. |
Capture any Numbers or Letters until the End of the Line after TRACKING#
What I Want | Regex String | Input Text | Result | What This Does |
Capture any numbers or letters until the end of the line after TRACKING# |
(?<=TRACKING\S*\s+)[0-9|A-Z]+(?=\s|$) |
TRACKING# Z239943B029238A |
Z239943B029238A |
Validate TRACKING# and ignores spaces and then captures anything after that. |
Find SSN : But Not Capture it and then Capture any Numbers and Dashes in the Format of 123-45-6789.
What I Want | Regex String | Input Text | Result | What This Does |
Find SSN : but don’t capture it and then capture any numbers and dashes in the format of 123-45-6789. |
(?<=SSN\s*:\s*)\d{3}\-\d{2}\-\d{4} |
SSN : 123-45-6789 |
123-45-6789 |
Validate SSN :, but don’t capture it. Then grab only the numbers and dashes in the proper SSN format of 123-45-6789. |
Verify that the Data is in this Format MMM D(D) YYYY But Not Change it
What I Want | Regex String | Input Text | Result | What This Does |
Verify that the data is in this format MMM D(D) YYYY but don’t change it. |
[a-zA-Z]{3}\s+\d{1,2},?\s+\d{4} |
Feb 14, 2010 |
Feb 14, 2010 |
Validates that the date is in this format of MMM DD YYYY but does not change the format. |
To Have Only a Two Digit Date Format
What I Want | Regex String | Input Text | Result | What This Does |
To have only a two digit date format. |
\d{2}\s*/\s*\d{2}\s*/\s*\d{2} |
12/25/11 |
12/25/11 |
Only takes date with a two digit format for day, month, and year |
Start at the Beginning of the Line, Ignore Everything until a Space is Found and Then Capture Anything After That Until the End of the Line.
What I Want | Regex String | Input Text | Result | What This Does |
Start at the beginning of the line, ignore everything until a space is found and then capture anything after that until the end of the line. |
(?<=\s).* |
President George Washington |
George Washington |
Starting at the beginning of the string, ignore everything until a space is found and then capture anything after that until the end of the string. |
Start at the Back of the Line and then Move Backward to Grab Anything One or More Times Until Hitting a Space
What I Want | Regex String | Input Text | Result | What This Does |
Start at the back of the line and then move backward to grab anything one or more times until hitting a space. |
\S+$ |
George Washington |
Washington |
Goes to the end of the string and then moves backward to grab anything one or more times until it hits a space. |
Start at the Beginning of the Line and Then Moves Forward to Grab Anything One or More Times Until it Hits a Space.
What I Want | Regex String | Input Text | Result | What This Does |
Start at the beginning of the line and then moves forward to grab anything one or more times until it hits a space. |
^\S+ |
George Washington |
George |
Starts at the front of the string and then moves forward to grab anything one or more times until it hits a space. |
To Only Find Letters Without a Limit
What I Want | Regex String | Input Text | Result | What This Does |
To only find letters without a limit. |
\D* |
abc123456def |
abcdef |
Captures numbers and Ignores anything else and this will happen an unlimited number of times. |