The Regex or Regular Expressions in CapturePoint Guide

CapturePoint has the ability to incorporate Regular Expressions, or Regex, to increase the accuracy of classifications and index data. This reference guide was designed to aid users looking to further refine classification and index rules. Regular Expressions are not required but are sometimes a helpful addition.

Download a copy of this regular expressions reference sheet: Download

Regular Expressions Table

Note: CapturePoint Regular Expressions can be written in multiple ways to achieve the same results.

Regex String	Result
\d	Matches any digit.
\D	Matches any character except a digit.
\s	Matches any whitespace character.
\S	Matches any character except a whitespace character.
\w	Matches any letter, digit, or underscore.
\W	Matches any character other than a letter, digit, or underscore.
.	Matches any character except for a line break. This can also be used as a placeholder.
\b	Matches a space that follows before or after a word (Word Boundary)
\B	Matches when there is no space following a word (Word Boundary)
[A-Z]	Matches any Uppercase letter A-Z of any length.
[a-z]	Matches any lowercase letter a-z of any length.
[a-zA-Z]	Matches any Uppercase or lower case letter of any length.
[A-Z0-9]	Matches any Uppercase letter or any number of any length.
^	Matches the start of the string.
[^]	Matches anything that is not written in the square brackets. For example if the square brackets had [^\s] then this will match anything other than whitespace.
\|	Pipe symbol generally means ‘or’ as in boolean and or logic. So, your code could say find 1 through 3 or 7 through 9 like this. [1-3 \| 7-9]
{1}	This will look for the preceding element the exact number of times as what is written inside the curly brackets If it shows {1}, then it will grab 1 character.
{2,}	This will look for the preceding element at least 2 times.
{3,8}	This will look for the preceding element a minimum of 3 times and at most 8 times.
?	Matches previous elements zero or one times. This is considered lazy.
??	Matches previous elements zero or one times, but as few as possible.
*	Matches previous elements zero or more times to unlimited number of times. As it is unlimited, it is considered greedy.
*?	Matches previous elements zero or more times, but as few as possible.
+	Matches previous element one or more times.
+?	Matches previous elements one or more times, but as few as possible.
[ ]	Matches anything that is written in the square brackets.
[ , ]	Matches anything that is written in the square brackets and then a comma inside is used as a separator for multiple elements.
[0-9]	Matches any number with any length of numbers 0-9.
\A	Matches the start of the string except it is unaffected by a multiline option.
\z	Matches the end of the input without exception.
\Z	Matches the end of the string or the point before the final \n at the end of the input. This is unaffected by a multiline option.
\G	Matches the point that the previous match ended. Used to find contiguous matches.
\f	Form feed
\n	Newline
\r	Carriage Return
\t	Tab
[\b]	Matches a backspace. It must be enclosed in these square brackets to have this meaning.
$	Matches the end of the string or the point before a final \n at the end of the input.

Regex Desired Outcomes

Find Purchase Order, but Not Capture it and then Capture any Numbers, Letters, Dashes, or Dots until the End of String

What I Want	Regex String	Input Text	Result	What This Does
Find Purchase Order, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string.	(?<=Purchase\sOrder:\s*)[0-9\|A-Z\-\.]+(?=\s\|$)	Purchase Order: 98752-854	98752-854	Validate Purchase Order and then capture anything until the end of string including dashes or dots.

Find the Exact Format of 3 Uppercase Letters Followed by a Dash and then 3 Numbers and Ignore anything Else in Front of or Behind This

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
Finds the exact format of 3 Uppercase letters followed by a dash and then 3 numbers and ignores anything else in front of or behind this.	[A-Z]{3}-\d{3}	Order ABC-123 placed successfully.	ABC-123	Finds the exact format of 3 Uppercase letters followed by a dash and then 3 numbers and ignores anything that preceeded or followed.

Captures only Digits and Dashes and Ignores all else and this Will Run One or More Times until the End of the Line

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
Captures only digits and dashes and ignores all else and this will run one or more times until the end of the line.	[\d-]+	My Cell # is 555-555-1212.	555-555-1212	Captures only digits and dashes and ignores all else and this will run one or more times until the end of the string.

Capture First 4 Digits and then Ignore any Spaces, a Dash, any More Spaces, and then any Text Written after That

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
Capture first 4 digits and then ignore any spaces, a dash, any more spaces, and then any text written after that.	\d{4}(?=\s\-\s[a-zA-Z]*)	7373 – Location: San Francisco	7373	Captures first 4 numbers and then looks for a specific format of space – space and then text and ignores all of that.

Capture Everything Before the Dash and Ignore Anything After That

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
Capture everything before the dash and ignore anything after that.	.+(?=\-)	A12-345678B	*A12	Captures any character other than a line break until a dash is reached and then ignore anything after that.

Find, but Don’t Capture the Asterisk at the Beginning of the Line, then Grab Anything until you Reach the Asterisk at the End of the Line, but Don’t Capture that One Either

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
Find, but don’t capture the asterisk at the beginning of the line, then grab anything until you reach the asterisk at the end of the line, but don’t capture that one either.	(?<=\\s).+(?=\s\)	A19400-2	A19400-2	Validate but don’t capture an asterisk at the beginning of the string, then capture any character other than a line break until you reach another asterisk.

Find the Word PAGE, page, or Page but not Capture it and then Only Capture the Number 1

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
Find the word PAGE, page or Page but don’t capture it and then only capture the number 1.	(?<=PAGE\s).[1]+\|(?<=page\s).[1]+\|(?<=Page\s).[1]+	Page 1 of 11	1	Find the word PAGE, page or Page but don’t capture it and then only capture the number 1.

To Find the 1st Page in a Series of Pages that Look Similar

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
To find the 1st page in a series of pages that look similar.	\b1\b	Page 1 of 11	1	This code will look for the beginning word boundary, search until it finds a 1 and then look for ending word boundary. It will only grab the 1 and nothing else.

Remove Anything that isn’t a Number at the Beginning of the Line. Then Grab 6 Numbers and Remove Anything until the End of the Line

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
Remove Anything that isn’t a Number at the Beginning of the Line. Then grab 6 numbers and remove anything until the end of the line.	(?<=^\|\D+)\d{6}(?=\D+\|$)	abc-d1234567efgh	123456	Removes anything that is not a digit at the beginning of the string, then grabs 6 digits and then removes everything after that until the end of the line.

Find INVOICE any Number of Spaces Before #, but Not Capture it and then Capture any Numbers, Letters, Dashes, or Dots until the end of String

What I Want	CapturePoint Regex String	Input Text	Result	What This Does
Find INVOICE any number of spaces before #, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string.	(?<=INVOICE\s?#\s*)[0-9\|a-zA-Z\-\.]+(?=\s\|$)	INVOICE # 1A2b3Cc	1A2b3Cc	Validate INVOICE and any number of spaces before #, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string.

To Only Find Numbers Without a Limit

What I Want	Regex String	Input Text	Result	What This Does
To only find numbers without a limit.	\d*	abc123456def	123456	Captures numbers and Ignores anything else and this will happen an unlimited number of times.

Capture the 8 numbers after Sale No., or Sale:

What I Want	Regex String	Input Text	Result	What This Does
Capture the 8 numbers after Sale No., or Sale:	(?<=Sale\sNo.\s)\d{8}\|(?<=Sale\s:\s)\d{8}\|(?<=SN:\s*)\d{8}	Sale No. 12345678	12345678	Validate either Sale No., or Sale :, but don’t capture it and then capture the next 8 digits.

Capture any Numbers or Letters until the End of the Line after TRACKING#

What I Want	Regex String	Input Text	Result	What This Does
Capture any numbers or letters until the end of the line after TRACKING#	(?<=TRACKING\S*\s+)[0-9\|A-Z]+(?=\s\|$)	TRACKING# Z239943B029238A	Z239943B029238A	Validate TRACKING# and ignores spaces and then captures anything after that.

Find SSN : But Not Capture it and then Capture any Numbers and Dashes in the Format of 123-45-6789.

What I Want	Regex String	Input Text	Result	What This Does
Find SSN : but don’t capture it and then capture any numbers and dashes in the format of 123-45-6789.	(?<=SSN\s:\s)\d{3}\-\d{2}\-\d{4}	SSN : 123-45-6789	123-45-6789	Validate SSN :, but don’t capture it. Then grab only the numbers and dashes in the proper SSN format of 123-45-6789.

Verify that the Data is in this Format MMM D(D) YYYY But Not Change it

What I Want	Regex String	Input Text	Result	What This Does
Verify that the data is in this format MMM D(D) YYYY but don’t change it.	[a-zA-Z]{3}\s+\d{1,2},?\s+\d{4}	Feb 14, 2010	Feb 14, 2010	Validates that the date is in this format of MMM DD YYYY but does not change the format.

To Have Only a Two Digit Date Format

What I Want	Regex String	Input Text	Result	What This Does
To have only a two digit date format.	\d{2}\s/\s\d{2}\s/\s\d{2}	12/25/11	12/25/11	Only takes date with a two digit format for day, month, and year

Start at the Beginning of the Line, Ignore Everything until a Space is Found and Then Capture Anything After That Until the End of the Line.

What I Want	Regex String	Input Text	Result	What This Does
Start at the beginning of the line, ignore everything until a space is found and then capture anything after that until the end of the line.	(?<=\s).*	President George Washington	George Washington	Starting at the beginning of the string, ignore everything until a space is found and then capture anything after that until the end of the string.

Start at the Back of the Line and then Move Backward to Grab Anything One or More Times Until Hitting a Space

What I Want	Regex String	Input Text	Result	What This Does
Start at the back of the line and then move backward to grab anything one or more times until hitting a space.	\S+$	George Washington	Washington	Goes to the end of the string and then moves backward to grab anything one or more times until it hits a space.

Start at the Beginning of the Line and Then Moves Forward to Grab Anything One or More Times Until it Hits a Space.

What I Want	Regex String	Input Text	Result	What This Does
Start at the beginning of the line and then moves forward to grab anything one or more times until it hits a space.	^\S+	George Washington	George	Starts at the front of the string and then moves forward to grab anything one or more times until it hits a space.

To Only Find Letters Without a Limit

What I Want	Regex String	Input Text	Result	What This Does
To only find letters without a limit.	\D*	abc123456def	abcdef	Captures numbers and Ignores anything else and this will happen an unlimited number of times.

Regex Helper for CapturePoint

Live Chat

E-mail Support

Phone Support

Support Tickets

Regular Expressions Table

Regular Expressions Table

Regex Desired Outcomes

Find Purchase Order, but Not Capture it and then Capture any Numbers, Letters, Dashes, or Dots until the End of String

Find the Exact Format of 3 Uppercase Letters Followed by a Dash and then 3 Numbers and Ignore anything Else in Front of or Behind This

Captures only Digits and Dashes and Ignores all else and this Will Run One or More Times until the End of the Line

Capture First 4 Digits and then Ignore any Spaces, a Dash, any More Spaces, and then any Text Written after That

Capture Everything Before the Dash and Ignore Anything After That

Find, but Don’t Capture the Asterisk at the Beginning of the Line, then Grab Anything until you Reach the Asterisk at the End of the Line, but Don’t Capture that One Either

Find the Word PAGE, page, or Page but not Capture it and then Only Capture the Number 1

To Find the 1st Page in a Series of Pages that Look Similar

Remove Anything that isn’t a Number at the Beginning of the Line. Then Grab 6 Numbers and Remove Anything until the End of the Line

Find INVOICE any Number of Spaces Before #, but Not Capture it and then Capture any Numbers, Letters, Dashes, or Dots until the end of String

To Only Find Numbers Without a Limit

Capture the 8 numbers after Sale No., or Sale:

Capture any Numbers or Letters until the End of the Line after TRACKING#

Find SSN : But Not Capture it and then Capture any Numbers and Dashes in the Format of 123-45-6789.

Verify that the Data is in this Format MMM D(D) YYYY But Not Change it

To Have Only a Two Digit Date Format

Start at the Beginning of the Line, Ignore Everything until a Space is Found and Then Capture Anything After That Until the End of the Line.

Start at the Back of the Line and then Move Backward to Grab Anything One or More Times Until Hitting a Space

Start at the Beginning of the Line and Then Moves Forward to Grab Anything One or More Times Until it Hits a Space.

To Only Find Letters Without a Limit

Was this article helpful to you?