CapturePoint has the ability to incorporate Regular Expressions, or Regex, to increase the accuracy of classifications and index data. This reference guide was designed to aid users looking to further refine classification and index rules. Regular Expressions are not required but are sometimes a helpful addition.

Download a copy of this regular expressions reference sheet: Download

Regular Expressions Table

Regular Expressions Table

Note:  CapturePoint Regular Expressions can be written in multiple ways to achieve the same results.

 

Regex String Result

\d

Matches any digit.

\D

Matches any character except a digit.

\s

Matches any whitespace character.

\S

Matches any character except a whitespace character.

\w

Matches any letter, digit, or underscore.

\W

Matches any character other than a letter, digit, or underscore.

.

Matches any character except for a line break.  This can also be used as a placeholder.

\b

Matches a space that follows before or after a word (Word Boundary)

\B

Matches when there is no space following a word (Word Boundary)

[A-Z]

Matches any Uppercase letter A-Z of any length.

[a-z]

Matches any lowercase letter a-z of any length.

[a-zA-Z]

Matches any Uppercase or lower case letter of any length.

[A-Z0-9]

Matches any Uppercase letter or any number of any length.

^

Matches the start of the string.

[^]

Matches anything that is not written in the square brackets.  For example if the square brackets had [^\s] then this will match anything other than whitespace.

|

Pipe symbol generally means ‘or’ as in boolean and or logic.  So, your code could say find 1 through 3 or 7 through 9 like this.  [1-3 | 7-9]

{1}

This will look for the preceding element the exact number of times as what is written inside the curly brackets  If it shows {1}, then it will grab 1 character.

{2,}

This will look for the preceding element at least 2 times.

{3,8}

This will look for the preceding element a minimum of 3 times and at most 8 times.

?

Matches previous elements zero or one times. This is considered lazy.

??

Matches previous elements zero or one times, but as few as possible.

*

Matches previous elements zero or more times to unlimited number of times.  As it is unlimited, it is considered greedy.

*?

Matches previous elements zero or more times, but as few as possible.

+

Matches previous element one or more times.

+?

Matches previous elements one or more times, but as few as possible.

[ ]

Matches anything that is written in the square brackets.

[ , ]

Matches anything that is written in the square brackets and then a comma inside is used as a separator for multiple elements.

[0-9]

Matches any number with any length of numbers 0-9.

\A

Matches the start of the string except it is unaffected by a multiline option.

\z

Matches the end of the input without exception.

\Z

Matches the end of the string or the point before the final \n at the end of the input.  This is unaffected by a multiline option.

\G

Matches the point that the previous match ended. Used to find contiguous matches.

\f

Form feed

\n

Newline

\r

Carriage Return

\t

Tab

[\b]

Matches a backspace.  It must be enclosed in these square brackets to have this meaning.

$

Matches the end of the string or the point before a final \n at the end of the input.

 

Back To Top

Regex Desired Outcomes

Find Purchase Order, but Not Capture it and then Capture any Numbers, Letters, Dashes, or Dots until the End of String

What I Want Regex String Input Text Result What This Does

Find Purchase Order, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string.

(?<=Purchase\sOrder:\s*)[0-9|A-Z\-\.]+(?=\s|$)

Purchase Order:  98752-854

98752-854

Validate Purchase Order and then capture anything until the end of string including dashes or dots.

 

Back To Top

Find the Exact Format of 3 Uppercase Letters Followed by a Dash and then 3 Numbers and Ignore anything Else in Front of or Behind This

What I Want CapturePoint Regex String Input Text Result What This Does

Finds the exact format of 3 Uppercase letters followed by a dash and then 3 numbers and ignores anything else in front of or behind this.

[A-Z]{3}-\d{3}

Order ABC-123 placed successfully.

ABC-123

Finds the exact format of 3 Uppercase letters followed by a dash and then 3 numbers and ignores anything that preceeded or followed.

 

Back To Top

Captures only Digits and Dashes and Ignores all else and this Will Run One or More Times until the End of the Line

What I Want CapturePoint Regex String Input Text Result What This Does

Captures only digits and dashes and ignores all else and this will run one or more times until the end of the line.

[\d-]+

My Cell # is 555-555-1212.

555-555-1212

Captures only digits and dashes and ignores all else and this will run one or more times until the end of the string.

 

Back To Top

Capture First 4 Digits and then Ignore any Spaces, a Dash, any More Spaces, and then any Text Written after That

What I Want CapturePoint Regex String Input Text Result What This Does

Capture first 4 digits and then ignore any spaces, a dash, any more spaces, and then any text written after that.

\d{4}(?=\s*\-\s*[a-zA-Z]*)

7373 – Location: San Francisco

7373

Captures first 4 numbers and then looks for a specific format of space – space and then text and ignores all of that.

 

Back To Top

Capture Everything Before the Dash and Ignore Anything After That

What I Want CapturePoint Regex String Input Text Result What This Does

Capture everything before the dash and ignore anything after that.

.+(?=\-)

*A12-345678B*

*A12

Captures any character other than a line break until a dash is reached and then ignore anything after that.

 

Back To Top

Find, but Don’t Capture the Asterisk at the Beginning of the Line, then Grab Anything until you Reach the Asterisk at the End of the Line, but Don’t Capture that One Either

What I Want CapturePoint Regex String Input Text Result What This Does

Find, but don’t capture the asterisk at the beginning of the line, then grab anything until you reach the asterisk at the end of the line, but don’t capture that one either.

(?<=\*\s*).+(?=\s*\*)

*A19400-2*

A19400-2

Validate but don’t capture an asterisk at the beginning of the string, then capture any character other than a line break until you reach another asterisk.

 

Back To Top

Find the Word PAGE, page, or Page but not Capture it and then Only Capture the Number 1

What I Want CapturePoint Regex String Input Text Result What This Does

Find the word PAGE, page or Page but don’t capture it and then only capture the number 1.

(?<=PAGE\s*).*[1]+|(?<=page\s*).*[1]+|(?<=Page\s*).*[1]+

Page 1 of 11

1

Find the word PAGE, page or Page but don’t capture it and then only capture the number 1.

 

Back To Top

To Find the 1st Page in a Series of Pages that Look Similar

What I Want CapturePoint Regex String Input Text Result What This Does

To find the 1st page in a series of pages that look similar.

\b1\b

Page 1 of 11

1

This code will look for the beginning word boundary, search until it finds a 1 and then look for ending word boundary.  It will only grab the 1 and nothing else.

 

Back To Top

Remove Anything that isn’t a Number at the Beginning of the Line. Then Grab 6 Numbers and Remove Anything until the End of the Line

What I Want CapturePoint Regex String Input Text Result What This Does

Remove Anything that isn’t a Number at the Beginning of the Line. Then grab 6 numbers and remove anything until the end of the line.

(?<=^|\D+)\d{6}(?=\D+|$)

abc-d1234567efgh

123456

Removes anything that is not a digit at the beginning of the string, then grabs 6 digits and then removes everything after that until the end of the line.

 

Back To Top

Find INVOICE any Number of Spaces Before #, but Not Capture it and then Capture any Numbers, Letters, Dashes, or Dots until the end of String

What I Want CapturePoint Regex String Input Text Result What This Does

Find INVOICE any number of spaces before #, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string.

(?<=INVOICE\s?#\s*)[0-9|a-zA-Z\-\.]+(?=\s|$)

INVOICE # 1A2b3Cc

1A2b3Cc

Validate INVOICE and any number of spaces before #, but don’t capture it and then capture any numbers, letters, dashes, or dots until the end of string.

 

Back To Top

To Only Find Numbers Without a Limit

What I Want Regex String Input Text Result What This Does

To only find numbers without a limit.

\d*

abc123456def

123456

Captures numbers and Ignores anything else and this will happen an unlimited number of times.

 

Back To Top

Capture the 8 numbers after Sale No., or Sale:

What I Want Regex String Input Text Result What This Does

Capture the 8 numbers after Sale No., or Sale:

(?<=Sale\s*No.\s*)\d{8}|(?<=Sale\s*:\s*)\d{8}|(?<=SN:\s*)\d{8}

Sale No. 12345678

12345678

Validate either Sale No., or Sale :, but don’t capture it and then capture the next 8 digits.

 

Back To Top

Capture any Numbers or Letters until the End of the Line after TRACKING#

What I Want Regex String Input Text Result What This Does

Capture any numbers or letters until the end of the line after TRACKING#

(?<=TRACKING\S*\s+)[0-9|A-Z]+(?=\s|$)

TRACKING# Z239943B029238A

Z239943B029238A

Validate TRACKING# and ignores spaces and then captures anything after that.

 

Back To Top

Find SSN : But Not Capture it and then Capture any Numbers and Dashes in the Format of 123-45-6789.

What I Want Regex String Input Text Result What This Does

Find SSN : but don’t capture it and then capture any numbers and dashes in the format of 123-45-6789.

(?<=SSN\s*:\s*)\d{3}\-\d{2}\-\d{4}

SSN : 123-45-6789

123-45-6789

Validate SSN :, but don’t capture it.  Then grab only the numbers and dashes in the proper SSN format of 123-45-6789.

 

Back To Top

Verify that the Data is in this Format MMM D(D) YYYY But Not Change it

What I Want Regex String Input Text Result What This Does

Verify that the data is in this format MMM D(D) YYYY but don’t change it.

[a-zA-Z]{3}\s+\d{1,2},?\s+\d{4}

Feb 14, 2010

Feb 14, 2010

Validates that the date is in this format of MMM DD YYYY but does not change the format.

 

Back To Top

To Have Only a Two Digit Date Format

What I Want Regex String Input Text Result What This Does

To have only a two digit date format.

\d{2}\s*/\s*\d{2}\s*/\s*\d{2}

12/25/11

12/25/11

Only takes date with a two digit format for day, month, and year

 

Back To Top

Start at the Beginning of the Line, Ignore Everything until a Space is Found and Then Capture Anything After That Until the End of the Line.

What I Want Regex String Input Text Result What This Does

Start at the beginning of the line, ignore everything until a space is found and then capture anything after that until the end of the line.

(?<=\s).*

President George Washington

George Washington

Starting at the beginning of the string, ignore everything until a space is found and then capture anything after that until the end of the string.

 

Back To Top

Start at the Back of the Line and then Move Backward to Grab Anything One or More Times Until Hitting a Space

What I Want Regex String Input Text Result What This Does

Start at the back of the line and then move backward to grab anything one or more times until hitting a space.

\S+$

George Washington

Washington

Goes to the end of the string and then moves backward to grab anything one or more times until it hits a space.

 

Back To Top

Start at the Beginning of the Line and Then Moves Forward to Grab Anything One or More Times Until it Hits a Space.

What I Want Regex String Input Text Result What This Does

Start at the beginning of the line and then moves forward to grab anything one or more times until it hits a space.

^\S+

George Washington

George

Starts at the front of the string and then moves forward to grab anything one or more times until it hits a space.

 

Back To Top

To Only Find Letters Without a Limit

What I Want Regex String Input Text Result What This Does

To only find letters without a limit.

\D*

abc123456def

abcdef

Captures numbers and Ignores anything else and this will happen an unlimited number of times.

 

Back To Top

Was this article helpful to you?

Comments are closed.