by Svetlana Cheusheva, updated on
In this tutorial, we'll have an in-depth look at how to use regex to match strings in Excel.
When you need to find a certain value in a range of cells, you'd use the MATCH or XMATCH function. When looking for a specific string in a cell, the FIND and SEARCH functions come in handy. And how do you know if a cell contains information that matches a given pattern? Obviously, by using regular expressions. But out of the box Excel does not support regexes! No worries, we'll force it to :)
As it's pretty clear from the heading, in order to use regular expressions in Excel, you need to create your own function. Luckily, Excel's VBA has an inbuilt RegExp object, which you can use in your code like shown below:
Paste the code in the VBA editor, and your new RegExpMatch function is ready for use. If you are not very experienced with VBA, this guide may be helpful: How to insert VBA code in Excel.
Note. After inserting the code, remember to save your file as a macro-enabled workbook (.xlsm).
The RegExpMatch function checks whether any part of the source string matches a regular expression. The result is a Boolean value: TRUE if at least one match is found, FALSE otherwise.
Our custom function has 3 arguments - the first two are required and the last one is optional:
Where:
The function works in all versions of Excel 365, Excel 2021, Excel 2019, Excel 2016, Excel 2013 and Excel 2010.
Before we get to practical calculations, please take notice of the following points that clarify some technicalities:
Below, you'll find a few regex match examples that were created for demonstration purposes. We cannot guarantee that our patterns will work faultlessly with a wider range of input data in your real worksheets. Before putting in production, be sure to test and adjust our samples patterns according to your needs.
When all the strings you want to match have the same pattern, regular expressions are an ideal solution.
Supposing you have a range of cells (A5:A9) containing various details about some items. You wish to know which cells have SKUs. Assuming that each SKU consists of 2 capital letters, a hyphen, and 3 digits, you can match them using the following expression.
Pattern: \b[A-Z]{2}-\d{3}\b
Where [A-Z]{2} means any 2 uppercase letters from A to Z and \d{3} means any 3 digits from 0 to 9. The \b character denotes a word boundary, meaning an SKU is a separate word, and not part of a bigger string such as 23-MAR-2022.
With the pattern established, we can move on to writing a formula. Essentially, using a custom function is no different from a native one. As soon as you start typing a formula, the function's name will appear in the list suggested by Excel's AutoComplete. However, there are a couple of nuances in Dynamic Array Excel (Microsoft 365 and Excel 2021) and traditional Excel (2019 and older versions).
To match a string in a single cell, refer to that cell in the first argument. The second argument is supposed to contain a regular expression.
=RegExpMatch(A5, "\b[A-Z]{2}-\d{3}\b")
The pattern can also be kept in a predefined cell, which is locked with an absolute reference ($A$2):
=RegExpMatch(A5, $A$2)
After entering the formula in the first cell, you can drag it down to all other rows.
This method works beautifully in all Excel versions.
To match multiple strings with a single formula, include a range reference in the first argument:
=RegExpMatch(A5:A9, "\b[A-Z]{2}-\d{3}\b")
In Excel 365 and Excel 2021 that support dynamic arrays, it works this way - you type the formula in the first cell, hit Enter, and the formula automatically spills into the below cells.
In Excel 2019 and earlier, it only works as a traditional CSE array formula, which is entered in a range of cells and completed by pressing the Ctrl + Shift + Enter keys together.
To match any single digit from 0 to 9, use the \d character in the regex. Depending on your particular task, add a suitable quantifier or create a more complex pattern.
To match any number of any length, put the + quantifier right after the /d character, which says to look for numbers containing 1 or more digits.
Pattern: \d+
=RegExpMatch(A5:A9, "\d+")
If your goal is to match numeric values containing a certain number of digits, then use \d together with an appropriate quantifier.
For example, to match invoice numbers consisting of exactly 7 digits, you'd use \d{7}. However, please keep in mind that it will match 7 digits anywhere in the string including a 10-digit or 100-digit number. If this is not what you are looking for, put the word boundary \b on both sides.
Pattern: \b\d{7}\b
=RegExpMatch(A5:A9, "\b\d{7}\b")
Since phone numbers can be written in various formats, matching them requires a more sophisticated regular expression.
In the below dataset, we will be searching for 10-digit numbers that have 3 digits in the first 2 groups and 4 digits in the last group. The groups can be separated with a period, hyphen or space. The first group may or may not be enclosed in parentheses.
Pattern: (\(\d{3}\)|\d{3})[-\.\s]?\d{3}[-\.\s]?\d{4}\b
Breaking down this regular expression, here's what we get:
With the original string in A5 and the regular expression in A2, the formula takes this form:
=RegExpMatch(A5, $A$2)
… and works exactly as expected:
Notes:
To find strings that do NOT contain a certain character, you can use negated character classes [^ ] that match anything NOT in brackets. For example:
In a list of phone numbers, suppose you want to find those that do not have a country code. Keeping in mind that any international code includes the + sign, you can use the [^\+] character class to find strings that do not contain a plus sign. It is important to realize that the above expression matches any single character that is not +. Because a phone number can be anywhere in a string, not necessarily in the very beginning, the * quantifier is added to check each subsequent character. The start ^ and end $ anchors ensure that the entire string is processed. As the result, we get the below regular expression that says "don't match the + character in any position in the string".
Pattern: ^[^\+]*$
=RegExpMatch(A5, "^[^\+]*$")
Though there is no special regular expression syntax for not matching a specific string, you can emulate this behavior by using a negative lookahead.
Supposing you wish to find strings that do not contain the word "lemons". This regular expression will work a treat:
Pattern: ^((?!lemons).)*$
Obviously, some explanation is needed here. The negative lookahead (?!lemons) looks to the right to see if there's no word "lemons" ahead. If "lemons" is not there, then the dot matches any character except a line break. The above expression performs just one check, and the * quantifier repeats it zero or more times, from the start of the string anchored by ^ to the end of the string anchored by $.
To ignore the text case, we set the 3th argument to FALSE to make our function case-insensitive:
=RegExpMatch(A5, $A$2, FALSE)
Tips and notes:
In classic regular expressions, there is a special pattern for case-insensitive matching (?i), which is not supported in VBA RegExp. To overcome this limitation, our custom function accepts the 3rd optional argument named match_case. To do case-insensitive matching, simply set it to FALSE.
Let's say you wish to identify dates such as 1-Mar-22 or 01-MAR-2022. To match the dd-mmm-yyyy and d-mmm-yy formats, we are using the following regular expression.
Pattern: \b\d{1,2}-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-(\d{4}|\d{2})\b
Our expression searches for a group of 1 or 2 digits, followed by a hyphen, followed by any of the month abbreviations separated by | which enables the OR logic, followed by a group of 4 or 2 digits.
Why not use a simpler pattern such as \d{1,2}-[A-Za-z]{3}-\d{2,4}\b? To prevent false positive matches like 01-ABC-2020.
Enter the pattern in A2, and you'll get the following formula:
=RegExpMatch(A5, $A$2, FALSE)
As generally known, an email address consists of 4 parts: username, @ symbol, domain name (mail server) and top-level domain (such as .com, .edu, .org, etc.). To check the email address validity, we'll need to replicate the above structure using regular expressions.
Pattern: \b[\w\.\-]+@[A-Za-z0-9]+[A-Za-z0-9\.\-]*[A-Za-z0-9]+\.[A-Za-z]{2,24}\b
To better understand what's going on here, let's take a closer look at each part:
Note. The pattern assumes the domain name contains 2 or more alphanumeric characters.
With the original text in A5 and the pattern in A5, the formula takes this shape:
=RegExpMatch(A5, $A$2)
Or you could use a simpler regular expression for email validation with either a lowercase or uppercase character set:
Pattern: \b[\w\.\-]+@[a-z0-9]+[a-z0-9\.\-]*[a-z0-9]+\.[a-z]{2,24}\b
But make your formula case-insensitive:
=RegExpMatch(A5, $A$2, FALSE)
Due to the fact that inbuilt and custom functions go along nicely, there is nothing that would prevent you from using them together in a single formula.
To return or calculate something if a regular expression is matched and something else if it's not matched, embed the custom RegExpMatch function in the logical text of IF:
For example, if a string in A5 contains a valid email address, you can return "Yes"; otherwise "No".
=IF(RegExpMatch(A5, $A$2,), "Yes", "No")
Because native Excel functions don't support regular expressions, it's not possible to put a regex directly in the COUNTIS or COUNTIFS function. Luckily, you can emulate this functionality using our custom function.
Supposing you've used a regex to match phone numbers and output the results in column B. To find out how many cells contain phone numbers, you just need to count the TRUE values in B5:B9. And that can be easily done using the standard COUNTIF formula:
=COUNTIF(B5:B9, TRUE)
Do not want any extra columns in your worksheet? No problem. Keeping in mind that our custom function can process multiple cells at a time and Excel's SUM can add up values in an array, here's what you do:
=SUM(--RegExpMatch(A5:A9, $A$2))
The users of our Ultimate Suite can leverage four powerful Regex functions without adding any VBA code to their workbooks as they are smoothly integrated into Excel during the add-in installation. Our custom functions are processed by the standard .NET RegEx engine and support full-featured classic regular expressions.
Assuming you have the latest version of Ultimate Suite installed (2021.4 or later), you can create a Regex Match formula in two simple steps:
A moment later, the AblebitsRegexMatch function is inserted in a new column to the right of your data.
In the screenshot below, the function checks whether the strings in column A contain 7-digit numbers or not.
Tips:
For more information, please see AblebitsRegexMatch function.
That's how to do regular expression matching in Excel. I thank you for reading and look forward to seeing you on our blog next week!
Excel Regex Match examples (.xlsm file)
Ultimate Suite 14-day fully-functional version (.exe file)
Table of contents