RegEx in Excel: using regular expressions in formulas

Can never understand why regular expressions are not supported in Excel formulas? Now, they are :) With our custom functions, you can easily find, replace, extract and remove strings matching a specific pattern.

At first sight, Excel has everything you could ever need for text string manipulations. Hmm… what about regular expressions? Oops, there are no built-in Regex functions in Excel. But no one says we cannot create our own ones :)

What is regular expression?

A regular expression (aka regex or regexp) is a specially encoded sequence of characters that defines a search pattern. Using that pattern, you can find a matching character combination in a string or validate data input. If you are familiar with a wildcard notation, you can think of regexes as an advanced version of wildcards.

Regular expressions have their own syntax consisting of special characters, operators, and constructs. For example, [0-5] matches any single digit from 0 to 5.

Regular expressions are used in many programming languages including JavaScript and VBA. The latter has a special RegExp object, which we'll utilize to create our custom functions.

Does Excel support regex?

Regrettably, there are no inbuilt Regex functions in Excel. To be able to use regular expressions in your formulas, you'll have to create your own user-defined function (VBA or .NET based) or install third-party tools supporting regexes.

Excel Regex cheat sheet

Whether a regex pattern is very simple or extremely sophisticated, it is built using the common syntax. This tutorial does not aim to teach you regular expressions. For this, there are plenty of resources online, from free tutorials for beginners to premium courses for advanced users.

Below we provide a quick reference to the main RegEx patterns that will help you get a grasp of the basics. It may also work as your cheat sheet when studying further examples.

If you are comfortable with regular expressions, you can jump straight to the RegExp functions.

Characters

These are the most frequently used patterns to match certain characters.

Pattern Description Example Matches
. Wildcard character: matches any single character except a line break .ot dot, hot, pot, @ot
\d Digit character: any single digit from 0 to 9 \d In a1b, matches 1
\D Any character that is NOT a digit \D In a1b, matches a and b
\s Whitespace character: space, tab, new line and carriage return .\s. In 3 cents, matches 3 c
\S Any non-whitespace character \S+ In 30 cents, matches 30 and cents
\w Word character: any ASCII letter, digit or underscore \w+ In 5_cats***, matches 5_cats
\W Any character that is NOT an alphanumeric character or underscore \W+ In 5_cats***, matches ***
\t Tab
\n New line \n\d+ In the two-line string below, matches 10

5 cats
10 dogs

\ Escapes special meaning of a character, so you can search for it \.
 
 
\w+\.
Escapes a period so you can find the literal "." character in a string
 
Mr., Mrs., Prof.

Character classes

Using these patterns, you can match elements of different character sets.

Pattern Description Example Matches
[characters] Matches any single character in the brackets d[oi]g dog and dig
[^characters] Matches any single character NOT in the brackets d[^oi]g Matches dag, dug, d1g
Does not match dog and dig
[from–to] Matches any character in the range between the brackets [0-9]
[a-z]
[A-Z]
Any single digit from 0 to 9
Any single lowercase letter
Any single uppercase letter

Quantifiers

Quantifiers are special expressions that specify the number of characters to match. A quantifier always applies to the character before it.

Pattern Description Example Matches
* Zero or more occurrences 1a* 1, 1a, 1aa, 1aaa, etc.
+ One or more occurrences po+ In pot, matches po
In poor, matches poo
? Zero or one occurrence roa?d road, rod
*? Zero or more occurrences, but as fewer as possible 1a*? In 1a, 1aa and 1aaa, matches 1a
+? One or more occurrences, but as fewer as possible po+? In pot and poor, matches po
?? Zero or one occurrence, but as fewer as possible roa?? In road and rod, matches ro
{n} Matches the preceding pattern n times \d{3} Exactly 3 digits
{n,} Matches the preceding pattern n or more times \d{3,} 3 or more digits
{n,m} Matches the preceding pattern between n and m times \d{3,5} From 3 to 5 digits

Grouping

Grouping constructs are used to capture a substring from the source string, so you can perform some operation with it.

Syntax Description Example Matches
(pattern) Capturing group: captures a matching substring and assigns it an ordinal number (\d+) In 5 cats and 10 dogs, captures 5 (group 1) and 10 (group 2)
(?:pattern) Non-capturing group: matches a group but does not capture it (\d+)(?: dogs) In 5 cats and 10 dogs, captures 10
\1 Contents of group 1 (\d+)\+(\d+)=\2\+\1 5+10=10+5
\2 Contents of group 2

Anchors

Anchors specify a position in the input string where to look for a match.

Anchor Description Example Matches
^ Start of string

Note: [^inside brackets] means "not"

^\d+ Any number of digits at the start of the string.

In 5 cats and 10 dogs, matches 5

$ End of string \d+$ Any number of digits at the end of the string.

In 10 plus 5 gives 15, matches 15

\b Word boundary \bjoy\b Matches joy as a separate word, but not in enjoyable.
\B NOT a word boundary \Bjoy\B Matches joy in enjoyable, but not as a separate word.

Alternation (OR) construct

The alternation operand enables the OR logic, so you can match either this or that element.

Construct Description Example Matches
| Matches any single element separated by the vertical bar (s|sh)ells In she sells sea-shells, matches sells and shells

Look-arounds

Lookaround constructs are helpful when you want to match something that is or isn't followed or preceded by something else. These expressions are sometimes called "zero-width assertions" or "zero-width match" because they match a position rather than actual characters.

Note. In VBA RegEx flavor, lookbehinds are not supported.
Pattern Description Example Matches
(?=) Positive lookahead X(?=Y) Matches expression X when it is followed by Y (i.e. if there is Y ahead of X)
(?!) Negative lookahead X(?!Y) Matches expression X if it is NOT followed by Y
(?<=) Positive lookbehind (?<=Y)X Matches expression X when it is preceded by Y (i.e. if there is Y behind of X)
(?<!) Negative lookbehind (?<!Y)X Matches expression X when it is NOT preceded by Y

Now that you know the essentials, let's move on to the most interesting part - using regexes on real data to parse strings and find the required information. If you need more details about the syntax, the Microsoft guide on Regular Expression Language may prove helpful.

Custom RegEx functions for Excel

As already mentioned, Microsoft Excel has no built-in RegEx functions. To enable regular expressions, we've created three custom VBA functions. You can copy the codes from the below-linked pages or from our sample workbook, and then paste in your own Excel files.

How VBA RegExp functions work

This section explains the inner mechanics and may be interesting to those who want to know exactly what happens at the backend.

To start using regular expressions in VBA, you need to either activate the RegEx object reference library or use the CreateObject function. To save you the trouble of setting the reference in the VBA editor, we chose the latter approach.

The RegExp object has 4 properties:

  • Pattern - is the pattern to match in the input string.
  • Global - controls whether to find all matches in the input string or just the first one. In our functions, it is set to True to get all matches.
  • MultiLine - determines whether to match the pattern across line breaks in multi-line strings or only in the first line. In our codes, it is set to True to search in every line.
  • IgnoreCase - defines whether a regular expression is case-sensitive (default) or case-insensitive (set to True). In our case, that depends on how you configure the optional match_case parameter. By default, all the functions are case-sensitive.

VBA RegExp limitations

Excel VBA implements the essential regex patterns, but it does lack many advanced features available in .NET, Perl, Java, and other regex engines. For example, VBA RegExp does not support inline modifiers such as (?i) for case-insensitive matching or (?m) for multi-line mode, lookbehinds, POSIX classes, to name a few.

Excel Regex Match function

The RegExpMatch function searches an input string for text that matches a regular expression and returns TRUE if a match is found, FALSE otherwise.

RegExpMatch(text, pattern, [match_case])

Where:

  • Text (required) - one or more strings to search in.
  • Pattern (required) - the regular expression to match.
  • Match_case (optional) - match type. TRUE or omitted - case-sensitive; FALSE - case-insensitive

The function's code is here.

Example: how to use regular expressions to match strings

In the below dataset, suppose you want to identify the entries containing SKU codes.

Given that each SKU starts with 2 capital letters, followed by a hyphen, followed by 4 digits, you can match them using the following expression.

Pattern: \b[A-Z]{2}-\d{4}\b

Where [A-Z]{2} means any 2 uppercase letters from A to Z and \d{4} means any 4 digits from 0 to 9. A word boundary \b indicates that an SKU is a separate word and not part of a bigger string.

With the pattern established, start typing a formula like you normally do, and function's name will appear in the list suggested by Excel's AutoComplete:
Using a custom Regex function in Excel

Assuming the original string is in A5, the formula goes as follows:

=RegExpMatch(A5, "\b[A-Z]{2}-\d{3}\b")

For convenience, you can input the regular expression in a separate cell and use an absolute reference ($A$2) for the pattern argument. This ensures that the cell address will remain unchanged when you copy the formula to other cells:

=RegExpMatch(A5, $A$2)

To display your own text labels instead of TRUE and FALSE, nest RegExpMatch in the IF function and specify the desired texts in the value_if_true and value_if_false arguments:

=IF(RegExpMatch(A5, $A$2), "Yes", "No")
RegExpMatch formula to match substrings

For more formula examples, please see:

Excel Regex Extract function

The RegExpExtract function searches for substrings that match a regular expression and extracts all matches or specific match.

RegExpExtract(text, pattern, [instance_num], [match_case])

Where:

  • Text (required) - the text string to search in.
  • Pattern (required) - the regular expression to match.
  • Instance_num (optional) - a serial number that indicates which instance to extract. If omitted, returns all found matches (default).
  • Match_case (optional) - defines whether to match (TRUE or omitted) or ignore (FALSE) text case.

You can get the code of the function here.

Example: how to extract strings using regular expressions

Taking our example a little further, let's extract invoice numbers. For this, we'll be using a very simple regex that matches any 7-digit number:

Pattern: \b\d{7}\b

Put the pattern in A2 and you'll get the job done with this compact and elegant formula:

=RegExpExtract(A5, $A$2)

If a pattern is matched, the formula extracts an invoice number, if no match is found - nothing is returned.
RegExpExtract formula to extract substrings

For more examples, please see: How to extract strings in Excel using regex.

Excel Regex Replace function

The RegExpReplace function replaces the values matching a regex with the text you specify.

RegExpReplace(text, pattern, replacement, [instance_num], [match_case])

Where:

  • Text (required) - the text string to search in.
  • Pattern (required) - the regular expression to match.
  • Replacement (required) - the text to replace the matching substrings with.
  • Instance_num (optional) - the instance to replace. The default is "all matches".
  • Match_case (optional) - controls whether to match (TRUE or omitted) or ignore (FALSE) text case.

The code of the function is available here.

Example: how to replace or remove strings using regexes

Some of our records contain credit card numbers. This information is confidential, and you may want to replace it with something or delete altogether. Both tasks can be accomplished with the help of the RegExpReplace function. How? In a second scenario, we'll be replacing with an empty string.

In our sample table, all card numbers have 16 digits, which are written in 4 groups separated with spaces. To find them, we replicate the pattern using this regular expression:

Pattern: \b\d{4} \d{4} \d{4} \d{4}\b

For replacement, the following string is used:

Replacement: XXXX XXXX XXXX XXXX

And here's a complete formula to replace credit card numbers with insensitive information:

=RegExpReplace(A5, "\b\d{4} \d{4} \d{4} \d{4}\b", "XXXX XXXX XXXX XXXX")

With the regex and replacement text in separate cells (A2 and B2), the formula works equally well:
Excel formula to replace substrings matching a regex

In Excel, "removing" is a particular case of "replacing". To remove credit card numbers, just use an empty string ("") for the replacement argument:

=RegExpReplace(A5, "\b\d{4} \d{4} \d{4} \d{4}\b", "")
Excel formula to remove substrings matching a regex

Tip. To get rig of empty lines in the results, you can use another RegExpReplace function as shown in this example: How to remove blank lines using regex.

For more information, please see:

Regex Tools to match, extract, replace and remove substrings

The users of our Ultimate Suite can get all the power of regular expressions without inserting a single line of code in their workbooks. All necessary code is written by our developers and smoothy integrated in your Excel during installation.

Unlike the VBA functions discussed above, Ultimate Suite's functions are .NET based, which gives two main advantages:

  1. You can use regular expressions in normal .xlsx workbooks without adding any VBA code and having to save them as macro-enabled files.
  2. .NET Regex engine supports full-featured classic regular expressions, which lets you construct more sophisticated patterns.

How to use Regex in Excel

With the Ultimate Suite installed, using regular expressions in Excel is as simple as these two steps:

  1. On the Ablebits Data tab, in the Text group, click Regex Tools.
    Regex Tools for Excel

  2. On the Regex Tools pane, do the following:
    • Select the source data.
    • Enter your regex pattern.
    • Choose the desired option: Match, Extract, Remove or Replace.
    • To get the result as formula and not value, select the Insert as a formula check box.
    • Hit the action button.

    For instance, to remove credit card numbers from cells A2:A6, we configure these settings:
    Using Regex Tools to remove part of text

In a trice, an AblebitsRegex function will be inserted in a new column to the right of your original data. In our case, the formula is:

=AblebitsRegexRemove(A2, "\b\d{4} \d{4} \d{4} \d{4}\b")

Once the formula is there, you can edit, copy or move it like any native formula.
Excel Regex formula to remove substrings

How to insert a Regex formula directly in a cell

The AblebitsRegex functions can also be inserted directly in a cell without using the add-in's interface. Here's how:

  1. Click the fx button on the formula bar or Insert Function on the Formulas tab.
  2. In the Insert Function dialog box, select the AblebitsUDFs category, choose the function of interest, and click OK.
    Inserting the Ablebits Regex function in a cell

  3. Define the function's arguments like you normally do and click OK. Done!

For more information, please see Regex Tools for Excel.

That's how to use regular expressions to match, extract, replace and remove text in Excel cells. I thank you for reading and look forward to seeing you on our blog next week!

Available downloads

Excel Regex - formula examples (.xlsm file)
Ultimate Suite - trial version (.zip file)

You may also be interested in

2 comments to "RegEx in Excel: using regular expressions in formulas"

    • Hello!
      You can count the number of instances using the COUNTA function. Use this in the function's third argument RegExpExtract function to extract the last instance.

      =RegExpExtract(A5,$A$2,COUNTA(RegExpExtract(A5,$A$2)))

      A2 - the regular expression to match.
      A5 - text string
      I hope I answered your question. If you have any other questions, please don’t hesitate to ask.

Post a comment



Thank you for your comment! When posting a question, please be very clear and concise. This will help us provide a quick and relevant solution to your query. We cannot guarantee that we will answer every question, but we'll do our best :)