by Svetlana Cheusheva, updated on
The Regex Replace function is missing in your Excel? This tutorial shows how to quickly add it to your workbooks, so you could use regular expressions to replace text strings in Excel.
When it comes to changing one piece of text to another, Microsoft Excel provides a number of options to choose from including the Find and Replace tool and a couple of substitution functions. Why would one want to complicate things with regexes? Because Excel's standard features can only process an exact string that you specify. To find a string that matches some pattern and replace it with something else, regular expressions are indispensable.
As it's generally known, built-in Excel functions do not support regular expressions. To be able to use regexes in your formulas, you need to create your own function. Luckily, the RegExp object already exists in VBA, and we are using this object in the below code:
Tips:
The RegExpReplace function searches an input string for values that match a regular expression and replaces the found matches with the text you specify.
The function accepts 5 arguments, but only the first three are required.
Where:
The function works in all versions of Excel 365, Excel 2021, Excel 2019, Excel 2016, Excel 2013 and Excel 2010.
To ensure that your results will meet your expectations, let's take a closer look at the inner mechanics:
Assuming you've already inserted the RegExpReplace function in your workbook, let's get to more fascinating things - using regular expressions for advanced find and replace in Excel.
In the sample dataset below, supposing you want to hide some personal data such as social security numbers. Given that SSN is a nine-digit number in the format "000-00-0000", we are using the following regular expression to find it.
Pattern: \d{3}-\d{2}-\d{4}
For replacement, this string is used:
Replacement text: XXX-XX-XXXX
With the original string in A5, the complete formula takes this form:
=RegExpReplace(A5, "\d{3}-\d{2}-\d{4}", "XXX-XX-XXXX")
For convenience, you can enter the pattern and replacement text in separate cells and refer to those cells in your formula. If you plan to use the formula for multiple cells, remember to lock the cell addresses with the $ sign:
=RegExpReplace(A5, $A$2, $B$2)
To find any single digit from 0 to 9, use \d in your regular expression. To find specific digits, use an appropriate quantifier or construct a more sophisticated regex like shown in the below examples.
To replace absolutely all numbers in a string with some character or text, use the + quantifier, which says to search for numbers containing 1 or more digits.
Pattern: \d+
For example, to replace all numbers in cell A5 with an asterisk, use this formula:
=RegExpReplace(A5, "\d+", "*")
In the same dataset, suppose you wish to replace only the amounts of money and not all the numbers. To have it done, you search for the dollar sign followed by one or more digits \$\d+ - this part matches the dollar unit. After the main unit, there may or may not be a fractional unit. To match it, you look for zero or one period after which come from 0 to 2 digits \.?\d{0,2}. The word boundary \b in the end ensures that the matching value is not part of a bigger number.
Pattern: \$\d+\.?\d{0,2}\b
Serve this regular expression to our custom function and you'll get the follwoing result:
=RegExpReplace(A5, "\$\d+\.?\d{0,2}\b", "*")
In classic regular expressions, there is the global search flag /g that forces a regex to find all possible matches in a string. In VBA, this flag is not supported. Instead, the VBA RegExp object provides the Global property that defines whether to search for all occurrences or only the first one. In the code of our function, the Global property is set to True, meaning the pattern should be tested against all possible matches in a string.
So, what do you do to replace all occurrences matching a pattern? Nothing special. This behavior is implemented by default.
Let's say you manage an Excel file with personal data. Some information is confidential, so before sharing this file with your colleagues, you want to replace sensitive info such as social security numbers (SSN) and individual taxpayer identification numbers (ITIN) with "CONF".
Given that both SSN and ITIN have the same format, you can find them using the below regular expression.
Pattern: \d{3}-\d{2}-\d{4}
Since the 4th instance_num argument of our function defaults to TRUE, you can safely omit it:
=RegExpReplace(A5, "\d{3}-\d{2}-\d{4}", "CONF")
As the result, all substrings matching the regex are replaced with the specified text:
To replace just one occurrence matching a given pattern, define the corresponding number in the instance_num argument.
In the above example, supposing you wish to replace only social security numbers. In all cells, SSN is listed first, so we are replacing the 1st instance:
=RegExpReplace(A5, $A$2, $B$2, 1)
Where $A$2 is the pattern and $B$2 is the replacement text.
Have you heard about the recent addition to our Ultimate Suite? I mean four powerful Regex Tools introduced in the latest version. Compared to their VBA counterparts, the Ablebits Regex functions have two main advantages:
For more information, please see AblebitsRegexReplace function.
Below, you will find an example of Regex Replace that cannot be done with VBA.
Let's say you aim to replace text in square brackets with some character or string. The task can be accomplished with either a capturing group or positive look-arounds. The former is an easier way, so we'll stick to it.
Generally, to find a character that is "special" in regular expressions, you need to put a backslash right before it. In regexes, \ acts as an escape character that cancels the special meaning of the following character and turns it into a literal character. So, to find a bracket, you prefix it with a backslash: \[ to match an opening bracket and \] to match a closing bracket. Between the brackets, place a (capturing group). Depending on which quantifier you use in the group, different replacements will be made.
Greedy pattern: \[(.*)\]
This pattern performs a greedy search - consumes as much as possible. The .* expression matches as many characters as possible, and then tries to match ]. So, this pattern captures everything from the first opening bracket to the last closing bracket.
Lazy pattern: \[(.*?)\]
This pattern performs a lazy search - consumes as little as possible. The question mark ? forces .* to match as few characters as possible until it finds the next match in the pattern, which is ]. So, this pattern captures everything from the first opening bracket to the first closing bracket.
Negated class: \[([^\]]*)\]
One more way to match a substring in square brackets is to use a negation operator (^) inside the capturing group. From the first opening bracket, this pattern captures any characters other than a closing bracket, until it finds the first closing bracket. The result will be the same as with the lazy search.
Once you've decided on the regex, go "feed" it to the Ablebits Regex Replace function. Here's how:
At a moment's notice, the AblebitsRegexReplace function is inserted in a new column to the right of your original data.
=AblebitsRegexReplace(A2, "\[(.*)\]", "*")
The pattern and replacement text you specify go to the 2nd and 3rd arguments, respectively. Alternatively, you can keep them in separate cells, and supply the corresponding cell references. In our case, we enter the patterns in cells B2 and C2, type the replacement value in A2, then compare the results and choose the desired outcome:
Tips and notes:
That's how to find and replace text strings in Excel using regular expressions. I thank you for reading and look forward to seeing you on our blog next week!
Excel Regex Replace examples (.xlsm file)
Ultimate Suite 14-day fully-functional version (.exe file)
Table of contents