How to extract substrings in Excel using regular expressions (Regex)

In this tutorial, you'll learn how to use regular expressions in Excel to find and extract substrings matching a given pattern.

Microsoft Excel provides a number of functions to extract text from cells. Those functions can cope with most of string extraction challenges in your worksheets. Most, but not all. When the Text functions stumble, regular expressions come to rescue. Wait… Excel has no RegEx functions! True, no inbuilt functions. But there's nothing that would prevent you from using your own ones :)

Excel VBA Regex function to extract strings

To add a custom Regex Extract function to your Excel, paste the following code in the VBA editor. In order to enable regular expressions in VBA, we are using the built-in Microsoft RegExp object. The code is written by our Excel specialist Alex Frolov.

Public Function RegExpExtract(text As String, pattern As String, Optional instance_num As Integer = 0, Optional match_case As Boolean = True) Dim text_matches() As String Dim matches_index As Integer On Error GoTo ErrHandl RegExpExtract = "" Set regex = CreateObject("VBScript.RegExp") regex.pattern = pattern regex.Global = True regex.MultiLine = True If True = match_case Then regex.ignorecase = False Else regex.ignorecase = True End If Set matches = regex.Execute(text) If 0 < matches.Count Then If (0 = instance_num) Then ReDim text_matches(matches.Count - 1, 0) For matches_index = 0 To matches.Count - 1 text_matches(matches_index, 0) = matches.Item(matches_index) Next matches_index RegExpExtract = text_matches Else RegExpExtract = matches.Item(instance_num - 1) End If End If Exit Function ErrHandl: RegExpExtract = CVErr(xlErrValue) End Function

If you have little experience with VBA, a step-by-step user guide may prove helpful: How to insert VBA code in Excel.

Note. For the function to work, be sure to save your file as a macro-enabled workbook (.xlsm).

RegExpExtract syntax

The RegExpExtract function searches an input string for values that match a regular expression and extracts one or all matches.

The function has the following syntax:

RegExpExtract(text, pattern, [instance_num], [match_case])

Where:

  • Text (required) - the text string to search in.
  • Pattern (required) - the regular expression to match. When supplied directly in a formula, the pattern should be enclosed in double quotation marks.
  • Instance_num (optional) - a serial number that indicates which instance to extract. If omitted, returns all found matches (default).
  • Match_case (optional) - defines whether to match or ignore text case. If TRUE or omitted (default), case-sensitive matching is performed; if FALSE - case-insensitive.

The function works in all versions of Excel 365, Excel 2021, Excel 2019, Excel 2016, Excel 2013 and Excel 2010.

4 things you should know about RegExpExtract

To effectively use the function in your Excel, there are a few important things to take notice of:

  1. By default, the function returns all found matches into neighboring cells like shown in this example. To get a specific occurrence, supply a corresponding number to the instance_num argument.
  2. By default, the function is case-sensitive. For case-insensitive matching, set the match_case argument to FALSE. Because of the VBA limitations, the case-insensitive construct (?i) won't work.
  3. If a valid pattern is not found, the function returns nothing (empty string).
  4. If the pattern is invalid, a #VALUE! error occurs.

Before you start using this custom function in your worksheets, you need to understand what it is capable of, right? The below examples cover a few common use cases and explain why the behavior may differ in Dynamic Array Excel (Microsoft 365 and Excel 2021) and traditional Excel (2019 and older versions).

Note. Out regex examples are written for fairy simple data sets. We cannot guarantee they will work flawlessly in your real worksheets. Those who have experience with regex would agree that writing regular expressions is a never-ending road to perfection - almost always there's a way to make it more elegant or capable of handling a wider range of input data.

Regex to extract number from string

Following the basic maxim of teaching "from simple to complex", we'll start with a very plain case: extracting number from string.

The first thing for you to decide is which number to retrieve: first, last, specific occurrence or all numbers.

Extract first number

This is as simple as regex can get. Given that \d means any digit from 0 to 9, and + means one or more times, our regular expression takes this form:

Pattern: \d+

Set instance_num to 1 and you'll get the desired result:

=RegExpExtract(A5, "\d+", 1)

Where A5 is the original string.

For convenience, you can input the pattern in a predefined cell ($A$2 ) and lock its address with the $ sign:

=RegExpExtract(A5, $A$2, 1) Extracting number from string using regex

Get last number

To extract the last number in a string, here's the pattern to use:

Pattern: (\d+)(?!.*\d)

Translated into a human language, it says: find a number that is not followed (anywhere, not just immediately) by any other number. To express this, we are using a negative lookahead (?!.*\d), which means that to the right of the pattern there should be no other digit (\d) regardless of how many other characters are before it.

=RegExpExtract(A5, "(\d+)(?!.*\d)") Extracting the last number from string using regex

Tips:

  • To get a specific occurrence, use \d+ for pattern and an appropriate serial number for instance_num.
  • The formula to extract all numbers is discussed in the next example.

Regex to extract all matches

Pushing our example a little further, suppose you want to get all numbers from a string, not just one.

As you may remember, the number of extracted matches is controlled by the optional instance_num argument. The default is all matches, so you simply omit this parameter:

=RegExpExtract(A2, "\d+")

The formula works beautifully for a single cell, but the behavior differs in Dynamic Array Excel and non-dynamic versions.

Excel 365 and Excel 2021

Due to support for dynamic arrays, a regular formula automatically spills into as many cells as needed to display all calculated results. In terms of Excel, this is called a spilled range: Regex to extract all matches

Excel 2019 and lower

In pre-dynamic Excel, the above formula would return just one match. To get multiple matches, you need to make it an array formula. For this, select a range of cells, type the formula, and press Ctrl + Shift + Enter to complete it.

A downside of this approach is a bunch of #N/A errors appearing in "extra cells". Regrettably, nothing can be done about it (neither IFERROR nor IFNA can fix it, alas). Extracting all matches in Excel 2019 and earlier versions

Extract all matches in one cell

When processing a column of data, the above approach obviously won't work. In this case, an ideal solution would be returning all matches in a single cell. To have it done, serve the results of RegExpExtract to the TEXTJOIN function and separate them with any delimiter you like, say a comma and a space:

=TEXTJOIN(", ", TRUE, RegExpExtract(A5, "\d+")) Extracting all matches into one cell

Note. Because the TEXTJOIN function is only available in Excel for Microsoft 365, Excel 2021 and Excel 2019, the formula won't work in older versions.

Regex to extract text from string

Extracting text from an alphanumeric string is quite a challenging task in Excel. With regex, it becomes as easy as pie. Just use a negated class to match everything that is not a digit.

Pattern: [^\d]+

To get substrings in individual cells (spill range), the formula is:

=RegExpExtract(A5, "[^\d]+") Regex to extract text from string

To output all matches into one cell, nest the RegExpExtract function in TEXTJOIN like this:

=TEXTJOIN("", TRUE, RegExpExtract(A5, "[^\d]+")) Regex to extract all text strings into one cell

Regex to extract email address from string

To pull out an email address from a string containing a lot of different information, write a regular expression that replicates the email address structure.

Pattern: [\w\.\-]+@[A-Za-z0-9\.\-]+\.[A-Za-z]{2,24}

Breaking down this regex, here's what we get:

  • [\w\.\-]+ is a username that may include 1 or more alphanumeric characters, underscores, dots and hyphens.
  • @ symbol
  • [A-Za-z0-9\.\-]+ is a domain name consisting of: uppercase and lowercase letters, digits, hyphens and dots (in case of subdomains). Underscores are not allowed here, therefore 3 different character sets (such as A-Z a-z and 0-9) are used instead of \w that matches any letter, digit or underscore.
  • \.[A-Za-z]{2,24} is a top-level domain. Consists of a dot followed by uppercase and lowercase letters. Most of top-level domains are 3-letters long (e.g .com .org, .edu, etc.), but in theory it can contain from 2 to 24 letters (the longest registered TLD).

Assuming the string is in A5 and the pattern in A2, the formula to extract an email address is:

=RegExpExtract(A5, $A$2) Regex to extract email address from string

Regex to extract domain from email

When it comes to extracting email domain, the first thought that comes to mind is using a capturing group to find text that immediately follows the @ character.

Pattern: @([A-Za-z0-9\.\-]+\.[A-Za-z]{2,24})

Serve it to our RegExp function:

=RegExpExtract(A5, "@([A-Za-z0-9\.\-]+\.[A-Za-z]{2,24})")

And you'll get this result: Extracting a domain name from email using regular expressions

With classic regular expressions, anything outside a capturing group is not included in the extraction. No one knows why VBA RegEx works differently and captures "@" as well. To get rid of it, you can remove the first character from the result by replacing it with an empty string.

=REPLACE(RegExpExtract(A5, "@([a-z\d][a-z\d\-\.]*\.[a-z]{2,})", 1, FALSE), 1, 1, "") Improved formula to extract domain from email

Regular expression to extract phone numbers

Phone numbers can be written in many different ways, which is why it's next to impossible to come up with a solution working under all circumstances. Nevertheless, you can write down all the formats used in your dataset and try to match them.

For this example, we are going to create a regex that will extract phone numbers in any of these formats:

(123) 345-6789

(123) 345 6789

(123)3456789

123-345-6789

123.345.6789

123 345 6789

1233456789

Pattern: \(?\d{3}[-\. \)]*\d{3}[-\. ]?\d{4}\b

  • The first part \(?\d{3} matches zero or one opening parenthesis followed by three digits d{3}.
  • The [-\. \)]* part means any character in square brackets appearing 0 or more times: hyphen, period, space or closing parenthesis.
  • Next, we have three digits again d{3} followed by any hyphen, period or space [-\. ]? appearing 0 or 1 time.
  • After that, there is a group of four digits \d{4}.
  • Finally, there is a word boundary \b defining that a phone number we are looking for cannot be part of a bigger number.

The complete formula takes this shape:

=RegExpExtract(A5, "\(?\d{3}[-\. \)]*\d{3}[-\. ]?\d{4}\b") Regex to extract phone numbers

Please be aware that the above regex may return a few false-positive results such as 123) 456 7899 or (123 456 7899. The below version fixes these issues. However, this syntax only works in VBA RegExp functions, not in classic regular expressions.

Pattern: (\(\d{3}\)|\d{3})[-\. ]?\d{3}[-\. ]?\d{4}\b

Regex to extract date from string

A regular expression for extracting a date depends on the format in which the date appears within a string. For example:

To extract dates like 1/1/21 or 01/01/2021, the regex is: \d{1,2}\/\d{1,2}\/(\d{4}|\d{2})

It searches for a group of 1 or 2 digits d{1,2} followed by a slash, followed by another group of 1 or 2 digits, followed by a slash, followed by a group of 4 or 2 digits (\d{4}|\d{2}). Please notice that we look for 4-digit years first, and only then for 2-digit years. If we write it the other way around, only the first 2 digits will be picked out from all the years. This is because after the first condition is in the alternation OR construct is matched, the remaining conditions are not checked.

To retrieve dates like 1-Jan-21 or 01-Jan-2021, the pattern is: \d{1,2}-[A-Za-z]{3}-\d{2,4}

It searches for a group of 1 or 2 digits, followed by a hyphen, followed by a group of 3 uppercase or lowercase letters, followed by a hyphen, followed by a group of 4 or 2 digits.

After combining the two patterns together, we get the following regex:

Pattern: \b\d{1,2}[\/-](\d{1,2}|[A-Za-z]{3})[\/-](\d{4}|\d{2})\b

Where:

  • The first part is 1 or 2 digits: \d{1,2}
  • The second part is either 1 or 2 digits or 3 letters: (\d{1,2}|[A-Za-z]{3})
  • The third part is a group of 4 or 2 digits: (\d{4}|\d{2})
  • Delimiter is either a forward slash or hyphen: [\/-]
  • A word boundary \b is placed on both sides to make it clear that a date is a separate word, and not part of a bigger string.

As you can see in the image below, it successfully pulls out dates and leaves out substrings such as 11/22/333. However, it still returns false positive results. In our case, the substring 11-ABC-2222 in A9 technically matches the date format dd-mmm-yyyy and therefore is extracted. Regex to extract date from string

To eliminate false positives, you can replace the [A-Za-z]{3} part with a full list of 3-letter month abbreviations:

Pattern: \b\d{1,2}[\/-](\d{1,2}|(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))[\/-](\d{4}|\d{2})\b

To ignore the letter case, we set the last argument of our custom function to FALSE:

=RegExpExtract(A5, $A$2, 1, FALSE)

And this time, we get a perfect result: Regular expression to extract date from string

Regex to extract time from string

To get time in the hh:mm or hh:mm:ss format, the following expression will work a treat.

Pattern: \b(0?[0-9]|1[0-2]):[0-5]\d(:[0-5]\d)?\s?(AM|PM)\b|\b([0-9]|[0-1]\d|2[0-3]):[0-5]\d(:[0-5]\d)?(?!:)

Breaking down this regex, you can see 2 parts separated by | that enables the OR logic. In other words, we search for a substring that matches one of the below expressions.

Expression 1: \b(0?[0-9]|1[0-2]):[0-5]\d(:[0-5]\d)?\s?(AM|PM)\b

Retrieves times with AM/PM.

Hour can be any number from 0 to 12. To get it, we use the OR construct ([0-9]|1[0-2]), where:

  • [0-9] matches any number from 0 to 9
  • 1[0-2] matches any number from 10 to 12

Minute [0-5]\d is any number from 00 to 59.

Second (:[0-5]\d)? is also any number from 00 to 59. The ? quantifier means zero or one occurrence since seconds may or may not be included in the time value.

Expression 2: \b([0-9]|[0-1]\d|2[0-3]):[0-5]\d(:[0-5]\d)?(?!:)

Extracts times without AM/PM.

The hour part can be any number from 0 to 32. To get it, a different OR construct ([0-9]|[0-1]\d|2[0-3]) is used, where:

  • [0-9] matches any number from 0 to 9
  • [0-1]\d matches any number from 00 to 19
  • 2[0-3] matches any number from 20 to 23

The minute and second parts are the same as in expression 1 above.

The negative lookahead (?!:) is added to skip strings such as 20:30:80.

As PM/AM can be either uppercase or lowercase, we make the function case-insensitive:

=RegExpExtract(A5, $A$2, 1, FALSE) Pull time from a string using regular expression

Hopefully, the above examples gave you some ideas on how to use regular expressions in your Excel worksheets. Unfortunately, not all features of classic regular expressions are supported in VBA. If your task cannot be accomplished with VBA RegExp, I encourage you to read the next part that discusses a lot more powerful .NET Regex functions.

Custom .NET based Regex function to extract text in Excel

Unlike VBA RegExp functions that can be written by any Excel user, .NET RegEx is the developer's realm. The Microsoft .NET Framework supports full-featured regular expression syntax compatible with Perl 5. This article won't teach you how to write such functions (I am not a programmer and don't have a slightest idea on how to do that :)

Four powerful functions processed by the standard .NET RegEx engine are already written by our developers and included in Ultimate Suite. Below, we'll demonstrate some practical uses of the function specially designed for extracting text in Excel.

Tip. For information about the .NET Regex syntax, please refer to .NET Regular Expression Language.

How to extract strings in Excel using regular expressions

Assuming you have the latest version of Ultimate Suite installed, extracting text using regular expressions boils down to these two steps:

  1. On the Ablebits Data tab, in the Text group, click Regex Tools. Regex Tools for Excel
  2. On the Regex Tools pane, select the source data, enter your Regex pattern, and choose the Extract option. To get the result as a custom function, not a value, select the Insert as a formula check box. When done, click the Extract button. Using the Regex Extract tool in Excel

The results will appear in a new column to the right of your original data: Regex Extract function for Excel

AblebitsRegexExtract syntax

Our custom function has the following syntax:

AblebitsRegexExtract(reference, regular_expression)

Where:

  • Reference (required) - a reference to the cell containing the source string.
  • Regular_expression (required) - the regex pattern to match.

Important note! The function only works on the machines with Ultimate Suite for Excel installed.

Usage notes

To make your learning curve smoother and your experience more enjoyable, please pay attention to these points:

  1. To create a formula, you can use our Regex Tools, or Excel's Insert function dialog, or type the full function name in a cell. Once the formula is inserted, you can manage it (edit, copy or move) like any native formula.
  2. The pattern you enter on the Regex Tools pane goes to the 2nd argument. It is also possible to keep a regular expression in a separate cell. In this case, just use a cell reference for the 2nd argument.
  3. The function extracts the first found match.
  4. By default, the function is case-sensitive. For case-insensitive matching, use the (?i) pattern.
  5. If a match is not found, an #N/A error is returned.

Regex to extract string between two characters

To get text between two characters, you can use either a capturing group or look-arounds.

Let's say you are looking to extract text between brackets. A capturing group is the easiest way.

Pattern 1: \[(.*?)\]

With a positive lookbehind and lookahead, the result will be exactly the same.

Pattern 2: (?<=\[)(.*?)(?=\])

Please pay attention that our capturing group (.*?) performs a lazy search for text between two brackets - from the first [ to the first ]. A capturing group without a question mark (.*) would do a greedy search and capture everything from the first [ to the last ].

With the pattern in A2, the formula goes as follows:

=AblebitsRegexExtract(A5, $A$2) Regex to extract text between two characters

How to get all matches

As already mentioned, the AblebitsRegexExtract function can only extract one match. To get all matches, you can use the VBA function we've discussed earlier. However, there is one caveat - VBA RegExp does not support capturing groups, so the above pattern will return the "boundary" characters as well, brackets in our case.

=TEXTJOIN(" ", TRUE, RegExpExtract(A5, $A$2)) VBA RegExp to extract text between two characters

To get rid of the brackets, SUBSTITUTE them with empty strings ("") using this formula:

=SUBSTITUTE(SUBSTITUTE(TEXTJOIN(", ", TRUE, RegExpExtract(A5, $A$2)), "]", ""),"[","")

For better readability, we are using a comma for delimiter. Improved formular to extract text between two characters

Regex to extract text between two strings

The approach we've worked out for pulling out text between two characters will also work for extracting text between two strings.

For example, to get everything between "test 1" and "test 2", use the following regular expression.

Pattern: test 1(.*?)test 2

The complete formula is:

=AblebitsRegexExtract(A5, "test 1(.*?)test 2") Regex to extract text between two strings

Regex to extract domain from URL

Even with regular expressions, extracting domain names from URLs is not a trivial task. The key element that does the trick is non-capturing groups. Depending on your ultimate goal, choose one of the below regexes.

To get a full domain name including subdomains

Pattern: (?:https?\:|^|\s)\/\/((?:[A-Za-z\d\-\.]{2,255}\.)?[A-Za-z\d\-]{1,63}\.[A-Za-z]{2,24})

To get a second-level domain without subdomains

Pattern: (?:https?\:|^|\s)\/\/(?:[A-Za-z\d\-\.]{2,255}\.)?([A-Za-z\d\-]{1,63}\.[A-Za-z]{2,24})

Now, let's see how these regular expressions work on an example of "https://www.mobile.ablebits.com" as a sample URL:

  • (?:https?\:|^|\s) - non-capturing group. It matches but does not capture a substring that is preceded by one of the following: https, http, start of string (^), a whitespace character (\s). The last two items are included to handle protocol-relative URLs like "//google.com".
  • \/\/ - two forward slashes (each is preceded by a backslash to escape the special meaning of the forward slash and interpret it literally).
  • (?:[A-Za-z\d\-\.]{2,255}\.)? - non-capturing group to identify third-level, fourth-level, etc. domains, if any (mobile in our sample URL). In the first pattern, it is placed within a bigger capturing group to get all such subdomains included into the extraction. A subdomain can be from 2 to 255 characters long, hence the {2,255} quantifier.
  • ([A-Za-z\d\-]{1,63}\.[A-Za-z]{2,24}) - capturing group to extract the second-level domain (ablebits) and the top-level domain (com). The max length of a second-level domain is 63 characters. The longest top-level domain currently in existence contains 24 characters.

Depending on which regular expression is entered in A2, the below formula will produce different results:

=AblebitsRegexExtract(A5, $A$2)

Regex to extract the full domain name with all subdomains: Regex to extract full domain name from URL

Regex to extract a second-level domain without subdomains: Regex to extract a second-level domain from URL

That's how to extract parts of text in Excel using regular expressions. I thank you for reading and look forward to seeing you on our blog next week!

Available downloads

Excel Regex Extract examples (.xlsm file)
Ultimate Suite trial version (.exe file)

33 comments

  1. Hi Ablebits,

    Thank you for your advise, but i try many times but still not work in my case.

    i have below set data,

    987789 x 9, 789789 x 4, 000987, 987765, 888999 x 3, the result is "18"

    because the 6 digits is a group item number, sometimes the item number have more than one set, so that mulitple a value. if the item number only 1 set, that count "1".

    The problem is the whole set information put in a "cell", so i don't know how to do.

    Please help again!

    Regards,

    1. Hi! If I understand your task correctly, you can't extract a number 1 from 000987 that isn't there.
      If this is not what you wanted, please describe the problem in more detail.

      1. thanks your quickly replied, my problem is that sometimes the item number more than 1 set

        Now currently one of case, i used this as belows to solve Case.1;

        =SUM(LEN(B24)-LEN(SUBSTITUTE(B24,{1,2,3,4,5,6,7,8,9,0},"")))/6

        Case.1 B24 cell have this data-> 048987, 048871, 048456, 124885, 124886 so i can get the total result is "5" (5 set number)

        But if like this as belows, i don't know how to get the total result is "18"

        Case.2 987789 x 9, 789789 x 4, 000987, 987765, 888999 x 3

        Hope my describe clearly for you and please provide some advise.

        Thank you so much!

          1. thanks again!

            each group number is represented to an item code, but in

            Case.2 987789 x 9, 789789 x 4, 000987, 987765, 888999 x 3,

            the result is came from like that - "9", "4", "1", "1", "3", so total sum is "18"

            because as above data input method to show how the item count,

            987789 have 9 sets
            789789 have 4 sets
            000987 only 1 set
            987765 only 1 set
            888999 have 3 sets

            But now in Case.2 i'd manually to add the total sum😞, limited in my ability, i can't find a way to solve it!

            Thank you so much!

              1. Thank you so much as your advise. i try again! Regards

  2. I would like to sort text strings as numbers.

    For example,
    Instead of the results looking like this:

    E1007L
    E1008L01
    E100KP01D182
    E100UD160
    E1010K

    I want it to look like:

    E100KP01D182
    E100UD160
    E1007L
    E1008L01
    E1010K

    I copied and pasted the VBA code and saved it as a macro.
    When I try to run the macro, the xlsm file I saved it as (Sort text as numbers.xlsm) never populates it in the macro dialog box.
    What am I doing wrong?

    1. Hi! In the article under which you have left your comment, there is no macro for the sorting of data, only user-defined functions. So I don't have any idea what kind of macro you're talking about. The result you get is according to the sorting rules.
      Maybe this article will be helpful: Insert and run VBA macros in Excel - step-by-step guide.
      If this is not what you wanted, please describe the problem in more detail.

  3. Hello Team I need your help regarding this:-
    Regex to extract text between two strings

    The approach we've worked out for pulling out text between two characters will also work for extracting text between two strings.

    For example, to get everything between "test 1" and "test 2", use the following regular expression.

    Pattern: test 1(.*?)test 2

    The complete formula is:

    =AblebitsRegexExtract(A5, "test 1(.*?)test 2")

    hOW We can get results without Ablebits reexextract tool with Regex Extractor

    STRINGS RESULTS
    V.1.1 test 1 successful test 2 failed successful
    V.1.2 test 1 no result test 2 failed no result
    V.1.2.2 test 1 errpr:1001 test 2 failed test 3 N/A error:1001
    V.1.3 test 1 sucessful test 2 successful successful
    V.2 test 1 sucessful #N/A

    Thanks a lot in advance

      1. Thanks a lot Sir,
        It is possible with a single formula please help.

        1. Hi! With a single formula, this problem can be solved using a custom user-defined function as described in the article above.
          I recommend paying attention to the Regex tool. You can find, extract, compare, delete, or replace strings that match the regular expression pattern you enter. You don't need to install any VBA code. Read the article above for more on this as well.

  4. Hello,

    thank you very much for this post, it helped me already a lot with my data.

    However, as I am using Excel 2016 I still have the problem of extract all matches. As you mentioned, I need to select an array and make it an array function. But because I am working with a column, I would need to use something like TEXTJOIN to in order for the results to be all displayed. With making it an array function, apparently I can't select multiple cells in 1 line, because I only returns the first match several times.

    Is there any other way with pre-dynamic Excel to modify the formula so that all the values are displayed in multiple adjacent cells in 1 row instead in 1 column?
    Or alternatively, is there any other function that would perform something similar than TEXTJOIN?

    Thank you very much in advance!

    1. Hi! I’m sorry, but your description doesn’t give me a complete understanding of your task. I don't know what formula you are talking about.
      To show array values in a row instead of a column, use the TRANSPOSE function.
      There is no alternative to the TEXTJOIN function in Excel2016. To merge multiple values in the same cell without losing data, try Merge Cells tool. It is available as a part of our Ultimate Suite for Excel that you can install in a trial mode and check how it works for free.

  5. Have installed the ultimate excel suite
    I am trying to do this:
    "Fabric_Cotton Silk, Fabric_Sico, Fabric_Handloom Silk, Handloom Silk, Party, Wedding"

    All of this is in one cell.
    Now I want to extract and join all tags that start with word "Fabric_"

    My formula:
    =TEXTJOIN(", ", TRUE, AblebitsRegexExtract(C11, "Fabric_(.*?),+"))

    But the code is extracting only the first occurrence of the word fabric
    Current output : Fabric_Cotton Silk,
    Required output: Fabric_Cotton Silk, Fabric_Sico, Fabric_Handloom Silk,

    Please help with the formula. Thank you

    1. Hello! The formula

      =TEXTJOIN(", ",FALSE,RegExpExtract(A1, "(?:^|,)?Fabric_(.+?)(?=,|$)"))

      will extract all occurrences of the substring "Fabric_" in the text "Fabric_Cotton Silk, Fabric_Sico, Fabric_Handloom Silk, Handloom Silk, Party, Wedding". The `(?:^|,)?` part of the formula is a non-capturing positive lookahead assertion that allows for the optional presence of a comma (,) or the end of the string before the substring "Fabric_". This assertion ensures that the substring "Fabric_" is not extracted if it is preceded by a comma. The `.+?` part of the formula matches any characters (.+) up to a comma or the end of the string, and the `(?=,|)` part of the formula matches a comma or the end of the string after the extracted substring.

  6. When I try to run the module in a spreadsheet, it works OK.

    When I put the function call into VB code, it crashes Excel.

    Any guidelines re calling the function in VB code ??

    1. Call the function into a String variable - txtTest as String
    Result = nothing happens

    Public Sub test()
    Dim txtTest as String
    Const txtRegexString As String = "[\w\.\-]+@[A-Za-z0-9\.\-]+\.[A-Za-z]{2,24}"

    txtTest = RegExpExtract("me@here.com", txtRegexString)
    MsgBox Str(txtTest)
    End Sub

    2. Call the function into an undefined variable - txtTest
    Result = crash Excel

    Public Sub test()
    Dim txtTest
    Const txtRegexString As String = "[\w\.\-]+@[A-Za-z0-9\.\-]+\.[A-Za-z]{2,24}"

    txtTest = RegExpExtract("me@here.com", txtRegexString)
    MsgBox Str(txtTest)
    End Sub

    1. Hi! Try to use instance_num=1 argument. Use this code:

      txtTest = RegExpExtract("me@here.com", txtRegexString, 1)
      MsgBox txtTest

      I hope my advice will help you solve your task.

  7. Hi Ablebits,

    Congrats and thank you for making it so simple. I am moving back from Google Spreadsheets to Excel, and it feels very annoying to lose those abilities !

    Cheers,
    ER

  8. Can I use this to parse a word document? How could I go about doing this?

    1. Transfer information from a Word document to an Excel worksheet. The Excel functions work only with Excel workbook data.

  9. Hello, how to use regex to extract all parameters from a list of URL's with different parameters comes with different values without missing any parameter and not have duplicates

    for example in this list

    https://www.website.com/?parm1=11
    https://www.website.com/?parm1=65&param2=22
    https://www.website.com/?parm1=35&param2=67&param3=87
    https://www.website.com/?parm1=3&param2=2&param3=37&param4=76
    https://www.website.com/?parm1=99&param2=66&param3=56&param4=46

    what is the regex formula can be used to make it extract
    https://www.website.com/?parm1=99&param2=66&param3=56&param4=46

    this URL contains all parameters without duplicates

  10. Hi,

    Can you help me as I am unable to re-use it in all opened workbooks even I have saved code in module section of PERSONAL.XLSB file and also getting errors as variable name not defined for regex and matches on below lines

    Set regex = CreateObject("VBScript.RegExp")
    Set matches = regex.Execute(text)

  11. Hi! This is all wonderful! My question is that I am looking for the VBA example function for "Extract text between two strings". The example given is for the .NET version, not VBA. Can you show me how you'd type it to work with the VBA?
    The string would look something like this:
    sWritten by: Author's NameIllustrated by: Illustrator's NamePublished by: Publishers Name

    I am trying to pull out just the name of the Publisher for example, so I need it to pull out the text in between "Published By:" and "".

    1. Hello!
      To extract the text between two delimiters you can use regular expressions and a custom VBA RegExpExtract function. It is described in detail above at the beginning of this article.
      Since the Publishers Name is at the end of the string, you can use the MID function.
      To return text after the last ":" character, use the formula:

      =MID(A1,SEARCH("#",SUBSTITUTE(A1,":","#", LEN(A1)-LEN(SUBSTITUTE(A1,":",""))))+1,100)

      I hope my advice will help you solve your task.

  12. Hi, thank you for this! It's amazing, exactly what I was looking for.

    Unfortunately, I am an absolute noob with VBA, so I don't understand why it doesn't work.

    Basically, I copied and pasted the code in a fresh module in VBA, but when I try to run the function, I have an error saying that "regex" is not defined..... although I see it in the code, it's the line that says ”Set regex = CreateObject,” isn't it?

    Any idea why it might not work?

    Thank you so much!

    1. Hi!
      The information you provided is not enough to understand your case and give you any advice. When does the error appear? What formula are you using? Did you save your file as a macro-enabled workbook(.xlsm)? Please describe in more detail.

      1. Hi again,

        Thank you for your reactivity. Yes, you are right, I forgot to mention the most important aspect. I am trying to use RegEx to extract a number from a string, using this formula: =RegExpExtract(A5, "\d+", 1). I copy&pasted the code given in the beginning of this article in VBA, then saved it.

        Then I wrote the formula as indicated (modified for my specific needs in the file), and got the error #VALUE!, which is for syntax, as far as I know. I then went back in the VBA to debug, and within the code it said that regex is not defined. I suspect that if I define it within the code, it should work. But I am confused, because to my (limited) knowledge, regex appears to be defined in the code you provided.

        I hope it is clearer now.

        1. And, yes, the file is saved as .xlsm :)

          1. Hi!
            I got the #VALUE! Error when I used incorrect quotes. But at the same time, I did not see any errors in the VBA code.
            You may be using a version of Excel earlier than 2010.
            Here is the article that may be helpful to you: How to enable macros in Excel

            1. Thank you, I got it to work. In fact, my VBA modules begin with "option explicit" by default. I have no idea what that is, but I deleted it, and now the formula works.

              And so I realized it is actually not the formula I needed, as I want to extract one digit from a larger number, and this formula only extracts a number from a line of text : )

              I'll keep looking. But I wanted to leave this here, in case anybody else can use it.

              Thank you for this blog and your dedication. It is very helpful.

              1. Hi Deea,

                To extract one digit from a larger number, you need a different regex. For example:

                To extract the 1st digit:

                =RegExpExtract(A5, "\d", 1)

                To extract the 2nd digit:

                =RegExpExtract(A5, "\d", 2)

                And so on.

Post a comment



Thank you for your comment!
When posting a question, please be very clear and concise. This will help us provide a quick and relevant solution to
your query. We cannot guarantee that we will answer every question, but we'll do our best :)