by Alexander Frolov, updated on
A few tips and pieces of advice will help you get domain names from a list of URLs using Excel formulas. Two variations of the formula let you extract the domain names with and without www. regardless of the URL protocol (http, https, ftp etc. are supported). The solution works in all modern versions of Excel, from 2010 through 2016.
If you are concerned with promoting your web-site (like I am) or doing SEO at the professional level promoting clients' web-sites for money, you often have to process and analyze huge lists of URLs: Google Analytics reports on traffic acquisition, Webmaster tools reports on new links, reports on backlinks to your competitors' web-sites (which do contain a great lot of interesting facts ;) ) and so on, and so forth.
To process such lists, from ten to a million links, Microsoft Excel makes an ideal tool. It is powerful, agile, extendable, and lets you send a report to your client directly from an Excel sheet.
"Why is it this range, from 10 to 1,000,000?" you may ask me. Because you definitely don't need a tool to process fewer than 10 links; and you will hardly need any if you have over a million inbound links. I'd wager that in this case you'd already had some custom software developed especially for you, with a business logic specifically tailored for your needs. And it would be me who would peruse your articles and not the other way round :)
When analyzing a list of URLs, you often need to perform the following tasks: get domain names for further processing, group URLs by domain, remove links from already processed domains, compare and merge two tables by domain names etc.
As an example, let's take a snippet of ablebits.com's backlinks report generated by Google Webmaster Tools.
Tip: I'd recommend using ahrefs.com to timely spot new links to your own site and your competitors' web-sites.
We have exported the data from a CSV file, which is why in terms of Excel our data are in a simple range. Press Ctrl + T to convert them to an Excel table because it is far more convenient to work with.
=MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3)
=IF(ISERROR(FIND("//www.",A2)), MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3), MID(A2,FIND(":",A2,4)+7,FIND("/",A2,9)-FIND(":",A2,4)-7))
The second formula may seem too long and complex, but only if you didn't see truly long formulas. It's not without reason that Microsoft has increased the maximum length of formulas up to 8192 characters in new versions of Excel :)
The good thing is that we don't have to use either an additional column or VBA macro. In fact, using VBA macros to automate your Excel tasks is not so difficult as it may seem, see a very good article - how to create and use VBA macros. But in this particular case, we do not actually need them, it's quicker and easier to go with a formula.
Note: Technically, www is the 3rd level domain, though with all normal web-sites www. is just an alias of the primary domain. In the early days of the Internet, you could say "double u, double u, double u our cool name dot com" on the phone or in a radio advert, and everyone perfectly understood and remembered where to look for you, of course unless your cool name was something like www.llanfairpwllgwyngyllgogerychwyrndrobwyll-llantysiliogogogoch.com :)
You need to leave all other domain names of the 3rd level, otherwise you would mess up links from different sites, e.g. with "co.uk" domain or from different accounts on blogspot.com etc.
Done! We have a column with extracted domain names.
In the next section you will learn how you can process a list of URLs based on the Domain column.
Tip: If you may need to edit the domain names manually at a later time or copy the results to another Excel worksheet, replace the formula results with values. To do this, proceed with the following steps:
Here you will find a few tips on further processing of the URL list, from on my own experience.
Excel has created an outline of your data on the left hand side of the screen. There are 3 levels of the outline and what you see now is the expanded view, or level 3 view. Click number 2 in the upper left hand corner to display the final data by domains, and then you can click the plus and minus signs (+ / -) in order to expand / collapse the details for each domain.
In our previous section we showed how to group URLs by domain. Instead of grouping, you can quickly color duplicate entries of the same domain name in your URLs.
For more details please see how to highlight duplicates in Excel.
You may have one or several separate Excel worksheets where you keep a list of domain names. Your tables may contain links that you don't want to work with, like spam or the domains you already processed. You may also need to keep a list of domains with interesting links and delete all other ones.
For example, my task is to color in red all domains that are in my spammer blacklist:
Not to waste much time, you can compare your tables to delete the unnecessary links. For full details, please read How to compare two Excel columns and delete duplicates
This is the most advanced way and the one I personally prefer.
Suppose, you have a separate Excel worksheet with reference data for each domain you ever worked with. This workbook keeps webmaster contacts for link exchange and the date when your website was mentioned in this domain. There can also be types/subtypes of websites and a separate column with your comments like on the screenshot below.
As soon as you get a new list of links you can match two tables by domain name and merge the information from the domain lookup table and your new URLs sheet in just two minutes.
As a result you will get the domain name as well as the website category and your comments. This will let you see the URLs from the list you need to delete and those you need to process.
Match two tables by domain name and merge data:
This nifty tool will match and merge two Excel 2013-2003 worksheets in a flash. You can use one or several columns as the unique identifier, update existing columns in the master worksheet or add new from the lookup table. Feel free to read more about Merge Tables Wizard on our website.
Just a few seconds - and you get all information about each domain name at a glance.
You can download Merge Tables Wizard for Excel, run it on your data and see how useful it can be.
If you are interested to get a free add-in for extracting domain names and subfolders of the root domain (.com, .edu, .us etc.) from the URL list, simply drop us a comment. When doing this, please specify your Excel version, e.g. Excel 2010 64-bit, and enter your email address in the corresponding field (do not worry, it won't be displayed publically). If we have a decent number of votes, we will create such and add-in and I let you know. Thank you in advance!
Table of contents