Is duplicate data in your worksheets causing you a headache? This tutorial will teach you how to quickly find, select, color or eliminate repeated entries in your dataset.
Whether you import data from an external source or collate it yourself, the duplication problem is the same - identical cells create a chaos in your spreadsheets, and you need to somehow deal with them. Since duplicates in Excel can take various forms, deduplication techniques may also vary. This tutorial brings into focus the most useful ones.
Note. This article shows how to search for duplicate cells in a range or list. If you are comparing two columns, then check out these solutions: How to find duplicates in 2 columns.
To highlight duplicate values in a column or range, you normally use Excel Conditional Formatting. In a simplest case, you can apply the predefined rule; in more sophisticated scenarios, you will have to create your own rule based on formula. The below examples illustrate both cases.
In this example, we will be using a preset rule available in all versions of Excel. As you can understand from the heading, this rule highlights all occurrences of a duplicate value, including the first one.
To apply the built-in rule for duplicates, perform these steps:
To mark duplicate values except 1st instances, the inbuilt rule cannot help, and you'll need to set up your own rule with a formula. The formula is quite tricky and requires adding a blank column to the left of your dataset (column A in this example).
To create a rule, these are the steps to perform:
=IF(COLUMNS($B2:B2)>1, COUNTIF(A$2:$B$7,B2),0) + COUNTIF(B$2:B2,B2)>1
Where B2 is the first cell in the first column, B7 is the last cell in the first column, and A2 is the cell in the blank column corresponding to the first row in your selected range. The detailed explanation of the formula is provided in a separate tutorial.
Tips and notes:
A lot more use cases and examples can be found in this tutorial: How to highlight duplicates in Excel.
To find duplicates including 1st occurrences, the generic formula is:
To spot duplicates excluding 1st occurrences, the general formula is:
As you can see, the formulas are very much alike, the difference is in how you define the source range.
To locate duplicate cells including first instances, you compare the target cell (A2) with all other cells in the range $A$2:$A$10 (notice that we lock the range with absolute references), and if more than one cell containing same value are found, label the target cell as "Duplicate".
=IF(COUNTIF($A$2:$A$10, A2)>1, "Duplicate", "")
This formula goes to B2, and then you copy it down to as many cells as there are items in the list.
To get duplicate cells without first instances, you compare the target cell (A2) only with the above cells, not with each other cell in the range. For this, build an expanding range reference like $A$2:$A2.
=IF(COUNTIF($A$2:$A2, $A2)>1, "Duplicate", "")
When copied to the below cells, the range reference expands by 1. So, the formula in B2 compares the value in A2 only against this cell itself. In B3, the range expands to $A$2:$A3, so the value in A3 is compared against the above cell as well, and so on.
For more formula examples, please see How to find duplicates in Excel.
As you probably know, all modem versions of Excel are equipped with the Remove Duplicate tool, which works with the following caveats:
To remove duplicate records, this is what you need to do:
In the below example, we want to check the first four columns for duplicates, so we select them. The Comments column is not really important and therefore is not selected.
Based on the values in the selected columns, Excel has found and removed 2 duplicate records (for Caden and Ethan). The first instances of these records are retained.
More use cases are covered in How to remove duplicate rows in Excel.
As shown in the first part of this tutorial, Microsoft Excel provides a few different features to deal with duplicates. The problem is you need to know where to look for them and how to leverage them for your specific tasks.
To make the lives of our Ultimate Suite users easier, we have created a special tool to handle duplicate cells an easy way. Precisely what can it do? Almost everything you can think of :)
Please let me introduce to you our recent addition to the Ablebits Duplicate Remover toolkit - Find Duplicate Cells add-in.
To find duplicates cells in your worksheet, carry out these steps:
In this example, we've chosen to color duplicate cells except 1st occurrences and got the following result:
Remember that cumbersome formula for conditional formatting to achieve the same effect? ;)
If you are analyzing structured data organized in a table, then use Duplicate Remover to search for duplicates based on values in one or more columns.
To find duplicates in 2 columns or 2 different tables, run the Compare Two Tables tool.
The good news is that all these tools are included in Ultimate Suite and you can try any of them in your worksheets right now - the download link is right below.
I thank you for reading and hope to see you on our blog next week!
Table of contents