Excel Data Cleanup: How to Add Dynamic Sequential Numbers & Find Duplicates
You've just finished a big data export, and now you need to get it ready for a clean upload into your database. But what happens when your system requires a unique sequential number for each related entry? And how do you ensure there are no duplicates hiding in your data?
This article will show you two quick and effective Excel tricks to clean and validate your data, preventing upload errors and saving you hours of manual work. We'll use a simple IF formula to add sequential numbers that restart for each unique record, and then we'll use a PivotTable to instantly find and flag any duplicate entries.
Be sure and watch the video for a visual walkthrough!
Free Download! Get the exact sample file used here in this tutorial to follow along. To download the file, Click the link, then go to File > Download > Microsoft Excel (.xlsx).
Step 1: Sort Your Data
Before we can create our sequential numbers, we need to group all related records together. The easiest way to do this is by sorting your data by the unique identifier.
Select all the data you want to work with.
Go to the Data tab in the ribbon.
Click Sort.
In the pop-up window, choose the column that contains your unique identifier. In our HR example, this is Employee ID.
Click OK.
Your data is now sorted, so all records for a single employee are grouped together.
Step 2: Create a Dynamic Sequential Numbering System
Now we'll use a simple formula to create a unique sequential number for each skill record per employee.
In the first cell of a new, blank column (e.g., cell A2), type the number 1 and press Enter. This will be the starting number for your first record.
In the cell directly below it (A3), type the following formula:
=IF(B3=B2,A2+1,1)
Here's what this formula does:
IF(B3=B2...
: It checks if the unique identifier in the current row (B3) is the same as the unique identifier in the row above it (B2)....A2+1...
: If they are the same (meaning it's the same employee), it takes the sequential number from the cell above it (A2) and adds 1....1)
: If they are different (meaning it's a new employee), the formula starts the count over at 1.
Once you press Enter, you'll see the formula has created the next sequential number.
Now, let's apply it to your entire data set.
Click on the cell with your new formula.
Double-click the small green square (autofill handle) at the bottom right corner of the cell.
Excel will automatically copy the formula all the way down your spreadsheet, numbering each record and restarting the count for every new employee.
To lock in your numbers: The last crucial step is to convert your formulas to values. This prevents the numbers from changing if you later sort or filter your data.
Select the entire new column with your sequential numbers.
Right-click and select Copy.
Right-click again and under "Paste Options," choose Values (the clipboard with the "123" icon).
Now, your sequential numbers are permanent, and your data is ready for upload!
Step 3: Check for Duplicate Entries
Before you upload, let's use a PivotTable to quickly find any accidental duplicate records.
Select all the data in your spreadsheet.
Go to the Insert tab and click PivotTable.
Choose to place the PivotTable in a New Worksheet and click OK.
In the PivotTable Fields pane on the right:
Drag your unique identifier (Employee ID) into the Rows area.
Drag the potential duplicate field (Skill) into the Rows area, placing it below the Employee ID.
Drag the unique identifier (Employee ID) into the Values area.
The PivotTable will default to "Sum," but we want to count the number of records.
Click the dropdown arrow on "Sum of Employee ID" in the "Values" area.
Select Value Field Settings.
Choose Count and click OK.
Finally, let's clean up the view to make it easy to spot duplicates.
Click anywhere in the PivotTable.
Go to the Design tab.
Click Subtotals and select Do Not Show Subtotals.
Your PivotTable now shows a count for every unique combination of Employee ID and Skill. A value of 1 is good, but a value greater than 1 indicates a duplicate.
Step 4: Highlight and Fix Your Duplicates
To make duplicates stand out, we'll use conditional formatting.
Select the column in your PivotTable that contains the counts (e.g., "Count of Employee ID").
Go to the Home tab and click Conditional Formatting.
Go to Highlight Cells Rules and select Greater Than....
In the pop-up, type 1 and click OK.
Any combination of employee and skill that appears more than once will now be highlighted.
You can now use this information to go back to your original data, filter for the highlighted employees, and clean up the duplicate entries.
By following these simple steps, you've successfully prepared your data for a clean and error-free upload, ensuring the integrity of your database.