Deduplication

To remove duplicate rows of data use this dialog:

Clicking add will place a new entry in the grid for which you choose the column to include in the deduplication criteria. CSV Easy combines the data in all the columns you select to create a matching key. The first instance of this key has its row kept and all subsequent duplicate copies of that key have their rows removed.

In the screen shot above First name and Last name are being used for the matching key. If we have 6 records in our data set the removal of duplicate rows would be as follows:

When a blank matching key is encountered it is completely ignored. This is because it is too dangerous and will result in all blank data being deduplicated together.

If you wish to control the row that is kept then you must order the file to keep your master row at the top of the file. This can be achieved in a variety of ways, but commonly you will sort the data by some form of criteria such as a date column.

When the process has finished a summary will be displayed showing the the record count before, after and the percentage of duplicates removed.

Page updated

Report abuse