unique

Commands ››
Parent Previous Next

The unique command is used to reduce rows that contain duplicate field values to a single row. The single row chosen to represent the duplicates will be the first on encountered in the input file. You can also specify that you want to output only the duplicates. Note that this command does not require that its input is sorted, but does require that all data be read into memory, which may make it slow or unusable for very large datasets.

See also: sort

Flag

Req'd?

Description

-f fields

No

A comma-separated list of fields to test for uniqueness. If not specified, each complete CSV record is tested.

-d

No

Specifies if only duplicate fields should be output. This is the converse of the default behaviour which is to only output unique fields.



The following example lists rows from the post.csv file where the first field value occurs more than once:

csvfix unique -d -f 1 data/post.csv

which produces:

"London","NW"
"London","W"
"London","E"
"London","SE"
"London","SW"

You can use the unique command to merge two or more CSV files into one, discarding any duplicate rows:

csvfix unique -o merged.csv file1.csv file2.csv

This assumes that the two input files have the possibly duplicate fields in the same order in the CSV records.


Created with the Personal Edition of HelpNDoc: Free Kindle producer