The validate command is used to validate CSV data against a number of validation rules. This command does not validate the basic CSV syntax (the input to it must be syntactically correct CSV) - it's intended to be used to validate business rules.
See also: check
Flag |
Req'd? |
Description |
-vf file |
Yes |
Specifies the file containing the validation rules. |
-om mode |
No |
Specifies the output mode. Possible values are: report, which displays the filename, line number and validation error message for each failure. This is the default pass, which displays all rows that pass validation
|
-ec |
No |
If this option is specified, the validate command returns a value of 2 to the host operating system on validation failure. Without this option, validation failure does not return an error value to the OS. |
The validation rules are contained in a text file. Here's an example:
# val_names.txt
#
# check that:
# each row in the file has at least three fields
# all fields contain some non-whitespace data
# the third field contains only the values "M" and "F"
required 1,2,3
notempty 1,2,3
values 3 "M" "F"
The format of the validation file is fairly straightforward. Lines starting with '#' are comments, other lines consists of two or more fields, separated by spaces. The fields are as follows:
Once you have a validation file you, you can use it to validate CSV data:
csvfix validate -vf val_names.txt data/names.csv
This will produce no output, because all the data in names.csv pass all the validation rules. However, if you try it with another file, like bad_names.csv:
csvfix validate -vf val_names.txt data/bad_names.csv
you get a list of rows and fields that fail the validation rules:
data/bad_names.csv (2): Jane,,F
field: 2 - field is empty
data/bad_names.csv (4): Flann,O'Brien,X
field: 3 - "X" is invalid value
data/bad_names.csv (5): George,Elliot
field: 3 - required field missing
data/bad_names.csv (6): Virginia,
field: 3 - required field missing
The following rules are currently available:
Rule |
Description |
required |
The field(s) must exist in the CSV, though they may be empty. For all other rules, it is not required that the field exists, so if a field must exist, it must be tested with this rule. |
notempty |
The field(s) must not contain only white space. |
fields |
Specifies a minimum and maximum number of fields in each row. This doers not require a field list: |
length |
The length of the field(s) must be between specified minimum and maximum values: |
numeric |
The field must contain a numeric value. Additionally, you can specify a number of ranges as parameters. For example |
values |
The field must contain one of a number of values. For example: |
notvalues |
As above, but the field must not contain the listed values. |
lookup |
Lookup one or more fields in a second CSV file (actually, you can use the same CSV file as the one you are validating, which is useful in some recondite circumstances). For example: |
date |
Check that field is a valid date. The format of the date is specified by a mask value that must be supplied. See the date_iso command for details of mask format. An optional date range, consisting of two dates in ISO format separated by a colon can also be provided. For example: |
The following example validates the cities.csv file against countries,csv. The validation file looks like this:
# val_country.txt
# lookup second field in cities.csv against
# the first field in countries.csv
required 1,2
lookup * 2:1 data/countries.csv
The command line to use it is:
csvfix validate -vf rules/val_country.txt data/cities
which produces the following output (because Greece is not in the countries.csv file):
data/cities.csv (6): Athens,GR
lookup of 'GR' in data/countries.csv failed
Created with the Personal Edition of HelpNDoc: Easy CHM and documentation editor