The from_xml command converts XML data into CSV records. Converting any arbitrary XML into arbitrary CSV requires a Turing-complete programming language, and CSVfix does not attempt to do this. However, it can convert most XML data into useful CSV records, which can then be further tweaked using other CSVfix commands.
To illustrate the use of the from_xml command, we will use the XML input file books.xml - this is in fact the output file produced by the to_xml command. Converting this XML data is pretty simple; this command line:
csvfix from_xml -re 'character' books.xml
produces the following CSV:
"Charles","Dickens","Bleak House","Esther Sumerson","Drippy heroine"
"Charles","Dickens","Bleak House","Inspector Bucket","Prototype detective"
"Charles","Dickens","Great Expectations","Pip","Deluded ex-blacksmith"
"Charles","Dickens","Bleak House","Mr Vholes","Vampiric lawyer"
"Jane","Austen","Emma","Emma Woodhouse","Smug Surrey goddess"
"Jane","Austen","Pride & Prejudice","Elizabeth Bennet","Non-drippy heroine"
"Jane","Austen","Pride & Prejudice","Mr Darcy","Proud, wet-shirted landowner"
How does it work? Well, the -re flag is used to specify the XML tag that marks the start of a new record, in this case 'character'. For each character tag, CSVfix outputs a record, using by default all the fields from the tag's parents, and the fields of the tag itself, plus any child tags. There is no way of specifying the field order - to do that you should pipe the output through the order command. If your XML contains multiple tags with the same name, you can narrow the tag matching by using a tag path fragment such as 'author@book@character'.
See also: to_xml
Flag |
Req'd? |
Description |
-re tags |
Yes |
Specifies a comma-separated list of tags which will be used to mark the start of new records. The tags may be simple, for example 'name', or be path fragments using the '@' character as a separator, for example 'character@name'. |
-np |
No |
Do not output data from parent tags. |
-nc |
No |
Do not output data from child tags. |
-na |
No |
Do not output data from attributes. |
-ip |
No |
Insert the path of the tag that produced the output as the first CSV field in the output. |
-ml sep |
No |
Specify the separator string used within a CSV output field for multi-line text input data. Default is a single space. |
Created with the Personal Edition of HelpNDoc: Full-featured EPub generator