| View source | |
| # 2024-08-01 - XML2TSV | |
| I have had fun using json2tsv. It simplifies the task of using JSON | |
| data in AWK scripts. | |
| gopher://codemadness.org/1/phlog/json2tsv/ | |
| I wanted something like json2tsv but for XML. Conventional wisdom | |
| says that parsing XML with regex causes madness, and results in N+1 | |
| problems. You have been warned. | |
| I wrote separate scripts to be used in a pipeline. Breaking it down | |
| into multiple steps greatly simplifies the task of parsing XML in | |
| AWK. | |
| * cdata.awk converts the weird <![CDATA[]]> sections a more "regular" | |
| format that xml2tsv.awk can parse. | |
| * xmlrem.awk removes the <!-- XML Comment --> sections. | |
| * xml2tsv.awk reads XML on stdin and prints TSV on stdout, replacing | |
| tab with \t, carriage return with \r, and newline with \n. | |
| Usage: | |
| cdata.awk file.xml | xmlrem.awk | xml2tsv.awk >file.tsv | |
| I tested this script in gawk, mawk, and nawk, including my 16-bit DOS | |
| build of nawk. This is just a toy and i would not recommend using it | |
| on large data sets. | |
| Source code: | |
| gopher://tilde.pink/0/~bencollver/files/cdata.awk | |
| gopher://tilde.pink/0/~bencollver/files/xmlrem.awk | |
| gopher://tilde.pink/0/~bencollver/files/xml2tsv.awk | |
| tags: bencollver,retrocomputing,technical | |
| # Tags | |
| bencollver | |
| retrocomputing | |
| technical |