View source | |
# 2024-08-01 - XML2TSV | |
I have had fun using json2tsv. It simplifies the task of using JSON | |
data in AWK scripts. | |
gopher://codemadness.org/1/phlog/json2tsv/ | |
I wanted something like json2tsv but for XML. Conventional wisdom | |
says that parsing XML with regex causes madness, and results in N+1 | |
problems. You have been warned. | |
I wrote separate scripts to be used in a pipeline. Breaking it down | |
into multiple steps greatly simplifies the task of parsing XML in | |
AWK. | |
* cdata.awk converts the weird <![CDATA[]]> sections a more "regular" | |
format that xml2tsv.awk can parse. | |
* xmlrem.awk removes the <!-- XML Comment --> sections. | |
* xml2tsv.awk reads XML on stdin and prints TSV on stdout, replacing | |
tab with \t, carriage return with \r, and newline with \n. | |
Usage: | |
cdata.awk file.xml | xmlrem.awk | xml2tsv.awk >file.tsv | |
I tested this script in gawk, mawk, and nawk, including my 16-bit DOS | |
build of nawk. This is just a toy and i would not recommend using it | |
on large data sets. | |
Source code: | |
gopher://tilde.pink/0/~bencollver/files/cdata.awk | |
gopher://tilde.pink/0/~bencollver/files/xmlrem.awk | |
gopher://tilde.pink/0/~bencollver/files/xml2tsv.awk | |
* * * | |
Follow-up: | |
Now that i wrote this AWK script, i found xml2tsv on bitreich. | |
gopher://bitreich.org/1/releases/xml2tsv | |
tags: bencollver,retrocomputing,technical | |
# Tags | |
bencollver | |
retrocomputing | |
technical |