2024-08-01 - XML2TSV
====================
I have had fun using json2tsv. It simplifies the task of using JSON
data in AWK scripts.
<
gopher://codemadness.org/1/phlog/json2tsv/>
I wanted something like json2tsv but for XML. Conventional wisdom
says that parsing XML with regex causes madness, and results in N+1
problems. You have been warned.
I wrote separate scripts to be used in a pipeline. Breaking it down
into multiple steps greatly simplifies the task of parsing XML in
AWK.
* cdata.awk converts the weird <![CDATA[]]> sections a more "regular"
format that xml2tsv.awk can parse.
* xmlrem.awk removes the <!-- XML Comment --> sections.
* xml2tsv.awk reads XML on stdin and prints TSV on stdout, replacing
tab with \t, carriage return with \r, and newline with \n.
Usage:
cdata.awk file.xml | xmlrem.awk | xml2tsv.awk >file.tsv
I tested this script in gawk, mawk, and nawk, including my 16-bit DOS
build of nawk. This is just a toy and i would not recommend using it
on large data sets.
Source code:
<
gopher://tilde.pink/0/~bencollver/files/cdata.awk>
<
gopher://tilde.pink/0/~bencollver/files/xmlrem.awk>
<
gopher://tilde.pink/0/~bencollver/files/xml2tsv.awk>
* * *
Follow-up:
Now that i wrote this AWK script, i found xml2tsv on bitreich.
<
gopher://bitreich.org/1/releases/xml2tsv>
tags: bencollver,retrocomputing,technical
Tags
====
bencollver
<
gopher://tilde.pink/1/~bencollver/log/tag/bencollver/>
retrocomputing
<
gopher://tilde.pink/1/~bencollver/log/tag/retrocomputing/>
technical
<
gopher://tilde.pink/1/~bencollver/log/tag/technical/>