2024-08-01 - XML2TSV
====================

I have had fun using json2tsv.  It simplifies the task of using JSON
data in AWK scripts.

<gopher://codemadness.org/1/phlog/json2tsv/>

I wanted something like json2tsv but for XML.  Conventional wisdom
says that parsing XML with regex causes madness, and results in N+1
problems.  You have been warned.

I wrote separate scripts to be used in a pipeline.  Breaking it down
into multiple steps greatly simplifies the task of parsing XML in
AWK.

* cdata.awk converts the weird <![CDATA[]]> sections a more "regular"
 format that xml2tsv.awk can parse.

* xmlrem.awk removes the <!-- XML Comment --> sections.

* xml2tsv.awk reads XML on stdin and prints TSV on stdout, replacing
 tab with \t, carriage return with \r, and newline with \n.

Usage:

   cdata.awk file.xml | xmlrem.awk | xml2tsv.awk >file.tsv

I tested this script in gawk, mawk, and nawk, including my 16-bit DOS
build of nawk.  This is just a toy and i would not recommend using it
on large data sets.

Source code:

<gopher://tilde.pink/0/~bencollver/files/cdata.awk>
<gopher://tilde.pink/0/~bencollver/files/xmlrem.awk>
<gopher://tilde.pink/0/~bencollver/files/xml2tsv.awk>

* * *

Follow-up:

Now that i wrote this AWK script, i found xml2tsv on bitreich.

<gopher://bitreich.org/1/releases/xml2tsv>

tags: bencollver,retrocomputing,technical

Tags
====

bencollver
<gopher://tilde.pink/1/~bencollver/log/tag/bencollver/>
retrocomputing
<gopher://tilde.pink/1/~bencollver/log/tag/retrocomputing/>
technical
<gopher://tilde.pink/1/~bencollver/log/tag/technical/>