Introduction
Introduction Statistics Contact Development Disclaimer Help
View source
# 2024-08-01 - XML2TSV
I have had fun using json2tsv. It simplifies the task of using JSON
data in AWK scripts.
gopher://codemadness.org/1/phlog/json2tsv/
I wanted something like json2tsv but for XML. Conventional wisdom
says that parsing XML with regex causes madness, and results in N+1
problems. You have been warned.
I wrote separate scripts to be used in a pipeline. Breaking it down
into multiple steps greatly simplifies the task of parsing XML in
AWK.
* cdata.awk converts the weird <![CDATA[]]> sections a more "regular"
format that xml2tsv.awk can parse.
* xmlrem.awk removes the <!-- XML Comment --> sections.
* xml2tsv.awk reads XML on stdin and prints TSV on stdout, replacing
tab with \t, carriage return with \r, and newline with \n.
Usage:
cdata.awk file.xml | xmlrem.awk | xml2tsv.awk >file.tsv
I tested this script in gawk, mawk, and nawk, including my 16-bit DOS
build of nawk. This is just a toy and i would not recommend using it
on large data sets.
Source code:
gopher://tilde.pink/0/~bencollver/files/cdata.awk
gopher://tilde.pink/0/~bencollver/files/xmlrem.awk
gopher://tilde.pink/0/~bencollver/files/xml2tsv.awk
* * *
Follow-up:
Now that i wrote this AWK script, i found xml2tsv on bitreich.
gopher://bitreich.org/1/releases/xml2tsv
tags: bencollver,retrocomputing,technical
# Tags
bencollver
retrocomputing
technical
You are viewing proxied material from tilde.pink. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.