Introduction
Introduction Statistics Contact Development Disclaimer Help
View source
# 2024-11-02 - No YAML, No Recutils
Back around 2011 i wrote a private database, and at some point
migrated it to a file based format. Each record is a YAML file.
The data is presented in a vertical format with one line per field,
plus some multi-line blocks defined by indentation. Trivial to
edit in any text editor. I used Tcl and the yaml module from libtcl
to process the data.
Recently i found the No YAML web site and i decided it was time
for a change.
No YAML
I briefly considered CSV, TSV, and JSON. They are all mature,
standardized formats, but in my opinion they fall short when it
comes to editing in a vertical format in a plain text editor.
I looked at GNU Recutils, since several folks wrote about it their
phlogs. Like my YAML files, recutils presents the data in a vertical
format with one line per field, plus it can do multi-line blocks via
line continuations. The format is fine for my purposes.
One of my requirements is that i want this to work on FreeDOS too.
Recutils requires filesystem support for ACL, which is too fancy for
DOS. The format is simple enough, but the source code is
surprisingly complex. It does a fraction of what sqlite3 does, and
in a less portable, less robust way.
I tried making my own format based on ASCII control codes. I could
use Control-^ (the RS character) as the record separator, Control-_
(the US character) as the unit separator AKA the field separator, and
Control-X (the CAN or Cancel character) to discard all text since the
beginning of the field. This format works in ed(1) and the calvin vi
clone on DOS. However, the control characters are a little ridiculous
to look at and type in. I did not want to foist such an eye-sore on
my future self.
csvtofsv also uses ASCII control codes as delimiters
I tried the format used by the gopher lawn. Like my YAML files, the
gopher lawn database has one file per record and one line per field,
presented in a vertical format. The record separator is an empty
line. The field separator is the EOL (end of the line). Each field
has a name, a colon character, a space, and optionally a value. Any
line that begins with whitespace is a line continuation from the
previous field.
Gopher Lawn database
This format is trivial to process in AWK. No special parser
required. I converted my private database to this format. Exporting
the whole AWK database to CSV took 0.3 seconds, compared to 3 seconds
in the Tcl & YAML version.
I added one feature: inline blocks of multi-line text. The block
format is the same as the line contination format, except the initial
value is a backslash character.
For example, here is a line continuation.
fieldname: First sentence.
Second sentence.
Third sentence.
When this value is read, the EOL and indentation are removed.
That's why this can also be represented without continuation.
fieldname: First sentence. Second sentence. Third sentence.
Here is an inline block of multi-line text.
fieldname: \
Line 1 of 3.
Line 2 of 3.
Line 3 of 3.
When this value is read, the indentation is removed, but the
EOL is preserved. The value contains multiple lines.
Time for me to shut up and show them the code. Below are two
small AWK scripts to convert from gopher lawn format to TSV
and back.
lawn2tsv.awk
tsv2lawn.awk
p.s.
In theory, if i wanted to migrate the data, i could use uncsv to
convert between TSV and CSV. GNU recutils can import and export
CSV.
I was told that the gopher lawn format resembles the Header Fields
format in email standards. See section 2.2 of RFC 5322.
gopher://gopher.32kb.net/0/rfc/rfc5322.txt
The VCARD format is also similar. See section 6.10 of RFC 6350 for
Extended Properties and Parameters. I could have abused this format
but i think it is too complex for my purposes.
gopher://gopher.32kb.net/0/rfc/rfc6350.txt
tags: bencollver,technical,unix
# Tags
bencollver
technical
unix
You are viewing proxied material from tilde.pink. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.