View source | |
# 2024-11-02 - No YAML, No Recutils | |
Back around 2011 i wrote a private database, and at some point | |
migrated it to a file based format. Each record is a YAML file. | |
The data is presented in a vertical format with one line per field, | |
plus some multi-line blocks defined by indentation. Trivial to | |
edit in any text editor. I used Tcl and the yaml module from libtcl | |
to process the data. | |
Recently i found the No YAML web site and i decided it was time | |
for a change. | |
No YAML | |
I briefly considered CSV, TSV, and JSON. They are all mature, | |
standardized formats, but in my opinion they fall short when it | |
comes to editing in a vertical format in a plain text editor. | |
I looked at GNU Recutils, since several folks wrote about it their | |
phlogs. Like my YAML files, recutils presents the data in a vertical | |
format with one line per field, plus it can do multi-line blocks via | |
line continuations. The format is fine for my purposes. | |
One of my requirements is that i want this to work on FreeDOS too. | |
Recutils requires filesystem support for ACL, which is too fancy for | |
DOS. The format is simple enough, but the source code is | |
surprisingly complex. It does a fraction of what sqlite3 does, and | |
in a less portable, less robust way. | |
I tried making my own format based on ASCII control codes. I could | |
use Control-^ (the RS character) as the record separator, Control-_ | |
(the US character) as the unit separator AKA the field separator, and | |
Control-X (the CAN or Cancel character) to discard all text since the | |
beginning of the field. This format works in ed(1) and the calvin vi | |
clone on DOS. However, the control characters are a little ridiculous | |
to look at and type in. I did not want to foist such an eye-sore on | |
my future self. | |
csvtofsv also uses ASCII control codes as delimiters | |
I tried the format used by the gopher lawn. Like my YAML files, the | |
gopher lawn database has one file per record and one line per field, | |
presented in a vertical format. The record separator is an empty | |
line. The field separator is the EOL (end of the line). Each field | |
has a name, a colon character, a space, and optionally a value. Any | |
line that begins with whitespace is a line continuation from the | |
previous field. | |
Gopher Lawn database | |
This format is trivial to process in AWK. No special parser | |
required. I converted my private database to this format. Exporting | |
the whole AWK database to CSV took 0.3 seconds, compared to 3 seconds | |
in the Tcl & YAML version. | |
I added one feature: inline blocks of multi-line text. The block | |
format is the same as the line contination format, except the initial | |
value is a backslash character. | |
For example, here is a line continuation. | |
fieldname: First sentence. | |
Second sentence. | |
Third sentence. | |
When this value is read, the EOL and indentation are removed. | |
That's why this can also be represented without continuation. | |
fieldname: First sentence. Second sentence. Third sentence. | |
Here is an inline block of multi-line text. | |
fieldname: \ | |
Line 1 of 3. | |
Line 2 of 3. | |
Line 3 of 3. | |
When this value is read, the indentation is removed, but the | |
EOL is preserved. The value contains multiple lines. | |
Time for me to shut up and show them the code. Below are two | |
small AWK scripts to convert from gopher lawn format to TSV | |
and back. | |
lawn2tsv.awk | |
tsv2lawn.awk | |
p.s. | |
In theory, if i wanted to migrate the data, i could use uncsv to | |
convert between TSV and CSV. GNU recutils can import and export | |
CSV. | |
I was told that the gopher lawn format resembles the Header Fields | |
format in email standards. See section 2.2 of RFC 5322. | |
gopher://gopher.32kb.net/0/rfc/rfc5322.txt | |
The VCARD format is also similar. See section 6.10 of RFC 6350 for | |
Extended Properties and Parameters. I could have abused this format | |
but i think it is too complex for my purposes. | |
gopher://gopher.32kb.net/0/rfc/rfc6350.txt | |
tags: bencollver,technical,unix | |
# Tags | |
bencollver | |
technical | |
unix |