| View source | |
| # 2024-11-02 - No YAML, No Recutils | |
| Back around 2011 i wrote a private database, and at some point | |
| migrated it to a file based format. Each record is a YAML file. | |
| The data is presented in a vertical format with one line per field, | |
| plus some multi-line blocks defined by indentation. Trivial to | |
| edit in any text editor. I used Tcl and the yaml module from libtcl | |
| to process the data. | |
| Recently i found the No YAML web site and i decided it was time | |
| for a change. | |
| No YAML | |
| I briefly considered CSV, TSV, and JSON. They are all mature, | |
| standardized formats, but in my opinion they fall short when it | |
| comes to editing in a vertical format in a plain text editor. | |
| I looked at GNU Recutils, since several folks wrote about it their | |
| phlogs. Like my YAML files, recutils presents the data in a vertical | |
| format with one line per field, plus it can do multi-line blocks via | |
| line continuations. The format is fine for my purposes. | |
| One of my requirements is that i want this to work on FreeDOS too. | |
| Recutils requires filesystem support for ACL, which is too fancy for | |
| DOS. The format is simple enough, but the source code is | |
| surprisingly complex. It does a fraction of what sqlite3 does, and | |
| in a less portable, less robust way. | |
| I tried making my own format based on ASCII control codes. I could | |
| use Control-^ (the RS character) as the record separator, Control-_ | |
| (the US character) as the unit separator AKA the field separator, and | |
| Control-X (the CAN or Cancel character) to discard all text since the | |
| beginning of the field. This format works in ed(1) and the calvin vi | |
| clone on DOS. However, the control characters are a little ridiculous | |
| to look at and type in. I did not want to foist such an eye-sore on | |
| my future self. | |
| csvtofsv also uses ASCII control codes as delimiters | |
| I tried another format based on something i found online. Like my | |
| YAML files, this format has one file per record and one line per | |
| field, presented in a vertical format. The record separator is an | |
| empty line. The field separator is the EOL (end of the line). Each | |
| field has a name, a colon character, a space, and optionally a value. | |
| Any line that begins with whitespace is a line continuation from the | |
| previous field. | |
| This format is trivial to process in AWK. No special parser | |
| required. I converted my private database to this format. Exporting | |
| the whole AWK database to CSV took 0.3 seconds, compared to 3 seconds | |
| in the Tcl & YAML version. | |
| I added one feature: inline blocks of multi-line text. The block | |
| format is the same as the line contination format, except the initial | |
| value is a backslash character. | |
| For example, here is a line continuation. | |
| fieldname: First sentence. | |
| Second sentence. | |
| Third sentence. | |
| When this value is read, the EOL and indentation are removed. | |
| That's why this can also be represented without continuation. | |
| fieldname: First sentence. Second sentence. Third sentence. | |
| Here is an inline block of multi-line text. | |
| fieldname: \ | |
| Line 1 of 3. | |
| Line 2 of 3. | |
| Line 3 of 3. | |
| When this value is read, the indentation is removed, but the | |
| EOL is preserved. The value contains multiple lines. | |
| Time for me to shut up and show them the code. Below are two | |
| small AWK scripts to convert from gopher lawn format to TSV | |
| and back. | |
| lawn2tsv.awk | |
| tsv2lawn.awk | |
| p.s. | |
| In theory, if i wanted to migrate the data, i could use uncsv to | |
| convert between TSV and CSV. GNU recutils can import and export | |
| CSV. | |
| I was told that the gopher lawn format resembles the Header Fields | |
| format in email standards. See section 2.2 of RFC 5322. | |
| gopher://gopher.32kb.net/0/rfc/rfc5322.txt | |
| The VCARD format is also similar. See section 6.10 of RFC 6350 for | |
| Extended Properties and Parameters. I could have abused this format | |
| but i think it is too complex for my purposes. | |
| gopher://gopher.32kb.net/0/rfc/rfc6350.txt | |
| tags: bencollver,technical,unix | |
| # Tags | |
| bencollver | |
| technical | |
| unix |