* * * * *

                             Adventures in Utext

> There is one point on the ASCII (American Standard Code for Information
> Interchange) ↔︎ JS (JavaScript) spectrum that I haven’t seen, and it’s one
> that, as I use Unicode in more complex ways on Gwern.net and have learned
> how many obscure features or characters Unicode has, I increasingly think
> has been neglected: only UTF (Unicode Transformation Format)-8 text
> rendered by a monospace font. Not ASCII, not a weird subset of SGML
> (Standard Generalized Markup Language), not troff, not raw terminal codes,
> not bitmaps encoded in ASCII—just UTF-8. This document format does only
> what pure Unicode text can do—but does everything that pure Unicode can do,
> which turns out to be a lot. What if we take Unicode literally, but not
> seriously?
>
> Your typical plain text output strips all formatting. At the most
> ambitious, it might have a Unicode superscript or fraction. But we can do
> so much more!
>

“Utext: Rich Unicode Documents · Gwern.net [1]”

That was an interesting read (your mileage may vary).

To generate the gopher and Gemini versions of my blog, I parse the HTML
(HyperText Markup Language) [2] and generate either plain text (for gopher)
or Gemtext for Gemini. And I'm still not entirely happy with the output. For
emphasized text, I would translate that to “*emphasized*”, which is … okay, I
guess? And for [DELETED-deleted-DELETED] text—that was a harder to deal with,
and I ended up with “[DELETED-deleted-DELETED]” text.

There's no excuse for that.

But after reading about Utext, and Uncode's COMBINING SHORT STROKE OVERLAY
[3] and COMBINING LOW LINE [4] I thought I might try using those for some
typographical niceties that you don't normally get with plain text. And
that's when I learned that not all virtual terminals support all of Unicode
all that well. And wraping text is … not that trivial anymore [5].

Ah well. For now, it seems to be working, but it remains to be seen if I like
the results.

Update on Friday, December 8^th, 2023

I reverted this change due to issues [6].


[1] https://gwern.net/utext
[2] gopher://gopher.conman.org/0Phlog:2021/12/06.2
[3] https://en.wikipedia.org/wiki/Strikethrough#Unicode
[4] https://en.wikipedia.org/wiki/Underscore#Unicode
[5] https://www.unicode.org/reports/tr14/
[6] gopher://gopher.conman.org/0Phlog:2023/12/08.1

Email author at [email protected]