VF-1 updates and tips

VF-1 updates and tips
---------------------

I'm very happy to have had both the time and motivation to get quite a
bit of good progress made on my gopher client VF-1 recently. This
post will mostly be an update on some of the new functionality. But
first, some usage tips! I noticed recently that tfurrows has been
keeping a list of tips[1] on his circumlunar gopherhole (where he
explores the bookmarking functionality more deeply than I ever have!),
and thought I might contribute a few of my own.

First of all, an embarrassing revelation - when I recently wrote my
allegedly definitive guide[2] to viewing "long stuff", I forgot about
one option for handling long menus! There are so many options even I
forget them. That said, this one is not one of my favourites, it was
an early hack to help people who were struggling in earlier days
before "less" worked on menus. When you use "ls" to list the current
menu selectors, you can give it the "-r" option (i.e. "ls -r") to view
the listing in the reverse of the usual order. Thus, items at the top
which would normally go flying off the top of your screen appear at
the bottom where you can always see them. The obvious downside, of
course, is that stuff is backward. I don't really recommend this
approach, but thought I'd mention it for completeness.

Onto something a little more useful! You are probably already aware
that VF-1 lets you set any external command you like as a handler for
different kinds of content. By default, the handler for "text/plain"
is just good old "cat", which does nothing other than spit the text
onto your screen. If it overflows, you can run the "less" command to
look at it in your favourite pager. An alternative to this is to use
less as your text/plain handler, but feed it a few more options. For
the past few days I have been using "less -FXR %s" as my default plain
text handler. The -F option tells less to immediately quit if the
file is short enough that it fits entirely on one screen, and -X
option tells it not to clear the screen after exiting (as is the
default behaviour of more). What this does is basically turn less
into an automatic "cat if short, less if long" viewer. The -R is
just there so that ANSI colour codes don't get mangled (more on that
later). This means stuff never flies off the top of your screen, and
you never have to manually run less to read the top of something.
This results in a pretty seamless experience and I think I'll stick
with it.

Okay, time for new features.

Starting with something very minor, the "text/plain" handler is now
used for both item types 0 and 1, whereas previously it only worked
for type 0. This change was inspired by Tomasino who, when learning
about handlers, immediately set his to lolcat - something I'd never
heard of. I encourage you to check it out, even if only briefly for,
well, the lols. Basically you can pipe text through it and it uses
ANSI colour codes to render that text into a GLORIOUS RAINBOW. We're
talking hundreds of colours, each character slightly different from
it's neighbours. Tomasino was disappointed that this worked on
content but not on menus (which in his case means his entire phlog),
the handler is now applied to menus too so you can enjoy ubiquitous
rainbows in gopherspace. Tomasino was *also* disappointed that the
colours disappeared when he used the "less" command, because until
now that command ignored the text/plain handler and just fed the
content straight to less. Now, the "less" command runs your
text/plain handler and pipes the output of that to less (or rather,
less -R, to preserve colours), so you can get colours even when you
are lessing!

To more fundamental changes, Tomasino has once again spurred me to
make some improvements, in his recent championing of better support
for the "+" item type which is used to specify redundant severs, i.e.
gopher servers which host a mirror of the content at the current
server. The RFC is pretty vague about exactly how this is supposed
to work. Most modern clients take a very minimal approach to
supporting this, and just list the mirror items like they would any
other link but do something minor to indicated "hey, this is a
mirror". I think the intent was probably for clients to do a bit
more with this. The RFC has various comments in it which makes it
pretty clear (to me at least) that the target environment for gopher
was under-resourced university departments setting up servers on
whatever old and under-powered hardware they had lying around, and
spreading information over as many servers as possible to reduce
load. Early gopher servers were probably expected to fail regularly.
So VF-1 tries to handle + items in such a way as to reduce the pain
of servers. After seeing that content at server A is mirrored at
server B, if an attempt to fetch something from server A later
during the same VF-1 session results in any kind of network error,
VF-1 will automatically try to fetch the content from server B
instead. The usefulness of this in 2019 is arguably limited - for
one thing, modern gopher servers are probably extremely powerful and
extremely under-loaded compared to early servers, and for another
there is no caching of redundant servers, so if the "main" server
you attempt to visit is down, you have no way to learn what the
backups are. It's not perfect, but it's better than nothing, and
I'm proud that VF-1 actually makes an effort to *do* something with
this information.

Speaking of being proud, the other significant changes are related
to text decoding, and I suspect VF-1 might now be the best gopher
client in town for people who regularly visit content encoded in a
variety of non-UTF-8 forms. Tomasino had nothing to do with this
change, which was instead prompted by the latest user at
circumlunar.space, tengu[3], who had some initial problems serving
Russian text from his gopherhole there, whether using UTF-8 or
older Cyrillic encodings like KOI8-R or CP1251. With some digging,
it turned out that this was mostly the fault of Gophernicus, but
VF-1 could stand some improvement too.

In the earliest versions of VF-1, I assumed that all text coming
over the wire would be either ASCII or UTF-8 (which decode
identically) and left it at that. This worked fine for about a
week until someone on BBOARD reported that VF-1 died when trying
to read some news article over at floodgap's feeds. It turned out
that the article contained a name with an accented character in it,
which was encoded in ISO-8859-1. So, I did a bit of research,
learned that the 3 most commonly used encodings on the web are, in
order, UTF-8, ISO-8559-1 and CP1251. So, I updated VF-1 to try
these, in order, moving down the list each time one failed.

If you know anything about text encoding you'll recognise how
naive this was. Any text which is valid CP1251 is also valid
ISO-8559-1, so an attempt to decode as ISO-8559-1 will never fail.
It may result in gibberish, but it won't throw an exception, and
so CP1251 text will never be decoded properly. So, now VF-1
attempts to decode everything as UTF-8 first and, if that fails,
tries a single fallback encoding. That fallback defaults to
ISO-8559-1, but it is under direct and easy user control using the
"set" command, so you can do "set encoding cp1251" to change the
fallback. If you regularly deal with just one non-UTF-8 encoding,
you can of course stick this in your ~/.vf1rc file to make it
permanently.

But wait, there's more. There is a very nice Python library
called chardet which attempts to automatically detect text
encodings making use of language statistics. You can decode
CP1251 as if it were ISO-8559-1 no problem, but you'll end up with
gibberish text whose n-gram distribution won't match any natural
language. Chardet uses this fact to guess encodings and with a
little practice it seems to work quite well. Now, I am very proud
of the fact that VF-1 has no dependencies outside the Python
standard library and that all of the code is in one single file.
All of this makes it extremely easy to install, even in weird
environments where modern tools like pip are not available. I
don't ever want to change this, so VF-1 does *not* depend on
chardet. But, if you install it yourself, VF-1 will recognise that
it's there and adopt the alternative strategy of autodetecting
the encoding if UTF-8 fails, and will drop back to the
user-specified encoding only if chardet fails to identify an
encoding with confidence above 0.5. With chardet installed, I
was able to use VF-1 to cruise around some Russian gopher sites
tengu linked me to, and whether I encountered UTF-8, KOI8-R or
CP1251 encoding, it all Just Worked, which was tremendously
satisfying. VF-1+chardet seems bullet-proofly international,
which is fantastic.

As an aside, I was amused to note that the chardet FAQ[4] has the
following entry:

> Yippie! Screw the standards, I'll just auto-detect everything!

> Don't do that. Virtually every format and protocol contains
> a method for specifying character encoding.

The FAQ goes on to talk about HTTP, HTML, XML, etc. Out here on
the plain text frontier, of course, there ain't any such thing
(well, maybe the /caps.txt hack does something about this,
hmm...), so I don't feel bad at all about auto-detecting
everything. It's pretty much the only choice we have.

For the record, this is *not* something that I think it is
worth extending gopher to work around. There is a much simpler
and nicer solution, which is simply to use UTF-8 for absolutely
all new content in gopherspace, so that there is no *need* to
explicitly specify the character encoding.

That's all that's new, aside from some tiny tidy ups and fixes.
There are a few other small things I'd like to tackle, but it's
starting to feel pretty complete for me.

[1] gopher://zaibatsu.circumlunar.space:70/1/~tfurrows/tips/
[2] gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/looking-at-long-stuff-with-vf1.txt
[3] gopher://zaibatsu.circumlunar.space:70/1/~tengu/
[4] https://chardet.readthedocs.io/en/latest/faq.html