The individual archivist, and ghosts of Gophers past

The individual archivist, and ghosts of Gophers past
----------------------------------------------------

Foreword: This post has been a *long* time coming. The ball was set
rolling by kvothe's departure from the phlogosphere in late July. New
ideas on the matter popped into my head more recently, prompting me to
finish it at last. So, it's not exactly fresh with regard to its
specific motivating example, but the issues are no less relevant. My
standard hyper-verbosity disclaimer applies!

The Zaibatsu has had, from very early days, a policy which allows
sundogs to request that their account be removed and all their content
immediately and permanently deleted. This is called "claiming your
civil right", which is part of the Schismatrix theme. The Orientation
Guide explains:

> This promise is not a gimmick to tie into the Schismatrix theme. It
> is a recognition that the ability to delete your accounts from
> online services is an important part of self-ownership of your
> digital identity. This is genuinely an important freedom and one
> which many modern online services do not offer, or deliberately make
> very difficult to access.

I have always been, and still am, proud that the Zaibatsu offers this
right so explicitly and unconditionally, and I have no plans to change
it. I really think this an important thing.

And yet, it always breaks my heart a little when somebody actually
claims their right, and it's especially tough when a large amount of
high-quality gopherspace content disappears with them. As several
people phlogged about noticing, kvothe recently chose to leave
gopherspace, taking with him his wonderful, long-running and
Bongusta-aggregated phlog "The Dialtone", which he had migrated from
SDF to the Zaibatsu. I loved having kvothe as part of our community,
but of course fully respect his right to move on.

As I deleted his home directory, I thought to myself "Man, I wish
there was an archive.org equivalent for Gopherspace, so that this
great phlog wasn't lost forever". A minute later I thought "Wait...
that is *totally* inconsistent with the entire civil right
philosophy!". Ever since, I've been trying to reconcile these
conflicting feelings and figure out what I *actually* believe.

Far from objecting to archive.org's activities on the web, I've come
to think of it as a valuable public service. I suppose I tend to
assume - and I have no data on how warranted this assumption is - that
most of the webpages that I am grateful to find have been preserved by
archive.org have disappeared from their original homes on the internet
not through the deliberate will of the authors, but due to various
unintentional processes of digital decay: commercial web hosts go out
of business, people lose their access to webspace provided by an ISP
or university, people lose interest in a website and stop paying to
have it hosted without necessarily actively wanting it gone, or people
die and nobody they leave behind hows how to keep the site alive, or
perhaps even knows that the site exists! It seems clear to me that
there is no harm in publicly archiving pages which disappear in this
manner. Often the information preserved by doing so is of great
practical value, or historical interest, or both.

In the case of pages which *were* deliberately removed by their
author, things seem to get murkier. How does one balance the right of
the author to control the lifespan of their own work against the
various "greater goods" which are served by having stuff stick around
forever? It's worth noting that the possibility of "unpublishing"
something is a relatively recent development. There has never been a
way to unpublish books, songs or films after warehouses full of
physical books, tapes, discs whatever have been manufactured. Because
of this I suspect there is an unusual lack of existing experience or
careful thought about the question. Now that we *can* unpublish
things, is it wrong to take away people's option to do so?

You might think that, having instituted the civil right policy at the
Zaibatsu, I've taken a strong stance on this. Actually, my decision
to put that policy in place was driven by my frustration at being
unable to delete accounts on websites. Often times, that frustration
is not borne from me wanting to unpublish public material (which even
sites with no way to delete accounts will often let you do) but from
wanting to get myself out of the site's database, so my email address,
private messages, login times and IP addresses, browser fingerprints,
etc. aren't sitting around waiting to be sold to or stolen by
marketers, spammers or other ne'er-do-wells. I've never actually
given much deep though to the question of the right to unpublish.

It seems that with regards to the web, at least, this philosophical
question has more or less been bulldozed by the sheer technical
possibility of something like archive.org existing - in much the same
way that a lot of questions surrounding the copyright of music were,
for a lot of people, bulldozed by the possibility of P2P filesharing.
At least within geek circles, archive.org is so well-known that it is
widely understood and generally accepted that an unavoidable part of
the act of publishing something online is that it may well be around
forever. Whether we like this or not, we have to live with it because
there is no way to prevent it - tools like robots.txt have never been,
and can never be, more than a "gentleman's agreement". As long as
there are computers with hard drives connected to the internet, stuff
might stick around forever, and it's naive to pretend otherwise.

Gopher is no exception here. The Zaibatsu's civil right policy is
meaningful in practice only because there is no equivalent of
archive.org for Gopher. But there is no such equivalent only because
nobody has yet bothered to build one. One may come, one day, and if
it does we'll be powerless to stop it. We might protest against its
coming mightily - I suspect, based on the things I've seen people
write about questions surrounding Gopher search engines, that such a
service would be pretty unpopular - but the people bringing it would
likely say to us "What? Why on Earth did you ever think this
wouldn't happen? How do you think the internet works?", and to some
extent it would be hard to argue against this. Just because something
can be done doesn't mean it should be done, but in the case of the
internet (perhaps technology more widely, too!) if something can be
done it almost certainly eventually will and so it's nothing more than
an exercise in denial to get deeply attached to its temporary absence.

It's hard not to get attached, though, because I think many people
will agree that the way Gopherspace functions right now feels really
nice. Heck, there is, or was, a phlog over at SDF with a tagline of
"Because Google probably doesn't index this", or something to that
effect. People clearly feel the need for an online space where they
can exist in the comfort of knowing that not everything they write is
immediately publicly searchable and preserved forever. How can you
not get attached to that?

Right now, Gopherspace is small enough, and tightly-knit enough, and
ideologically-driven enough, that a culture of rejecting this kind of
thing - making it taboo, if you like - could probably keep archiving
at bay for a while. The cultural preferences of Gopherspace
inhabitants already seem to keep at bay a lot of things which are
perfectly technically possible with the protocol, like serving a lot
of HTML. Even if we don't actually want to try to actively fight back
against the arrival of archiving or extensive indexing to Gopherspace,
I do think it's good to consciously appreciate and savour it, for the
time that we can.

What if we *do* want to actively fight back? Well, as said there's
ultimately little we can do because you just straight up can't prevent
these things from being done. But as a kind of soft resistance, there
might be value in adopting alternative solutions to the (real)
problems that an archive.org for Gopher would solve. I think that
unlike the web, we might *have* a viable alternative, which takes
advantage of Gopher's extreme simplicity.

Archiving a website has never been entirely straightforward. You
can't just save a single HTML file to disk and expect it to work like
the original. This may have worked in the very earliest days of the
web, but it wouldn't have been long before you had to also parse that
HTML file and look for included external resources, most likely
images, and download those, too (and then possibly transform the
downloaded HTML to change absolute URLs for external resources to
relative URLs which will work from the disc). When CSS arrived,
stylesheets became one more component you'd have to archive. Yes,
carefully designed websites will function well enough with images and
stylesheets missing, but that hasn't been true for the average website
for a long time. Today, archiving a website feels like a Herculean
technical challenge. External stylesheets, fonts and images are just
the beginning - modern sites completely fail without dozens of
externally hosted scripts, many of which may try to pull in any of
the above kind of resource from external sources whose URLs are not
even pre-determined before site is executed ("viewed" is far too
simple a term for a modern website). It doesn't seem like it would
be hard at all to build a site which was impossible in principle to
meaningfully archive. Archive.org probably hates the modern web even
more than us Gopher-dwelling retrogrouches!

Notably, Gopher does *not* have this problem. Most items of Gopher
content consist, entirely, of a single text file. Saved to disk, this
single file, viewed offline 10 years later after the original server
has vanished, is in every way equivalent to its original hosted
version. We've got it better than the web, and its actually easy to
underestimate just how much better off we've got it. Just how much
better of are we? I would submit that on a computer with even vaguely
modern specs, it would probably be possible to use a Gopher client
which *automatically and immediately* archived every singe documented
you visited, as you visited it, and maintained a searchable full text
index of those archives, without this being unduly taxing on processor
time or disk space. Imagine that!

This is quite a super power, and it enables everybody who surfs
Gopherspace to act as an "individual archivist", forever preserving
the things we see for our own personal reference later. If I'd been
using such a client, Kvothe's Dialtone phlog would still be available
to me to re-read at my leisure after he claimed his civil right,
whilst being unavailable to any new readers. This seems to strike
quite a nice balance between the interests of content producers and
consumers. It's a human-scale solution which goes a very long way
toward obviating the need for anything like a public archive or search
index of all of Gopherspace. Obviously it can't replace a search
engine for solving the problem of finding resources you aren't already
aware of, but I would say that the vast majority of the times I've
wished for a full text Gopher search engine it's been because I wanted
to rediscover something that I remember reading a few weeks ago but
now can't recall where.

Like many people, I enjoy greatly the fact that modern Gopherspace is
small and intimate. It's a place by humans and for humans, where it's
still very possible to disappear and be forgotten. That's very
valuable! Search indexes and archiving services threaten this
feeling, and a lot of Gopherites are opposed to them for this reason.
At the same time, it's hard to deny that such "intrusions" into
Gopherspace solve real problems and could be incredibly useful. Deep
down I know that these things are probably inevitable, especially if
Gopherspace continues to grow rapidly. When they come I'll try to
accept them gracefully. But in the meantime I think that individual
archiving offers a solution to the most pressing problems such
services would solve, in a way which still retains the precious
feeling of a Gopherspace where we are *not* watched over by machines
of loving grace.

Well, except for the NSA machines which presumably log all plaintext
internet traffic.