Low budget P2P content distribution with git

Low budget P2P content distribution with git
--------------------------------------------

In recent months I've spent a lot less time than is typical thinking
about anything to do with computers and the internet, but there is
one train of thought I've been repeatedly pondering. I had hoped to
write up a bunch of less technical stuff first (don't worry, that's
still coming - I'm kind of disappointed in myself that I've lapsed
into writing a massive computery, internety post so soon after coming
back to writing here. Bad Solderpunk! In penance I'm going not
going to write any more for at least a month - stay tuned for
cycling, environmentalism and manga, though), but it seems like this
technical idea has become just a little topical just recently, so
perhaps now is actually a good time to get this idea out there. Let
me be very clear from the outset that I'm just idea-sketching out
loud here. This isn't a new project, or anything, I'm not giving the
system I'm about to describe a name or committing to fleshing out the
details or anything like that. That's not to say nothing will ever
come of this, I just want to make it clear from the outset that these
ideas are half-baked at best and I'm absolutely not committed to
jumping head first into wherever this train of thought leads...

Protocols like Gemini and Gopher are an effective salve against many
of the miseries inflicted by the modern web, but by no means do they
solve *all* the web's problems. All three systems share the same big
picture architecture, namely that the default pattern of usage is
that content lives in exactly one place, a server which is online
24/7, 365 days a year and accessible from anywhere on Earth, and that
to consume this content you request a copy of it at the instant of
consumption, render it to the screen and then discard it (perhaps
after a relatively brief cache lifetime), leaving no persistent copy,
with the understanding that if you want to read something again next
week or month or year you'll just request a fresh copy and do all
this again. Because all three protocols work this way, all three of
them share a long list of common shortcomings, mostly about losing
access to stuff you'd like to still have access to. Online content
can become inaccessible to *you* in the short term if your internet
connection goes down. It can inaccessible to *anybody* in the short
term if the server goes down. It can become inaccessible to large
groups of people in the *long term* due to the ease with which
authoritarian governments can block access to a single server. It
can become inaccessible to *everybody* *forever* if the hosting
service disappears (think Geocities), or if the person running a
private server dies or is incapacitated and none of their friends or
family know which bills to pay to keep the thing up. These problems
can be mitigated to some extent via load sharing, content delivery
networks, caching proxies, etc. All these solutions involve setting
up yet *more* computers which are switched on and connected to the
net 24/7, which is expensive both financially and environmentally.
On a long enough timeline, the survival rate for all websites drops
to zero: find some mailing list archives from the late 90s or early
00s and try visiting all the URLs people shared in it. More than 90%
of them won't work. 20 or 25 years is not an awfully long time span
for this kind of decay to happen in.

None of these observations are new or exciting, and there are no
shortage of projects attempting to address various of these
shortcomings in various domains. You've maybe heard of DAT[1] and
IPFS[2] and SSB[3], and those are just the Johnny-come-latelies to
this sphere. Freenet[4] has been around for over 20 years, and I
don't doubt that it has predecessors of its own. What all of these
projects have in common is conceptual complexity. They're
distributed, decentralised, peer-to-peer, content-addressed,
cryptographically authenticated, and more. This isn't intended as
a criticism. These projects have a much higher ratio of essential
complexity to "empty complexity" than something like a modern web
browser, because they're trying to solve substantially more
difficult problems, making some conceptual complexity is
unavoidable. But all of the projects above and their associated
ideas have met with fairly limited implementation by developers and
fairly limited uptake by users, and I think the high barrier to
entry represented by a lot of conceptual complexity, even if it is
essential, is probably a large part of the reason for this (that
and a healthy serving of apathy, no doubt). I'm not trying to
say that the search for clever solutions to these problems is futile,
not at all. I'm just laying out what seem to me to be the facts.

Completely solving the problems associated with an always-online,
purely client-server web is never going to be easy. The wait for
something which works well enough and is user friendly enough to
facilitate serious uptake is going to be a long (though hopefully
worthwhile) one. In the meantime, it's tempting to wonder whether or
not there is some kind of "80:20" solution to these problems which
gets at least some of us at least some of the way there - enough of
the way to be worthwhile - without a huge learning curve. Lately
I've been thinking that maybe there is, and that maybe it's actually
not even all that hard. In fact it's so incredible simple that I'm
almost embarrassing to say it out loud, out of fear that if it were
*that* simple then people would *obviously* already be doing, so
clearly I've missed something big due to not being smart enough. Or
maybe some people well of my radar *are* doing this, and that's what
I've missed. Anyway, are you ready for this huge idea? Here it is.

Use git.

No, really, just use git. Not the way you're possibly using it
already (like I am), as a kind of deployment mechanism, where you
write your posts locally, commit them to a repo, then push to a
remote copy of that repo only you have access t only you have access
to, triggering a hook which checks out a copy of your work in
whatever directory your web/Gopher/Gemini server looks in (although,
if you're doing that, switching to using it the way I'm talking about
is a piece of cake). I'm talking about using git for small internet
content the way people use it for source code, as an actual
distribution mechanism for ending up with a local copy of something
on our disk that you then use offline (by compiling it, interpreting
it, etc). I'm talking about your text-centric online content being
nothing more than a public git repository. If somebody wants to read
your posts, they clone your repo. Then they've got your posts on
their disk, and they can read them from there. If they go offline, it
doesn't matter, because your stuff is on their disk. They can read
it today, and tomorrow, and next year. If your server goes offline,
it doesn't matter, because your stuff is on their disk. If they like
your stuff and want to read more of it, then next time both they and
you are online, they are one `git pull` away from getting any updates
you've made since their original clone. There is no need for Atom,
or RSS, or carefully formatted index pages with datestamps integrated
into link text. When distributing by git, visiting a site and
subscribing to a site are one and the same thing. No extra
technological concessions to the notion of "subscribability" are
needed. Furthermore, when distributing by git, visiting a site and
making a complete offline archive of the site are one and the same
thing. There's no need for slow, clumsy, error-prone and
admin-irritating loops of repeatedly fetching and parsing files using
tools like wget to discover the URL of every single resource in a
site. You just grab the whole thing at once in a single network
transaction, no parsing required. Git is actually better than
Atom/RSS and recursive wget combined! An Atom or RSS feed usually
only has the 10 or 20 most recent updates in it, so if you're offline
for a long time you'll miss some stuff. Git won't, you'll get every
commit made since your last pull. And a recursive wget just leaves
you with an offline copy of an entire site as it was in one point in
time. There's no way get *just* the new stuff one month later -
sure, with HTTP(S) you can use headers like If-Modified-Since to
avoid fetching new copies of stuff that's changed, but you still need
to make a request for every single page which *could* have changed.
With git you just pull and that's it.

I've barely scratched the surface here. I'm going to keep going, but
first let's really quickly think about this from a network privacy
point of view. Cloning or pulling a git repo involves making network
connections to *one* server, known in advance, and has no side
effects. There are no cookies or anything cookie-esque to tie
subsequent requests together at a more fine-grained or persistent
level than the IP address. This is much better than the web, and
exactly on par with Gemini and Gopher. If you want to, you can do
git stuff over HTTPS or SSH, and that's normal and standard, so in
this respect we're better than Gopher where plaintext is the only
option. But if you don't want to use crypto, or your computer can't
handle it, or you're using some futuristic internet overlay like
Yggdrasil so you get transport security without baking it into every
protocol, you can do a plaintext git:// clone. So for some folks
this is better than Gemini, where it's TLS or bust. But the
git-as-distribution-tool approach gives you something that none of
the web or Gopher or Gemini give you: it's one network transaction
for the *whole site*, and that's it. A git admin knows that you (or
rather, your IP address) has cloned their repo and now has all their
posts. But that's it. They don't know which posts you read, and
which ones you don't. They don't know which posts you read once and
which ones you read every day and which ones you only read in the
middle of cold, lonely nights. There is nothing like a "click
stream" for them to analyse. Even the boogeyman of "traffic
analysis", where the size and latency of opaque encrypted
transactions are used by third parties to reconstruct your path
through a public site gain no traction here. Your fine-grained
consumption habits are entirely invisible to everybody but you.
That's really neat!

One more brief digression: I've described everything so far in
network terms (and will get back to that shortly and then do it for
the rest of this post). But keep in mind, please, that there is
*nothing* network-centric about this idea. We're all very used to
doing git clones and pulls over TCP/IP, but you can clone and pull
from the filesystem just fine. Try it. Git won't bat an eyelid.
That means you can clone and pull from USB sticks and SD cards, which
means this whole thing works just fine over sneakernet. You don't
have to go "all in" on sneakernet, you can mix and match it with
networking in whatever proportion suits you, and transition slowly
from using mainly one to mainly the other on an as-needed basis. I
think about sneakernet a lot these days, and I think anybody else
who's interested in sustainable/perma-/salvage computing ought to as
well. I'll write more about this some other time. Let me just say
for now that the fact that this git-for-distribution thing works
seamlessly via sneakernet is a big plus for me.

Okay, back to the main thrust: by visiting/archiving/subscribing to a
site via git we get even more than Atom/RSS and recursive wget
combined can offer, with less effort on the part of either producer
or consumer. Jake. But so far we've still talking about readers
fetching content from a single authoritative source operated by
authors, so we still have a lot the usual centralisation problems.
This approach still puts a potentially heavy load on one
authoritative server, it still requires lots of long distance data
traffic, and if the author's server disappears forever *before* you
got a chance to clone the repo, you're out of luck. Getting past
these hurdles in a web/Gopher/Gemini context isn't easy. If I use
recursive wget to get a complete local copy of some website, then in
order to enable somebody else to use a recursive wget to get a
complete copy from *me* (because my server is closer, or more
reliable, or the original is gone) there's a lot more rigmarole
involved. I'd need to setup a webserver and point it at my copy, and
there's no guarantee that alone is enough. The site may not work
properly without suitable URL rewriting or redirecting rules or
similar configuration details in place on the server side. I'd need
to reproduce those settings exactly, and the information required to
do so is *not* something I'd end up with as a consequence of doing
the original recursive wget. So the whole procedure kind of only
works once, and can't reliably be chained, with an n-th party getting
a fully functioning copy from a (n-1)-th party's copy. Even if
redirects/rewrites weren't in the picture and this chaining *was*
possible, there'd naturally be a big question of trust, as at any
stage along the chain the site could be modified by somebody other
than the original author and you'd be none the wiser. But none of
these problems are there in the git version! You can clone a clone
of a clone no worries, that's normal. Everybody who "visits a site"
distributed by git has everything they need to *redistribute* the
site. And git has built-in support for signing commits with GPG,
which can go a long way toward resolving the trust problem (public
keys can be distributed as part of the repository itself, which works
out alright as long as you can be confident you make your initial
clone from the genuine origin - not foolproof, but much better than
nothing). All of this is just bog-standard git functionality, tried
and tested, nothing new or exciting, 100% ready to go and documented
in countless sources. This stuff is exactly what makes git a
*distributed* version control system. The new idea here is really
nothing more than using it to distribute writing to readers, instead
of source code to developers or users.

It turns out we've *had* a decentralised, distributed, offline-first
system for P2P storage and delivery of text files for 17 years now!
It was just created for an application very different from
blogging/phlogging/gemlogging. By the time git became an established
and familiar technology, the web was in the full blown grip of "web
2.0" fever, and static, non-interactive content that was 90% text was
consigned squarely to "the past". This resulted, I think, in a
missed connection, which maybe we can finally make. There's nothing
fundamentally wrong with interactivity, of course, nor with non-text
media, either. But I don't need to tell anybody who is reading this
via Gopher or Gemini that there's a whole universe of material which
is interesting, or informative, or useful, or amusing, or uplifting,
or otherwise valuable even if it's "just text" and even if you read
it days or weeks or months or years after it was originally written.
That's not a unique property of source code. It's true of our little
small internet world, too! Git is just perfect for distributing
exactly this kind of writing. You get delay-tolerant subscription
for free: Atom and RSS can go to the dustbin of history. Constant
internet connectivity is not required, although it doesn't hurt. You
can pull from all your repos four times a day every day if you live
all the time in an apartment with a permanent high-speed internet
connection. If you're trying to spend less time online because you
think that's better for you in some way(s), you can connect once in
the morning, pull from all your repos and then disconnect and read
what you received at your leisure. If you live on a boat and
sometimes go without internet access for weeks at a time, that works
just fine too. If you are travelling without regular internet access
and you meet somebody on the way who follows some of the same repos
you do, whichever one of you pulled from upstream less recently can
pull from one who did so more recently to get some updates on the
road - and then pull later from the official source once back in
civilisation, without this switching of sources causing any problems.
Stuff can continue to circulate for years after the original source
disappears, provided enough people were interested enough in the
first place to clone it and make their clones readable. To be
honest, this feels to me like it could be an even better small
internet platform than Gopher or Gemini, at least for some kinds of
content (for others, perhaps not - I'll return to this later).

Of course, this is nothing like a *real* solution to any of the nasty
problems of centralised client-server distribution. You can update
your clone of a git repo from some source other than the original,
official repo, and have confidence that what you get is genuine
thanks to PGP, sure - if you know about that other source in advance.
But there's no magic means by which knowing only the URL of the
original repo you can automatically find the most up-to-date third
party copy or copies which are online now and close by to you in
network terms and pull from them instead. That's the kind of hard
problem which makes real P2P systems complicated, and git does
nothing at all to solve these. But we can 80:20 around this to some
extent.

I've been vague up until now about exactly how this works in a hands
on, daily use kind of way. I'm not proposing we literally spend our
time doing git clones and git pulls manually by hand all the time
(although you *could* use this system that way, and that should be
seen as a feature, just like being able to access Gopherspace via
telnet). We can build tools to streamline things. This is largely
the reason, incidentally, for using git in particular and not
Mercurial or Fossil or whatever else might be hot these days. Git is
ubiquitous and isn't likely to stop being so anytime soon. It's been
ported everywhere - you can use git today on Plan 9 or Minix 3 or
whatever weird system floats your boat (are there still open source
descendants of Solaris out there? If there are, I bet they have
git). There are bindings to libgit in all major programming
languages, allowing you to automate this stuff. All this work has
already been done, and these tools are going to be kept up to date
and ported further and documented better by people who don't know and
don't care about the small group of dorks using git as a plain text
content distribution system. It's exactly the same philosophy behind
using TLS for Gemini and not something newer and better. Tiny
guerilla computing projects can't afford to ignore the opportunity to
have the enemy manufacture our weapons for us. So we build tools
based on git, because a lot of us already know how to build them, and
once they're built they'll be usable just about everywhere. We can
throw together something which has the look and feel of a traditional
Atom/RSS-based feed reader, but it's powered by git under the hood,
it just looks at timestamped commits to figure out which files were
updated when. And there's no reason we can't standardise on every
repo designed to be used in this way having (or *optionally* having)
a directory in the rep root with a well-known name which contains
simple .ini or .json or .yaml or whatever files (no doubt getting
everybody to agree on one of these would represent 99% of the work of
actually bringing this idea to fruition) that provide a little bit of
metadata in an easy-to-parse format. These could provide some of the
feed metadata that you'd traditionally find in Atom/RSS, like a
repository's title, subtitle, author, contact details and license
information. They could provide GPG public keys. And they could be
used to advertise the URLs of clones of the repo, its "official
mirrors", and maybe where these clones are in the world and at what
times of day they are mostly likely to be online (ditto for the
original). The git-aware app could register all those URLs as
additional "remotes" for the repo, and it could preferentially try to
pull from the nearest one most most likely to be up when the user
hits "refresh", and if that remote was down, it could fall back to
the second best choice, and so on. This involves some manual
coordination between authors and willing mirroring parties, and
introduces a kind of dichotomy between "official mirrors" and
"unofficial mirrors" which you'd need to learn about out of band and
tell your client about, but I suspect we can tackle this in the usual
grass-roots, small internet way and still end up somewhere better
than we are right now. It's far from perfect, but it's also far from
awful.

And we're *still* really just scratching the surface of what doing
this would enable. To make it explicit, we're talking about a system
where every participant keeps a full copy of the full history of
every site they visit on their hard drive indefinitely. This sounds
nuts at first. It also sounds nuts that in this system there is no
way to fetch just a single post - if you want to read one post that
somebody has told you about, you have to clone the full repo
containing said post. That's, in some sense, woefully inefficient!
These concerns diminish rapidly if we start thinking small. I've
been phlogging on Gopher for over four years now. Anybody who has
been following me all that time knows that I am *not* a succinct
writer. I am relentlessly verbose. And yet, my phlog directory is
1.7 megabytes. Having to clone that whole lot to read one post
doesn't seem so horrible knowing that. When visiting a single blog
post on the web today you could easily pull down a lot more than 1.7
MB of external fonts, style sheets, surveillance Javascript, flashy
background images and more. Cloning my whole phlog repo to read one
post is less efficient than using Gopher to fetch just that one post,
but it's still more efficient than the status quo of the web. Let's
suppose that I continue to phlog at the same exhausting level of
verbosity for fifty whole years in total. That would bring me up to
just over 21 MB, which we can round up to 25 MB to make things
simpler. Now, suppose you didn't want to just read *my* fifty years
of rambling, but you wanted to read the ramblings of *one hundred
people* who all wrote excessively for fifty years - arguably more
output than any person really has the time to read. This would bring
us up to 2.5 GB. That fits several times over on the smallest USB or
SD storage device you can buy. Businesses literally give that much
storage away for free in the form of promotional key chains. The
above calculations could be off by a factor of ten (git itself
obviously introduces some degree of storage overhead which I've
completely failed to address so far and, in truth, know almost
nothing about, but I'm pretty sure it's nothing like a factor of ten)
and the storage burden of 25 GB would still be underwhelming, even
for a 20 year old machine. We really can live this way. Text is
*small*.

Having full local copies of everything ever written by anybody whom
you've ever read a single small internet post by is a game changer in
and of itself. Stuff like archive.org becomes at least partially
obsolete, because you have the full history of each site locally.
You can, to some extent, be your own search engine. Obviously you
can't search your own disk to find stuff you've never previously
fetched, but you can easily find stuff you vaguely recall reading a
year ago, and if you've only just recently started following somebody
who has been writing for years, you can search their back catalogue.
You can ask your computer to find other posts you have on your disk
which are "similar to" some particular post, in terms of them both
using similar words or phrases which are otherwise rare. All sorts
of machine learning, pattern recognition, recommendation engine type
stuff could be done, if you wanted, but it's something you could do
yourself entirely on your own machine with complete control and
transparency and perfect privacy. If one of those metadata files in
a well-known location in every repo mentioned earlier was a kind of
machine-readable "git-roll" where authors could advertise the URLs of
other repos that they are reading, then you could even do a little
casual repo spidering (with a configurable maximum amount of disk
space and monthly bandwidth dedicated to this - possibly both set to
zero if you don't care for it). This all sounds somewhat futuristic,
but indexing and searching and identifying fuzzy conceptual
connections between a couple of gigabytes worth of text files is not
exactly the computational cutting edge. I'm starting to feel like in
some ways we have been denying ourselves super powers for years
simply by continuing to distribute our content in a fashion which
makes it really impractical to grab sites wholesale, even though the
bandwidth and disk space required to do this (for simple text files,
anyway) has long been easy to come by.

I've been unrelentingly positive about this whole prospect so far.
So many benefits to content distribution via git! Aren't there any
problems? Well, sure. There are two big ones that I've identified
so far. One is technical, the other is, uhh, sociological? Or
something? Let's deal with that one first. The basic issue is that
stuff on the internet can become unavailable in two different ways.
Sometimes stuff disappears involuntarily - due to technical faults,
censorship, business failures, financial problems, etc. But
sometimes stuff disappears because the author didn't want it up
anymore and willingly took it down, which feels like a reasonable
thing for authors to be able to do if they like. We might, very
roughly, think of these as "bad disappearances" and "good
disappearances", respectively. The problem is that it's not possible
to solve the bad disappearance problems without making good
disappearances impossible. Publishing something via this git system
is in principle permanent and irreversible. If just one person
clones or pulls from your repo before you take it down, other people
can pull/clone from them and there's nothing you can do to stop this
beyond asking nicely. It's not just "taking stuff down" that becomes
infeasible. If you change your mind about something you wrote ten
years ago and want to change it, you can do so - but everybody
"subscribed" to your repository will be notified of this fact and
will be able to see both the before and after versions. This kind of
publishing is, by necessarily, radically long-lasting and radically
transparent in a way that people aren't used to and many may not be
ready for.

Many will say that the internet is *already* like this and you can
never guarantee that anything you publish, via any protocol, won't be
redistributed forever. This is exactly right. It's the very nature
of a global network of general purpose computing devices, and we
should never fool ourselves into thinking that any technology can
prevent this. Furthermore, this isn't a problem unique to using git
for publication, it's going to be a problem in *any* solution to
these problems. Does that mean we should just forget about this
issue? Maybe not. Just because something is always possible in
principle doesn't mean that making it as quick and easy and
convenient as possible will be without consequence. An internet
which never forgets is handy in a lot of ways and in a lot of fields
of endeavour. It's also strongly mismatched with human social
psychology and norms. The small internet crowd tends to place a lot
of emphasis on "human scale" computing and on personal connections,
so I think this is worth flagging this and encouraging people to
think about it. But I do also think it's possible to overstate how
big of a deal this is. Maybe I've already done that. I dunno.

The other big problem, the technical one, is that of linking. That
whole hypertext thing. Let's consider a "gitlog", i.e. a
blog/phlog/gemlog-style resource which is published exclusively via a
public git repository, and is not hosted on any of the traditional
server-client request-per-page protocols ("gitlog" is a horrible name
for this thing because it will cause massive confusion and search
engine collision with the `git log` command, but I'll use it as a
placeholder for now). Internal links within one gitlog are
straightforward (at least if it's in HTML or gemtext, both of which
support relative URLs), but how does the author of a post in this log
provide a link to an individual post in another gitlog? An
unambiguous pointer to an individual gitlog post necessarily has two
parts: the URL of (any clone of) the repository, and a path relative
to the repository root indicating the file containing the post in
question. I am not aware of any pre-existing URL scheme for
unambiguously conveying both these things at once, nor of any
pre-existing hypertext format which allows "two part" links. It's
not remotely hard to imagine how to cook up either one, perfectly
straightforward in fact, but ugh, once we do that this stops being a
super minimal "just use this existing thing to distribute your
arbitrary existing text, with maybe a tiny bit of optional helper
metadata sprinkled in if you want" approach and becomes a whole
*thing* with its own unique format which you have to buy into. I
really like that a lot of people are basically already 100% geared up
to distribute their smol content this way by just making the private
repository they already use for deployment publically readable, super
quick and easy, no other change required. Anything which stands in
the way of that feels like a bad idea. But without a standalone
pure-gitlog linking solution, the whole system is limited to
bihosting scenarios, where git-based distribution kind of lurks
behind the scenes, and plain old gopher:// or gemini:// links what we
actually include in our posts. This is not great, but perhaps
something we can live with? Maybe there's a convention where if your
Gopher or Gemini content is also available in this way, you configure
your Gopher/Gemini server to respond to requests for a certain
well-known endpoint with (i) a git repo URL and (ii) a regex or
some-such for transforming your gopher:// or gemini:// URLs into
paths relative to your repository root? That would work, I think, as
a kind of easy machine-readable "gateway" from the Gopher/Gemini view
of things to the git view of things. Maybe there are even better
ways? I don't mean to suggest we can't somehow make this linking
thing at least roughly work, I'm just highlighting that this is the
most substantial issue I've thought of without an obvious and easy
solution. I still think there's something worth pursuing inside all
this. Hell, even if we give up on external linking, that's not the
end of the world. The short, whimsical urban fantasy of Joneworlds[5]
is a genuine gem of modern Gopher/Geminispace, and the vast majority
of it is entirely self-contained and very little would be lost by
distributing it without any links at all.

I think at last that this is all I have to say about this for now. I
mean, there's more I could say, all sorts of little details, but I
think this is enough for now, to get the idea out there. I am happy
to release this idea into the electronic wild and see what kind of
life, if any, it may take up in the minds of the denizens of the
small internet. I look forward to hearing people's thoughts.

I've written all this up over the past week or so (took longer than I
thought it would!), but the core of the idea has been brewing for a
few months, and I've been influenced in various ways by some stuff
that I wrote and also some stuff I read over the past year or two.
I'm going to try to dump links to all of these influential things
below, but this probably won't be exhaustive, sorry.

I got started thinking about using git for content distribution
during the formation of Circumlunar Space's newish zine project,
Circumlunar Transmissions[6]. The zines are hosted via Gemini and
Gopher, but you can also clone a git repo, the idea being that people
can use this to easily host Gemini or Gopher mirrors. At some point
I started wondering about the possibility of just skipping that last
part and distributing it entirely via git. A zine is a *perfect* use
case for this. No sane person expects a zine to be editable or
deletable after release.

I was motivated to write this stuff up *now*, not in another week or
a month, by a recent post by ploum[7] which asked "Could we imagine a
decentralised and delay-tolerant network simple enough so you could
implement it in a day?", and envisaged a system where folders of
PGP-signed Markdown documents are copied to the local disk and
browsed from there. There are some extra ideas in there about using
the system for something like email, too. If you ignore that part
and just focus on the Markdown distribution, well, I think git
basically already does this.

Not long at all after I read that post and decided I had better start
writing, I came across a short article[8], inspired by the off-grid
working habits of the 100 rabbits crew[9] (who produce code and art
while living on a 10 meter yacht, occasionally taking breaks to make
awe-inspiring and death-defying ocean crossings), which asserted that
"saving pages with wget is like low-budget p2p" (inspiring the title
of this post), and asked a bunch of provocative questions:

* What if the browser was local-first?
* What if websites showed up as files and folders on my computer?
* What if the browser saved a copy of everything I bookmarked?
* What if I had my own personal wayback machine?
* What if I had a little local Google that could search the full text
of everything I’ve ever saved?
* What if I could copy those website files and remix them? Add links.
Mark them up with highlights Write margin notes.
* What if the whole web was built around copying/remixing/sharing?

Using git for distribution straightforwardly opens up all of these
possibilities.

In writing this up, I tapped into some older ideas. The first is one
of my own: I made a phlog post about 2 years ago wherein I claimed
that "on a computer with even vaguely modern specs, it would probably
be possible to use a Gopher client which *automatically and
immediately* archived every singe documented you visited, as you
visited it, and maintained a searchable full text index of those
archives, without this being unduly taxing on processor time or disk
space"[10]. I discussed this in the context of the value of being able
to easily disappear, which is something we lose with git distribution.

I take the question of when, if ever, radically permanent online
writing make sense as seriously a I do because of an even earlier
past made by Alex Schroeder[11], concerning the Secure Scuttlebut
protocol. Alex says "I don’t like systems where I cannot delete
things. I don’t need non-repudiation since I’m talking to people,
not signing contracts. Basically a “unforgeable append-only”
system is similar to a legal set of contracts and not at all like
conversations in real life". He's well aware that it's impossible to
guarantee all participants in a distributed system will comply with a
request to delete something, but still thinks it's better to build
systems which at least *try* to let us undo. I'm still not 100% sure
I agree, but I totally understand and respect the perspective.
Systems with a hard "no take backs" property shouldn't be designed or
used lightly - especially not in light-hearted social contexts, where
they seem an especially bad match.

Drew Devault wrote about a year ago[12] about better (open-source,
non-commercial, pro-privacy) approaches to search engines, in which
he floats the idea of search engines not crawling the entire web, but
limiting themselves to a list of "tier 1" domains which are
"authoritative or high-quality sources for their respective
specializations", as well as pages which link to tier 1 domains. My
idea of doing just a little bit of exploratory spidering and indexing
of the git repos which are advertised in those repos you subscribe
to, ending up with the ability to search a small, socially-defined
corner of "gitspace", was inspired by this. Search is a useful
feature even if you can't search *all* the things.

[1] https://www.datprotocol.com/
[2] https://ipfs.io/
[3] https://scuttlebot.io/more/protocols/secure-scuttlebutt.html
[4] https://freenetproject.org/
[5] gopher://republic.circumlunar.space:70/1/~joneworlds/
[6] gopher://republic.circumlunar.space:70/1/zine
[7] gemini://rawtext.club/~ploum/2021-10-10.gmi
[8] https://subconscious.substack.com/p/saving-copies-of-everything-is-like
[9] gemini://gemini.circumlunar.space/users/hundredrabbits/
[10] gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-individual-archivist-and-ghosts-of-gophers-past.txt
[11] gopher://alexschroeder.ch:70/0page/2018-06-29_No_Take_Back
[12] gemini://drewdevault.com/2020/11/17/Better-than-DuckDuckGo.gmi