Zaibatsu upgrade, setuid woes
-----------------------------

Last week I upgraded the OS at the Zaibatsu (if any sundogs have seen
scary warnings from ssh/scp about changed keys, this is the reason
why!  It's safe to connect - email me if you are extra paranoid and
want to confirm the new fingerprint).  We had been running the stable
release of Debian 8 (Stretch) since the server was setup over two
years ago.  Stretch is pretty darn old now, and with Debian stable
typically shipping slightly old software from day one, we were well
behind the times.  Which, for the most part, I imagine was absolutely
fine by most people involved.  We're not exactly all about being at
the cutting edge here!

But with very old versions of programming languages installed it was
becoming increasingly common for new projects to not run smoothly,
and upgrades to security libraries like OpenSSL are always a good
thing.  So I've been ready to uprade for a while, but have had to
wait for the VPS provider RamNode to be ready as well.  The Zaibatsu
is an OpenVZ VPS, meaning it shares a running Linux kernel with
several other VPSes (virtualisation magic keeps the processes,
filesystems etc. of one VPS invisible from the others).  Because we
don't have our own independent kernel, we can't upgrade whenever we
like (or fiddle with most sysctl settings, either).  We can only run
the kernels that RamNode provide.  However, RamNode semi-recently
updated their OpenVZ infrastructure to a non-legacy version, meaning
they were at last able to offer some more modern OS options, including
Debian 10 (Buster).  However, there's no possibility for a typical
"in place" upgrade via the standard `apt` tooling.  I basically had
to back the whole thing up, re-image the Zaibatsu with Buster,
reinstall all the software and restore our files.  The recent May Day
long weekend gave me a good opportunity to do this with enough time to
hopefully fix any issues if things really went South.

For the most part, things went pretty smoothly.  No data was lost, and
I prioritised getting the gopher and mail servers running again ASAP
so that the visible downtime to the rest of the world with regard to
our basic services was very short.  I'm glad I did this with a lot of
spare time available, though, because it took me what seemed like an
eternity to get our BBS working again.

The Circumlunar BBS works as follows (it's not proprietary software
with unknown inner workings like SDF's BBOARD is!): all posts are
stored in a board/thread/post directory structure in /var/bbs, which
is owned by a "bbs" user.  The content is world readable, so if you
really want you can poke around and read everything with cd, ls and
cat/less/more.  Or write your own read-only client, or search engine,
or whatever.  To post, you have to use one of the approved clients,
which are setuid and owned by the bbs user, enabling them to create
new files in /var/bbs.  These clients ensure that people's correct
usernames are attached to posts - if we just made /var/bbs
world-writable, people could impersonate other users, or vandalise
other user's posts.  Not that I think any sundogs would do that, but
at some point we hoped other pubnixes would pick this software up so
we didn't want to necessarily assume a small, high-trust society.

There's nothing super innovative about this - take a look in
/usr/games and you'll find many of those binaries are setgid and have
their group set to "games".  This is precisely to allow any user on a
multiuser unix system update a shared high score file in a controlled
way.  Even though the games (and our BBS clients) are free software,
if a user cloned the repo and modified the game to give them free
points, without root access they can't make their new hax0red binary
owned by the "games" group, so they can't cheat their way to the top
of the scoreboard.

The main BBS client we use is written in Lua, because of its low
memory footprint (we have 128MB of RAM to share amongst all logged in
users!).  Now, running interpreted programs (or shell scripts) set
setuid or setgid is not straightforward.  Because setuid programs
which are owned by the root user are potentially big security holes,
and because shell scripts in particular are hard to guarantee the
security of because users can easily influence their behaviour via
environment variables, aliases, etc., most modern unixes do not allow
shells or interpreters to run setuid, and this extends to the Lua
interpreter.  The way around this is to write a small "wrapper" in a
compiled language like C, which does nothing but call the interpeter
you want with the required arguments.  The resulting binary wrapper
can be made setuid/setgid no problems.

People online will scream at you that you are going directly to
sysadmin hell if you attempt this, but they are being lazy: many, many
people seem to have forgotten that setuid binaries are not necessarily
owned by root.  If they are owned by a low-privilege user/group like
"games" or "bbs" then the risk is comparatively minimal and there's no
reason to freak out.  Most likely, in this day and age where using
unix as an actual, genuine multi-user system is seen more as quaint
historical re-enactment than serious computing, they're unable to
conceive of what the point would be in a non-root setuid program.
Heck, maybe they don't even have a /usr/games/ directory!!!

Anyway, we have just such a setuid binary wrapper for our BBS client,
and for whatever reason it was not working correctly after the
upgrade, leaving the BBS in a read-only state.  I went nuts trying to
figure out why it had stopped working, verifying that it was owned by
the correct user, that the setuid bit was set, etc.  Eventually I
discovered that between the versions of dash (Debian's default
/bin/sh) shipped with Stretch and with Buster, a change was made where
the first thing the shell does is check whether its true and effective
uid are equal and, if they aren't, drops priveleges so that they are.
Because the binary wrapper used the C standard library's `system()`
function to lauch the Lua interpeter, and because `system()` uses
/bin/sh to launch things, we were ending up with a non-priveleged Lua.
This change to dash was made "for security reasons".  It really
pisses me off that nobody thought to implement this feature in such a
way that the shell checks whether its effective uid is 0 and only
drops privileges in such a case, realising that the threat from
non-root setuids is minior.

At first I thought this would be an easy fix - I changed the wrapper
to use one of the `exec()` functions instead of `system()` so that
/bin/sh was not invoked.  This resulted in a privelged Lua interpeter!
However, because Lua has no built-in filesystem support, the client
relies crucially on launching standard unix tools like cp, mv, rm,
chown and chmod to maintain all the correct permissions in /var/bbs in
a secure way.  And Lua launches those tools...using `system()`, and
hence /bin/sh.  So the problem was only partially solved.

In the end, I hunted down and installed an older version of dash which
didn't drop privileges and everything immediately just worked.  It's
not an ideal solution, but the BBS is a central part of our community
and I wanted it up and running ASAP.  However, this situation does
pretty well rule out the notion of this system becoming widely
deployed at other pubnixes.  I will have to see if I can either get
sudo to function as an alternative binary wrapper (this is widely
advised online, but I have tried previously to apply it to our
situation and although I've forgotten the details I recall that there
was a very real and principled reason why it didn't work for us), or
if we can reimplement the client in a language which compiles to a
binary.  I'm not optimistic on that last part, though.  Nobody wants
to write something like this in C in this day and age.  However,
modern compiled languages tend to have extremely smart and
high-powered toolchains (in order to manage the dependency hell that
modern software development fashion seems to eagerly embrace) which
don't play well in a low-resource pubnix environment.  Rust wants
every single user to have a multi-hundred megabyte copy of the
toolchain in their home directory, and "Hello, world" has about 250
dependencies - one of which was always released only yesterday.  Go
can be installed system wide in the traditional sane manner, but
128MB of RAM is not enough to compile anything I've tried.  Sigh...

Aside from this saga, all other problems have been pretty minor.
Systemd adoption is much more widespread in Buster than it was in
Stretch, but I'm trying to see this as a learning opportunity rather
than a hassle.  We are now no longer using xinet to run some things
that we previously were, they are now implemented as systemd
socket-activated services.  I still think the death of /etc/rc.local
is as a dirt-simple place-of-last-resort to stick system
initialisation code is a tragedy, but I'll spare any further
complaining until a later post at least.  Basically I think we are
now right back to where we were, in terms of everything working, but
with a much more modern environment which isn't frighteningly close
to EOL.  Hopefully I don't have to do this again for another few
years!