----------------------------------------
The bad libraries dilemma
August 15th, 2021
----------------------------------------

Dear readers,

I hope there's some software developer among you,
possibly someone who is involved in open source.  I
would need a piece of advice (you can mail me, dacav at
tilde.institute).

Let's start with some context.

In my spare time I like to work on personal projects,
which usually involve writing software.

As mentioned before (on this phlog, and in other
places), the current one is crossbow(1), a minimal feed
aggregator (RSS, Atom), written in C, for Unix
operating systems.

The project started simple about two years ago, when I
stumbled into a similar software named rsstail(1).
rsstail(1) works roughly like tail(1): it polls a feed
and prints updates.

Since I wanted to change it a little, I checked out its
code, and found that the logic is quite simple.  Most
of the work is done by a library called libmrss, which
depends in turn on another library called libnxml.

Using the libmrss is trivial, so it was easy for me to
build a first prototype for my software, which
gradually evolved to what I initially wanted from
rsstail(1).

One day I decided to try the AFL fuzzer to get an
insight of how [in]secure my software is.  AFL managed
to make it crash, but as it turned out, all the
highlighted problems were affecting the dependencies.

The right thing to do was of course to fix the bugs,
and so I did.  The same happened in a couple of
subsequent occasions, when I was notified about crashes
by the users of my software.

Over time I realised that I might have a serious
problem going with libmrss and libnxml:

- The code is full of bugs, of different severities:
 ranging from memory leaks to access beyond array
 boundaries.  Apparently, I just scratched the surface
 of it.  This code has obviously never been tested
 thoroughly!

- The author is somewhat irresponsive: every time I
 submit a pull request I have also to urge him via
 email.  If anything he doesn't reject the patches,
 however…

- …he is not inclined to do a proper release process.
 After multiple requests, he finally made up his mind
 to bump the version.  Yet he neglected to add a tag,
 and to release a proper distribution tarball[1].
 Ultimately this means that the various software
 distributions shipping these libraries will not see
 the patched versions.

To put it simple, while these dependencies allowed me
to bootstrap into a working software, at the current
stage they are mostly a source of frustration.
Basically I've inherited lots of technical debt, and
who is responsible for it clearly lost interest in his
projects.

I meditated about the possibility of bailing out: I
would use a good XML parsing library such as Mini-XML,
and implement handling of the RSS and Atom formats on
the top of it.  I started to sketch something, but
there's a long way ahead.

On the other hand, improving the two libraries I'm
currently relying upon would be good for the community,
as they are already established, distributed, and used
by some other projects.

Given the situation with the author, however, the best
course of action would be to fork, and this would in
turn mean a lot of work: much more than what my spare
time allows me to do.  I should also mention that the
libraries do way too much, and it's not hard to guess
that the features I'm not using are buggy too.

What should I do?


---
[1] I'm feeling now the urge of entering a digression
   on the topic of distribution tarballs, and on the
   fact that github/gitlab stupidly distribute source
   tarballs instead.  This is however a topic for
   another post.