===========
3 streams
===========

Recently I've got a task at work to run some network-bound jobs
"online" (or very quickly and in parallel), which were previously
getting queued and executed a few at a time. It seemed like a
potentially awkward task involving plenty of rewriting: data
segments/heaps/other bits of thousands of processes running in
parallel would cause a rather high and unnecessary overhead (the
overhead would be lower for languages without garbage collection and
with less bloated libraries though, but in this case it was about 1 MB
per process), so they had to be merged into a single process.

Fortunately I have overestimated the complexity there. The programs
were using standard textual streams for IPC (and I actually considered
the possibility of running them as daemons from the beginning), so it
was straightforward to abstract that a bit, and introduce a second
mode for running them as daemons: UNIX domain sockets and (user-level)
threads. So now it's either thread, socket, and logging, or process,
std{in,out}, and stderr -- rather simple and standard in both cases,
and individual programs don't have to care which mode they'll run in.
There's actually no need to keep both modes, but individual processes
with standard streams are still a bit easier to use for manual
invocation, and support of both allows a smooth transition. The
overall approach resembles a transition from CGI to FastCGI.

Though I have also underestimated another part: still
rewriting/refactoring the scheduler, which collects tasks from
multiple sources and applies some rules (which weren't precisely
specified, aren't necessarily useful, often try to work around warts
of remote devices, and weren't planned initially -- hence the
refactoring). I keep wondering whether it can be made generic to some
extent (and implemented as a daemon or a library, akin to cron or
systemd timers), since it looks like a generic problem. But with all
the custom sources and rules it probably wouldn't make much sense: the
initial time-based scheduling itself is an easy and small part.

But boring bits aside, I am once again pleased with how nice and
generic the basic and common approach of using 3 streams for
input/output and logging is, compared to seemingly popular in
enterprise software MQs and RPCs.


----

:Date: 2019-04-13