I need help to make good scientific software

Part of my job as a PhD student is to write data analysis software for [a
future space mission](https://www.lisamission.org/). Because the launch is
about 10 years into the future, there is still a lot of R&D being done, so all
that I am writing now is really a prototype. It may or may not be used in the
final pipeline.

Now, I really _want_ my contribution to this effort to survive the next decade.
I want it to be there, even if only in essence, in the pipeline that will run
the first analysis of the mission data. But whether my code will make the cut
depends on many things. Some of them could be grouped into the umbrella of
"software quality".

Software of good quality, in this context, is something that will happily work
in other machines, years or decades from now, with no hassle. That won't crash,
won't require a special environment to be set up for it, won't yell at the user
about dependency version incompatibilities. That will scale up and down to the
size of the problem at hand and the computing resources available. Something
that will just do its job efficiently, silently, and above all correctly.

In practice that's just an ideal. We physicists are not very good at making
robust, interoperable, _pleasant to use_ software. We don't have much training
on software, we mostly learn to write for loops that compute big formulas. But
I want to try.

## The plea for help

Now, given that time is finite and my thesis won't write itself, I need
practical advice on doing this (if you have some experience and are interested,
please [give me a hand!](mailto:[email protected])). Some specific
constraints are the following:

- I am pushing for the use of Julia in an environment dominated by Python and
C, but I am not actually experienced in Julia. I just think it is a wiser
engineering choice.
- I am writing signal processing and Monte Carlo routines.
- This has to run in standard Linux x86_64 clusters (slurm, kubernetes).
- My routines should be easy to call from Python and C for interoperability.
- We might want to throw GPUs at the problem.

## What I thought so far

From my little experience I know that writing unit tests by hand takes time and
can only catch the bugs you anticipate anyway; [inline snapshot
testing](https://ianthehenry.com/posts/my-kind-of-repl/) at least helps by
making that easier. Random test generation [seems
promising](https://danluu.com/testing/) and I will try out
[Supposition.jl](https://seelengrab.github.io/Supposition.jl) very soon. Formal
methods stuff like [Alloy](https://alloytools.org/) and
[TLA+](https://lamport.azurewebsites.net/tla/tla.html) look fun but I have way
too little time to learn them, and besides they seem most useful for concurrent
algorithms which is not what I am doing.

On the robustness and ease of installation side of things, I want to avoid
dependencies like the plague, unless they are very stable.

Fortunately for now I depend only on (other than Julia itself) the C library
[fftw](https://www.fftw.org/) and its [Julia
bindings](https://juliamath.github.io/FFTW.jl). Those look stable enough to me
that I shouldn't worry about using them. I am wondering if I should
[vendor](https://htmx.org/essays/vendoring/) them to make things easy for the
users and myself. Other Julia packages that I use (Revise, PyPlot) are only for
development and so I don't count them. Supposition.jl is an interesting case;
if I do end up using it I should have some test suites that depend on it. That
is still just a development dependency, but unlike Revise and PyPlot, its usage
goes into the source tree. I have no idea what I should do then.

For interoperability with C, I know that [compilation of Julia code into C
libraries](https://julialang.github.io/PackageCompiler.jl/) is possible but I
never did it. [The other
direction](https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/)
should be easy.

For interoperability with Python, there is
[PythonCall](https://github.com/JuliaPy/PythonCall.jl) which I hope won't break
too much. But if we are ambitious, in the long term this is not needed: all
current Python code can be replaced by Julia. Not in the timeframe of my PhD,
though.

## Where that leaves us

This week I will try out this fancy _property driven development_ thing. It
looks like it can be very effective, and fun. Hopefully this won't distract me
too much from the actual thesis goals.

Anyway, cheers!

tags: software