Using fuzz testing in physics research

Using fuzz testing in physics research

First, thanks to those who talked to me after [last
post](i-need-help-to-make-good-scientific-software.html). Your help is
appreciated! I've received tips on using Julia's environments and Manifest.toml
to get things working on other machines with little fuss; I'll definitely check
that out, although for now what I want is to make a library, not a standalone
program. It should be easy to call from Python and C, easy to vendor, and so
on. I was also pointed to Julia's discourse forum, I'll ask there when I get
the time.

Now properly speaking of fuzz testing / property-based testing, Supposition.jl,
and how that's helping me do research.

If you don't know what those are, here's a quick recap:

Writing unit tests for your code takes long, is boring, and mostly catches the
bugs you anticipate anyway, so it's not a great investment of your time. Fuzz
testing means that instead you have the computer randomly generate thousands of
tests, exploring many possible combinations of inputs that you couldn't think
of. Traditionally, a fuzzer will just check if your program crashes or not, but
you can also ask other kinds of questions to it. You can ask things like: is
the output of this function always a valid input to that function? Is FooError
the only possible exception raised by this method? Is this equation always
satisfied?

Asking those questions is _checking properties_ of your code, and that's where
the name "property-based testing" comes from. In pratice, to do this, you have
to

1. Tell the computer how to generate inputs to your code
2. Tell it what properties the code should satisfy
3. Let it do its thing. It will find counterexamples and automatically shrink
them to give you the smallest, simplest one that reproduces the bug.

Finally, if you write the tests first and the code later, this is called
property-driven development (like TDD, test-driven development, but smarter).
And Supposition.jl is a Julia package designed for property-based testing.

I started using Supposition.jl this Tuesday (4 days ago).

I'm having a great time with it! It didn't take long to learn, the API is
clean. And it has found so many useful counterexamples that when a test passes
I'm really confident that the property in question is satisfied.

It has found trivial and shallow bugs, where I just didn't understand Julia
well enough (for example, type instabilities). It has found important, subtle
bugs in my code where the logic was wrong, but required just the right
combination of inputs to see. And it has found bugs in the math itself: I'm
implementing stuff from a paper and it seems that some of its mathematical
claims are wrong, because Supposition.jl found counterexamples.

This is great because I would never have thought of those counterexamples in so
little time, or at all. It led me to a rabbit hole of math that was ultimately
very productive, because now I have a much deeper understanding of what I'm
doing, I know what the authors of the mistaken papers missed, how to fix it,
and even how to make it more practical. Overall, great investment of my time as
a researcher.

And to emphasize: I didn't spend this week playing with Supposition.jl! I
learned all I needed to learn from around half an hour in total looking up
things in the documentation, and that was it. All my brainpower went to fixing
bugs and understanding my math, while the fuzzer was the one trying to find
clever counterexamples. I'm honestly impressed with how well it worked and how
easy it was. It finds bugs faster than you can fix them.

It looks like fuzz testing is useful not just for software but for research.
I'm definitely going to put any general claims I make in a future publication
through a fuzzer before submitting it. It's a cheaper, faster version of having
your local mathematician check your stuff to find counterexamples; it's not as
rigorous, but damn does it catch edge cases.

A good physicist, like a good programmer, always takes the time to check that
their reasoning is correct, at least in the common case. But it's rare to have
the time to rigorously verify that every step is completely correct in every
possible case. Given that we are not going to use formal mathematics to prove
our stuff correct anyway, fuzz testing is the cheapest, most effective option
to find those mistakes. And I'm glad I'm using it.

tags: software, research