Okay, what failed this time?

* * * * *

Okay, what failed this time?

I'm running the regression tests for “Project: Wolowizard [1]” and about half
way through the tests (around the two hour mark or so) start failing.
Sometimes expected results just aren't showing up. I'm freaking about a bit
because of all the issues we've had in running these tests, only for it to
start failing in yet a different way.

Now, a bit about how this all works—there are four computers involved; one
runs the tests, injecting messages towards a mini-cluster of two machines,
either of which (depending on which one gets the message) sends a message to
the fourth machine, which does a bunch of processing (which may involve
interaction with a simulated cell phone on the testing machine), then
responds back to the mini-cluster, which then responds back to the testing
machine.

Now, I can check the immedate results from the mini-cluster, but the actual
data I'm interested in is logged via syslog, so I have that data forwarded to
the testing machine and my code grovels through a log file for the actual
data I want. And it's that data (or part thereof) that apparently isn't being
logged, and thus, the tests are failing.

Now, it just so happens that the part of the test that's failing is the part
dealing with the mini-cluster, and it looks like about half the tests are
failing (hmm …. ).

I log into each of the two computers comprising the mini-cluster, and check
/etc/syslog.conf, in the off chance that changed. Nope. I then explain the
problem to Bunny, standing (or rather, sitting) in as my cardboard programmer
[2] when it hits me—I should check to see if the program is running.

Rats. It is.

The tests are still failing, and my shoes began to squeak. [3]

Okay, just because syslogd is running doesn't necessarily mean it's running
correctly. So I run logger -p local1.info FOO on each machine and yes, one of
the machines is failing to foward the logs to the testing machine.

Ahah!

I restart syslogd on that system, and lo! The log entries are getting through
now.

You know, I expect there to be issues with the stuff I'm testing; what I
don't expect is the stuff that we didn't write is having issues (the Protocol
Stack From Hell™ notwithstanding).

Okay, reset everything and start the regression test over again …

Update in the wee-hours of the morning, Friday, January 20^th, 2012

A bit over half-way through the regression tests, and the log files rotate.
Aaaaaaaaaah! Okay, reset all the data, and start from the last failed test.
That's easy, since I can specify which cases to run. That's hard, because I
have to specify nearly a 100 cases. That's easy, since I can use the Unix
command seq to list them. That's hard, because the test cases aren't just
numbers, but things like “1.b.77” and “1.c.18”, and while the shell supports
command line expantion from a running program via the backtick (ala for i in
`seq 34 77`; do echo 1.b.$i; done) I need to nest two such operations (echo
`for i in `seq 34 77`;do echo 1.b.$i; done`) to specify the test cases from
the command line, and the command line doesn't support that. Okay, I can
create a temporary file that lists the test cases …

[1] gopher://gopher.conman.org/0Phlog:2010/10/11.1
[2] http://c2.com/cgi/wiki?CardboardProgrammer
[3] http://www.lyricstime.com/2nu-ponderous-lyrics.html

Email author at [email protected]