* * * * *

                   We could have skipped writing a program

A bit of background—“Project: Sippy-Cup [1]” uses data from a single column
from a database to do its job. It doesn't query the database directly since
we have a tight deadline, so there's a custom binary file that contains
around 100,000,000 records, each record having a unique key and a 32-bit
value. It doesn't matter what the key or the value is, just that this file
exists. So, with that out of the way …

I was at lunch today with some fellow cow-orkers. Talk turned towards a QA
(Quality Assurance) engineer who was tasked by my friend TS  (a senior QA
engineer) to write a program to scan the data file used by “Project: Sippy-
Cup” and count each unique value. I had written such a program in Lua (which
worked by directly reading the binary file itself—easy enough since “Project:
Sippy-Cup” is in Lua and has to read the binary file). TS wrote one in Python
to do the work from a text dump of the binary file. The text output is just:

-----[ data ]-----
unique-key-1 = value
unique-key-2 = value
-----[ END OF LINE ]-----

It's not hard to parse, it's just that the text dump is 100,000,000 lines
long.

The QA engineer in question couldn't get his program to work.

It was only after lunch did I realize that none of us had to write a program.
No, all it would have taken was running:

-----[ shell ]-----
GenericUnixPrompt> dump-proprietary-data -s Project-Sippy-Cup.data \
       | awk '{print $3}' \
       | sort \
       | uniq -c \
       | sort -rn \
       > /tmp/report.out
-----[ END OF LINE ]-----

Sigh.

[1] gopher://gopher.conman.org/0Phlog:2014/03/05.1

Email author at [email protected]