* * * * *
Yet even more stupid benchmarks
Yet another silly optimization problem [1]. This time, from a silly coding
challenge [2] to find the number of integers expressible with unique digits
(that is, no single digit repeats) in a base-10 representation up to the
value 10,000,000,000 (there are 8,877,690 such numbers, by the way).
The neatest and fastest solution was final program on this page [3], written
in C#. It generates only such numbers; it doesn't try to test each number.
Since I don't use C#, I decided to translate the code in C to play around
with it. Wasn't all that hard:
> #include <stdio.h>
> #include <stdlib.h>
>
> int total = 0;
> const int pre[(1 << 10) + 1] /* = { ... } */ ;
>
> void generate2(
> int maxlen,
> int currentlen,
> int availabledigits,
> int currentvalue
> )
> {
> int last = (currentlen == maxlen - 1);
> int x = availabledigits;
>
> while(x != 0)
> {
> int digit = pre[x ^ (x & (x - 1))];
> x &= (x - 1);
>
> if (digit == 0 && currentvalue == 0)
> continue;
>
> if (last)
> ++total;
> else
> generate2(
> maxlen,
> currentlen + 1,
> availabledigits & ~(1 << digit),
> (currentvalue * 10) + digit
> );
> }
> }
>
> int main(int argc,char *argv[])
> {
> int len;
>
> for (len = 1 ; len <= 10 ; len++)
> generate2(len,0,0xFFF >> 2,0);
>
> printf("total: %d\n",total);
> return EXIT_SUCCESS;
> }
>
I pregenerated the pre[] array since I wanted this to run as fast as
possible. The code used to generate the array:
> for (i = 0 ; i <= 10 ; i++)
> pre[1 << i] = i;
>
Anyway, once written and compiled (gcc -O4 -fomit-frame-pointer f.c) it ran
in about 0.2 seconds (average run) on a 2.6GHz (gigaHertz) machine. Fast, but
I could go faster by running it across the two CPU (Central Processing Unit)s
in the box. I was expecting about half the runtime, since this is easily
parallelizable.
It ran in about 0.16 seconds, a rather disappointing ¾ time. I commented out
the code in generate2() just to test the overhead of threading and
syncronization and that isn't a factor (program ran in 0.001 seconds).
Undaunted, I decided to try one of the quad-core boxes at The Office.
Reworked the code a bit to split the load between four CPUs as evenly as
possible, and ran some tests.
0.13 seconds on average. Still not quite half the speed.
Hmmm …
[1]
gopher://gopher.conman.org/0Phlog:2008/08/07.2
[2]
http://beust.com/weblog/archives/000491.html
[3]
http://www.indiangeek.net/2008/08/29/a-case-study-in-micro-optimization/
Email author at
[email protected]