* * * * *

          Some musings on bloated software and software performance

This blog post about bloated software [1] got me to thinking about some
recreational programming I recently engaged in.

A while ago I broke down and wrote a program to solve Jumbles [2]. It's a
very straightforward and simple program too:

> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <ctype.h>
>
> #define WORDS   "/usr/share/dict/words"
>
> /***********************************************/
>
> static int cmp(const void *l,const void *r)
> {
>   int cl;
>   int cr;
>
>   cl = *(const char *)l;
>   cr = *(const char *)r;
>
>   return cl - cr;
> }
>
> /**********************************************/
>
> int main(int argc,char *argv[])
> {
>   FILE   *fpin;
>   char   *p;
>   char    work  [BUFSIZ];
>   size_t  len;
>   char    buffer[BUFSIZ];
>   char    dwork [BUFSIZ];
>   size_t  dlen;
>
>   if (argc != 2)
>   {
>     fprintf(stderr,"usage: %s word\n",argv[0]);
>     return(EXIT_FAILURE);
>   }
>
>   strcpy(work,argv[1]);
>   len  = strlen(work);
>
>   for (p = work ; *p ; p++)
>     *p = toupper(*p);
>
>   qsort(work,len,1,cmp);
>   fpin = fopen(WORDS,"r");
>   if (fpin == NULL)
>   {
>     fprintf(stderr,"Huston, we have a problem ... \n");
>     return(EXIT_FAILURE);
>   }
>
>   while(fgets(buffer,BUFSIZ,fpin))
>   {
>     strcpy(dwork,buffer);
>
>     for (p = dwork ; *p ; p++)
>     {
>       if (*p == '\n')
>       {
>         *p = '\0';
>         break;
>       }
>       *p = toupper(*p);
>     }
>
>     dlen = p - dwork;
>     if (dlen != len) continue;
>     qsort(dwork,dlen,1,cmp);
>     if (strcmp(dwork,work) == 0)
>       printf("%*.*s\n",dlen,dlen,buffer);
>   }
>
>   fclose(fpin);
>   return(EXIT_SUCCESS);
> }
>

The whole trick to the program is to take the letters we're trying to
unscramble, say “gerrof”, convert them all to uppercase, “GERROF”, then sort
the letters, “EFGORR”. Then go through a list of words (in this case, the
file /usr/share/dict/words) and for each word in that list, go through the
same process, then compare the sorted sets of letters. If they match, that's
the answer (or one of several answers).

And as written, it can do about 6 words a second (164 seconds to process
1,000 words, but I'm re-reading the word list file each time). And that's
certainly fast enough for most people.

But I wrote another version. Just as simple—or rather, even simpler:

> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <ctype.h>
> #include <assert.h>
>
> #include "words.h"
>
> typedef unsigned long Letters;
>
> /*************************************/
>
> Letters has_letters(const char *s)
> {
>   Letters set = 0;
>
>   assert(s != NULL);
>
>   for ( ; *s ; s++)
>   {
>     if (
>          ((*s < 'A') || (*s > 'z'))
>          || ((*s >'Z') && (*s < 'a'))
>        )
>     {
>       set |= CHAR_OTHER;
>     }
>     else
>     {
>       set |= (1uL << (toupper(*s) - 'A'));
>     }
>   }
>
>   return set;
> }
>
> /********************************************/
>
> static int cmp(const void *l,const void *r)
> {
>   int cl;
>   int cr;
>
>   cl = *(const char *)l;
>   cr = *(const char *)r;
>
>   return cl - cr;
> }
>
> /********************************************/
>
> int main(int argc,char *argv[])
> {
>   char    work [BUFSIZ];
>   char    dwork[BUFSIZ];
>   char   *p;
>   size_t  len;
>   size_t  i;
>   size_t  j;
>   Letters set;
>
>   if (argc < 2)
>   {
>     fprintf(stderr,"usage: %s word ...\n",argv[0]);
>     return EXIT_FAILURE;
>   }
>
>   for (i = 1 ; i < argc ; i++)
>   {
>     strcpy(work,argv[i]);
>     len = strlen(work);
>
>     for (p = work ; *p ; p++)
>       *p = toupper(*p);
>
>     qsort(work,len,1,cmp);
>     set = has_letters(work);
>
>     for (j = 0 ; j < cswords ; j++)
>     {
>       if (set != cwords[j].letters) continue;
>       if (len != cwords[j].word.size) continue;
>       strcpy(dwork,cwords[j].word.text);
>       for (p = dwork ; *p ; p++)
>         *p = toupper(*p);
>       qsort(dwork,len,1,cmp);
>       if (strcmp(dwork,work) == 0)
>         printf("%s ",cwords[j].word.text);
>     }
>     printf("\n");
>   }
>
>   return EXIT_SUCCESS;
> }
>
>

It works pretty much the same way, except for two major differences:

 1. it tests potential words against a set of letters. In this case, the set
    is a 32-bit quantity with a bit set for each letter in the word (an “A”
    in the word will have bit 0 set, a “C” would have bit 2 set, etc, as you
    can see from the code above) and the biggie:
 2. instead of reading in the list of words each time, there's a large
    static array of all the words (source not shown here, since the source
    file for that is 62M (Megabytes) in size), which also includes the size
    and letter set of each word.

Yes, a static array of all the words (actually, it's a bit more than that
since each entry also contains parts of speech and synonyms, all courtesy of
The Moby Project [3]), which “bloats” the executable program by 15M. There is
no overhead of processing the data (in tests, it would take about five
seconds to read everything in, which doesn't sound like much, but I'm getting
to that) and even better—since the entire structure is constant, it can be
stored in the read-only section of the program executable, which helps avoid
excessive paging (since the operating system can discard any pages not
recently used instead of paging out to swap space; if it needs the pages it
can page them in directly from the executable file).

The end result of this massive “bloat” is that the second program can process
694 words per second! (or 1.44 seconds to process 1,000 words).

A hundred times faster.

That's quite a speed-up.

It's quite bloated too—an additional 15M worth.

So it's not quite as simple as saying that programs need to be less “bloated”
it also depends on how the program is written. This also shows how certain
classes of optimization can block off flexibility. I can easily add or remove
words from consideration, since the program reads in an external file. I
cannot, however, do that easily for the second program—it would require a
recompile.

Even though a program like the lastest version of Microsoft Word is
“bloated,” it can also do a ton of stuff that Microsoft Word from the mid 80s
couldn't do—like real time spell checking for instance. It could probably be
made to run faster, either through dropping certain features (which may not
be that bad of an idea) or certainly, profiling the code and speeding up the
hot spots (if there are any in a program of that size).

I actually suspect the major reason Micosoft Word is slow is due to paging.
Slap enough memory in a Windows box, and turn off paging and it would
probably run at a decent speed (once loaded, that is).

[1] http://blog.micropledge.com/2008/06/snappy-software/
[2] gopher://gopher.conman.org/0Phlog:2007/06/27.2
[3] http://en.wikipedia.org/wiki/Moby_Project

Email author at [email protected]