HHHHoooowwww TTTToooo SSSStttteeeeaaaallll CCCCooooddddeeee
                            oooorrrr
              IIIInnnnvvvveeeennnnttttiiiinnnngggg TTTThhhheeee WWWWhhhheeeeeeeellll OOOOnnnnllllyyyy OOOOnnnncccceeee


                      Henry Spencer

                 Zoology Computer Systems
                      25 Harbord St.
                  University of Toronto
               Toronto, Ont. M5S1A1  Canada
         {allegra,ihnp4,decvax,utai}!utzoo!henry


                         _A_B_S_T_R_A_C_T

         Much  is  said  about  ``standing  on   other
    people's  shoulders, not their toes'', but in fact
    the wheel is re-invented every day in  the  Unix/C
    community.   Worse, often it is re-invented badly,
    with bumps, corners, and cracks.  There  are  ways
    of  avoiding  this: some of them bad, some of them
    good, most of them  under-appreciated  and  under-
    used.



_I_n_t_r_o_d_u_c_t_i_o_n

``Everyone knows'' that that the UNIX/C|- community  and  its
programmers are the very paragons of re-use of software.  In
some ways this is true.  Brian Kernighan [1] and others have
waxed  eloquent about how outstanding UNIX is as an environ-
ment for software re-use.  Pipes, the shell, and the  design
of programs as `filters' do much to encourage programmers to
build on others' work rather  than  starting  from  scratch.
Major  applications can be, and often are, written without a
line of C.  Of course, there are always people who insist on
doing  everything  themselves,  often citing `efficiency' as
the compelling reason why they can't possibly build  on  the
work  of  others (see [2] for some commentary on this).  But
surely these are the lamentable exceptions, rather than  the
rule?

Well, in a word, no.

At the level of shell programming, yes, software  re-use  is
widespread in the UNIX/C community.  Not quite as widespread
_________________________
|- UNIX is a trademark of Bell Laboratories.




                    February 21, 1989





                          - 2 -


or as effective as it might be, but definitely common.  When
the  time  comes to write programs in C, however, the situa-
tion changes.  It took a radical change in directory  format
to  make people use a library to read directories.  Many new
programs still contain hand-crafted code  to  analyze  their
arguments,  even though prefabricated help for this has been
available for years.   C  programmers  tend  to  think  that
``re-using  software''  means  being able to take the source
for an existing program and edit it to  produce  the  source
for  a new one.  While that _i_s a useful technique, there are
better ways.

Why does it matter that re-invention is rampant?  Apart from
the  obvious, that programmers have more work to do, I mean?
Well, extra work for  the  programmers  is  not  exactly  an
unmixed  blessing,  even  from  the  programmers' viewpoint!
Time spent re-inventing facilities that are  already  avail-
able  is  time  that is _n_o_t available to improve user inter-
faces, or to make the program run faster, or to  chase  down
the  proverbial  Last Bug.  Or, to get really picky, to make
the code readable and  clear  so  that  our  successors  can
_u_n_d_e_r_s_t_a_n_d it.

Even more seriously, re-invented wheels  are  often  square.
Every  time  that a line of code is re-typed is a new chance
for bugs to be introduced.  There will always be the tempta-
tion  to  take shortcuts based on how the code will be used-
shortcuts that may turn around and bite the programmer  when
the  program  is  modified or used for something unexpected.
An inferior  algorithm  may  be  used  because  it's  ``good
enough''  and  the  better  algorithms  are too difficult to
reproduce on the spur of the moment... but the definition of
``good enough'' may change later.  And unless the program is
well-commented [here we pause for laughter], the next person
who  works  on  it  will have to study the code at length to
dispel the suspicion that there is some  subtle  reason  for
the  seeming  re-invention.   Finally,  to quote [2], _i_f _y_o_u
_r_e-_i_n_v_e_n_t _t_h_e _s_q_u_a_r_e _w_h_e_e_l, _y_o_u _w_i_l_l _n_o_t _b_e_n_e_f_i_t _w_h_e_n  _s_o_m_e_-
_b_o_d_y _e_l_s_e _r_o_u_n_d_s _o_f_f _t_h_e _c_o_r_n_e_r_s.

In short, re-inventing the wheel ought to be a  rare  event,
occurring  only  for  the most compelling reasons.  Using an
existing wheel, or improving an  existing  one,  is  usually
superior  in  a variety of ways.  There is nothing dishonor-
able about stealing code* to make life easier and better.

_T_h_e_f_t _v_i_a _t_h_e _E_d_i_t_o_r

UNIX historically has flourished in  environments  in  which
full  sources for the system are available.  This led to the
_________________________
* Assuming no software licences,  copyrights,  patents,
etc. are violated!




                    February 21, 1989





                          - 3 -


most obvious and crudest way  of  stealing  code:  copy  the
source  of  an  existing program and edit it to do something
new.

This approach does have its advantages.  By its  nature,  it
is the most flexible method of stealing code.  It may be the
only viable approach when what is desired is some variant of
a complex algorithm that exists only within an existing pro-
gram; a good example was V7 _d_u_m_p_d_i_r (which printed  a  table
of contents of a backup tape), visibly a modified copy of V7
_r_e_s_t_o_r (the only other program that understood  the  obscure
format of backup tapes).  And it certainly is easy.

On the other hand, this approach also has its problems.   It
creates  two subtly-different copies of the same code, which
have to be maintained separately.  Worse, they often have to
be maintained ``separately but simultaneously'', because the
new program inherits all the mistakes of the original.  Fix-
ing  the same bug repeatedly is so mind-deadening that there
is great temptation to fix it in only the  program  that  is
actually  giving  trouble... which means that when the other
gives trouble, re-doing the cure must  be  preceded  by  re-
doing  the  investigation  and diagnosis.  Still worse, such
non-simultaneous bug fixes cause the variants of the code to
diverge  steadily.   This  is  also true of improvements and
cleanup work.

A program created in this way may also be inferior, in  some
ways, to one created from scratch.  Often there will be ves-
tigial code left over from the program's evolutionary ances-
tors.   Apart from consuming resources (and possibly harbor-
ing bugs) without a  useful  purpose,  such  vestigial  code
greatly  complicates understanding the new program in isola-
tion.

There is also the  possibility  that  the  new  program  has
inherited  a poor algorithm from the old one.  This is actu-
ally a universal problem with stealing code, but it is espe-
cially  troublesome with this technique because the original
program probably was not built with  such  re-use  in  mind.
Even  if  its algorithms were good for _i_t_s intended purpose,
they may not be versatile enough to do a good job  in  their
new role.

One relatively clean form of theft via editing is  to  alter
the  original  program's  source  to generate either desired
program by conditional compilation.  This eliminates most of
the  problems.   Unfortunately,  it  does so only if the two
programs are sufficiently similar that they can  share  most
of  the source.  When they diverge significantly, the result
can be a maintenance  nightmare,  actually  worse  than  two
separate  sources.   Given  a close similarity, though, this
method can work well.




                    February 21, 1989





                          - 4 -


_T_h_e_f_t _v_i_a _L_i_b_r_a_r_i_e_s

The obvious way of using somebody else's code is to  call  a
library  function.  Here, UNIX has had some success stories.
Almost everybody uses the _s_t_d_i_o library rather than  invent-
ing their own buffered-I/O package.  (That may sound trivial
to those who never programmed on a V6 or earlier  UNIX,  but
in  fact  it's  a  great improvement on the earlier state of
affairs.)  The simpler sorts  of  string  manipulations  are
usually  done with the _s_t_r_x_x_x functions rather than by hand-
coding them, although efficiency issues and the wide  diver-
sity  of  requirements  have limited these functions to less
complete success.  Nobody who knows about _q_s_o_r_t  bothers  to
write his own sorting function.

However, these success stories are pleasant  islands  in  an
ocean  of mud.  The fact is that UNIX's libraries are a dis-
grace.  They are well enough implemented, and  their  design
flaws  are  seldom  more  than  nuisances,  but there aren't
_e_n_o_u_g_h of them!  Ironically,  UNIX's  ``poor  cousin'',  the
Software  Tools  community  [3,4],  has  done much better at
this.  Faced with a wild diversity  of  different  operating
systems, they were forced to put much more emphasis on iden-
tifying clean abstractions for system services.

For  example,  the  Software  Tools  version  of   _l_s   runs
unchanged,  _w_i_t_h_o_u_t  conditional  compilation,  on dozens of
different operating systems [4].  By contrast, UNIX programs
that  read  directories invariably dealt with the raw system
data structures, until  Berkeley  turned  this  cozy  little
world  upside-down  with  a change to those data structures.
The Berkeley implementors were  wise  enough  to  provide  a
library  for  directory access, rather than just documenting
the new underlying structure.  However,  true  to  the  UNIX
pattern,  they  designed a library which quietly assumed (in
some of its naming conventions) that the  underlying  system
used  _t_h_e_i_r  structures!  This particular nettle has finally
been grasped firmly by the IEEE POSIX project  [5],  at  the
cost of yet another slightly-incompatible interface.

The adoption of the new directory libraries is  not  just  a
matter  of  convenience  and  portability:  in  general  the
libraries are faster than the hand-cooked code they replace.
Nevertheless, Berkeley's original announcement of the change
was greeted with a storm of outraged protest.

Directories, alas, are not an isolated example.  The  UNIX/C
community  simply  hasn't made much of an effort to identify
common code and package it for re-use.  One of the two major
variants  of  UNIX still lacks a library function for binary
search, an algorithm which is notorious for both the perfor-
mance  boost  it  can produce and the difficulty of coding a
fully-correct version from scratch.   No  major  variant  of
UNIX  has a library function for either one of the following



                    February 21, 1989





                          - 5 -


code fragments, both omnipresent (or at least,  they  _s_h_o_u_l_d
be  omnipresent  [6])  in  simple*  programs  that  use  the
relevant facilities:

       if ((f = fopen(filename, mode)) == NULL)
               _p_r_i_n_t _e_r_r_o_r _m_e_s_s_a_g_e _w_i_t_h _f_i_l_e_n_a_m_e, _m_o_d_e, _a_n_d _s_p_e_c_i_f_i_c
               _r_e_a_s_o_n _f_o_r _f_a_i_l_u_r_e, _a_n_d _t_h_e_n _e_x_i_t


       if ((p = malloc(amount)) == NULL)
               _p_r_i_n_t _e_r_r_o_r _m_e_s_s_a_g_e _a_n_d _e_x_i_t

These may sound utterly trivial,  but  in  fact  programmers
almost  never  produce as good an error message for _f_o_p_e_n as
ten lines of library code can, and half the time the  return
value from _m_a_l_l_o_c isn't checked at all!

These examples illustrate a general principle, a side  bene-
fit  of stealing code: the way to encourage standardization|-
and quality is to make it easier to be careful and  standard
than to be sloppy and non-standard.  On systems with library
functions for error-checked _f_o_p_e_n and _m_a_l_l_o_c, it  is  easier
to use the system functions-which take some care to do ``the
right thing''-than to kludge it yourself.  This  makes  con-
verts very quickly.

These are not isolated examples.  Studying the libraries  of
most  any  UNIX  system  will  yield  other ideas for useful
library functions (as well as a lot of silly  nonsense  that
UNIX  doesn't  need, usually!).  A few years of UNIX systems
programming also leads to  recognition  of  repeated  needs.
Does _y_o_u_r* UNIX have library functions to:

    +o decide whether a filename is well-formed (contains no
      control  characters,  shell  metacharacters, or white
      space, and is  within  any  name-length  limits  your
_________________________
*  I  include  the  qualification  ``simple''   because
complex  programs  often  want  to  do more intelligent
error  recovery  than  these  code  fragments  suggest.
However,  _m_o_s_t of the programs that use these functions
_d_o_n'_t  need  fancy  error  recovery,  and   the   error
responses  indicated  are  _b_e_t_t_e_r  than  the ones those
programs usually have now!
|- Speaking of encouraging standardization: we  use  the
names  _e_f_o_p_e_n  and  _e_m_a_l_l_o_c for the checked versions of
_f_o_p_e_n and _m_a_l_l_o_c, and arguments and returned values are
the  same  as  the  unchecked  versions except that the
returned value is guaranteed non-NULL if  the  function
returns at all.
* As you might guess, my system has all of these.  Most
of them are trivial  to  write,  or  are  available  in
public-domain forms.




                    February 21, 1989





                          - 6 -


      system sets)?

    +o close all file descriptors except the standard ones?

    +o compute  a  standard  CRC  (Cyclic  Redundancy  Check
      ``checksum'')?

    +o operate on _m_a_l_l_o_ced unlimited-length strings?

    +o do  what  _a_c_c_e_s_s(2)  does  but  using  the  effective
      userid?

    +o expand metacharacters in a filename the same way  the
      shell  does?  (the simplest way to make sure that the
      two agree is to use _p_o_p_e_n and _e_c_h_o for anything  com-
      plicated)

    +o convert integer baud rates  to  and  from  the  speed
      codes used by your system's serial-line _i_o_c_t_ls?

    +o convert integer  file  modes  to  and  from  the  _r_w_x
      strings used|- to present such modes to humans?

    +o do a binary search through a  file  the  way  _l_o_o_k(1)
      does?

The above are fairly trivial examples of the sort of  things
that  _o_u_g_h_t  to  be  in  UNIX libraries.  More sophisticated
libraries can also be useful,  especially  if  the  language
provides  better  support  for  them  than C does; C++ is an
example [7].  Even in C, though,  there  is  much  room  for
improvement.

Adding library functions does have its  disadvantages.   The
interface to a library function is important, and getting it
right is hard.  Worse, once users  have  started  using  one
version  of an interface, changing it is very difficult even
when hindsight  clearly  shows  mistakes;  the  near-useless
return  values  of some of the common UNIX library functions
are obvious examples.  Satisfactory handling of error condi-
tions  can  be  difficult.  (For example, the error-checking
_m_a_l_l_o_c mentioned earlier is very handy for programmers,  but
invoking  it from a library function would be a serious mis-
take, removing any possibility of more intelligent  response
to that error.)  And there is the perennial headache of try-
ing to get others to adopt your pet function, so  that  pro-
grams  using  it  can be portable without having to drag the
source of the function around too.  For  all  this,  though,
libraries  are  in  many  ways  the most satisfactory way of
_________________________
|- If you think only _l_s uses these, consider that _r_m and
some  similar  programs  _o_u_g_h_t  to use _r_w_x strings, not
octal modes, when requesting confirmation!




                    February 21, 1989





                          - 7 -


encouraging code theft.

Alas, encouraging code theft does not  guarantee  it.   Even
widely-available library functions often are not used nearly
as much as they should be.  A conspicuous example is _g_e_t_o_p_t,
for  command-line  argument  parsing.   _G_e_t_o_p_t supplies only
quite modest help in parsing the command line, but the stan-
dardization  and  consistency that its use produces is still
quite valuable; there are far too many pointless  variations
in  command  syntax  in  the hand-cooked argument parsers in
most UNIX programs.  Public-domain implementations of _g_e_t_o_p_t
have  been  available  for years, and AT&T has published (!)
the source for the System V implementation.  Yet people con-
tinue  to  write  their  own argument parsers.  There is one
valid reason for this, to be discussed in the next  section.
There are also a number of excuses, mostly the standard ones
for not using library functions:

    +o ``It doesn't do quite what I want.''  _B_u_t _o_f_t_e_n _i_t _i_s
      _c_l_o_s_e  _e_n_o_u_g_h  _t_o _s_e_r_v_e, _a_n_d _t_h_e _c_o_m_b_i_n_e_d _b_e_n_e_f_i_t_s _o_f
      _c_o_d_e _t_h_e_f_t _a_n_d  _s_t_a_n_d_a_r_d_i_z_a_t_i_o_n  _o_u_t_w_e_i_g_h  _t_h_e  _m_i_n_o_r
      _m_i_s_m_a_t_c_h_e_s.

    +o ``Calling a library function  is  too  inefficient.''
      _T_h_i_s  _i_s _m_o_s_t_l_y _h_e_a_r_d _f_r_o_m _p_e_o_p_l_e _w_h_o _h_a_v_e _n_e_v_e_r _p_r_o_-
      _f_i_l_e_d _t_h_e_i_r  _p_r_o_g_r_a_m_s  _a_n_d  _h_e_n_c_e  _h_a_v_e  _n_o  reliable
      _i_n_f_o_r_m_a_t_i_o_n  _a_b_o_u_t _w_h_a_t _t_h_e_i_r _c_o_d_e'_s _e_f_f_i_c_i_e_n_c_y _p_r_o_b_-
      _l_e_m_s _a_r_e [_2].

    +o ``I didn't know about  it.''   _C_o_m_p_e_t_e_n_t  _p_r_o_g_r_a_m_m_e_r_s
      _k_n_o_w _t_h_e _c_o_n_t_e_n_t_s _o_f _t_h_e_i_r _t_o_o_l_b_o_x_e_s.

    +o ``That  whole  concept  is  ugly,   and   should   be
      redesigned.''  (Often said of _g_e_t_o_p_t, since the usual
      UNIX single-letter-option syntax that  _g_e_t_o_p_t  imple-
      ments  is  widely  criticized  as user-hostile.)  _H_o_w
      _l_i_k_e_l_y _i_s _i_t _t_h_a_t _t_h_e _r_e_s_t _o_f _t_h_e _w_o_r_l_d _w_i_l_l _g_o _a_l_o_n_g
      _w_i_t_h  _y_o_u_r  _r_e_d_e_s_i_g_n  (_a_s_s_u_m_i_n_g  _y_o_u _e_v_e_r _f_i_n_i_s_h _i_t)?
      _C_o_n_s_i_s_t_e_n_c_y _a_n_d  _a  _h_i_g_h-_q_u_a_l_i_t_y  _i_m_p_l_e_m_e_n_t_a_t_i_o_n  _a_r_e
      _v_a_l_u_a_b_l_e  _e_v_e_n  _i_f  _t_h_e _s_t_a_n_d_a_r_d _b_e_i_n_g _i_m_p_l_e_m_e_n_t_e_d _i_s
      _s_u_b_o_p_t_i_m_a_l.

    +o ``I would have done it differently.''  _T_h_e _t_r_i_u_m_p_h _o_f
      _p_e_r_s_o_n_a_l _t_a_s_t_e _o_v_e_r _p_r_o_f_e_s_s_i_o_n_a_l _p_r_o_g_r_a_m_m_i_n_g.

_T_h_e_f_t _v_i_a _T_e_m_p_l_a_t_e_s

_T_e_m_p_l_a_t_e_s are a major and much-neglected  approach  to  code
sharing:    ``boilerplate''   programs   which   contain   a
carefully-written skeleton for some  moderately  stereotyped
task,  which  can  then  be adapted and filled in as needed.
This method has some of the vices of modifying existing pro-
grams,  but  the  template  can be designed for the purpose,
with attention to quality and versatility.



                    February 21, 1989





                          - 8 -


Templates can be particularly useful when library  functions
are  used  in a stereotyped way that is a little complicated
to write from scratch; _g_e_t_o_p_t is an excellent example.   The
one  really valid objection to _g_e_t_o_p_t is that its invocation
is not trivial, and typing  in  the  correct  sequence  from
scratch  is  a real test of memory.  The usual _g_e_t_o_p_t manual
page contains a lengthy example which is essentially a  tem-
plate for a _g_e_t_o_p_t-using program.

When the first public-domain  _g_e_t_o_p_t  appeared,  it  quickly
became  clear that it would be convenient to have a template
for its use handy.  This template eventually grew to  incor-
porate  a  number  of  other  things: a useful macro or two,
definition of _m_a_i_n, opening of files in  the  standard  UNIX
filter  fashion, checking for mistakes like opening a direc-
tory, filename and line-number tracking for error  messages,
and  some  odds  and  ends.  The full current version can be
found in the Appendix; actually it diverged  into  two  dis-
tinct versions when it became clear that some filters wanted
the illusion of a single input stream, while  others  wanted
to handle each input file individually (or didn't care).

The obvious objection to this line of development is  ``it's
more complicated than I need''.  In fact, it turns out to be
surprisingly convenient to have all this  machinery  presup-
plied.   _I_t  _i_s _m_u_c_h _e_a_s_i_e_r _t_o _a_l_t_e_r _o_r _d_e_l_e_t_e _l_i_n_e_s _o_f _c_o_d_e
_t_h_a_n _t_o _a_d_d _t_h_e_m.  If directories are legitimate input, just
delete  the  code  that  catches  them.  If no filenames are
allowed as input, or exactly one must be present, change one
line  of  code  to enforce the restriction and a few more to
deal with the arguments correctly.  If the arguments are not
filenames  at  all, just delete the bits of code that assume
they are.  And so forth.

The job  of  writing  an  ordinary  filter-like  program  is
reduced  to filling in two or three blanks* in the template,
and then writing the code that actually processes the  data.
Even  quick  improvisations  become  good-quality  programs,
doing things the standard way with all the proper amenities,
because even a quick improvisation is easier to do by start-
ing from the template.  _T_e_m_p_l_a_t_e_s _a_r_e _a_n  _u_n_m_i_x_e_d  _b_l_e_s_s_i_n_g;
_a_n_y_o_n_e  _w_h_o  _t_y_p_e_s  _a _n_o_n-_t_r_i_v_i_a_l _p_r_o_g_r_a_m _i_n _f_r_o_m _s_c_r_a_t_c_h _i_s
_w_a_s_t_i_n_g _h_i_s _t_i_m_e _a_n_d _h_i_s _e_m_p_l_o_y_e_r'_s _m_o_n_e_y.

Templates are also useful for other stereotyped files,  even
ones that are not usually thought of as programs.  Most ver-
sions of UNIX have a simple template for manual pages hiding
somewhere (in V7 it was /_u_s_r/_m_a_n/_m_a_n_0/_x_x).  Shell files that
want to analyze complex argument lists have the same  _g_e_t_o_p_t
problem  as  C  programs,  with the same solution.  There is
_________________________
* All marked with the string `xxx' to  make  them  easy
for a text editor to find.




                    February 21, 1989





                          - 9 -


enough machinery in a ``production-grade'' _m_a_k_e file to make
a  template  worthwhile,  although  this  one  tends  to get
altered fairly heavily; our current one is in the Appendix.

_T_h_e_f_t _v_i_a _I_n_c_l_u_s_i_o_n

Source inclusion (####iiiinnnncccclllluuuuddddeeee) provides a way of  sharing  both
data  structures  and  executable  code.  Header files (e.g.
_s_t_d_i_o._h) in particular tend to be taken for granted.  Again,
those  who  haven't  been  around long enough to remember V6
UNIX may have trouble grasping what a revolution it was when
V7 introduced systematic use of header files!

However, even mundane header files could be rather more use-
ful  than  they normally are now.  Data structures in header
files are widely accepted, but there is somewhat less use of
them  to  declare the return types of functions.  One or two
common header files like _s_t_d_i_o._h and  _m_a_t_h._h  do  this,  but
programmers  are  still  used  to  the idea that the type of
(e.g.)  _a_t_o_l has to be typed in by hand.  Actually, all  too
often  the programmer says ``oh well, on my machine it works
out all right if I don't bother declaring  _a_t_o_l'',  and  the
result  is  dirty and unportable code.  The X3J11 draft ANSI
standard for C addresses this by defining some  more  header
files and requiring their use for portable programs, so that
the header files can do all the work and do it _r_i_g_h_t.

In principle, source inclusion can be  used  for  more  than
just header files.  In practice, almost anything that can be
done with source inclusion can be  done,  and  usually  done
more  cleanly,  with  header files and libraries.  There are
occasional  specialized  exceptions,  such  as  using  macro
definitions  and source inclusion to fake parameterized data
types.

_T_h_e_f_t _v_i_a _I_n_v_o_c_a_t_i_o_n

Finally, it is often possible  to  steal  another  program's
code  simply  by invoking that program.  Invoking other pro-
grams via _s_y_s_t_e_m or _p_o_p_e_n for things that are easily done in
C  is  a common beginner's error.  More experienced program-
mers can go too far the other  way,  however,  insisting  on
doing  everything  in  C,  even  when  a  leavening of other
methods would give better results.  The best way to  sort  a
large file is probably to invoke _s_o_r_t(1), not to do it your-
self.  Even invoking a shell file can be useful, although  a
bit  odd-seeming  to most C programmers, when elaborate file
manipulation is needed and efficiency is not critical.

Aside from invoking other programs at run time, it can  also
be useful to invoke them at compile time.  Particularly when
dealing with large tables, it is often better to dynamically
generate  the  C  code  from  some more compact and readable
notation.  _Y_a_c_c and _l_e_x are familiar examples of this  on  a



                    February 21, 1989





                          - 10 -


large  scale,  but  simple  _s_e_d  and  _a_w_k programs can build
tables  in  more  specialized,  application-specific   ways.
Whether  this is really theft is debatable, but it's a valu-
able technique all the same.  It can neatly bypass a lot  of
objections that start with ``but C won't let me write...''.

_A_n _E_x_c_e_s_s _o_f _I_n_v_e_n_t_i_o_n

With all these varied methods, why is code  theft  not  more
widespread?  Why are so many programs unnecessarily invented
from scratch?

The most obvious answer is the  hardest  to  counter:  theft
requires  that  there be something to steal.  Use of library
functions is impossible unless somebody sets up  a  library.
Designing  the interfaces for library functions is not easy.
Worse, doing it _w_e_l_l requires insight, which generally isn't
available  on demand.  The same is true, to varying degrees,
for the other forms of theft.

Despite its reputation as a hotbed of software re-use,  UNIX
is  actually  hostile  to some of these activities.  If UNIX
directories had been complex and obscure,  directory-reading
libraries would have been present from the beginning.  As it
is, it was simply _t_o_o _e_a_s_y to do things  ``the  hard  way''.
There  _s_t_i_l_l  is no portable set of functions to perform the
dozen or so useful manipulations of terminal  modes  that  a
user  program  might  want  to  do, a major nuisance because
changing those modes ``in the raw''  is  simple  but  highly
unportable.

Finally, there is the Not Invented Here  syndrome,  and  its
relatives,  Not  Good  Enough  and Not Understood Here.  How
else to explain AT&T  UNIX's  persistent  lack  of  the  _d_b_m
library  for  hashed databases (even though it was developed
at Bell Labs and hence is available to AT&T),  and  Berkeley
UNIX's  persistent  lack of the full set of _s_t_r_x_x_x functions
(even though a public-domain implementation has existed  for
years)?   The  X3J11  and POSIX efforts are making some pro-
gress at developing a common nucleus of  functionality,  but
they  are aiming at a common subset of current systems, when
what is really wanted is a common superset.

_C_o_n_c_l_u_s_i_o_n

In short, never build what you can  (legally)  steal!   Done
right, it yields better programs for less work.

_R_e_f_e_r_e_n_c_e_s


[1] Brian W. Kernighan, _T_h_e _U_n_i_x _S_y_s_t_e_m _a_n_d _S_o_f_t_w_a_r_e  _R_e_u_s_a_-
   _b_i_l_i_t_y,  IEEE  Transactions on Software Engineering, Vol
   SE-10, No. 5, Sept. 1984, pp. 513-8.



                    February 21, 1989





                          - 11 -


[2] Geoff Collyer and Henry Spencer, _N_e_w_s _N_e_e_d _N_o_t _B_e  _S_l_o_w,
   Usenix Winter 1987 Technical Conference, pp. 181-190.

[3] Brian W. Kernighan and  P.J.  Plauger,  _S_o_f_t_w_a_r_e  _T_o_o_l_s,
   Addison-Wesley, Reading, Mass. 1976.

[4] Mike O'Dell, _U_N_I_X: _T_h_e _W_o_r_l_d _V_i_e_w,  Usenix  Winter  1987
   Technical Conference, pp. 35-45.

[5] IEEE, _I_E_E_E _T_r_i_a_l-_U_s_e _S_t_a_n_d_a_r_d _1_0_0_3._1 (_A_p_r_i_l _1_9_8_6): _P_o_r_t_-
   _a_b_l_e  _O_p_e_r_a_t_i_n_g  _S_y_s_t_e_m  _f_o_r _C_o_m_p_u_t_e_r _E_n_v_i_r_o_n_m_e_n_t_s, IEEE
   and Wiley-Interscience, New York, 1986.

[6] Ian  Darwin  and  Geoff  Collyer,  _C_a_n'_t  _H_a_p_p_e_n  _o_r  /*
   _N_O_T_R_E_A_C_H_E_D  */ _o_r _R_e_a_l _P_r_o_g_r_a_m_s _D_u_m_p _C_o_r_e, Usenix Winter
   1985 Technical Conference, pp. 136-151.

[7] Bjarne  Stroustrup,  _T_h_e   _C++   _P_r_o_g_r_a_m_m_i_n_g   _L_a_n_g_u_a_g_e,
   Addison-Wesley, Reading, Mass. 1986.

_A_p_p_e_n_d_i_x

Warning:  these templates  have  been  in  use  for  varying
lengths  of  time, and are not necessarily all entirely bug-
free.

998_C _p_r_o_g_r_a_m, _s_i_n_g_l_e _s_t_r_e_a_m _o_f _i_n_p_u_t

8/*
8 * name - purpose xxx
8 *
8 * $Log$
8 */
98#include <stdio.h>
8#include <sys/types.h>
8#include <sys/stat.h>
8#include <string.h>
98#define MAXSTR  500             /* For sizing strings -- DON'T use BUFSIZ! */
8#define STREQ(a, b)     (*(a) == *(b) && strcmp((a), (b)) == 0)
98#ifndef lint
8static char RCSid[] = "$Header$";
8#endif
98int debug = 0;
8char *progname;
98char **argvp;                           /* scan pointer for nextfile() */
8char *nullargv[] = { "-", NULL };       /* dummy argv for case of no args */
8char *inname;                           /* filename for messages etc. */
8long lineno;                            /* line number for messages etc. */
8FILE *in = NULL;                        /* current input file */
98extern void error(), exit();
8#ifdef UTZOOERR
8extern char *mkprogname();
8#else
8#define mkprogname(a)   (a)
8#endif
98char *nextfile();
8void fail();
98/*
8 - main - parse arguments and handle options
8 */
8main(argc, argv)
8int argc;
8char *argv[];
8{
8        int c;
8        int errflg = 0;
8        extern int optind;
8        extern char *optarg;
8        void process();
98        progname = mkprogname(argv[0]);



8                     February 21, 1989


9


8                           - 12 -


98        while ((c = getopt(argc, argv, "xxxd")) != EOF)
8                switch (c) {
8                case 'xxx':     /* xxx meaning of option */
8                        xxx
8                        break;
8                case 'd':       /* Debugging. */
8                        debug++;
8                        break;
8                case '?':
8                default:
8                        errflg++;
8                        break;
8                }
8        if (errflg) {
8                fprintf(stderr, "usage: %s ", progname);
8                fprintf(stderr, "xxx [file] ...\n");
8                exit(2);
8        }
98        if (optind >= argc)
8                argvp = nullargv;
8        else
8                argvp = &argv[optind];
8        inname = nextfile();
8        if (inname != NULL)
8                process();
8        exit(0);
8}
98/*
8 - getline - get next line (internal version of fgets)
8 */
8char *
8getline(ptr, size)
8char *ptr;
8int size;
8{
8        register char *namep;
98        while (fgets(ptr, size, in) == NULL) {
8                namep = nextfile();
8                if (namep == NULL)
8                        return(NULL);
8                inname = namep;         /* only after we know it's good */
8        }
8        lineno++;
8        return(ptr);
8}
98/*
8 - nextfile - switch files
8 */
8char *                          /* filename */
8nextfile()
8{
8        register char *namep;
8        struct stat statbuf;
8        extern FILE *efopen();
98        if (in != NULL)
8                (void) fclose(in);
98        namep = *argvp;
8        if (namep == NULL)      /* no more files */
8                return(NULL);
8        argvp++;
98        if (STREQ(namep, "-")) {
8                in = stdin;
8                namep = "stdin";
8        } else {
8                in = efopen(namep, "r");
8                if (fstat(fileno(in), &statbuf) < 0)
8                        error("can't fstat `%s'", namep);
8                if ((statbuf.st_mode & S_IFMT) == S_IFDIR)
8                        error("`%s' is directory!", namep);
8        }
98        lineno = 0;
8        return(namep);
8}
98/*
8 - fail - complain and die
8 */
8void
8fail(s1, s2)
8char *s1;
8char *s2;
8{
8        fprintf(stderr, "%s: (file `%s', line %ld) ", progname, inname, lineno);
8        fprintf(stderr, s1, s2);
8        fprintf(stderr, "\n");
8        exit(1);
8}
98/*
8 - process - process input data
8 */
8void
8process()
8{
8        char line[MAXSTR];
98        while (getline(line, (int)sizeof(line)) != NULL) {
8                xxx
8        }
8}



8                     February 21, 1989


9


8                           - 13 -


98_C _p_r_o_g_r_a_m, _s_e_p_a_r_a_t_e _i_n_p_u_t _f_i_l_e_s

8/*
8 * name - purpose xxx
8 *
8 * $Log$
8 */
98#include <stdio.h>
8#include <sys/types.h>
8#include <sys/stat.h>
8#include <string.h>
98#define MAXSTR  500             /* For sizing strings -- DON'T use BUFSIZ! */
8#define STREQ(a, b)     (*(a) == *(b) && strcmp((a), (b)) == 0)
98#ifndef lint
8static char RCSid[] = "$Header$";
8#endif
98int debug = 0;
8char *progname;
98char *inname;                           /* filename for messages etc. */
8long lineno;                            /* line number for messages etc. */
98extern void error(), exit();
8#ifdef UTZOOERR
8extern char *mkprogname();
8#else
8#define mkprogname(a)   (a)
8#endif
8void fail();
98/*
8 - main - parse arguments and handle options
8 */
8main(argc, argv)
8int argc;
8char *argv[];
8{
8        int c;
8        int errflg = 0;
8        FILE *in;
8        struct stat statbuf;
8        extern int optind;
8        extern char *optarg;
8        extern FILE *efopen();
8        void process();
98        progname = mkprogname(argv[0]);
98        while ((c = getopt(argc, argv, "xxxd")) != EOF)
8                switch (c) {
8                case 'xxx':     /* xxx meaning of option */
8                        xxx
8                        break;
8                case 'd':       /* Debugging. */
8                        debug++;
8                        break;
8                case '?':
8                default:
8                        errflg++;
8                        break;
8                }
8        if (errflg) {
8                fprintf(stderr, "usage: %s ", progname);
8                fprintf(stderr, "xxx [file] ...\n");
8                exit(2);
8        }
98        if (optind >= argc)
8                process(stdin, "stdin");
8        else
8                for (; optind < argc; optind++)
8                        if (STREQ(argv[optind], "-"))
8                                process(stdin, "-");
8                        else {
8                                in = efopen(argv[optind], "r");
8                                if (fstat(fileno(in), &statbuf) < 0)
8                                        error("can't fstat `%s'", argv[optind]);
8                                if ((statbuf.st_mode & S_IFMT) == S_IFDIR)
8                                        error("`%s' is directory!", argv[optind]);
8                                process(in, argv[optind]);
8                                (void) fclose(in);
8                        }
8        exit(0);
8}
98/*
8 - process - process input file
8 */
8void
8process(in, name)
8FILE *in;
8char *name;
8{
8        char line[MAXSTR];
98        inname = name;
8        lineno = 0;
98        while (fgets(line, sizeof(line), in) != NULL) {
8                lineno++;
8                xxx
8        }
8}



8                     February 21, 1989


9


8                           - 14 -


98/*
8 - fail - complain and die
8 */
8void
8char *s1;
8char *s2;
8{
8        fprintf(stderr, "%s: (file `%s', line %ld) ", progname, inname, lineno);
8        fprintf(stderr, s1, s2);
8        fprintf(stderr, "\n");
8        exit(1);
8}

998_M_a_k_e _f_i_l_e

8# Things you might want to put in ENV and LENV:
8# -Dvoid=int            compiler lacks void
8# -DCHARBITS=0377       compiler lacks unsigned char
8# -DSTATIC=extern       compiler dislikes "static foo();" as forward decl.
8# -DREGISTER=           machines with few registers for register variables
8# -DUTZOOERR            have utzoo-compatible error() function and friends
8ENV = -DSTATIC=extern -DREGISTER= -DUTZOOERR
8LENV = -Dvoid=int -DCHARBITS=0377 -DREGISTER= -DUTZOOERR
98# Things you might want to put in TEST:
8# -DDEBUG               debugging hooks
8# -I.                   header files in current directory
8TEST = -DDEBUG
98# Things you might want to put in PROF:
8# -Dstatic='/* */'      make everything global so profiler can see it.
8# -p                    profiler
8PROF =
98CFLAGS = -O $(ENV) $(TEST) $(PROF)
8LINTFLAGS = $(LENV) $(TEST) -ha
8LDFLAGS = -i
98OBJ = xxx
8LSRC = xxx
8DTR = README dMakefile tests tests.good xxx.c
98xxx:    xxx.o
8        $(CC) $(CFLAGS) $(LDFLAGS) xxx.o -o xxx
98xxx.o:  xxx.h
98lint:   $(LSRC)
8        lint $(LINTFLAGS) $(LSRC) | tee lint
98r:      xxx tests tests.good    # Regression test.
8        xxx <tests >tests.new
8        diff -h tests.new tests.good && rm tests.new
98# Prepare good output for regression test -- name isn't "tests.good"
8# because human judgement is needed to decide when output is good.
8good:   xxx tests
8        xxx <tests >tests.good
98dtr:    r $(DTR)
8        makedtr $(DTR) >dtr
98dMakefile:      Makefile
8        sed '/^L*ENV=/s/ *-DUTZOOERR//' Makefile >dMakefile
98clean:
8        rm -f *.o lint tests.new dMakefile dtr core mon.out xxx


















9


8                     February 21, 1989


9