IIIInnnnddddiiiiaaaannnn HHHHiiiillllllll CCCC SSSSttttyyyylllleeee aaaannnndddd CCCCooooddddiiiinnnngggg SSSSttttaaaannnnddddaaaarrrrddddssss
           aaaassss aaaammmmeeeennnnddddeeeedddd ffffoooorrrr UUUU ooooffff TTTT ZZZZoooooooollllooooggggyyyy UUUUNNNNIIIIXXXX||||----


                       L.W. Cannon
                       R.A. Elliott
                      L.W. Kirchhoff
                       J.H. Miller
                       J.M. Milner
                        R.W. Mitze
                        E.P. Schan
                     N.O. Whittington

                        Bell Labs


                      Henry Spencer

                 Zoology Computer Systems
                  University of Toronto



                         _A_B_S_T_R_A_C_T

         This document is an annotated  (by  the  last
    author)  version of the original paper of the same
    title.  It describes a set of coding standards and
    recommendations  which  are  local  standards  for
    officially-supported UNIX programs.  The scope  is
    coding style, not functional organization.



April 18, 1990










_________________________
|- UNIX is a trademark of Bell Laboratories.



















         IIIInnnnddddiiiiaaaannnn HHHHiiiillllllll CCCC SSSSttttyyyylllleeee aaaannnndddd CCCCooooddddiiiinnnngggg SSSSttttaaaannnnddddaaaarrrrddddssss
           aaaassss aaaammmmeeeennnnddddeeeedddd ffffoooorrrr UUUU ooooffff TTTT ZZZZoooooooollllooooggggyyyy UUUUNNNNIIIIXXXX||||----


                       L.W. Cannon
                       R.A. Elliott
                      L.W. Kirchhoff
                       J.H. Miller
                       J.M. Milner
                        R.W. Mitze
                        E.P. Schan
                     N.O. Whittington

                        Bell Labs


                      Henry Spencer

                 Zoology Computer Systems
                  University of Toronto



_1.  _I_n_t_r_o_d_u_c_t_i_o_n

    This document is a result  of  a  committee  formed  at
Indian  Hill  to  establish a common set of coding standards
and recommendations for  the  Indian  Hill  community.   The
scope  of  this work is the coding style, not the functional
organization of programs.  The standards  in  this  document
are not specific to ESS programming only1.  We have tried to
combine previous work [1,6] on C style into a uniform set of
standards that should be appropriate for any  project  using
C2.

_________________________
|- UNIX is a trademark of Bell Laboratories.

1.   In  fact,  they're pretty good general standards.  ``To
    be  clear  is  professional;  not  to   be   clear   is
    unprofessional.''  -  Sir Ernest Gowers.  This document
    is  presented  unadulterated;   U  of   T   variations,
    comments, exceptions, etc. are presented in footnotes.

2.   Of   necessity,   these   standards  cannot  cover  all
    situations.  Experience and  informed  judgement  count
    for  much.   Inexperienced  programmers  who  encounter
    unusual situations should consult 1)  code  written  by
    experienced  C programmers following these rules, or 2)
    experienced C programmers.




                      April 18, 1990





                          - 2 -


_2.  _F_i_l_e _O_r_g_a_n_i_z_a_t_i_o_n

    A file consists of  various  sections  that  should  be
separated by several blank lines.  Although there is no max-
imum length requirement for source files,  files  with  more
than about 1500 lines are cumbersome to deal with.  The edi-
tor may not have enough temp space to edit the file,  compi-
lations  will go slower, etc.  Since most of us use 300 baud
terminals, entire rows of asterisks, for example, should  be
discouraged3.  Also lines longer than  80  columns  are  not
handled  well by all terminals and should be avoided if pos-
sible4.

    The suggested order of sections for a file is  as  fol-
lows:

1.   Any header file includes should be the first  thing  in
    the file.

2.   Immediately after the includes5 should  be  a  prologue
    that tells what is in that file.  A description of  the
    purpose  of  the  objects in the files (whether they be
    functions, external data declarations  or  definitions,
    or  something  else)  is more useful than a list of the
    object names.

3.   Any typedefs and defines that apply to the  file  as  a
    whole are next.

4.   Next come the global (external) data declarations.   If
    a  set of defines applies to a particular piece of glo-
    bal data (such as a flags word), the defines should  be
    immediately after the data declaration6.

5.   The functions come last7.
_________________________

3.   This is not a problem at U of T, or most other sensible
    places, but rows of asterisks are still annoying.

4.   Excessively long lines which result from deep indenting
    are often a symptom of poorly-organized code.

5.   A  common  variation, in both Bell code and ours, is to
    reverse the order of sections 1  and  2.   This  is  an
    acceptable practice.

6.   Such  defines should be indented to put the _d_e_f_i_n_es one
    level deeper than the first keyword of the  declaration
    to which they apply.

7.   They should be in some sort of meaningful order.   Top-
    down   is   generally  better  than  bottom-up,  and  a
    ``breadth-first''  approach  (functions  on  a  similar



                      April 18, 1990





                          - 3 -


_2._1.  _F_i_l_e _N_a_m_i_n_g _C_o_n_v_e_n_t_i_o_n_s

    UNIX requires certain suffix conventions for  names  of
files to be processed by the _c_c command [5]8.  The following
suffixes are required:

+o    C source file names must end in ._c

+o    Assembler source file names must end in ._s

    In addition the following conventions  are  universally
followed:

+o    Relocatable object file names end in ._o

+o    Include header file names end in ._h 9 or ._d

+o    Ldp10 specification file names end in ._b

+o    Yacc source file names end in ._y

+o    Lex source file names end in ._l

_3.  _H_e_a_d_e_r _F_i_l_e_s

    Header files are files that are included in other files
prior  to  compilation  by  the  C  preprocessor.   Some are
defined at the system  level  like  _s_t_d_i_o._h  which  must  be
included  by  any  program  using  the standard I/O library.
Header files are also used to contain data declarations  and
defines that are needed by more than one program11.   Header
_________________________
    level   of  abstraction  together)  is  preferred  over
    depth-first (functions  defined  as  soon  as  possible
    after  their  calls).  Considerable judgement is called
    for here.  If defining large  numbers  of  essentially-
    independent  utility  functions,  consider alphabetical
    order.

8.   In addition to the suffix conventions given here, it is
    conventional to use `Makefile' (not `makefile') for the
    control file for _m_a_k_e and `README' for a summary of the
    contents of a directory or directory tree.

9.   Preferred.   An  alternate  convention  that   may   be
    preferable in multi-language environments is to use the
    same suffix as an ordinary source  file  but  with  two
    periods instead of one (e.g. ``foo..c'').

10.  No idea what this is.

11.  Don't use absolute pathnames for header files.  Use the
    <_n_a_m_e> construction for getting them  from  a  standard
    place,   or   define   them  relative  to  the  current



                      April 18, 1990





                          - 4 -


files should be functionally organized,  i.e.,  declarations
for  separate subsystems should be in separate header files.
Also, if a set of declarations is likely to change when code
is  ported  from  one machine to another, those declarations
should be in a separate header file.

    Header files should not be nested.  Some  objects  like
typedefs  and  initialized  data  definitions cannot be seen
twice by the compiler in one compilation.  On non-UNIX  sys-
tems this is also true of uninitialized declarations without
the _e_x_t_e_r_n keyword12.  This can happen if include files  are
nested and will cause the compilation to fail.

_4.  _E_x_t_e_r_n_a_l _D_e_c_l_a_r_a_t_i_o_n_s

    External declarations should begin in column  1.   Each
declaration  should  be  on  a  separate  line.   A  comment
describing the role of the object being declared  should  be
included,  with  the  exception  that a list of defined con-
stants do not need comments if the constant names are suffi-
cient  documentation.  The comments should be tabbed so that
they line up underneath each other13.  Use the tab character
(CTRL I if your terminal doesn't have a separate key) rather
than blanks.  For structure and union template declarations,
each  element  should  be  alone  on  a  line with a comment
describing it.  The opening brace ( { )  should  be  on  the
same line as the structure tag, and the closing brace should
be alone on a line in column 1, i.e.

struct boat {
       int wllength;   /* water line length in feet */
       int type;       /* see below */
       long sarea;     /* sail area in square feet */
};
/*
* defines for boat.type14
*/
#define KETCH   1
#define YAWL    2
#define SLOOP   3
#define SQRIG   4
#define MOTOR   5
_________________________
    directory.  The ----IIII option of the C compiler is the best
    way  to  handle  extensive  private libraries of header
    files;  it permits reorganizing the directory structure
    without having to alter source files.

12.  It should be noted that declaring variables in a header
    file  is often a poor idea.  Frequently it is a symptom
    of poor partitioning of code between files.

13.  So should the constant names and their defined values.




                      April 18, 1990





                          - 5 -


If an external variable  is  initialized15  the  equal  sign
should not be omitted16.

       int x = 1;
       char *msg = "message";
       struct boat winner = {
               40,     /* water line length */
               YAWL,
               600     /* sail area */
       };

17

_5.  _C_o_m_m_e_n_t_s

    Comments that  describe  data  structures,  algorithms,
etc., should be in block comment form with the opening /* in
column one, a * in column 2  before  each  line  of  comment
text18, and the closing */ in columns 2-3.

_________________________

14.  These  defines  are  better   put   right   after   the
    declaration  of  _t_y_p_e,  within  the _s_t_r_u_c_t declaration,
    with enough tabs after # to  indent  _d_e_f_i_n_e  one  level
    more than the structure member declarations.

15.  Any variable whose initial value is important should be
    _e_x_p_l_i_c_i_t_l_y initialized, or at the very least should  be
    commented  to  indicate that C's default initialization
    to 0 is being relied on.

16.  The empty initializer, ``{}'', should  never  be  used.
    Structure initializations should be fully parenthesized
    with braces.  Constants used to initialize longs should
    be explicitly long.

17.  In any file which is part of a larger whole rather than
    a self-contained program, maximum use should be made of
    the _s_t_a_t_i_c keyword  to  make  functions  and  variables
    local  to single files.  Variables in particular should
    be accessible from other files only  when  there  is  a
    clear  need that cannot be filled in another way.  Such
    usages should  be  commented  to  make  it  clear  that
    another  file's  variables  are being used; the comment
    should name the other file.

18.  Some   automated   program-analysis   packages   use  a
    different character in this position as  a  marker  for
    lines   with   specific   items   of  information.   In
    particular, a  line  with  a  `-'  here  in  a  comment
    preceding  a function is sometimes assumed to be a one-
    line summary of the function's purpose.




                      April 18, 1990





                          - 6 -



/*
*      Here is a block comment.
*      The comment text should be tabbed over19
*      and the opening /* and closing star-slash
*      should be alone on a line.
*/


    Note that _g_r_e_p ^.\* will catch all  block  comments  in
the  file.   In some cases, block comments inside a function
are appropriate, and they should be tabbed over to the  same
tab  setting as the code that they describe.  Short comments
may appear on a single line indented over to the tab setting
of the code that follows.

       if (argc > 1) {
               /* Get input file from command line. */
               if (freopen(argv[1], "r", stdin) == NULL)
                       error("can't open %s\n", argv[1]);
       }


    Very short comments may appear on the same line as  the
code  they describe, but should be tabbed over far enough to
separate them from the statements.  If more than  one  short
comment appears in a block of code they should all be tabbed
to the same tab setting.

       if (a == 2)
               return(TRUE);           /* special case */
       else
               return(isprime(a));     /* works only for odd a */


_6.  _F_u_n_c_t_i_o_n _D_e_c_l_a_r_a_t_i_o_n_s

    Each function should be preceded  by  a  block  comment
prologue that gives the name and a short description of what
the function does20.  If the function returns a  value,  the
type  of  the  value  returned  should be alone on a line in
column 1 (do not default to _i_n_t).  If the function does  not
return  a  value  then it should not be given a return type.
_________________________

19.  A common practice in both Bell and local code is to use
    a  space  rather  than  a  tab  after  the  *.  This is
    acceptable.

20.  Discussion of  non-trivial  design  decisions  is  also
    appropriate,  but avoid duplicating information that is
    present in (and clear from) the code.   It's  too  easy
    for such redundant information to get out of date.




                      April 18, 1990





                          - 7 -


If the value returned requires a long explanation, it should
be  given  in the prologue;  otherwise it can be on the same
line as the return type, tabbed over.  The function name and
formal  parameters  should  be  alone on a line beginning in
column 1.  Each parameter should be declared (do not default
to _i_n_t), with a comment on a single line.  The opening brace
of the function body should also be alone on a  line  begin-
ning  in  column 1.  The function name, argument declaration
list, and opening brace  should  be  separated  by  a  blank
line21.  All local declarations and code within the function
body should be tabbed over at least one tab.

    If the function  uses  any  external  variables,  these
should  have  their  own  declarations  in the function body
using the _e_x_t_e_r_n keyword.  If the external  variable  is  an
array  the  array  bounds  must  be  repeated  in the _e_x_t_e_r_n
declaration.  There should also be _e_x_t_e_r_n  declarations  for
all  functions called by a given function.  This is particu-
larly beneficial to  someone  picking  up  code  written  by
another.   If  a function returns a value of type other than
_i_n_t, it is required by the compiler that such  functions  be
declared  before  they are used.  Having the _e_x_t_e_r_n delcara-
tion in the calling function's declarations  section  avoids
all such problems22.

    In general each variable declaration  should  be  on  a
separate  line  with a comment describing the role played by
the variable in the function.  If the variable  is  external
or a parameter of type pointer which is changed by the func-
tion, that should be noted in the comment.   All  such  com-
ments for parameters and local variables should be tabbed so
that they line up underneath each other.   The  declarations
should  be  separated  from  the  function's statements by a
blank line.

    A local variable should not  be  redeclared  in  nested
blocks23.  Even  though  this  is  valid  C,  the  potential
_________________________

21.  Neither Bell nor local code  has  ever  included  these
    separating  blank  lines, and it is not clear that they
    add anything useful.  Leave them out.

22.  These  rules  tend  to  produce a lot of clutter.  Both
    Bell  and  local  practice  frequently   omits   _e_x_t_e_r_n
    declarations  for _s_t_a_t_i_c variables and functions.  This
    is permitted.  Omission of  declarations  for  standard
    library  routines is also permissible, although if they
    _a_r_e declared it is better to declare  them  within  the
    functions that use them rather than globally.

23.  In  fact,  avoid  any  local declarations that override
    declarations at higher levels.




                      April 18, 1990





                          - 8 -


confusion is enough that _l_i_n_t will complain  about  it  when
given the ----hhhh option.

_6._1.  _E_x_a_m_p_l_e_s


/*
*      skyblue()
*
*      Determine if the sky is blue.
*/

int                     /* TRUE or FALSE */
skyblue()

{
       extern int hour;

       if (hour < MORNING || hour > EVENING)
               return(FALSE);  /* black */
       else
               return(TRUE);   /* blue */
}


/*
*      tail(nodep)
*
*      Find the last element in the linked list
*      pointed to by nodep and return a pointer to it.
*/

NODE *                  /* pointer to tail of list */
tail(nodep)

NODE *nodep;            /* pointer to head of list */

{
       register NODE *np;      /* current pointer advances to NULL */
       register NODE *lp;      /* last pointer follows np */

       np = lp = nodep;
       while ((np = np->next) != NULL)
               lp = np;
       return(lp);
}


_7.  _C_o_m_p_o_u_n_d _S_t_a_t_e_m_e_n_t_s

    Compound statements are statements that  contain  lists
of  statements enclosed in braces.  The enclosed list should
be tabbed over one more than the tab position  of  the  com-
pound statement itself.  The opening left brace should be at



                      April 18, 1990





                          - 9 -


the end of the line beginning the compound statement and the
closing  right brace should be alone on a line, tabbed under
the beginning of the compound statement.  Note that the left
brace  beginning a function body is the only occurrence of a
left brace which is alone on a line.

_7._1.  _E_x_a_m_p_l_e_s


       if (expr) {
               statement;
               statement;
       }

       if (expr) {
               statement;
               statement;
       } else {
               statement;
               statement;
       }

    Note that the right brace before the _e_l_s_e and the right
brace  before  the _w_h_i_l_e of a _d_o-_w_h_i_l_e statement (below) are
the only places where a right braces  appears  that  is  not
alone on a line.

       for (i = 0; i < MAX; i++) {
               statement;
               statement;
       }

       while (expr) {
               statement;
               statement;
       }

       do {
               statement;
               statement;
       } while (expr);

       switch (expr) {
       case ABC:
       case DEF:
               statement;
               break;
       case XYZ:
               statement;
               break;
       default:
               statement;
               break24;
       }



                      April 18, 1990





                          - 10 -


Note that when multiple  _c_a_s_e  labels  are  used,  they  are
placed on separate lines.  The fall through feature of the C
_s_w_i_t_c_h statement should rarely if ever be used when code  is
executed before falling through to the next one.  If this is
done it must be commented for future maintenance.

       if (strcmp(reply, "yes") == EQUAL) {
               statements for yes
               ...
       } else if (strcmp(reply, "no") == EQUAL) {
               statements for no
               ...
       } else if (strcmp(reply, "maybe") == EQUAL) {
               statements for maybe
               ...
       } else {
               statements for none of the above
               ...
       }

The last example is a generalized _s_w_i_t_c_h statement  and  the
tabbing  reflects  the switch between exactly one of several
alternatives rather than a nesting of statements.

_8.  _E_x_p_r_e_s_s_i_o_n_s

_8._1.  _O_p_e_r_a_t_o_r_s

    The old versions of equal-ops =+, =-, =*,  etc.  should
not  be  used.   The  preferred use is +=, -=, *=, etc.  All
binary operators except . and -> should  be  separated  from
their operands by blanks25.  In addition, keywords that  are
followed  by  expressions in parentheses should be separated
from the left parenthesis by a blank26.  Blanks should  also
appear  after  commas in argument lists to help separate the
arguments visually.  On the other hand,  macros  with  argu-
ments and function calls should not have a blank between the
name and the left parenthesis.  In particular, the C prepro-
cessor requires the left parenthesis to be immediately after
_________________________

24.  This _b_r_e_a_k is, strictly speaking, unnecessary,  but  it
    is  required  nonetheless  because  it prevents a fall-
    through error if another _c_a_s_e is added later after  the
    last one.

25.  Some  judgement  is  called  for in the case of complex
    expressions, which may  be  clearer  if  the  ``inner''
    operators   are   not  surrounded  by  spaces  and  the
    ``outer'' ones are.

26.  _S_i_z_e_o_f is an exception, see the discussion of  function
    calls.  Less logically, so is _r_e_t_u_r_n.




                      April 18, 1990





                          - 11 -


the macro name or else the argument list will not be  recog-
nized.   Unary  operators should not be separated from their
single operand.  Since  C  has  some  unexpected  precedence
rules,  all  expressions involving mixed operators should be
fully parenthesized.

    _E_x_a_m_p_l_e_s

       a += c + d;
       a = (a + b) / (c * d);
       strp->field = str.fl - ((x & MASK) >> DISP);
       while (*d++ = *s++)
               ;       /* EMPTY BODY */


_8._2.  _N_a_m_i_n_g _C_o_n_v_e_n_t_i_o_n_s

    Individual projects will no doubt have their own naming
conventions.  There are some general rules however.

+o    An initial underscore should not be used for any  user-
    created names27.  UNIX uses it for names that the  user
    should   not  have  to  know  (like  the  standard  I/O
    library)28.

+o    Macro names, _t_y_p_e_d_e_f names, and _d_e_f_i_n_e names should  be
    all in CAPS.

+o    Variable names, structure tag names, and function names
    should be  in  lower  case29.   Some  macros  (such  as
    _g_e_t_c_h_a_r  and  _p_u_t_c_h_a_r) are in lower case since they may
    also exist as functions.  Care is  needed  when  inter-
    changing  macros  and  functions  since  functions pass
    their parameters by value  whereas  macros  pass  their
    arguments by name substitution30.
_________________________

27.  Trailing underscores should be avoided too.

28.  This convention is reserved for  system  purposes.   If
    you  must have your own private identifiers, begin them
    with a capital letter identifying the package to  which
    they belong.

29.  It  is  best  to  avoid names that differ only in case,
    like _f_o_o and  _F_O_O.   The  potential  for  confusion  is
    considerable.

30.  This  difference also means that carefree use of macros
    requires care when they  are  defined.   Remember  that
    complex  expressions  can  be  used  as parameters, and
    operator-precedence  problems  can  arise  unless   all
    occurrences   of  parameters  in  the  definition  have
    parentheses around them.  There is little that  can  be



                      April 18, 1990





                          - 12 -


_8._3.  _C_o_n_s_t_a_n_t_s

    Numerical constants should  not  be  coded  directly31.
The  _d_e_f_i_n_e  feature of the C preprocessor should be used to
assign a meaningful name.  This will also make it easier  to
administer  large  programs  since the constant value can be
changed uniformly by changing only the _d_e_f_i_n_e.  The enumera-
tion  data  type  is  the preferred way to handle situations
where a variable takes on only a  discrete  set  of  values,
since additional type checking is available through _l_i_n_t.

    There are some cases where the constants 0  and  1  may
appear  as themselves instead of as defines.  For example if
a _f_o_r loop indexes through an array, then

       for (i = 0; i < ARYBOUND; i++)

is reasonable while the code

       fptr = fopen(filename, "r");
       if (fptr == 0)
               error("can't open %s\n", filename);

is not.  In the last example the defined  constant  _N_U_L_L  is
available  as part of the standard I/O library's header file
_s_t_d_i_o._h and must be used in place of the 0.

_9.  _P_o_r_t_a_b_i_l_i_t_y

    The advantages of portable code are well  known.   This
section  gives  some  guidelines  for writing portable code,
where the definition of portable is taken  to  mean  that  a
source file contains portable code if it can be compiled and
executed on different machines with the only  source  change
being the inclusion of possibly different header files.  The
header files will contain defines and typedefs that may vary
from  machine  to  machine.   Reference  [1] contains useful
information on both style  and  portability.   Many  of  the
recommendations  in  this  document  originated in [1].  The
following is a list of pitfalls to be avoided and  recommen-
dations to be considered when designing portable code:

+o    First,  one  must  recognize  that  some   things   are
    inherently  non-portable.   Examples  are  code to deal
    with particular hardware registers such as the  program
_________________________
    done  about  the  problems  caused  by  side effects in
    parameters except to avoid side effects in  expressions
    (a good idea anyway).

31.  At  the  very  least,  any   directly-coded   numerical
    constant  must have a comment explaining the derivation
    of the value.




                      April 18, 1990





                          - 13 -


    status word, and code that is  designed  to  support  a
    particular  piece  of  hardware such as an assembler or
    I/O driver.  Even in these cases there  are  many  rou-
    tines  and  data organizations that can be made machine
    independent.  It  is  suggested  that  source  file  be
    organized  so that the machine-independent code and the
    machine-dependent code are in separate files.  Then  if
    the  program  is  to be moved to a new machine, it is a
    much  easier  task  to  determine  what  needs  to   be
    changed32.  It  is  also  possible  that  code  in  the
    machine-independent  files  may have uses in other pro-
    grams as well.

+o    Pay attention to word sizes.  The following sizes apply
    to  basic types in C for the machines that will be used
    most at IH33:

                   type    pdp11   3B   IBM
                   ________________________
                   char        8    8     8
                   short      16   16    16
                   int        16   32    32
                   long       32   32    32

    In general if the word size is important, _s_h_o_r_t or _l_o_n_g
    should  be used to get 16 or 32 bit items on any of the
    above machines34.  If a simple loop  counter  is  being
    used  where either 16 or 32 bits will do, then use _i_n_t,
    since it will get the most efficient (natural) unit for
    the current machine35.
_________________________

32.  If you  #_i_f_d_e_f  dependencies,  make  sure  that  if  no
    machine is specified, the result is a syntax error, _n_o_t
    a default machine!

33.  The 3B is a Bell Labs machine.  The VAX, not  shown  in
    the table, is similar to the 3B in these respects.  The
    68000 resembles either the pdp11 or the  3B,  depending
    on the particular compiler.

34.  Any unsigned type other than plain _u_n_s_i_g_n_e_d _i_n_t  should
    be  _t_y_p_e_d_e_fed,  as  such  types  are  highly  compiler-
    dependent.  This is also true of long and  short  types
    other  than  _l_o_n_g  _i_n_t  and  _s_h_o_r_t _i_n_t.  Large programs
    should  have  a  central  header  file  which  supplies
    _t_y_p_e_d_e_fs  for  commonly-used  width-sensitive types, to
    make it easier to change them and  to  aid  in  finding
    width-sensitive code.

35.  Beware   of   making  assumptions  about  the  size  of
    pointers.  They are not always the same  size  as  _i_n_t.
    Nor  are  all  pointers always the same size, or freely
    interconvertible.  Pointer-to-character is a particular



                      April 18, 1990





                          - 14 -


+o    Word size also affects shifts and masks.  The code

            x &= 0177770

    will clear only the three rightmost bits of an _i_n_t on a
    PDP11.   On  a  3B  it will also clear the entire upper
    halfword.  Use

            x &= ~07

    instead which works properly on all machines36.

+o    Code that  takes  advantage  of  the  two's  complement
    representation  of  numbers on most machines should not
    be used.  Optimizations that replace arithmetic  opera-
    tions  with equivalent shifting operations are particu-
    larly suspect.  You should weigh the time savings  with
    the  potential for obscure and difficult bugs when your
    code is moved, say, from a 3B to a 1A.

+o    Watch out for signed characters.  On the PDP-11,  char-
    acters  are  sign  extended  when  used in expressions,
    which is not the case on any other machine.  In partic-
    ular,  _g_e_t_c_h_a_r is an integer-valued function (or macro)
    since the value of _E_O_F for the standard I/O library  is
    -1,  which is not possible for a character on the 3B or
    IBM37.

+o    The PDP-11 is unique among processors on which C exists
    in  that  the  bytes  are  numbered  from right to left
    within a word.  All other machines (3B, IBM,  Interdata
    8/32, Honeywell) number the bytes from left to right38.
    Hence  any code that depends on the left-right orienta-
    tion of bits in a word deserves special scrutiny.   Bit
    fields  within  structure members will only be portable
_________________________
    trouble  spot  on  machines which do not address to the
    byte.

36.  The or operator ( | ) does not have these problems, nor
    do   bitfields  (which,  unfortunately,  are  not  very
    portable due to defective compilers).

37.  Actually, this is not quite the real reason why _g_e_t_c_h_a_r
    returns  _i_n_t,  but  the  comment  is valid:  code which
    assumes either that characters are signed or that  they
    are  unsigned  is unportable.  It is best to completely
    avoid using _c_h_a_r  to  hold  numbers.   Manipulation  of
    characters  as  if  they  were  numbers  is  also often
    unportable.

38.  Actually,  there  are  some more right-to-left machines
    now, but the comments still apply.




                      April 18, 1990





                          - 15 -


    so long as two separate fields are  never  concatenated
    and treated as a unit39.  [1,3]

+o    Do not default the boolean test for non-zero, i.e.

            if (f() != FAIL)

    is better than

            if (f())

    even though _F_A_I_L may have the value  0  which  is  con-
    sidered  to  mean false by C40.  This will help you out
    later when  somebody  decides  that  a  failure  return
    should be -1 instead of 0 41.

+o    Be suspicious of numeric values appearing in the  code.
    Even  simple  values  like  0  or  1  could  be  better
    expressed using defines like _F_A_L_S_E and _T_R_U_E (see previ-
    ous item)42.  Any other constants appearing in  a  pro-
    gram  would  be better expressed as a defined constant.
    This makes it easier to change and also easier to read.

+o    Become familiar with  existing  library  functions  and
_________________________

39.  The same applies to variables  in  general.   Alignment
    considerations  and  loader  peculiarities make it very
    rash  to   assume   that   two   consecutively-declared
    variables are together in memory, or that a variable of
    one type is aligned appropriately to be used as another
    type.

40.  A  particularly  notorious case is using _s_t_r_c_m_p to test
    for string equality, where the result should _n_e_v_e_r _e_v_e_r
    be  defaulted.   The  preferred approach is to define a
    macro _S_T_R_E_Q:

            #define STREQ(a, b) (strcmp((a), (b)) == 0)


41.  An exception is commonly made for predicates, which are
    functions which meet the following restrictions:

+o    Has no other purpose than to return true or false.

+o    Returns 0 for false, 1 for true, nothing else.

+o    Is named so that the meaning of (say) a  `true'  return
    is  absolutely  obvious.   Call  a predicate _i_s_v_a_l_i_d or
    _v_a_l_i_d, not _c_h_e_c_k_v_a_l_i_d.

42.  Actually, _Y_E_S and _N_O often read better.




                      April 18, 1990





                          - 16 -


    defines43.  You should not be writing your  own  string
    compare  routine, or making your own defines for system
    structures44.  Not only does this waste your time,  but
    it  prevents  your program from taking advantage of any
    microcode assists or other means of  improving  perfor-
    mance of system routines45.

+o    Use _l_i_n_t.  It is a valuable tool for  finding  machine-
    dependent  constructs  as well as other inconsistencies
    or program bugs that pass the compiler46.

_1_0.  _L_i_n_t

    _L_i_n_t is a C program checker [2] that examines C  source
files  to  detect  and report type incompatibilities, incon-
sistencies between function definitions and calls, potential
program  bugs,  etc.   It  is  expected  that  projects will
require programs to use _l_i_n_t as part of the official  accep-
tance procedure47.  In addition, work is going on in depart-
ment 5521 to modify _l_i_n_t so that it will check for adherence
to the standards in this document.

    It is still too early  to  say  exactly  which  of  the
_________________________

43.  But not _t_o_o familiar.  The internal details of  library
    facilities,  as  opposed  to their external interfaces,
    are subject to change without warning.  They  are  also
    often quite unportable.

44.  Or,  especially,  writing  your  own  code  to  control
    terminals.  Use the _t_e_r_m_c_a_p package.

45.  It  also  makes  your  code  less readable, because the
    reader has to figure out whether you're doing something
    special  in  that  reimplemented  stuff  to justify its
    existence.  Furthermore,  it's  a  fruitful  source  of
    bugs.

46.  The  use  of  _l_i_n_t  on   all   programs   is   strongly
    recommended.   It  is difficult to eliminate complaints
    about functions whose return value is not used (in  the
    current  version  of  C,  at  least),  but  most  other
    messages from _l_i_n_t really do indicate something  wrong.
    The  -h, -p, -a, -x, and -c options are worth learning.
    All of them will complain about some legitimate things,
    but  they will also pick up many botches.  Note that -p
    checks function-call type-consistency for only a subset
    of  Unix library routines, so programs should be linted
    both  with   and   without   this   option   for   best
    ``coverage''.

47.  Yes.




                      April 18, 1990





                          - 17 -


standards given here will be checked by _l_i_n_t.  In some cases
such  as  whether a comment is misleading or incorrect there
is little hope of mechanical checking.  In other cases  such
as  checking  that  the  opening brace of a function body is
alone on a line in column  1,  the  test  has  already  been
added48.  Future bulletins will  be  used  to  announce  new
additions to _l_i_n_t as they occur.

    It should be noted that the best way to use _l_i_n_t is not
as  a  barrier  that must be overcome before official accep-
tance of a program, but rather as a  tool  to  use  whenever
major changes or additions to the code have been made.  _L_i_n_t
can find obscure bugs and insure portability before problems
occur.

_1_1.  _S_p_e_c_i_a_l _C_o_n_s_i_d_e_r_a_t_i_o_n_s

    This  section  contains  some  miscellaneous  do's  and
don'ts.

+o    Don't change syntax via macro substitution.   It  makes
    the program unintelligible to all but the perpetrator.

+o    There is a time and a  place  for  embedded  assignment
    statements49.  In some constructs there  is  no  better
    way  to  accomplish the results without making the code
    bulkier and less readable.  The _w_h_i_l_e loop  in  section
    8.1 is one example of an appropriate place.  Another is
    the common code segment:

            while ((c = getchar()) != EOF) {
                    process the character
            }

    Using embedded assignment statements  to  improve  run-
    time performance is also possible.  However, one should
    consider  the  tradeoff  between  increased  speed  and
    decreased  maintainability  that  results when embedded
    assignments are used in artificial places.   For  exam-
    ple, the code:

            a = b + c;
            d = a + r;

    should not be replaced by

_________________________

48.  Little of this is relevant at U of T.  The  version  of
    _l_i_n_t that we have lacks these mods.

49.  The ++++++++ and -------- operators count as assignment statements.
    So, for many purposes, do functions with side effects.




                      April 18, 1990





                          - 18 -



            d = (a = b + c) + r;

    even though the latter may save one cycle.   Note  that
    in  the  long  run  the time difference between the two
    will decrease as the optimizer  gains  maturity,  while
    the  difference in ease of maintenance will increase as
    the human memory of what's going on in the latter piece
    of code begins to fade50.

+o    There is also a time and  place  for  the  ternary  ? :
    operator  and  the  binary comma operator.  The logical
    expression   operand   before   the   ? :   should   be
    parenthesized:

            (x >= 0) ? x : -x

    Nested ? : operators can be  confusing  and  should  be
    avoided  if  possible.   There  are  some  macros  like
    _g_e_t_c_h_a_r where they can be useful.  The  comma  operator
    can  also be useful in _f_o_r statements to provide multi-
    ple initializations or incrementations.

+o    Goto statements should be  used  sparingly  as  in  any
    well-structured code51.  The main place where they  can
    be  usefully employed is to break out of several levels
    of _s_w_i_t_c_h, _f_o_r, and _w_h_i_l_e nesting52, e.g.

            for (...)
                    for (...) {
                            ...
                            if (disaster)
                                    goto error;
                    }
            ...
    error:
            clean up the mess

    When a _g_o_t_o is necessary the accompanying label  should
    be  alone  on a line and tabbed one tab position to the
_________________________

50.  Note also that  side  effects  within  expressions  can
    result  in code whose semantics are compiler-dependent,
    since C's order of evaluation is  explicitly  undefined
    in most places.  Compilers do differ.

51.  The _c_o_n_t_i_n_u_e statement is almost as bad.  _B_r_e_a_k is less
    troublesome.

52.  The need to do such a thing may indicate that the inner
    constructs   should  be  broken  out  into  a  separate
    function, with a success/failure return code.




                      April 18, 1990





                          - 19 -


    left of the associated code that follows.

+o    This committee recommends that programmers not rely  on
    automatic   beautifiers   for  the  following  reasons.
    First, the main person who benefits from  good  program
    style  is  the  programmer himself.  This is especially
    true in the early design of handwritten  algorithms  or
    pseudo-code.  Automatic beautifiers can only be applied
    to complete, syntactically correct programs  and  hence
    are  not available when the need for attention to white
    space and indentation is greatest.   It  is  also  felt
    that  programmers  can  do a better job of making clear
    the complete visual layout of a function or file,  with
    the  normal  attention  to detail of a careful program-
    mer53.  Sloppy programmers should learn to  be  careful
    programmers  instead of relying on a beautifier to make
    their code readable.  Finally, it is  felt  that  since
    beautifiers  are  non-trivial  programs that must parse
    the source, the burden of maintaining them in the  face
    of the continuing evolution of C is not worth the bene-
    fits gained by such a program.

_1_2.  _P_r_o_j_e_c_t _D_e_p_e_n_d_e_n_t _S_t_a_n_d_a_r_d_s

    Individual projects may wish  to  establish  additional
standards beyond those given here.  The following issues are
some of those that should be adddressed by each project pro-
gram administration group.

+o    What additional naming conventions should be  followed?
    In  particular, systematic prefix conventions for func-
    tional grouping of global data and also  for  structure
    or union member names can be useful.

+o    What kind of include file organization  is  appropriate
    for the project's particular data hierarchy?

+o    What procedures should  be  established  for  reviewing
    _l_i_n_t  complaints?   A tolerance level needs to be esta-
    blished in concert with the  _l_i_n_t  options  to  prevent
    unimportant  complaints  from  hiding  complaints about
    real bugs or inconsistencies.

+o    If a project establishes its own archive libraries,  it
    should plan on supplying a lint library file [2] to the
    system administrators.  This will allow _l_i_n_t  to  check
    for compatible use of library functions.

_________________________

53.  In other words, some of the visual layout  is  dictated
    by  intent rather than syntax.  Beautifiers cannot read
    minds.




                      April 18, 1990





                          - 20 -


_1_3.  _C_o_n_c_l_u_s_i_o_n

    A set of standards has been presented for C programming
style.   One  of the most important points is the proper use
of white space and comments so that  the  structure  of  the
program  is  evident  from  the layout of the code.  Another
good idea to keep in mind when writing code is  that  it  is
likely  that  you or someone else will be asked to modify it
or make it run  on  a  different  machine  sometime  in  the
future.

    As with any standard, it must be followed if it  is  to
be  useful.   The  Indian  Hill version of _l_i_n_t will enforce
those standards that are amenable to automatic checking.  If
you have trouble following any of these standards don't just
ignore them.  Programmers at Indian Hill should bring  their
problems  to  the  Software  Development  System  Group (Lee
Kirchhoff, contact) in department 5522.  Programmers outside
Indian  Hill  should contact the Processor Application Group
(Layne Cannon, contact) in department 5512 54.






























_________________________

54.  At U of T Zoology, it's Henry Spencer in 336B.




                      April 18, 1990





                          - 21 -


                        RRRReeeeffffeeeerrrreeeennnncccceeeessss



[1]  B.A. Tague, "C Language Portability",  Sept  22,  1977.
    This  document issued by department 8234 contains three
    memos by R.C. Haight, A.L. Glasser, and T.L. Lyon deal-
    ing with style and portability.

[2]  S.C. Johnson, "Lint, a C  Program  Checker",  Technical
    Memorandum, 77-1273-14, September 16, 1977.

[3]  R.W. Mitze, "The 3B/PDP-11 Swabbing Problem",  Memoran-
    dum for File, 1273-770907.01MF, September 14, 1977.

[4]  R.A. Elliott and D.C.  Pfeffer,  "3B  Processor  Common
    Diagnostic  Standards- Version 1", Memorandum for File,
    5514-780330.01MF, March 30, 1978.

[5]  R.W. Mitze, "An Overview of C Compilation of UNIX  User
    Processes  on  the  3B",  Memorandum  for  File,  5521-
    780329.02MF, March 29, 1978.

[6]  B.W. Kernighan and  D.M.  Ritchie,  _T_h_e  _C  _P_r_o_g_r_a_m_m_i_n_g
    _L_a_n_g_u_a_g_e, Prentice-Hall 1978.
































                      April 18, 1990





                          - 22 -



/*
* TTTThhhheeee CCCC SSSSttttyyyylllleeee SSSSuuuummmmmmmmaaaarrrryyyy SSSShhhheeeeeeeetttt                            Block comment,
* by Henry Spencer, U of T Zoology                     describes file.
*/

#include <errno.h>                                      Headers; don't nest.

typedef int     SEQNO;  /* ... */                       Global definitions.
#define STREQ(a, b)     (strcmp((a), (b)) == 0)

static char *foo = NULL;        /* ... */               Global declarations.
struct bar {                                            Static whenever poss.
       SEQNO alpha;    /* ... */
#               define  NOSEQNO 0
       int beta;       /* ... */                       Don't assume 16 bits.
};

/*
* Many unnecessary braces, to show where.              Functions.
*/
static int              /* what is returned */          Don't default int.
bletch(a)
int a;                  /* ... */                       Don't default int.
{
       int bar;                /* ... */
       extern int errno;       /* ..., changed here */
       extern char *index();

       if (foobar() != FAIL) {                 if (!isvalid()) {
               return(OK);                             errno = ERANGE;
       }                                       } else {
                                                       x = &y + z->field;
       while (x == (y & MASK)) {               }
               f += (x >= 0) ? x : -x;
       }                                       for (i = 0; i < BOUND; i++) {
                                                       /* lint -h[p]cax. */
       do {                                    }
               /* Avoid nesting ?: */
       } while (index(a, b) != NULL);          if (STREQ(x, "foo")) {
                                                       x |= 07;        /* 07 is... */
       switch (...) {                          } else if (STREQ(x, "bar")) {
       case ABC:                                       x &= ~077;      /* 077 is... */
       case DEF:                               } else if (STREQ(x, "ugh")) {
               printf("...", a, b);                    /* Avoid gotos */
               break;                          } else {
       case XYZ:                                       /* and continues. */
               x = y;                          }
               /* FALLTHROUGH */
       default:                                while ((c = getc()) != EOF)
               /* Limit imbedded =s. */                ;       /* NULLBODY */
               break;
       }
}



                      April 18, 1990





                          - 23 -


    ---------------
























































                      April 18, 1990