This document is an annotated (by the last
author) version of the original paper of the same
title. It describes a set of coding standards and
recommendations which are local standards for
officially-supported UNIX programs. The scope is
coding style, not functional organization.
April 18, 1990
_________________________
|- UNIX is a trademark of Bell Laboratories.
This document is a result of a committee formed at
Indian Hill to establish a common set of coding standards
and recommendations for the Indian Hill community. The
scope of this work is the coding style, not the functional
organization of programs. The standards in this document
are not specific to ESS programming only1. We have tried to
combine previous work [1,6] on C style into a uniform set of
standards that should be appropriate for any project using
C2.
_________________________
|- UNIX is a trademark of Bell Laboratories.
1. In fact, they're pretty good general standards. ``To
be clear is professional; not to be clear is
unprofessional.'' - Sir Ernest Gowers. This document
is presented unadulterated; U of T variations,
comments, exceptions, etc. are presented in footnotes.
2. Of necessity, these standards cannot cover all
situations. Experience and informed judgement count
for much. Inexperienced programmers who encounter
unusual situations should consult 1) code written by
experienced C programmers following these rules, or 2)
experienced C programmers.
April 18, 1990
- 2 -
_2. _F_i_l_e _O_r_g_a_n_i_z_a_t_i_o_n
A file consists of various sections that should be
separated by several blank lines. Although there is no max-
imum length requirement for source files, files with more
than about 1500 lines are cumbersome to deal with. The edi-
tor may not have enough temp space to edit the file, compi-
lations will go slower, etc. Since most of us use 300 baud
terminals, entire rows of asterisks, for example, should be
discouraged3. Also lines longer than 80 columns are not
handled well by all terminals and should be avoided if pos-
sible4.
The suggested order of sections for a file is as fol-
lows:
1. Any header file includes should be the first thing in
the file.
2. Immediately after the includes5 should be a prologue
that tells what is in that file. A description of the
purpose of the objects in the files (whether they be
functions, external data declarations or definitions,
or something else) is more useful than a list of the
object names.
3. Any typedefs and defines that apply to the file as a
whole are next.
4. Next come the global (external) data declarations. If
a set of defines applies to a particular piece of glo-
bal data (such as a flags word), the defines should be
immediately after the data declaration6.
5. The functions come last7.
_________________________
3. This is not a problem at U of T, or most other sensible
places, but rows of asterisks are still annoying.
4. Excessively long lines which result from deep indenting
are often a symptom of poorly-organized code.
5. A common variation, in both Bell code and ours, is to
reverse the order of sections 1 and 2. This is an
acceptable practice.
6. Such defines should be indented to put the _d_e_f_i_n_es one
level deeper than the first keyword of the declaration
to which they apply.
7. They should be in some sort of meaningful order. Top-
down is generally better than bottom-up, and a
``breadth-first'' approach (functions on a similar
UNIX requires certain suffix conventions for names of
files to be processed by the _c_c command [5]8. The following
suffixes are required:
+o C source file names must end in ._c
+o Assembler source file names must end in ._s
In addition the following conventions are universally
followed:
+o Relocatable object file names end in ._o
+o Include header file names end in ._h 9 or ._d
+o Ldp10 specification file names end in ._b
+o Yacc source file names end in ._y
+o Lex source file names end in ._l
_3. _H_e_a_d_e_r _F_i_l_e_s
Header files are files that are included in other files
prior to compilation by the C preprocessor. Some are
defined at the system level like _s_t_d_i_o._h which must be
included by any program using the standard I/O library.
Header files are also used to contain data declarations and
defines that are needed by more than one program11. Header
_________________________
level of abstraction together) is preferred over
depth-first (functions defined as soon as possible
after their calls). Considerable judgement is called
for here. If defining large numbers of essentially-
independent utility functions, consider alphabetical
order.
8. In addition to the suffix conventions given here, it is
conventional to use `Makefile' (not `makefile') for the
control file for _m_a_k_e and `README' for a summary of the
contents of a directory or directory tree.
9. Preferred. An alternate convention that may be
preferable in multi-language environments is to use the
same suffix as an ordinary source file but with two
periods instead of one (e.g. ``foo..c'').
10. No idea what this is.
11. Don't use absolute pathnames for header files. Use the
<_n_a_m_e> construction for getting them from a standard
place, or define them relative to the current
April 18, 1990
- 4 -
files should be functionally organized, i.e., declarations
for separate subsystems should be in separate header files.
Also, if a set of declarations is likely to change when code
is ported from one machine to another, those declarations
should be in a separate header file.
Header files should not be nested. Some objects like
typedefs and initialized data definitions cannot be seen
twice by the compiler in one compilation. On non-UNIX sys-
tems this is also true of uninitialized declarations without
the _e_x_t_e_r_n keyword12. This can happen if include files are
nested and will cause the compilation to fail.
_4. _E_x_t_e_r_n_a_l _D_e_c_l_a_r_a_t_i_o_n_s
External declarations should begin in column 1. Each
declaration should be on a separate line. A comment
describing the role of the object being declared should be
included, with the exception that a list of defined con-
stants do not need comments if the constant names are suffi-
cient documentation. The comments should be tabbed so that
they line up underneath each other13. Use the tab character
(CTRL I if your terminal doesn't have a separate key) rather
than blanks. For structure and union template declarations,
each element should be alone on a line with a comment
describing it. The opening brace ( { ) should be on the
same line as the structure tag, and the closing brace should
be alone on a line in column 1, i.e.
struct boat {
int wllength; /* water line length in feet */
int type; /* see below */
long sarea; /* sail area in square feet */
};
/*
* defines for boat.type14
*/
#define KETCH 1
#define YAWL 2
#define SLOOP 3
#define SQRIG 4
#define MOTOR 5
_________________________
directory. The ----IIII option of the C compiler is the best
way to handle extensive private libraries of header
files; it permits reorganizing the directory structure
without having to alter source files.
12. It should be noted that declaring variables in a header
file is often a poor idea. Frequently it is a symptom
of poor partitioning of code between files.
13. So should the constant names and their defined values.
April 18, 1990
- 5 -
If an external variable is initialized15 the equal sign
should not be omitted16.
int x = 1;
char *msg = "message";
struct boat winner = {
40, /* water line length */
YAWL,
600 /* sail area */
};
17
_5. _C_o_m_m_e_n_t_s
Comments that describe data structures, algorithms,
etc., should be in block comment form with the opening /* in
column one, a * in column 2 before each line of comment
text18, and the closing */ in columns 2-3.
_________________________
14. These defines are better put right after the
declaration of _t_y_p_e, within the _s_t_r_u_c_t declaration,
with enough tabs after # to indent _d_e_f_i_n_e one level
more than the structure member declarations.
15. Any variable whose initial value is important should be
_e_x_p_l_i_c_i_t_l_y initialized, or at the very least should be
commented to indicate that C's default initialization
to 0 is being relied on.
16. The empty initializer, ``{}'', should never be used.
Structure initializations should be fully parenthesized
with braces. Constants used to initialize longs should
be explicitly long.
17. In any file which is part of a larger whole rather than
a self-contained program, maximum use should be made of
the _s_t_a_t_i_c keyword to make functions and variables
local to single files. Variables in particular should
be accessible from other files only when there is a
clear need that cannot be filled in another way. Such
usages should be commented to make it clear that
another file's variables are being used; the comment
should name the other file.
18. Some automated program-analysis packages use a
different character in this position as a marker for
lines with specific items of information. In
particular, a line with a `-' here in a comment
preceding a function is sometimes assumed to be a one-
line summary of the function's purpose.
April 18, 1990
- 6 -
/*
* Here is a block comment.
* The comment text should be tabbed over19
* and the opening /* and closing star-slash
* should be alone on a line.
*/
Note that _g_r_e_p ^.\* will catch all block comments in
the file. In some cases, block comments inside a function
are appropriate, and they should be tabbed over to the same
tab setting as the code that they describe. Short comments
may appear on a single line indented over to the tab setting
of the code that follows.
if (argc > 1) {
/* Get input file from command line. */
if (freopen(argv[1], "r", stdin) == NULL)
error("can't open %s\n", argv[1]);
}
Very short comments may appear on the same line as the
code they describe, but should be tabbed over far enough to
separate them from the statements. If more than one short
comment appears in a block of code they should all be tabbed
to the same tab setting.
if (a == 2)
return(TRUE); /* special case */
else
return(isprime(a)); /* works only for odd a */
_6. _F_u_n_c_t_i_o_n _D_e_c_l_a_r_a_t_i_o_n_s
Each function should be preceded by a block comment
prologue that gives the name and a short description of what
the function does20. If the function returns a value, the
type of the value returned should be alone on a line in
column 1 (do not default to _i_n_t). If the function does not
return a value then it should not be given a return type.
_________________________
19. A common practice in both Bell and local code is to use
a space rather than a tab after the *. This is
acceptable.
20. Discussion of non-trivial design decisions is also
appropriate, but avoid duplicating information that is
present in (and clear from) the code. It's too easy
for such redundant information to get out of date.
April 18, 1990
- 7 -
If the value returned requires a long explanation, it should
be given in the prologue; otherwise it can be on the same
line as the return type, tabbed over. The function name and
formal parameters should be alone on a line beginning in
column 1. Each parameter should be declared (do not default
to _i_n_t), with a comment on a single line. The opening brace
of the function body should also be alone on a line begin-
ning in column 1. The function name, argument declaration
list, and opening brace should be separated by a blank
line21. All local declarations and code within the function
body should be tabbed over at least one tab.
If the function uses any external variables, these
should have their own declarations in the function body
using the _e_x_t_e_r_n keyword. If the external variable is an
array the array bounds must be repeated in the _e_x_t_e_r_n
declaration. There should also be _e_x_t_e_r_n declarations for
all functions called by a given function. This is particu-
larly beneficial to someone picking up code written by
another. If a function returns a value of type other than
_i_n_t, it is required by the compiler that such functions be
declared before they are used. Having the _e_x_t_e_r_n delcara-
tion in the calling function's declarations section avoids
all such problems22.
In general each variable declaration should be on a
separate line with a comment describing the role played by
the variable in the function. If the variable is external
or a parameter of type pointer which is changed by the func-
tion, that should be noted in the comment. All such com-
ments for parameters and local variables should be tabbed so
that they line up underneath each other. The declarations
should be separated from the function's statements by a
blank line.
A local variable should not be redeclared in nested
blocks23. Even though this is valid C, the potential
_________________________
21. Neither Bell nor local code has ever included these
separating blank lines, and it is not clear that they
add anything useful. Leave them out.
22. These rules tend to produce a lot of clutter. Both
Bell and local practice frequently omits _e_x_t_e_r_n
declarations for _s_t_a_t_i_c variables and functions. This
is permitted. Omission of declarations for standard
library routines is also permissible, although if they
_a_r_e declared it is better to declare them within the
functions that use them rather than globally.
23. In fact, avoid any local declarations that override
declarations at higher levels.
April 18, 1990
- 8 -
confusion is enough that _l_i_n_t will complain about it when
given the ----hhhh option.
_6._1. _E_x_a_m_p_l_e_s
/*
* skyblue()
*
* Determine if the sky is blue.
*/
int /* TRUE or FALSE */
skyblue()
{
extern int hour;
if (hour < MORNING || hour > EVENING)
return(FALSE); /* black */
else
return(TRUE); /* blue */
}
/*
* tail(nodep)
*
* Find the last element in the linked list
* pointed to by nodep and return a pointer to it.
*/
NODE * /* pointer to tail of list */
tail(nodep)
NODE *nodep; /* pointer to head of list */
{
register NODE *np; /* current pointer advances to NULL */
register NODE *lp; /* last pointer follows np */
Compound statements are statements that contain lists
of statements enclosed in braces. The enclosed list should
be tabbed over one more than the tab position of the com-
pound statement itself. The opening left brace should be at
April 18, 1990
- 9 -
the end of the line beginning the compound statement and the
closing right brace should be alone on a line, tabbed under
the beginning of the compound statement. Note that the left
brace beginning a function body is the only occurrence of a
left brace which is alone on a line.
Note that the right brace before the _e_l_s_e and the right
brace before the _w_h_i_l_e of a _d_o-_w_h_i_l_e statement (below) are
the only places where a right braces appears that is not
alone on a line.
for (i = 0; i < MAX; i++) {
statement;
statement;
}
while (expr) {
statement;
statement;
}
do {
statement;
statement;
} while (expr);
switch (expr) {
case ABC:
case DEF:
statement;
break;
case XYZ:
statement;
break;
default:
statement;
break24;
}
April 18, 1990
- 10 -
Note that when multiple _c_a_s_e labels are used, they are
placed on separate lines. The fall through feature of the C
_s_w_i_t_c_h statement should rarely if ever be used when code is
executed before falling through to the next one. If this is
done it must be commented for future maintenance.
if (strcmp(reply, "yes") == EQUAL) {
statements for yes
...
} else if (strcmp(reply, "no") == EQUAL) {
statements for no
...
} else if (strcmp(reply, "maybe") == EQUAL) {
statements for maybe
...
} else {
statements for none of the above
...
}
The last example is a generalized _s_w_i_t_c_h statement and the
tabbing reflects the switch between exactly one of several
alternatives rather than a nesting of statements.
_8. _E_x_p_r_e_s_s_i_o_n_s
_8._1. _O_p_e_r_a_t_o_r_s
The old versions of equal-ops =+, =-, =*, etc. should
not be used. The preferred use is +=, -=, *=, etc. All
binary operators except . and -> should be separated from
their operands by blanks25. In addition, keywords that are
followed by expressions in parentheses should be separated
from the left parenthesis by a blank26. Blanks should also
appear after commas in argument lists to help separate the
arguments visually. On the other hand, macros with argu-
ments and function calls should not have a blank between the
name and the left parenthesis. In particular, the C prepro-
cessor requires the left parenthesis to be immediately after
_________________________
24. This _b_r_e_a_k is, strictly speaking, unnecessary, but it
is required nonetheless because it prevents a fall-
through error if another _c_a_s_e is added later after the
last one.
25. Some judgement is called for in the case of complex
expressions, which may be clearer if the ``inner''
operators are not surrounded by spaces and the
``outer'' ones are.
26. _S_i_z_e_o_f is an exception, see the discussion of function
calls. Less logically, so is _r_e_t_u_r_n.
April 18, 1990
- 11 -
the macro name or else the argument list will not be recog-
nized. Unary operators should not be separated from their
single operand. Since C has some unexpected precedence
rules, all expressions involving mixed operators should be
fully parenthesized.
_E_x_a_m_p_l_e_s
a += c + d;
a = (a + b) / (c * d);
strp->field = str.fl - ((x & MASK) >> DISP);
while (*d++ = *s++)
; /* EMPTY BODY */
_8._2. _N_a_m_i_n_g _C_o_n_v_e_n_t_i_o_n_s
Individual projects will no doubt have their own naming
conventions. There are some general rules however.
+o An initial underscore should not be used for any user-
created names27. UNIX uses it for names that the user
should not have to know (like the standard I/O
library)28.
+o Macro names, _t_y_p_e_d_e_f names, and _d_e_f_i_n_e names should be
all in CAPS.
+o Variable names, structure tag names, and function names
should be in lower case29. Some macros (such as
_g_e_t_c_h_a_r and _p_u_t_c_h_a_r) are in lower case since they may
also exist as functions. Care is needed when inter-
changing macros and functions since functions pass
their parameters by value whereas macros pass their
arguments by name substitution30.
_________________________
27. Trailing underscores should be avoided too.
28. This convention is reserved for system purposes. If
you must have your own private identifiers, begin them
with a capital letter identifying the package to which
they belong.
29. It is best to avoid names that differ only in case,
like _f_o_o and _F_O_O. The potential for confusion is
considerable.
30. This difference also means that carefree use of macros
requires care when they are defined. Remember that
complex expressions can be used as parameters, and
operator-precedence problems can arise unless all
occurrences of parameters in the definition have
parentheses around them. There is little that can be
April 18, 1990
- 12 -
_8._3. _C_o_n_s_t_a_n_t_s
Numerical constants should not be coded directly31.
The _d_e_f_i_n_e feature of the C preprocessor should be used to
assign a meaningful name. This will also make it easier to
administer large programs since the constant value can be
changed uniformly by changing only the _d_e_f_i_n_e. The enumera-
tion data type is the preferred way to handle situations
where a variable takes on only a discrete set of values,
since additional type checking is available through _l_i_n_t.
There are some cases where the constants 0 and 1 may
appear as themselves instead of as defines. For example if
a _f_o_r loop indexes through an array, then
for (i = 0; i < ARYBOUND; i++)
is reasonable while the code
fptr = fopen(filename, "r");
if (fptr == 0)
error("can't open %s\n", filename);
is not. In the last example the defined constant _N_U_L_L is
available as part of the standard I/O library's header file
_s_t_d_i_o._h and must be used in place of the 0.
_9. _P_o_r_t_a_b_i_l_i_t_y
The advantages of portable code are well known. This
section gives some guidelines for writing portable code,
where the definition of portable is taken to mean that a
source file contains portable code if it can be compiled and
executed on different machines with the only source change
being the inclusion of possibly different header files. The
header files will contain defines and typedefs that may vary
from machine to machine. Reference [1] contains useful
information on both style and portability. Many of the
recommendations in this document originated in [1]. The
following is a list of pitfalls to be avoided and recommen-
dations to be considered when designing portable code:
+o First, one must recognize that some things are
inherently non-portable. Examples are code to deal
with particular hardware registers such as the program
_________________________
done about the problems caused by side effects in
parameters except to avoid side effects in expressions
(a good idea anyway).
31. At the very least, any directly-coded numerical
constant must have a comment explaining the derivation
of the value.
April 18, 1990
- 13 -
status word, and code that is designed to support a
particular piece of hardware such as an assembler or
I/O driver. Even in these cases there are many rou-
tines and data organizations that can be made machine
independent. It is suggested that source file be
organized so that the machine-independent code and the
machine-dependent code are in separate files. Then if
the program is to be moved to a new machine, it is a
much easier task to determine what needs to be
changed32. It is also possible that code in the
machine-independent files may have uses in other pro-
grams as well.
+o Pay attention to word sizes. The following sizes apply
to basic types in C for the machines that will be used
most at IH33:
type pdp11 3B IBM
________________________
char 8 8 8
short 16 16 16
int 16 32 32
long 32 32 32
In general if the word size is important, _s_h_o_r_t or _l_o_n_g
should be used to get 16 or 32 bit items on any of the
above machines34. If a simple loop counter is being
used where either 16 or 32 bits will do, then use _i_n_t,
since it will get the most efficient (natural) unit for
the current machine35.
_________________________
32. If you #_i_f_d_e_f dependencies, make sure that if no
machine is specified, the result is a syntax error, _n_o_t
a default machine!
33. The 3B is a Bell Labs machine. The VAX, not shown in
the table, is similar to the 3B in these respects. The
68000 resembles either the pdp11 or the 3B, depending
on the particular compiler.
34. Any unsigned type other than plain _u_n_s_i_g_n_e_d _i_n_t should
be _t_y_p_e_d_e_fed, as such types are highly compiler-
dependent. This is also true of long and short types
other than _l_o_n_g _i_n_t and _s_h_o_r_t _i_n_t. Large programs
should have a central header file which supplies
_t_y_p_e_d_e_fs for commonly-used width-sensitive types, to
make it easier to change them and to aid in finding
width-sensitive code.
35. Beware of making assumptions about the size of
pointers. They are not always the same size as _i_n_t.
Nor are all pointers always the same size, or freely
interconvertible. Pointer-to-character is a particular
April 18, 1990
- 14 -
+o Word size also affects shifts and masks. The code
x &= 0177770
will clear only the three rightmost bits of an _i_n_t on a
PDP11. On a 3B it will also clear the entire upper
halfword. Use
x &= ~07
instead which works properly on all machines36.
+o Code that takes advantage of the two's complement
representation of numbers on most machines should not
be used. Optimizations that replace arithmetic opera-
tions with equivalent shifting operations are particu-
larly suspect. You should weigh the time savings with
the potential for obscure and difficult bugs when your
code is moved, say, from a 3B to a 1A.
+o Watch out for signed characters. On the PDP-11, char-
acters are sign extended when used in expressions,
which is not the case on any other machine. In partic-
ular, _g_e_t_c_h_a_r is an integer-valued function (or macro)
since the value of _E_O_F for the standard I/O library is
-1, which is not possible for a character on the 3B or
IBM37.
+o The PDP-11 is unique among processors on which C exists
in that the bytes are numbered from right to left
within a word. All other machines (3B, IBM, Interdata
8/32, Honeywell) number the bytes from left to right38.
Hence any code that depends on the left-right orienta-
tion of bits in a word deserves special scrutiny. Bit
fields within structure members will only be portable
_________________________
trouble spot on machines which do not address to the
byte.
36. The or operator ( | ) does not have these problems, nor
do bitfields (which, unfortunately, are not very
portable due to defective compilers).
37. Actually, this is not quite the real reason why _g_e_t_c_h_a_r
returns _i_n_t, but the comment is valid: code which
assumes either that characters are signed or that they
are unsigned is unportable. It is best to completely
avoid using _c_h_a_r to hold numbers. Manipulation of
characters as if they were numbers is also often
unportable.
38. Actually, there are some more right-to-left machines
now, but the comments still apply.
April 18, 1990
- 15 -
so long as two separate fields are never concatenated
and treated as a unit39. [1,3]
+o Do not default the boolean test for non-zero, i.e.
if (f() != FAIL)
is better than
if (f())
even though _F_A_I_L may have the value 0 which is con-
sidered to mean false by C40. This will help you out
later when somebody decides that a failure return
should be -1 instead of 0 41.
+o Be suspicious of numeric values appearing in the code.
Even simple values like 0 or 1 could be better
expressed using defines like _F_A_L_S_E and _T_R_U_E (see previ-
ous item)42. Any other constants appearing in a pro-
gram would be better expressed as a defined constant.
This makes it easier to change and also easier to read.
+o Become familiar with existing library functions and
_________________________
39. The same applies to variables in general. Alignment
considerations and loader peculiarities make it very
rash to assume that two consecutively-declared
variables are together in memory, or that a variable of
one type is aligned appropriately to be used as another
type.
40. A particularly notorious case is using _s_t_r_c_m_p to test
for string equality, where the result should _n_e_v_e_r _e_v_e_r
be defaulted. The preferred approach is to define a
macro _S_T_R_E_Q:
#define STREQ(a, b) (strcmp((a), (b)) == 0)
41. An exception is commonly made for predicates, which are
functions which meet the following restrictions:
+o Has no other purpose than to return true or false.
+o Returns 0 for false, 1 for true, nothing else.
+o Is named so that the meaning of (say) a `true' return
is absolutely obvious. Call a predicate _i_s_v_a_l_i_d or
_v_a_l_i_d, not _c_h_e_c_k_v_a_l_i_d.
42. Actually, _Y_E_S and _N_O often read better.
April 18, 1990
- 16 -
defines43. You should not be writing your own string
compare routine, or making your own defines for system
structures44. Not only does this waste your time, but
it prevents your program from taking advantage of any
microcode assists or other means of improving perfor-
mance of system routines45.
+o Use _l_i_n_t. It is a valuable tool for finding machine-
dependent constructs as well as other inconsistencies
or program bugs that pass the compiler46.
_1_0. _L_i_n_t
_L_i_n_t is a C program checker [2] that examines C source
files to detect and report type incompatibilities, incon-
sistencies between function definitions and calls, potential
program bugs, etc. It is expected that projects will
require programs to use _l_i_n_t as part of the official accep-
tance procedure47. In addition, work is going on in depart-
ment 5521 to modify _l_i_n_t so that it will check for adherence
to the standards in this document.
It is still too early to say exactly which of the
_________________________
43. But not _t_o_o familiar. The internal details of library
facilities, as opposed to their external interfaces,
are subject to change without warning. They are also
often quite unportable.
44. Or, especially, writing your own code to control
terminals. Use the _t_e_r_m_c_a_p package.
45. It also makes your code less readable, because the
reader has to figure out whether you're doing something
special in that reimplemented stuff to justify its
existence. Furthermore, it's a fruitful source of
bugs.
46. The use of _l_i_n_t on all programs is strongly
recommended. It is difficult to eliminate complaints
about functions whose return value is not used (in the
current version of C, at least), but most other
messages from _l_i_n_t really do indicate something wrong.
The -h, -p, -a, -x, and -c options are worth learning.
All of them will complain about some legitimate things,
but they will also pick up many botches. Note that -p
checks function-call type-consistency for only a subset
of Unix library routines, so programs should be linted
both with and without this option for best
``coverage''.
47. Yes.
April 18, 1990
- 17 -
standards given here will be checked by _l_i_n_t. In some cases
such as whether a comment is misleading or incorrect there
is little hope of mechanical checking. In other cases such
as checking that the opening brace of a function body is
alone on a line in column 1, the test has already been
added48. Future bulletins will be used to announce new
additions to _l_i_n_t as they occur.
It should be noted that the best way to use _l_i_n_t is not
as a barrier that must be overcome before official accep-
tance of a program, but rather as a tool to use whenever
major changes or additions to the code have been made. _L_i_n_t
can find obscure bugs and insure portability before problems
occur.
_1_1. _S_p_e_c_i_a_l _C_o_n_s_i_d_e_r_a_t_i_o_n_s
This section contains some miscellaneous do's and
don'ts.
+o Don't change syntax via macro substitution. It makes
the program unintelligible to all but the perpetrator.
+o There is a time and a place for embedded assignment
statements49. In some constructs there is no better
way to accomplish the results without making the code
bulkier and less readable. The _w_h_i_l_e loop in section
8.1 is one example of an appropriate place. Another is
the common code segment:
while ((c = getchar()) != EOF) {
process the character
}
Using embedded assignment statements to improve run-
time performance is also possible. However, one should
consider the tradeoff between increased speed and
decreased maintainability that results when embedded
assignments are used in artificial places. For exam-
ple, the code:
a = b + c;
d = a + r;
should not be replaced by
_________________________
48. Little of this is relevant at U of T. The version of
_l_i_n_t that we have lacks these mods.
49. The ++++++++ and -------- operators count as assignment statements.
So, for many purposes, do functions with side effects.
April 18, 1990
- 18 -
d = (a = b + c) + r;
even though the latter may save one cycle. Note that
in the long run the time difference between the two
will decrease as the optimizer gains maturity, while
the difference in ease of maintenance will increase as
the human memory of what's going on in the latter piece
of code begins to fade50.
+o There is also a time and place for the ternary ? :
operator and the binary comma operator. The logical
expression operand before the ? : should be
parenthesized:
(x >= 0) ? x : -x
Nested ? : operators can be confusing and should be
avoided if possible. There are some macros like
_g_e_t_c_h_a_r where they can be useful. The comma operator
can also be useful in _f_o_r statements to provide multi-
ple initializations or incrementations.
+o Goto statements should be used sparingly as in any
well-structured code51. The main place where they can
be usefully employed is to break out of several levels
of _s_w_i_t_c_h, _f_o_r, and _w_h_i_l_e nesting52, e.g.
for (...)
for (...) {
...
if (disaster)
goto error;
}
...
error:
clean up the mess
When a _g_o_t_o is necessary the accompanying label should
be alone on a line and tabbed one tab position to the
_________________________
50. Note also that side effects within expressions can
result in code whose semantics are compiler-dependent,
since C's order of evaluation is explicitly undefined
in most places. Compilers do differ.
51. The _c_o_n_t_i_n_u_e statement is almost as bad. _B_r_e_a_k is less
troublesome.
52. The need to do such a thing may indicate that the inner
constructs should be broken out into a separate
function, with a success/failure return code.
April 18, 1990
- 19 -
left of the associated code that follows.
+o This committee recommends that programmers not rely on
automatic beautifiers for the following reasons.
First, the main person who benefits from good program
style is the programmer himself. This is especially
true in the early design of handwritten algorithms or
pseudo-code. Automatic beautifiers can only be applied
to complete, syntactically correct programs and hence
are not available when the need for attention to white
space and indentation is greatest. It is also felt
that programmers can do a better job of making clear
the complete visual layout of a function or file, with
the normal attention to detail of a careful program-
mer53. Sloppy programmers should learn to be careful
programmers instead of relying on a beautifier to make
their code readable. Finally, it is felt that since
beautifiers are non-trivial programs that must parse
the source, the burden of maintaining them in the face
of the continuing evolution of C is not worth the bene-
fits gained by such a program.
Individual projects may wish to establish additional
standards beyond those given here. The following issues are
some of those that should be adddressed by each project pro-
gram administration group.
+o What additional naming conventions should be followed?
In particular, systematic prefix conventions for func-
tional grouping of global data and also for structure
or union member names can be useful.
+o What kind of include file organization is appropriate
for the project's particular data hierarchy?
+o What procedures should be established for reviewing
_l_i_n_t complaints? A tolerance level needs to be esta-
blished in concert with the _l_i_n_t options to prevent
unimportant complaints from hiding complaints about
real bugs or inconsistencies.
+o If a project establishes its own archive libraries, it
should plan on supplying a lint library file [2] to the
system administrators. This will allow _l_i_n_t to check
for compatible use of library functions.
_________________________
53. In other words, some of the visual layout is dictated
by intent rather than syntax. Beautifiers cannot read
minds.
April 18, 1990
- 20 -
_1_3. _C_o_n_c_l_u_s_i_o_n
A set of standards has been presented for C programming
style. One of the most important points is the proper use
of white space and comments so that the structure of the
program is evident from the layout of the code. Another
good idea to keep in mind when writing code is that it is
likely that you or someone else will be asked to modify it
or make it run on a different machine sometime in the
future.
As with any standard, it must be followed if it is to
be useful. The Indian Hill version of _l_i_n_t will enforce
those standards that are amenable to automatic checking. If
you have trouble following any of these standards don't just
ignore them. Programmers at Indian Hill should bring their
problems to the Software Development System Group (Lee
Kirchhoff, contact) in department 5522. Programmers outside
Indian Hill should contact the Processor Application Group
(Layne Cannon, contact) in department 5512 54.
_________________________
54. At U of T Zoology, it's Henry Spencer in 336B.
April 18, 1990
- 21 -
RRRReeeeffffeeeerrrreeeennnncccceeeessss
[1] B.A. Tague, "C Language Portability", Sept 22, 1977.
This document issued by department 8234 contains three
memos by R.C. Haight, A.L. Glasser, and T.L. Lyon deal-
ing with style and portability.
[2] S.C. Johnson, "Lint, a C Program Checker", Technical
Memorandum, 77-1273-14, September 16, 1977.
[3] R.W. Mitze, "The 3B/PDP-11 Swabbing Problem", Memoran-
dum for File, 1273-770907.01MF, September 14, 1977.
[4] R.A. Elliott and D.C. Pfeffer, "3B Processor Common
Diagnostic Standards- Version 1", Memorandum for File,
5514-780330.01MF, March 30, 1978.
[5] R.W. Mitze, "An Overview of C Compilation of UNIX User
Processes on the 3B", Memorandum for File, 5521-
780329.02MF, March 29, 1978.
/*
* TTTThhhheeee CCCC SSSSttttyyyylllleeee SSSSuuuummmmmmmmaaaarrrryyyy SSSShhhheeeeeeeetttt Block comment,
* by Henry Spencer, U of T Zoology describes file.
*/
#include <errno.h> Headers; don't nest.
typedef int SEQNO; /* ... */ Global definitions.
#define STREQ(a, b) (strcmp((a), (b)) == 0)
/*
* Many unnecessary braces, to show where. Functions.
*/
static int /* what is returned */ Don't default int.
bletch(a)
int a; /* ... */ Don't default int.
{
int bar; /* ... */
extern int errno; /* ..., changed here */
extern char *index();
if (foobar() != FAIL) { if (!isvalid()) {
return(OK); errno = ERANGE;
} } else {
x = &y + z->field;
while (x == (y & MASK)) { }
f += (x >= 0) ? x : -x;
} for (i = 0; i < BOUND; i++) {
/* lint -h[p]cax. */
do { }
/* Avoid nesting ?: */
} while (index(a, b) != NULL); if (STREQ(x, "foo")) {
x |= 07; /* 07 is... */
switch (...) { } else if (STREQ(x, "bar")) {
case ABC: x &= ~077; /* 077 is... */
case DEF: } else if (STREQ(x, "ugh")) {
printf("...", a, b); /* Avoid gotos */
break; } else {
case XYZ: /* and continues. */
x = y; }
/* FALLTHROUGH */
default: while ((c = getc()) != EOF)
/* Limit imbedded =s. */ ; /* NULLBODY */
break;
}
}