C programming language coding guidelines

Eric Laroche
June 3 1998

Top
Contents
Introduction
. . . . Aim of coding
. . . . Aim of this paper
C language definition
Internationalization
. . . . Coding character set
. . . . Coding language
Identifier names
. . . . Name styles
. . . . Name restrictions
. . . . Namespace
. . . . File names
Compiler warnings
. . . . Style warnings
Helper tools
. . . . Lint
. . . . Metrics
. . . . Assertions
. . . . Runtime checkers
Code complexity
. . . . Nested expressions
. . . . Redundancy
. . . . Determinism
Modularity
. . . . Interfaces
. . . . Header files
. . . . Resources
. . . . Code order
. . . . Conditional compiling
. . . . Code nesting
Scope
. . . . Scope of functions
. . . . Scope of variables
. . . . Scope of types
. . . . Scope of macros
Error prone constructs
. . . . Explicit casts
. . . . Type size
. . . . Array size
. . . . Buffer sizes
. . . . Macro parameters
. . . . Macro side effects
. . . . Sign extension
. . . . Error checking
. . . . Sequence points
. . . . Optimizer errors
Style
. . . . Numbers
. . . . Unsigned numbers
. . . . Longs, shorts
. . . . Floats
. . . . Parameter types
. . . . Variable arguments
. . . . Portability types
. . . . Standard library
. . . . NULL macro
. . . . Register
. . . . Auto
. . . . Goto
. . . . Multiple returns
. . . . Obscurities
. . . . Topics left out
Indentation
. . . . Tabulators
. . . . Braces
. . . . Labels
. . . . Blanks
Comments
. . . . Block comments
Conclusions
Footnotes
References
Index

Introduction

This paper gives a comprehensive insight into the author's guidelines in C programming. Many aspects that are needed to define these guidelines are thoroughly discussed. The paper provides much background information needed for the decisions about do's and don't's in C coding.

The paper shows the abstractions and considerations that lead to recommendations in C programming. It is the author's opinion that all the coding recommendations can be deduced from general coding aims.

Aim of coding

Obviously one first aim of coding is to implement functionality in form of software components. First of all, the implementations should fulfill the following goals:

error free design (which is sometimes only proven in the implementation phase)
error free algorithms
error free code architecture (what is to be known as bugfree¹)

Note that software defects can be a security problem².

Then, software engineering aims are

maintainability (ease of expanding or correcting code)
reusability

The importance³ of the software engineering issues mainly depends on the overall size of a project. If you intend to expand the project or to convert parts of it into a (reusable) software library, software engineering issues should be considered from start on.

We can say that the smaller the project, the more likely it is manageable without considering software engineering issues. However, if your design and coding experience allows, I recommend not to neglect software engineering issues even with the smallest projects, since

a project may (later) expand to a much larger size
projects code fragments may become part of a software library

Aim of this paper

The intention of this paper is to summarize considerations made about fulfilling the above coding aims and making C code

less defective
more robust (against changes in code architecture)
more readable⁴ (for easier maintenance)

Apart from considerations and background information, some recommendations will be given⁵.

Design issues are not so much covered in this paper, since they are often independent of the programming language.

This paper won't compare the C programming language to other programming languages. Some aspects, like efficiency and economy of a programming language depend heavily on a programming environment's implementation⁶. Other aspects, like reliability (e.g. through pre- and postconditions) can be seen in the C language as part of the design rather than coding.

However, a few footnotes are included about better approaches of the C++ language in (not object orientated) language details.

C language definition

The C programming language definition to consider is ISO/IEC 9899:1990. Older names for this standard are ANSI-C or K&R2 (Kernighan/Ritchie, 2nd Edition 1988 [BK/DR,1988])⁷.

There is an older standard K&R1 (Kernighan/Ritchie, 1st Edition 1978), also known as pre-ANSI-C.

Use the newer standard if possible. It does rigid type checking (and implicit casting if necessary) on function arguments (provided that function prototypes and appropriate header files are used).

Tools to convert from K&R1 to K&R2 or to add function prototypes are available (e.g. protoize).

Internationalization

A few early words about internationalization should be said to make clear what internationalization is not. Internationalization does not mean that you use your native language to name identifiers and write comments.

Coding character set

You should limit yourself to the ASCII character set in C source files. The printable ASCII characters are represented by the bytes hexadecimal 20 to 7e.

Non-ASCII characters (e.g. ISO-Latin above hexadecimal 7f) are syntactically incorrect when used for identifier names (even if some compilers may accept such identifiers).

Comments, strings and character literals should not contain non-ASCII characters.

Non-ASCII characters and control characters (hexadecimal 0 to 1f, with the exception of newline, carriage return, tabulator and formfeed) may confuse editors or may even lead to unexpected results on some compilers (e.g. ^Z misinterpreted as end-of-file, most significant bit stripped on non-ASCII characters, etc.). There is no means of notation of the encoding type of a source file (unlike with mail messages).

C offers nice ASCII-notations for non-ASCII characters, e.g. '\xe8'.

Coding language

To let the sources be globally readable, the lingua franca of computer science, english, should exclusively be used. As with other parts of the C coding guidelines: the smaller the scope context, the less strict the rule.

Customizing your program's user interface to a local language (i.e. internationalization in the stricter sense) can be done with locales and e.g. the gettext() family of functions. These functions let you use compiled text databases for the different ISO language codes, while leaving your C sources still readable, i.e. (english) strings are still present in the code.

Note that internationalization covers also local formats for things like dates and money.

Identifier names

Identifier names are used to name functions, variables, types (typedefs, structs, unions, enums), type members, macros, etc.

There are of course many styles of choosing identifier names. One important rule is to be consequent about naming.

The smaller the scope of a name, the less important a good choice of the name is. E.g. it is less important to chose a speaking and somewhat systematic name for a small scope index variable⁸ than it is for an exported library function.

Using capable development environments (the ones that let you jump to definition and declaration of a function) makes it less important to express type and package membership in a function name.

The aspect of which natural language to use for names has already been covered.

Name styles

There are several methods to get readable names:

use of underscore characters⁹ (e.g. get_name)
use of upper case initials (e.g. GetName)
use of natural language context (e.g. getname)

Some software packages use a kind of type prefix (e.g. uppercase initial for functions and lowercase type indicating initials for variables). E.g.: sz (zero terminated string), n/i (integer), p (pointer), psz (zero terminated string pointer), f/b (flag, boolean), c (character), fp (file pointer), pfn (function pointer), h (handle), w (word, unsigned int), fd (file descriptor).

Package prefixes (e.g. t_ in the TLI API) will be mentioned below.

Name restrictions

Case-insensitive linkers will have problems with external identifiers that differ only in case.
There is a restriction on how many characters of an external identifier name a linker must support.¹⁰
Identifiers that begin on _ or __ are reserved for the C implementation and may not be used.
Some additional prefixes are reserved and may not be used: str (string functions, string.h), E (operating system error numbers, errno.h).

Namespace

A program's identifier namespace isn't partitioned in the C programming language, unlike in Lisp, Java¹¹ or C++ 3rd Ed. [BS,1997].

Libraries tend to include package membership information (e.g. t_open, t_bind, etc. of the TLI network interface library) to omit name collisions¹².

However, it will be impossible to find small unique package prefixes¹³.

File names

The naming of source files, header files, libraries, directories and projects is not a C programming language issue. However, some considerations are worthwhile.

Filesystems (in which the source files are stored) may or may not be case-sensitive¹⁴. In the later case, one cannot save e.g. Main.c and main.c in the same directory.

The recommended character set for file names, valid on most systems, is rather small: letters, digits, .-_ and maybe ~@+%,, reasons being

some characters are used by filesystems (/\:;)
some characters have the meaning of quotes (in scripts or shells) ("'\`)
some characters are delimiters in shells (blank, tab)
some characters are meta characters in shells (<>()[]{}$=`;&|^*?#!)
some characters depend on a non-ASCII character set

A package prefix is considerable with header files to avoid namespace problems (a bad example being external header files named error.h or types.h).

Compiler warnings

I demand that you can compile your sources without warnings at the highest compiler warning level.

Note that there may exist differences in the number of warnings issued between an non-optimizing compile (cc -O0 ...) and an optimizing compile (cc -O3 ...). Of course both compiles should be satisfied.

Further, extended syntax-check programs such as lint should be used permanently.

The time saved by using the compiler's hints is typically more than the time spent to keep the sources warning-less.

Some third party libraries or even compilation systems(!) may include header files that produce annoying warnings when compiled with the highest warning level. I recommend to encapsulate the offending interface declarations in a code layer that meets the warning-less requirement

If the source code is to be distributed and will be used on a variety of systems, it should be revised for warning-less compile on several compilers and/or computer/operating systems. You may also want to do cross-compiles.

Compiler warnings may be a hint of incompatibilities. E.g. the warning "possible bad alignment" may not be an error on CISC CPU architecture systems, but will be one on RISC systems.

Compiling C code as C++ code may reveal C/C++ incompatibilities¹⁵, specially with assignments from and to void*¹⁶.

Style warnings

Some compiler warnings reveal bad coding style rather than errors or incompatibilities. These bad styles should be avoided.

The warning "assignment in conditional expression" can be avoided by using while((p = next()) != 0) instead of while(p = next()).

The warning "conditional expression is constant" can be avoided by using for(;;) instead of while(1).

The "unreferenced parameter" warning is a tough one in the C language¹⁷. Suggestions were: casting to void (does not suppress the warning on all compilers) and self assignment (looks weird and lets me think of a "statement has no effect" warning).

Asserts (assert.h) may produce "statement has no effect" warnings in the release version. This one seems to be tough.

The "comparison of signed and unsigned values" warning appears if signed and unsigned values are mixed in a less/greater expression. One approach is not to use unsigned values at all. strlen() may unfortunately require an int cast.

Helper tools

You should employ helper tools (such as lint) as much as possible to enhance your code. A further (however not automated) tool can be seen in code reviews by humans.

Lint

lint is a traditional Unix development tool, originally designed for K&R1.

It allows

to catch some of K&R1 pitfalls (insufficient type checking, missing declarations)
to check inter-module issues (inconsistent interface declarations, e.g. char p[] vs. extern char* p)
to find globally unused functions and variables
to find dead (unreachable) code (which is usually an error)
to check for false or missing arguments to printf() or scanf()
to check static array's bounds
to catch inconsistent function use (e.g. ignoring return values sometimes)

Metrics

Metrics are a means to quantify some code characteristics. These quantifications can be used to achieve a desired level of commenting or modularization, etc.

Simple metrics count e.g. the

number of source code lines
number of lines of code (LOC, leaving out empty lines and comments)
number of comments¹⁸
number of statements
ratio of comments per statement

One can do statistics on the distribution of these values over modules or projects, or check if the values are in certain ranges.

Advanced metrics can count function points, express code complexity (e.g. statements per function, nesting, number of cross-references, etc.).

Even more advanced metrics may try to track changes in micro architecture¹⁹, changes in interfaces and the like, as the project development goes on. This may be accomplished together with configuration management tools.

Assertions

An alternative to runtime checker applications are Assertions. They allow to check preconditions and postconditions of functions (and other constructs) at runtime.

Assertions can be used to check the internal program logic and a correct program flow.

Drawbacks are:

They're expensive to code.
Their use might not be systematic (because manually created).
The code may get harder to read.

Release versions typically do not contain the assertions. Beta versions may or may not contain the assertions.

Runtime checkers

Applications are available that instrumentate code with runtime checks (for array boundaries, dynamic data boundaries, missing memory deallocation, function/system calls with bad arguments, etc.).

The use of such tools may be time consuming since the program may run factors slower. The tool may not be able to distinguish between e.g. memory leaks (bad) and reusable buffers that won't be freed (a valid design decision).

Samples are: purify, electric fence (dynamic memory checks only).

These applications may be used in automated testing suite.

Code complexity

One of the aims of coding guidelines may be to keep code complexity as low as possible (complex meaning: hard to read, error prone).

Nested expressions

C is an orthogonal language. It allows to chain and nest expressions. This evokes the following problems:

Code can become unreadable.
Complicated expressions may require a operator precedence lookup, based on the assumption that one does not have all of the precedences memorized²⁰.
Source code debugging doesn't show intermediate data.
Expressions need to be wrapped to a second line or the line will be extraordinary long.

Most of these points make code harder to read and debug.

You should not avoid using temporary variables in order to enhance performance. Temporary variables can be optimized away to lead to the same code that was expected from the complicated expression (if the compiler is powerful enough).

One drawback of nested expression was the warning "assignment in conditional expression" with if(p = malloc(size)). The warning can be avoided by using parentheses or by separating statements: p = malloc(size); if(p). This may additionally make debugging easier.

Redundancy

Code fragments should not be repeated.

Redundant code is harder to maintain and increases the probability of introducing defects. Code with many redundancies is harder to read.

You might consider implementing a more general function instead²¹.

If performance is the problem, use macros or an optimizing compiler that inlines the functions (preferred).

Determinism

Write deterministic code. Stack variables and buffers should be initialized.

Searching for bugs that show non-deterministic symptoms is an unpleasant task at best.

You may consider to assign a freed pointer 0²².

Modularity

Modularity is an important key for code maintainability and handling complexity. If you introduce a module abstraction layer on top of the functions, you can decrease problem complexity tremendously²³.

Interfaces

Interfaces are the declarations used by implementation and referrers. They consist of type definitions, function declarations (prototypes) and maybe global data declarations²⁴ and are located in header files.

No extern declaration must appear in C files. They belong in commonly included header files.

I tend to design one header file for each C source file. If several modules are linked together to a library, a new external header file may be created that declares all extern functions. Alternatively, all of the extern functionality can be encapsulated in one source file²⁵. Its according header then becomes the external header.

There are very few reasons to include static definitions in header files.

There is in my opinion no reason to include static function declarations in header files. They belong in the source file that implements the functions.

Header files

Header files that are intern to a software component should be included with #include "header.h", header files that are extern to the component should be included with #include <header.h>. I tend to prefer the extern include, if a header file can be seen to declare stuff that's independent of the component, despite the fact that usually -I. compiler options are necessary to indicate where to find the header files.

Header files must be included by both the implementing module and the calling module(s). Note that the inclusion in the implementing module is not enforced by the compiler and the linker will link independently of actual function parameter types²⁶.

As mentioned above, it's wrong to declare static functions in a header file or to not define functions as static that are only used in one module.

Header files should use include guards to enable²⁷ (intended or unintended) multiple inclusion²⁸:

#ifndef header_h

#define header_h

#endif

Header files must not be included by an absolute path (e.g. /usr/include/header.h), paths can be specified in compiler options (-L/usr/include). Using relative paths is correct (e.g. sys/header.h), however using .. seems confusing (e.g. ../header.h).

Never use the extern keyword in source files, use it only in header files.

Resources

Resources are data-only modules. Samples are X11 bitmap files.

Resource data should carry the const modifier, so it can possibly be put into a read-only data section²⁹. This allows instances to share the resource memory and the operating system does not need to allocate virtual memory paging space for it.

Code order

There are two traditional styles of code order (i.e. order of functions in a source file). Pascal style defines a function before it is referenced. C style defines a function after it is referenced.³⁰

There are some aspects to consider about the order of the code in a source file:

Using Pascal style makes it unnecessary to track function interface changes in the prototypes.
A compiler may chose only to inline functions on high optimization levels and if they have been defined before.
C style sources seem to be easier to read since going from the start of the file to the end is much more like the actual program flow compared to Pascal style.

Error handling should typically be done without delay, e.g.

fp = fopen(file, "r");

if(fp == 0) return -1;

process(fp);

is in my opinion preferable to

fp = fopen(file, "r");

if(fp != 0) process(fp);

else return -1;

It is more readable since it doesn't add one layer of indentation per condition and the 'process' part above may be large and tear the blocks quite apart.

Conditional compiling

The opinions about conditional compiling differ. Programmers either

don't use conditional compiling at all
conditionally include only a few macros and include directives
conditionally include large parts of code

due to the fact that conditional compiling seems hard to read to some.

Nested conditional compiles seem to confuse further.

However, suitable editors can display uncompiled parts of the code in comment font type³¹ and even fold it, to make sources more readable. There are tools available to remove the unused conditional parts³².

Alternatives to conditional compiling are including one of different header files (with the -I compiler option) or linking one of different libraries (with the -L compiler option) or both. In development, you may chose to create a set of subprojects and include and link one of them.

Conditional use of parts of code, header files, libraries or subprojects is often used to encapsulate platform specifics.

Code nesting

Block constructs should not be nested to deeply. Nested loops tend to get hard to read.

You may want to design a separate local function that implements the inner part of a loop, to avoid deep nesting. This may make it easier for a compiler to optimize the code³³.

I would say that no more than two levels of nesting should be used.

Designing small functions and using return on error cases instead of adding a layer of if/else further reduces nesting and makes code more readable (in my opinion).

Scope

The scope of an identifier (function, type, etc.) is the part of the code in which the identifier can be referenced. The C programming language offers

application scope
file scope
function scope
block scope

(ordered from broad to narrow).

Choosing the scope of a function, variable or type is one of the most important micro architecture instruments.

Generally, scope should be chosen as narrow as possible.

Scope of functions

Functions have file scope (static) or application scope (not static). Limit a function to file scope if possible. This affects the modularization of a software component (i.e. the way functions are grouped together to source files).

A narrow function scope encapsulates code.³⁴

Scope of variables

Variables can have all four kinds of scope: application scope, file scope, function scope, block scope. The variable scope should be as small as possible.³⁵ The opinions about using variable declarations in block scope differ.

Typically, there is no overhead (stack pointer operations) involved with block scope variables, space for the deepest possible block allocation gets reserved at function entry.

Application scope leads to global variables, which generally should be avoided. Use functions to access the data (getX() and setX()) instead³⁶.

Application scope or file scope (as well as static data in functions) lets the functions that access the data only be useable by a single thread³⁷.

A reference to a calling functions buffer should be used instead of global or static data to avoid multithreading problems and buffer overwrite problems (e.g. as with localtime()).

Scope of types

Types should be limited to file scope or application scope in my opinion, since it is a rare case that only one method (function) acts on a type (that is then in function scope).

Don't redefine types in file scope, use common header files instead.

Scope of macros

Macros have file scope by nature.

However, macro encapsulating code fragments as

#undef m

#define m

#undef m

have been seen. They allow to hide macros from the rest of the source file. However, the general problems with macro names remain.

Limiting macro scope is a bit obscure.

Error prone constructs

C keeps, unlike some other languages, some error prone constructs ready.

As a sample, Java doesn't pose problems with explicit casts, array sizes, buffer sizes, macros and less problems with error checking (through the use of exceptions). Lisp e.g. doesn't pose problems with number ranges.

Explicit casts

Compilers do not generate errors or warnings on semantically false explicit casts. The explicit casts are accepted as is³⁸.

Use explicit casts as rarely as possible³⁹. It's good to think whether an explicit casts is necessary and what the compiler will do with it.

The C language allows implicit conversions from T* to void* and vice versa⁴⁰. There is no explicit cast needed in C to convert from void*. malloc() is a sample of an often used function that returns void*.

Type size

Know what your integer size is. Is it 16 bit, 32 bit, 64 bit, 128 bit?

int size limits the range of numbers you can use⁴¹. Check if e.g. 31 bit (signed int) is enough for your problem domain.

Array size

There are several methods to refer to an array size (sample array being int a[32];):

32
SIZE (a macro used for the array definition)
sizeof(a)/sizeof(*a)

The first one is the worst, it will be invalid if the array definition is changed without tracking the other occurrences of the size. The second one is better, but the third using sizeof is the preferred one.

Buffer sizes

It is strongly discouraged to implement or use functions that require a buffer as an argument, without also requiring the buffer size.

This rule should be strictly followed if the input size is an external (and hence uncontrolled) property (e.g. a line length with gets()).

Buffer overflows can

corrupt adjacent data
corrupt the stack frame (if on the stack)
corrupt malloc internal data (if on the heap)

The first one is hard to find because it can subtly change the program logic. The last one is also hard to find, since the program often crashes at some later point in a call to malloc() or free(). In that case often only a malloc debug package helps.

A sample of a corrupted stack frame is a program that crashed (on a little endian system), leaving a core⁴². Because of the overwritten stack, the debugger that was used to examine the core (sdb) was unable to display a backtrace and just displayed the message "cannot get_text from 0x63697245", which confused on the first look, but was a good hint that upon returning from the corrupting function, the program tried to jump to nil.

One problem is to guess how much buffer size a sprintf() will require. However, sprintf() allows to specify maximal lengths of spliced fields to limit the output string size (e.g. sprintf(buf, "...%.*s...", sizeof(buf)-1-..., p).

Note that buffer overflows are security problems. Overwriting stack based buffers (while knowing the affected program and the system it runs on very well) can be used to insert manipulated function caller addresses and hence execute malicious code⁴³.

Macro parameters

Macro parameters must be protected to ensure operator precedence.

#define sqr(x) x*x

sqr(a+b)

will have unwanted side effects, where

#define sqr(x) ((x)*(x))

sqr(a+b)

will have less side effects. Note the protection of the parameters and the result.

Macro side effects

Side effects are inevitable if macro parameters appear more than once, e.g.

#define max(a,b) ((a)>(b)?(a):(b))

k = max(i++,j++);

The task of a macro can be implemented in a function, if not in a performance critical part of the code. Looking at the compiler output will show whether function inlining (as compiler optimization step) produces the same result that would be expected from macro expansion.

Sign extension

Sign extension must be considered if a signed char is converted to a short or long or a signed short is converted to a long.

A typical sample is the sign extension from character to integer:

char* p;

printf("0x%02x", *p);

may rather print 0xffffffe4 than an intended 0xe4. The character needs a cast to unsigned before the conversion:

printf("0x%02x", (unsigned char)*p);

Error checking

Missing error checks may lead to bugs.

Lint's warning "return value sometimes ignored" may help to identify offending code locations.

A classical C language programming error is not to check malloc() for nullpointer return⁴⁴.

Sequence points

A statement such as

*p++ = *p++ = 0;

invokes undefined behavior, i.e. is either

*p = 0;

p++;

*p = 0;

p++;

(intended) or

*p = 0;

p++;

(not intended) or something worse (even less intended), because C does not define a sequence point between the assignments. Note that the first step will (probably) be an assignment and the last an increment, but the order in between is not determined.

It would be nice if undefined behavior through missing sequence point definition was generally diagnosed by compilers⁴⁵.

Optimizer errors

Sometimes hard-to-track errors origin from errors in the compiler optimizer step. The optimizer may e.g. look at a variable as invariant and produce erroneous code.

There are two considerations:

Do most of the development without optimization.
Look at the assembly output if you suspect errors.

When (and if) switching to optimized release code, test cases must be run to check integrity.

Style

Numbers

Numbers (numeric constants) should not appear in the code. Explicitly used numbers should be limited to 0, 1 and -1.

Define the numbers as constants, macros⁴⁶ or enums outside the functions.

They should especially not appear in the code if they're meaningful for limitations or performance of an algorithm (e.g. if they limit some input size).

Counterexamples are

buffers that starts at some size and increase if needed
values that are encapsulated deep in some implementation of an algorithm

Use as few hardcoded values as possible. Don't use static sized tables of data, since they are almost never appropriate.

Don't generate any hidden dependencies among constants. Define constants by means of the constants they derive from.

Numeric constants are hard to understand if they're at the same time not commented and not composed of other named constants. Compilers are quite able to do arithmetics at compile time, use them.

Sample: if you need a buffer to hold a string representation of an integer, define its size in terms of INT_MAX or sizeof(int). E.g.: sizeof(int)*5/2+3⁴⁷ (assuming 8 bits per byte).

Unsigned numbers

You may consider not to use unsigned values at all in application programming⁴⁸.

Typically you will only gain one of 32 or 64 bits, which can often be neglected. Again: know your problem domain. If you need more than 31 bits in an application, you may want to switch to 63 bits or bignums.

The C language will also not indicate an exception if you subtract a larger unsigned number from a smaller unsigned number, so you can't make your programs more robust by means of using unsigned values.

Using signed and unsigned values leads to ambiguities when comparing or adding them.

Longs, shorts

Using longs is an issue on 16 bit systems (either if you develop for 16 bit systems directly or plan to port your products to them sometimes)⁴⁹.

Traditionally, long and short (or unsigned long, unsigned short) were used (together with htonl(), ntohl(), htons(), ntohs()) in implementing low-level network protocols, such as UDP-based application protocols⁵⁰. The assumption was that C implementations define a long to be exactly 32 bits, which is however not defined by the C language standard.

Use ASCII representations of numbers, when you write them to file or network, in order to be system architecture independent (size, byteorder, padding)⁵¹.

Besides using htonl() and ASCII, there exist some architecture independent data representation libraries like XDR⁵². However, ASCII representations seem easier to debug, because human readable.

Using shorts may save significant space in large arrays. However, if the problem domain changes, shorts may become too small. Conversions from shorts to ints and back may also bring some computational overhead.

Know also that unexpected alignments may occur if you mix shorts and longs. Sample: struct {short a; long b;}; will most probably be eight bytes of size, not six⁵³.

Floats

Avoid floating point numbers (double, float) if possible.

Reasons being

Integers may be more adapted to the discrete nature of a problem.
Integer arithmetics are faster than float arithmetics (if that matters).
Not using floating point numbers results in smaller executables on systems that require floating point handling routines⁵⁴ and link them statically.

Many problems are solvable without using floats. E.g. a typical hashtable high-water-mark of 0.75 may be expressed by a ratio and handled by integer arithmetics: if(4*items > 3*size) ...

Avoid single precision float. Use double.

If you have to deal with single precision floats on file, then encapsulate the code that deserializes (reads them back).

If space counts, you may consider to use normalized numbers, that are adapted to the problem domain (e.g. shorts signifying 1000th).

Parameter types

Express arrays as pointers in the function parameters.

Use int main(int argc, char** argv) instead of int main(int argc, char* argv[]). The internal semantics of a parameter are that of a variable declared as char** argv, not char* argv[].

Variable arguments

Variable argument functions⁵⁵ don't let the compiler check number and type of the arguments. For this reason you may chose to use them rarely.

Some compilers (and e.g. lint) warn of wrong arguments supplied to the variable argument function families printf() and scanf(), which are part of the standard C library.

Portability types

If a simple type (e.g. some kind of identification number) is supposed to change sometimes (e.g. from short to long), then introduce a type synonym for this type using typedef.

Standard library

Use the standard library functions where possible. They are portable and usually optimized.

Some standard library functions might even get inline expanded (memcpy()), so there's probably no performance problem.

You should use stuff that's offered. E.g. strerror() will tell a lot about the origin of an error reported by the operating system. Not using it will leave the user and support group clueless.

Don't use gets() and the scanf() family for safety (buffer overflow crashes or program corruption) and security reasons (buffer overflow exploits). Use fgets() respectively fgets();strtok();atoi(); etc. instead.

NULL macro

Nullpointer comparisons can be expressed by

if(!p) ...
if(p == 0) ...
if(p == NULL) ...

all three being perfectly valid in C⁵⁶.

Register

Don't use register.

One could assume that compilers know the CPU registers better than the C programmer does, since they are the interfaces to the register-using assembly languages.

Also, compilers are free to ignore the register keyword (and often will⁵⁷).

Auto

Don't use auto.

Rarely, auto was used to emphasize that a function variable needs explicitly to be automatic. E.g. in a recursive function in which some variables may be modified to be static (to save stack space). However, the latter is bad practice since it is not multithreading save⁵⁸.

Goto

Don't use goto. Gotos lead to a confusing program flow.

Most of the control flow problems can be solved by using additional layers of local functions (that need not imply overhead). Use return to jump out of them. Introducing function layers may enhance modularity and code encapsulation.

Appreciate also break and continue instead of goto.

Multiple returns

The opinions differ about using multiple return statements in a function.

I see multiple returns as a good micro design construct. They allow function code to be less deeply nested.

Obscurities

Avoid the use of the logic operators && and || as standalone statements.

Don't use

f() && g();

instead of

if(f()) g();

Don't overuse the comma operator.

Topics left out

The style topics intentionally left out in this paper are

the pros and cons of the ternary operator ?:⁵⁹
the pros and cons of polymorphic function arguments (e.g. declaring a parameter as a long and casting all possible arguments to it)
the use of a function parameter to hold separate information (e.g. putting two shorts into a long parameter)
the pros and cons of code optimization and when and where to deploy such⁶⁰
recommendations on how many lines of code per function and module⁶¹

Indentation

Personally I consider indentation style not so important, since

filters can be used to adjust source code⁶² (e.g. indent)
syntax coloring editors can be used to adjust source code (e.g. xemacs)
a programmer can often easily adapt to a style

However, independently of the indent style you use,

be consequent
try not to make assumptions about other editors tabulator settings

Tabulators

Making no assumptions about tabulator settings restricts you to either use only tabulators to indent or only blanks, but not both (since the results are displayed differently⁶³). Tabulator setting independence forbids also the usage of tabulators at other places than at the left margin.

Not all editors can preserve tabs or blanks⁶⁴. In a worse case, only the indentation of changed lines in a source file is converted.

Tabulators can also become victims of branch merge tools (which are part of revision control software).

Braces

Opening and closing braces ({}) can either appear on a line of their own or on the preceding line. The closing braces being right after the last statement (Lisp style) being rarely seen.

If the opening and/or closing braces are on a line of their own, they can be adjusted to the indent level of the outer block or to that of the inner block or (halfway) in between.

The use of the above styles can differ between code (functions) and data (structs, unions, array initializations) and can differ between top level code braces (functions) and function level code braces (do, else, for, if, switch, while).

Braces may or may not be omitted in control blocks if the block covers one or zero statements⁶⁵.

Labels

Switch labels (case, default) can either appear adjusted to the outer block indent level or to the inner (with or without adding one more indent level for the code in the switch statement⁶⁶) or in between.

Goto labels can be adjusted to the left margin (i.e. top level block), one indent level less than the next statement, or on the same level as the next statement. The first two styles are more readable.

Be consequent about placing the labels.

Blanks

Only few C tokens (identifiers, operators, etc.) require a blank as delimiter between them (e.g. 'else if' and 'int i'). However, lots of blanks are typically used to make source code more readable.

Be consequent about using blanks between tokens.

Trailing blanks and tabulators change only the meaning of trailing backslashes (e.g. in C++ style singleline comments⁶⁷ where they're legal and have a semantic). Using trailing blanks and tabulators in this context is dangerous, since editors may be inaccurate in preserving them.

Comments

A programmer may decide to create only comments on lines of their own⁶⁸.

You might include some evident information in every C source file:

a short description of functionality
a revision date (maybe generated by version control)
an author
maybe a copyright
maybe the revision history (unless maybe managed by version control)

Don't comment the obvious.

Don't use comments as substitutes for speaking identifier names.

Comments can have the semantics of a tool directive (e.g. /*LINTED*/, /*ARGSUSED*/, /*NOTREACHED*/ or /*EMPTY*/ for Lint). They might or might not be used.

Block comments

As blocks represent a functional entity, they are candidates for comments. As constructs that span multiple lines, they should preferably be commented on comment-only lines.

You may consider to put an empty line between blocks to make them a bit more readable. You may generally consider consequent spacing.

In the C language (unlike the C++ language⁶⁹), a block has two parts: a variable definition part and a code part. These two parts should be separated by a blank line to make them more readable.

I would not recommend to comment the closing brace of a block. A suited editor typically lets the user jump to the corresponding opening brace. The same applies for #endif.

Conclusions

C is a programming language that enforces very few restrictions, making the need for guidelines more evident than in other programming languages.

I hope to have given a thorough insight into most aspects⁷⁰ that are essential in establishing C programming language guidelines.

Footnotes

¹ I know, there are no bugfree programs, at least if they are large enough (like sendmail), they can't be bugfree.

² sendmail again. The security problem won't be treated any more in this paper.

³ A software engineering biased quote: If it doesn't work, it can be fixed. If it can't be maintained, it's scrap. (Well, it may not work and can't be maintained either...)

⁴ Readable for a human, not for the compiler.

⁵ However, I put no recommendations where the choice is totally yours, e.g. chosing one of the much discussed indent styles.

⁶ Samples of good efficiency of other programming languages are the ability of Lisp environments to compile their functions and the possibility to post-compile Java virtual machine bytecode into native binary code.

⁷ K&R2 [BK/DR,1988] is based on the ANSI C draft X3J11/88-001 and was published before the ANSI standard ANS X3.159-1989. This standard was adopted as ISO/IEC 9899:1990 and is also known as ANSI/ISO 9899-1990. Note that ANSI and ISO require periodic reviews of their standards.

⁸ However, I find names as n or i quite speaking for local index variables. At least if the loops are kept small, as they should (for good modularity).

⁹ Underscores may not appear as such in a view that underlines identifier names (e.g. identifiers displayed as HTML hyperlinks). They appear to be blanks.

¹⁰ Considering the (historic) limitation of external identifier names to only six characters seems not feasible today. Not many good identifier names would result if you were to use only six characters or had to avoid the reuse of the same six starting characters.

¹¹ The Java namespace model is based on (possibly large) hierarchical names, proposing internet domains as a base. Packages are named like com.sun.applet. This is a thought-out approach to get unique names.

¹² One may ask if it was usual to provide only lean interfaces (only few functions) in C libraries partly because of the missing package support, and hence the possible name clashes with many names.

¹³ One of the reasons C is not specially suited for large software bases.

¹⁴ Note that I make a distinction between case-sensitive and case-preserving. The later doesn't allow you to save file names that only differ by case, although the case will be preserved.

¹⁵ If you intend to develop hybrid C/C++ code (not recommended).

¹⁶ Which will produce errors under C++. That's why casts are still seen with malloc() calls (in C/C++ hybrid code).

¹⁷ In C++, the name (but not the type) of the unused parameter can be left away. Nice solution.

¹⁸ Unlike the quantity, the quality of the comments can't easily be measured with metrics. However, it will be revealed by (human) code reviews.

¹⁹ Of course such things are more important if the software project is a large one.

²⁰ Be sure however to have an operator precedence table at hand all the time.

²¹ Although abstract and generic programming is not so much supported as it is in C++.

²² Functions may check pointers for 0 and issue understandable diagnostics (instead of crash, e.g. printf() writing <null>).

²³ The (object oriented) C++ class concept and inheritance decrease complexity even further.

²⁴ To be avoided.

²⁵ Note that if you want to link only the functions of your interface that are really used, and your linker can't throw unused functions out of an object file, you have to put each independent function in its own module and use an external header file.

²⁶ Not so in C++. What was designed to allow function overloading will also do type checks in the link phase.

²⁷ C does not allow type and macro redefinitions, even with the same definitions. Also, the include guards make it easier to allow circular header references (though not recommended).

²⁸ To provide an interface that is open to C and C++, you may also want to declare C linkage in a similar way: #ifdef __cplusplus extern "C" { #endif ... #ifdef __cplusplus } #endif.

²⁹ By the linker respectively the loader.

³⁰ These styles obviously must be mixed when using indirect recursive functions that are implemented in the same module.

³¹ The editor needs to know about the defined macros then, obviously.

³² The C preprocessor usually can't do this, since it will also expand included header files and macros.

³³ Compilers/optimizers tend to get confused if too many variables are present and give up optimization.

³⁴ The C++ language offers additional encapsulation mechanisms with its class concept. By specifying members (data or functions) to be private (or protected), C++ can limit scope to certain (class) functions.

³⁵ One may argue whether to insert generic blocks (not part of a function, loop or similar) to do additional variable scope narrowing. It can encapsulate variables in large flat legacy code, but it does not seem to be a good design decision.

³⁶ However, using explicit access functions can be seen as a design flaw because modules should provide functionality, not plain data access.

³⁷ If a monitor is used, the static data will be no problem. However, still no reference on the static data can be returned to multiple threads.

³⁸ C++ introduced two less powerful casts to defuse the situation. (One to change a const specifier only and one to cast from one pointer type to another only.)

³⁹ [BS,1997] sees casts (though in C++) (in most cases) as in indication of a design error.

⁴⁰ C++ does not. This is one of the more obvious incompatibilities between C++ and C.

⁴¹ And C comes, unlike Lisp, not with a standard bignum (only resource-limited size) numeric type.

⁴² A process image (data and stack memory) that can be used for post mortem analysis of a program crash (on Unix).

⁴³ Specially dangerous with (network) services and even more if they run with high operating system privileges.

⁴⁴ C++ handles its memory allocation (operator new) with a bad_alloc exception and needs not be handled.

⁴⁵ Lint may give you the warning "evaluation order undefined: p".

⁴⁶ Unlike C++, C does not allow const int values to be used as array sizes. A macro definition must be used instead.

⁴⁷ 5/2 as upper approximation of 8*log10(2), with additional bytes for integer division truncation, sign and string termination. Don't forget to give this constant a meaningful name.

⁴⁸ I'm not talking of implementing something like virtual memory support as part of operating system code here.

⁴⁹ However since the memory layout is probably different on a 16 bit system (can't allocate large data chunks, must partitionate them), int/long portability will not be the only problem.

⁵⁰ Of course also in operating system code; but in that context you can less say that the code should not assume of what size an int is.

⁵¹ This applies also to floating point numbers.

⁵² XDR (external data representation) e.g. used by RPC (remote procedure call) e.g. used by NFS (network file system).

⁵³ The same happens with struct {long b; short a;}, to get the longs aligned in an array of such structs. The padding will in this case be at the end of the struct and not create a gap between the struct members.

⁵⁴ These floating point handling routines may be quite large on operating systems that support processors that lack a floating point unit. Floating point logic may not be implemented on the operating system level on such systems (small embedded systems kernels being a sample).

⁵⁵ Variable arguments are discouraged by C++. The C++ alternative is to provide different, overloaded functions. Calls to printf() are replaced by calls to ostream::operator<<().

⁵⁶ C++ tends to absolutely not use the NULL macro.

⁵⁷ Compilers must ignore at least some register keywords anyway if the programmer specifies too many.

⁵⁸ Also, truly recursive functions (e.g. qsort) can be designed to descend the smaller branch(es) of the recursion tree first and take the last branch iteratively. This will lead to only small stack space requirements.

⁵⁹ Users of functional programming languages may be pro-biased on this one.

⁶⁰ However, two quotes (couldn't resist): Don't optimize early. When speed and space are an issue, style guides are thrown out the window (however, I do not totally agree with the last one).

⁶¹ I think modules should rather represent logical units than be limited in any physical form (e.g. in the number of lines of code).

⁶² These indent-style filters could even be triggered by revision control software upon check-out and check-in to let the programmer have its own style. However, this requires very consequent use of revision control tools, which may pose some unwanted overhead.

⁶³ A special annoying style is an indentation with four blanks where every pair of four blanks is converted to one tab. This looks like half of the indent levels (2, 4, ...) are left out when viewed with a tabstop setting of four (instead of eight).

⁶⁴ You may question the quality of such editors. They shouldn't be used for coding.

⁶⁵ Note that a closing brace may serve as an anchor for debuggers. You may not be able to jump out of an unbraced loop in source debugging mode.

⁶⁶ I do not recommend adding additional indent levels.

⁶⁷ Not part of the C language, but tolerated by many C compilers.

⁶⁸ C++ much more encourages mixed lines with code and comments because of its singleline comments construct.

⁶⁹ In C++ a variable definition can appear anywhere a code statement can. This allows always immediate initialization.

⁷⁰ Of course, there's always more.

References

[BK/DR,1988] The C Programming Language, B. W. Kernighan, D. M Ritchie, Prentice Hall, 2nd Ed. 1988

[BS,1997] The C++ Programming Language, B. Stroustrup, Addison-Wesley, 3rd Ed. 1997

Index

aim - algorithm * - alignment * - ansi - application scope - architecture * - array bound - array size - ascii * * - assert * - assertion - author - auto - backslash - beta version - bignum - blank * - block * * * - block scope - brace * - buffer * * * - buffer size - buffer overflow * - bug * * - case - cast * * - change * * * * - character set * - code complexity * - coding * - coding style * - comment * * * - complexity * - component * * - conditional compiling - const - constant * - copyright - cross compiler - cross reference - debugging * * - declaration * * * * - defect * - definition * * * * - dependency * - design * * - determinism * - directory - double - economy - editor * - efficiency - encapsulation * * * - enum * - error * * - error checking - error handling - exception - explicit cast - expression * - external header - external identifier - file * - file name - file scope - filesystem - float - function call - function parameter * - function point - function scope - functional entity - functionality * - global variable * - goto * - header file * * * - hidden dependency - identifier * * - identifier name - iec - ignore * - implementation * * - implicit cast - include guard - incompatibility - inconsistency - indent - indent level - indentation * - integer size - interface * - internationalization - iso - iso-latin - label - language code - language definition - leak - library * - lines of code - lint * * - locale - long * - macro * * * - main - maintainability * - memory leak - metric - micro architecture * - modularity - module * - name - name collision - namespace - native language - nesting * - network - normalized number - null - nullpointer * - number * - operator precedence * - optimization * * * - optimizer error - order * - orthogonality - overflow * - package - package prefix - performance * - platform - portability - portability type - postcondition - precedence * - precondition - prefix - preserving blanks - problem domain * - program flow * * - project * * - project size - prototype * - ratio * - readability * * * * * * - recursion - redefinition - redundancy - reference - register - resource - return - return value * - reusability - review - revision - revision date - revision history - robustness * - runtime check - scope * * - security * * - sequence point - short * - side effect - sign extension - singleline comment - sizeof - software engineering - spacing - speaking - stack frame - standard * - statement * - statistic - standard library - struct - style * * - syntax coloring - system call - tabulator - tabulator setting - testing suite - thread - token - type prefix - type size - undefined behavior - underscore - unreachable - unreferenced - unsigned * * - unused * - user interface - variable argument - warning - warning level

Keywords: C programming language coding guidelines, C coding guidelines, C coding standards, C rules, C recommendations, coding in C

Eric Laroche, laroche@lrdev.com, Wed Jun 3 1998
URL: <URL:http://www.lrdev.com/lr/c/ccgl.html>
Original URL: <URL: http://www.access.ch/lr/c/ccgl.html>