CHAPTER 11:

PREPROCESSOR DIRECTIVES

C compilers incorporate a preprocessing phase that alters the source code in various ways before passing it on for compiling. Four capabilities are provided by this facility in Small C. They are:

  1. macro processing
  2. inclusion of text from other files
  3. conditional compiling
  4. in-line assembly language
The preprocessor is controlled by directives which are not part of the C language proper. Each directive begins with a # character and is written on a line by itself. Only the preprocessor sees these directive lines since it deletes them from the code stream after processing them.

Depending on the compiler, the preprocessor may be a separate program or it may be integrated into the compiler itself. Small C has an integrated preprocessor that operates at the front end of its single pass algorithm.

Macro Processing

Directives of the form

		#define Name CharacterString?... 

define names which stand for arbitrary strings of text. After such a definition, the preprocessor replaces each occurrence of Name (except in string constants and character constants) in the source text with CharacterString?.... As Small C implements this facility, the term macro is misleading, since parameterized substitutions are not supported. That is, CharacterString?... does not change from one substitution to another according to parameters provided with Name in the source text.

Small C accepts macro definitions only at the global level. It should be obvious that the term definition, as it relates to macros, does not carry the special meaning it has with declarations (Chapters 4-6).

The Name part of a macro definition must conform to the standard C naming conventions as described in Chapter 2. CharacterString?... begins with the first printable character following Name and continues through the last printable character of the line or until a comment is reached.

If CharacterString?... is missing, occurrences of Name are simply squeezed out of the text. Name matching is based on the whole name (up to 8 characters); part of a name will not match. Thus the directive

		#define ABC 10

will change

		i = ABC;

into

		i = 10;

but it will have no effect on

		i = ABCD;

It is customary to use uppercase letters for macro names to distinguish them from variable names.

Replacement is also performed on subsequent #define directives, so that new symbols may be defined in terms of preceding ones.

The most common use of #define directives is to give meaningful names to constants; i.e., to define so called manifest constants. However, we may replace a name with anything at all, a commonly occurring expression or sequence of statements for instance. Some people are fond of writing

		#define FOREVER while(1) 

and then writing their infinite loops as

		FOREVER {...}  

Conditional Compiling

This preprocessing feature lets us designate parts of a program which may or may not be compiled depending on whether or not certain symbols have been defined. In this way it is possible to write into a program optional features which are chosen for inclusion or exclusion by simply adding or removing #define directives at the beginning of the program.

When the preprocessor encounters

		#ifdef Name 

it looks to see if the designated name has been defined. If not, it throws away the following source lines until it finds a matching

		#else 

or

		#endif 

directive. The #endif directive delimits the section of text controlled by #ifdef, and the #else directive permits us to split conditional text into true and false parts. The first part (#ifdef...#else) is compiled only if the designated name is defined, and the second (#else...#endif) only if it is not defined.

The converse of #ifdef is the

		#ifndef Name 

directive. This directive also takes matching #else and #endif directives. In this case, however, if the designated name is not defined, then the first (#ifndef...#else) or only (#ifndef...#endif) section of text is compiled; otherwise, the second (#else...#endif), if present, is compiled.

Nesting of these directives is allowed; and there is no limit on the depth of nesting. It is possible, for instance, to write something like

		#ifdef ABC
		...               /* ABC */
		#ifndef DEF
		...               /* ABC and not DEF */
		#else
		...               /* ABC and DEF */
		#endif
		...               /* ABC */
		#else
		...               /* not ABC */
		#ifdef HIJ
		...               /* not ABC but HIJ */
		#endif
		...               /* not ABC */
		#endif

where the ellipses represent conditionally compiled code, and the comments indicate the conditions under which the various sections of code are compiled.

Including Other Source Files

The preprocessor also recognizes directives to include source code from other files. The three directives

		#include "Filename"
		#include <Filename>
		#include Filename

cause a designated file to be read as input to the compiler. The preprocessor replaces these directives with the contents of the designated files. When the files are exhausted, normal processing resumes.

Filename follows the normal MS-DOS file specification format, including drive, path, filename, and extension. Full C requires either quotation marks or angle brackets around the name, but Small C is not so particular. Nevertheless, for better portability, we should write

		#include <stdio.h> 

to include the standard I/O header file (which contains standard definitions and is normally included in every C program) and

		#include "Filename" 

for other files.

Use of this directive allows us to draw upon a collection of common functions which can be included into many different programs. This reduces the amount of effort needed to develop programs and promotes uniformity among programs. However, since this method requires the recompiling of the code in each program that uses it, its use is usually limited to including header files of common macro definitions and global declarations. On the other hand, procedural modules are usually compiled separately, stored in libraries, and combined with programs at link time (Chapter 16).

Assembly Language Code

One of the main reasons for using the C language is to achieve portability. But there are occasional situations in which it is necessary to sacrifice portability in order to gain full access to the operating system or to the hardware in order to perform some interface requirement. If these instances are kept to a minimum and are not replicated in many different programs, the negative effect on portability may be acceptable.

To support this capability, Small C provides for assembly language instructions to be written into C programs anywhere a statement is valid. Since the compiler generates assembly language as output, when it encounters assembly language instructions in the input, it simply copies them directly to the output.

Two special directives delimit assembly language code. They are

		#asm 

and

		#endasm 

Everything from #asm to #endasm is assumed to be assembly language code and so is sent straight to the output of the compiler exactly as it appears in the input. Macro substitution is not performed.

Of course, to make use of this feature, we must know how the compiler uses the CPU registers, how functions are called, and how the operating system and hardware works. Small C code generation is covered in Chapter 19.

Go to Part 2 Return to Table of Contents