CHAPTER 16:
COMPILING SMALL C PROGRAMS
The Program Translation Process

The output of the Small C compiler is designed to be compatible with the Microsoft and IBM assemblers. As of this writing, Small C does not have an assembler of its own, although one is being developed and may even be available as you read this. The new assembler is to be fully documented in a separate book. The following comments are intended to provide helpful information about the overall program translation process, regardless of the assembler used. Refer to the assembler's documentation for details about the assembly process. Also refer to the MS-DOS documentation for specific information about the linker.

As we noted, the Small C compiler generates assembly language output. This makes compiling a Small C program a three step process. First the program is passed through the compiler to produce the assembly code (usually in a file with an ASM extension). Next, it goes through an assembler to create a linkable object file (OBJ extension). And finally, it is translated by the linker into an executable file (EXE extension). Of course the entire process may be captured in a batch file so that it becomes a single step for the user.

The entire process is illustrated if Figure 16-1 which shows the general case where a multi-part program is compiled and assembled in parts, then linked together with modules from several libraries.

Figure 16-1. Compiling Small C Programs

The ASM file produced by the compiler is an ordinary ASCII file, so it can be typed to on the screen, printed, or edited. The code generated by the compiler is plainly visible, making it easy to see what the compiler did.

The Assembler

The assembler primarily does three things. It translates the mnemonic operation codes to numeric codes that are meaningful to the CPU. It converts label references to actual addresses. And it casts the result into the OBJ file format that the linker requires. As far as we are concerned here, the assembler is just a tool that must be applied to the output of the compiler. Since the compiler never generates code that produces assembler errors, there is no need to consider error conditions. Refer to the assembler documentation for specific information about invoking the assembler.

The linker

DOS comes with a linker which is compatible with the one that comes with the Microsoft and IBM assemblers. And since the Small C assembler will use this linker, we can be more specific about this phase of the process.

As Figure 16-1 shows, the linker is the hub of the process; it brings everything together to produce the executable file. Besides concatenating together the various modules of the program, its main function is to resolve external references.

When something is declared extern in one module and is defined globally in another, the linker locates where it is defined (the entry point) and patches every reference with its actual address. This cannot be done by the assembler because when it assembles modules, it has no idea where external targets are defined or what addresses they will have when the modules are combined.

Two errors are likely to occur when linking. Some external references may not be resolvable because there is no definition in any of the modules. This condition produces a list of the offending names and the modules from which the references are made. The list is introduced by the message:

Unresolved Externals:

Also, two or more modules may contain global definitions with the same name, creating an ambiguity. The linker complains about this condition with the message:

Symbol Defined More Than Once:

This is followed by the name of an offending definition.

The linker can be pointed to libraries (LIB extension) as well as object files. A library is a file that contains a collection of object modules that are copied into it by means of a library manager. The linker is designed to search libraries for modules with entry points that match unresolved references. When such a module is found, it is copied from the library into the program. This fixes the addresses of its entry points so the linker can then resolve references to them. With this arrangement, only those modules in the library which are actually needed are added to the program.

Notice in Figure 16-1 that more than one library can be scanned. When this is done, the linker scans the libraries in the order in which they are named to the linker. Once it has moved from one library to the next, it does not go back to the previous library. This means that when a module in the second library contains external references to a module in the first library, an unresolved reference will occur. This can always be avoided by specifying the standard library (Small C's CLIB.LIB) last. Other libraries may contain special purpose functions (the windowing functions of the Small-Windows library, for instance). Such functions are likely to call the standard functions. If the standard library is processed first, then these references to standard functions cannot be resolved. Therefore it is important to always specify the standard library last.

Even within a single library the order of the modules may be such that backward references between modules exist. This does, in fact, happen. But the linker is designed to resolve all references to modules in the current library before moving to the next one.

The Compiler

The Small C compiler has the filename CC.EXE. It uses standard file redirection and command line arguments (Chapter 15) to determine where to obtain its input, where to send its output, and which run-time options to apply.

It employs a flexible algorithm for determining its input and output files. The purpose is to make it equally convenient for production compiles and special cases. By default, Small C obtains its input from stdin and sends its output to stdout. Therefore, when it is invoked by the command

it simply displays its signon message and waits for keyboard input. It's response to whatever it receives is shown immediately on the screen. This mode of operation, which makes studying the compiler's behavior very convenient, is unique to the Small C compiler. Whenever we wish to see what the compiler will generate for a given situation, all we have to do is execute it in this manner, query it with a program fragment, and observe its response. If we want a printed record of the session, we can redirect the output as with

		CC >PRN

We can also have it include the source lines with its output so it will be clear which source lines produced which assembly code. This is done with

		CC >PRN -L1

The letter L does not have to be uppercase. It is shown that way only to avoid being confused with the numeric digit one. More will be said about the listing switch below.

Of course, other possibilities exist for the default use of the standard input and output files. For instance, if we wish to see what the compiler does with an actual program, we could invoke it with

		CC <PROG.C -L1

which would compile the program to the screen together with its source code. We could then use control-S keystrokes to alternately pause and resume execution as the screen scrolls. Combining the last two examples would make the output go to the printer.

When we use the listing switch in this way, the source lines in the output are each preceded by a semicolon. The purpose is to make them appear as comments to the assembler. Therefore, mixing the source listing with the output does not create assembler errors. We might invoke the compiler with

		CC <PROG.C >PROG.ASM -L1

to create a file for the assembler--a file which could also be viewed on the screen or printed.

Using the standard files gives the compiler a great deal of flexibility. But for production compiling we should not have to enter two redirection specifications complete with filename extensions. That would be a bit unwieldy. So, in those situations where we simply want to compile a program the simplest way possible, we can enter

CC PROG

The compiler will take the non-switch argument PROG as a filename and assume that it should be used with different extensions for both input and output. It will assume an extension of C for the input and ASM for the output. So, in this case, it would input PROG.C and output PROG.ASM. No listing will be generated either on the screen, on the printer, or in the output. If more than one filename is given, the compiler will concatenate them on input and apply the first name to the output. Thus

CC PROG PROG2 PROG3

will input PROG.C, PROG2.C, then PROG3.C as though they were a single input file. Output will go to PROG.ASM.

Finally, we can use output redirection together with named files. This allows us to override the assumed output filename. For example,

		CC PROG PROG2 >PRN

would compile PROG.C and PROG2.C to the printer, and

		CC PROG PROG2 >OUT.A

would compile them to a file named OUT.A.

Note that redirection specifications do not have default filename extensions.

While the compiler is executing, it may be interrupted in either of two ways. It will pause on a control-S keystroke and continue on the next keystroke of any type. Also a control-C aborts execution with an exit code of 2.

The switches which control compiler behavior are listed below:

The -M switch lets us monitor progress by having the compiler write each function header to the screen. This also helps isolate errors to the functions containing them.
The -A switch causes the alarm to sound whenever an error is reported.
The -P switch causes the compiler to pause after reporting each error. An ENTER (carriage return) keystroke resumes execution.
The -L#, switch calls for a source listing. The # represents a single numeric digit which specifies the file descriptor of a standard file which is to receive the listing. Table 12-1 lists these values. Zero should not be used because it specifies an input file. As we saw above, 1 specifies the standard output file. In that case, since the output may also go to the same file, a semicolon precedes each line of the listing, making it appear to the assembler as a comment. Specifying -L2 sends the listing to the standard error file which is assigned to the screen and cannot be redirected.
The -NO switch specifies no optimizing. We can use this switch to see the raw, unoptimized code.
The null switch - or any unrecognizable switch causes the compiler to abort after displaying the help line:

		USAGE: CC [FILE]... [-M] [-A] [-P] [-L#] [-NO]

This little bit of assistance is usually enough to remind us of what we need to know. The brackets designate their contents as optional, and the ellipsis means that more files can be specified.

Both uppercase and lowercase letters are acceptable in filenames and switches. Table 16-1 illustrates sample commands which invoke the compiler, and describes their effects.

Table 16-1: Invoking the Compiler

Go to Chapter 17 Return to Table of Contents

CHAPTER 16: COMPILING SMALL C PROGRAMS

CHAPTER 16:
COMPILING SMALL C PROGRAMS