CHAPTER 17:

COMPILING THE COMPILER

How to Compile the Compiler

Since the Small C compiler is really just another Small C program, we follow the steps in Chapter 16 to compile it just as we would any other program. The only difference is the number of source files and their names. Whereas Figure 16-1 illustrates the general procedure, Figure 17-1 is specific to the compiler itself.

Figure 17-1. Compiling the Compiler

The compiler is organized into four parts, each of which is compiled and assembled separately. The OBJ files are then combined by the linker with modules from the library CLIB.LIB to produce the new CC.EXE file. Each source file includes two header files, STDIO.H and CC.H. In addition, the first source file includes NOTICE.H which contains the signon notice. Listing 17-1 shows a batch file that will perform the entire operation. This file uses the Microsoft assembler. It assumes that we have sufficient disk space and that the PATH environment variable is properly set so that the assembler and linker can be found by MS-DOS. If this is not so, then adjustments will have to made accordingly.

Listing 17-1: Batch File to Compile the Compiler

The four parts of the compiler are divided functionally so that changes to one part will not normally require changes to the other parts. Having compiled the entire compiler once and having kept the OBJ files, it is then only necessary to compile and assemble the parts that are actually affected by a change. Notice that the batch file in Listing 17-1 deletes the ASM files, but retains the OBJ files for just that reason. Listing 17-2 shows a batch file for recompiling just one part of the compiler.

Listing 17-2: Batch File to Recompile One Part of the Compiler

Notice that the link step is optional in case two or three parts must be recompiled. We can perform the link step only with the last part. The %1 in this file stands for the first command line argument. If the batch file were named CCC.BAT, then to invoke it for part three of the compiler we would enter

		CCC 3

General Advice

CC.H contains #define statements that pertain to the compiler as a whole. Changes to this file usually require that the entire compiler be recompiled. However, by keeping track of which symbols in CC.H have changed, we can use a text editor to perform a global search on each of the source files to determine which ones must be recompiled.

The symbol DISOPT is contained in a #define directive in CC4.C where the code generation and optimizing logic resides. Currently the directive is contained in a comment so that it will be ignored by the compiler. However, if we remove the comment delimiters and recompile CC4.C, the new compiler will list on stdout the frequencies of the optimization cases it applies. This is handy when working with the optimizer; it helps us to decide whether or not a given optimizati on is likely to be worth its cost in performance and compiler size. There is no point in having the optimizer look for cases that seldom arise.

Notice in Figure 17-1 the new executable compiler CC.EXE replaces the previous one. This has ramifications. What is the probability that the new compiler will work properly? Closer to zero, no doubt. So we have just replaced our production compiler with something containing bugs. Failure to realize this, when we recompile the compiler to fix it, leads to interesting results. The buggy compiler may run without a complaint. And the new compiler may also link properly. But when we run the new compiler, look out. Chances are that it will go berserk--if our bug caused bad code to be generated that is. If we are on our toes, we will recognize the symptoms and realize what happened. But the symptoms of many compiler bugs are not obvious, and some are not even predictable.

Of course, we have a copy of the original, unaltered compiler somewhere, do we not? Sure we do. So we can restore CC.EXE from that copy and proceed to recompile our fixed compiler properly. To avoid the loss of time, we will probably want to keep a reliable copy of the compiler handy and make sure to use it for each new compilation. Eventually, when we are convinced of the dependability of our revised compiler, we will want to begin using it as our production compiler so future versions of the compiler can incorporate new features that are supported by the revised compiler.

Now, suppose we add support for some new constructs. We place the new compiler in production status, and proceed to use the new constructs in future revisions of the compiler. By making the source code dependent on the enhanced compiler, we are committing to its reliability. If we should get down the road a way, and discover that our production compiler has problems, then falling back to the previous production compiler will not work because the old compiler will choke on the new constructs which we have put in the source files. At that point we will have to resort to some messy patching of the source files to make them acceptable to the old compiler. This will probably involve commenting out logic that implements enhancements and/or rewriting the new constructs in the old way. We must then fix the previously undiscovered bug, recompile, test, reinstall this as the production compiler, remove the temporary patches, and recompile again. Finally, we will be back where we were. The point is that time saved in testing may well be spent in even more disagreeable ways.

Conclusion

At this point, we know the Small C language, we know its repertoire of library functions, we know how to use the compiler, and we know how to compile new versions of the compiler. Now we come to the interesting part--what goes on inside the compiler.

Go to Part 3 Return to Table of Cotents