INTRODUCTION

The C programming language was developed at Bell Laboratories in the early seventies by Dennis Ritchie who based his work on Ken Thompson's B language. C was designed to conveniently manipulate the same kinds of object known to computer processors--bits, bytes, words, and addresses. For that reason, and because it is a structured language, it has become the language of choice for systems programming on minicomputers and microcomputers. It was originally developed for implementing the UNIX operating system and today PC/MS-DOS is largely written in C.

C has other good applications too. It is well suited to text processing, engineering, and simulation problems. Although other languages have specific features which in many cases equip them better for particular tasks (e.g., the complex numbers of FORTRAN, the matrix operations of PL/I, and the sort verb, report writer, and edited moves of COBOL), C is nevertheless becoming a very popular language for a wide range of applications, and for good reason--programmers like it.

Users of C typically cite the following reasons: (1) C programs tend to be more portable; (2) C has a rich set of expression operators, making it unnecessary to resort to assembly language except in rare cases; (3) C programs are compact, but are not necessarily cryptic; (4) C compilers usually generate efficient object code; (5) C is a relaxed language, without unnecessarily awkward syntax.

For a description of the complete C language, I refer you to the original book on the subject, The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie[10]. Although numerous other books have appeared and a standard dialect of C is emerging, this remains the primary non-vendor source on the language.

In May of 1980 Dr. Dobb's Journal ran an article entitled "A Small C Compiler for the 8080s" in which Ron Cain presented a small compiler for a subset of the C language. The most interesting feature of the compiler besides its small size was the language in which it was written--the one it compiled. It was a self-compiler! (Although this is commonplace today, it was a fairly novel idea at the time.) With a simple, one-pass algorithm, his compiler generated assembly language for the 8080 processor. Being small, however, it had its limitations. It recognized only characters, integers, and single dimension arrays of either type. The only loop controlling device was the while statement. There were no Boolean operators, so the bitwise logical operators & (AND) and | (OR) were used instead. But even with these limitations, it was a very capable language and a delight to use, especially compared to assembly language.

Recognizing the need for improvements, Ron encouraged me to produce a second version, and in December of 1982 it also appeared in Dr. Dobb's Journal. The new compiler augmented Small C with (1) code optimizing, (2) data initializing, (3) conditional compiling, (4) the extern storage class, (5) the for, do/while, switch, and goto statements, (6) combination assignment operators, (7) Boolean operators, (8) the one's complement operator, (9) block local variables, and (10) various other features. Then in 1984 Ernest Payne and I developed and published a CP/M compatible run-time library for the compiler. It consisted of over 80 functions and included most of those in the UNIX C Standard I/O Library--the ones that pertained to the CP/M environment. This became version 2.1 and the subject of The Small C Handbook.

Within a year, Russ Nelson, of Clarkson College had this compiler running under MS-DOS. And, through an agreement with them, I was able to base my own 8086 implementation on his work. Although I revised the compiler extensively, his primary contribution--the use of p-codes--has remained. The run-time library was thoroughly reworked, adapting the input/output functions to the DOS file handle facility. This is the DOS compiler that Dr. Dobb's Journal distributed as version 2.1.

On two occasions since, I revisited the compiler to overhaul its code generator. First, I replaced the long string of if...else... statements that translated p-codes to assembly code with a huge switch statement, then I replaced that with an array of pointers to assembly code strings. Translating the p-codes then became just a matter of subscripting the array with the p-codes themselves.

Finally, while sprucing up the compiler for this book, I rewrote the code optimizer from scratch. Whereas before it consisted of a string of if...else... statements that looked for specific sequences of p-codes and replaced them with others, it has now been generalized, reduced in size, and made to do more optimizing. The result is that Small C now generates code that is respectable compared to professional compilers. Another advantage is that it is now very easy to understand what the optimizer does and to add to its repertoire of tricks.

See Appendix G for a complete list of differences between versions 2.1 and 2.2.

This version of Small C (2.2) is the subject of this book. It remains a subset compiler, but is now better organized and much more efficient. I have resisted the urge to develop Small C into a full C compiler for a number of reasons. First it would take a lot of work. But, more importantly, it would move Small C out of its niche as a student compiler by obscuring its logic with additional complications. Being incomplete, Small C has plenty of room improvement. Students can experiment with it, adding missing features and improving its algorithms. Chapter 28 lists many possibilities for developing the compiler further.

As you read along, I trust that you will enjoy learning a really neat language and experience the satisfaction of learning the mysteries of compiler operation.

Go to Part 1 Return to Table of Contents