PART 1:

THE SMALL C LANGUAGE

The following chapters introduce the Small C language in a natural order. However, since each chapter deals with a narrow aspect of the language, they may also be read independently for reference purposes.

Conventions Followed in the Text

Although most of the concepts in the following chapters pertain to both Small C and full C, some relate only to Small C or only to full C. So, to avoid confusion, I refer to these languages in a systematic way. The term C, with no adjective, refers to both Small C and full C. Statements about Small C apply only to Small C, and statements about full C pertain only to the complete language.

Following customary practice, functions are distinguished from other things by suffixing their names with a set of matching parentheses. Because it follows C syntax, programmers instinctively think "it's a function" when they see this. Similarly, array names are followed by matching square brackets.

Occasionally you will see the ellipsis (...) used as an abbreviation for anything that might appear at some point. An example is if...else... which appears a few paragraphs below. Note that the ellipsis assumes special meaning in syntax definitions. Syntax definitions employ the following devices:

  1. Generic terms are italicized and begin with a capital letter. They specify the kind of item that must or may appear at some point in the syntax. Frequently two terms are combined to identify a single item.
  2. Symbols and special characters in boldface are required by the syntax. They must appear exactly as shown.
  3. The term String refers to a contiguous string of items.
  4. The term List refers to a series of items separated by commas and (possibly) white space.
  5. An ellipsis (...) means that entities of the same type may be repeated any number of times.
  6. A question mark at the end of a term identifies it as optional; it may or may not be needed.

The 8086 Family of Processors

Since an understanding of any compiled language requires some knowledge of the way data is represented in the processor that interprets the compiled programs, you will find numerous references to the 8086 processor. The Intel 8086 was the first of a family of 16 bit processors upon which IBM's personal computers, and compatible machines, are based. See Appendix B for a survey of the architecture of these processors.

A Sample Program

Before proceeding, let's get a feel for the C language by surveying a small program. Listing 1-1 is a program, called words, which takes each word from an input file and places it on a line by itself in an output file. A word, in this case, is any contiguous string of printable characters.

The first line of the program is a comment giving the name of the program and a brief description of its function. The second line instructs the compiler to include text from the file STDIO.H. The included text appears to the compiler exactly as though it had been written in place of the #include directive. The curious angle brackets enclosing STDIO.H give some compilers a clue about where to find the file (Chapter 11). The third and fourth lines define the symbols INSIDE and OUTSIDE to stand for 1 and 0 respectively. A preprocessor built into the compiler scans each line replacing all such symbols with the values they represent. Such symbols are often called macros since they stand for substitution text that may be larger and more complex than its name.

The next two lines define variables: a character named ch and an integer named where which is given the initial value zero (represented by OUTSIDE).

The procedural part of the program consists of three functions: main(), white(), and black(). Execution begins in main() which contains calls to white() and black(). The while statement in main() controls repeated execution of the if...else... statement enclosed in braces. With each repetition it calls getchar() to obtain the next character from the input file. (Although getchar() is not defined in this program, it is nevertheless available for use because it exists in a library of functions that can be linked with the program.)

The character obtained is assigned to the character variable ch. If it is not equal (!=) to the value represented by EOF (defined in STDIO.H), then the if statement is performed; otherwise, control passes through the end of main() and back to the operating system.

With each iteration, the current character is checked to see if it is a white character--space (' '), newline ('\n'), or tab ('\t'). If so, white() is called. White() then checks to see if the previous character was within a word (where equals INSIDE). If so, the current character must be the first white character following a word, therefore putchar() (another library function) is called to write a special newline character to the output file to terminate the current line and start another one. It then sets where to OUTSIDE so that no more newlines will be written for that word. When the next black (printable) character is found, a call is made to black() which writes the character to the output file and sets the variable where to INSIDE, indicating that the most recent character was within a word.

As this program executes it has the effect of squeezing all continuous runs of white characters into a single occurrence of the newline character which has the effect of a carriage return, line feed sequence.

Although this quick tour through a C program leaves many questions unanswered, it illustrates the general form of C programs, how variables and functions are defined, and how control flows through a programs. The following chapters fully explain these concepts.

Go to Chapter 1 Return to Table of Contents