CHAPTER 28:
FURTHER DEVELOPMENT
Programs are seldom ever perfect, and the Small C compiler is no exception. Many of its algorithms can be improved. Much of its code could be organized better and made more efficient. No doubt there are bugs lurking in the darkness. And, of course, many features of the C language have yet to be implemented. So there is no shortage of improvements that can be made. You may even want to revise it simply for the experience.

Whatever the reasons, you may find yourself using the compiler to create new versions of itself. And so, on the chance that you will, I have included this chapter of suggestions and helpful hints. Chapter 17 explained the procedure for compiling the compiler. Here we look at some of the things you might want to do to it and how you might do them.

Before proceeding, I must confess that the suggestions which follow are untried. Had I tested them, they would no doubt be in the compiler already. In fact, I did install about a quarter of the original recommendations simply because after investigating them, going ahead with the implementation was irresistible.

Furthermore, as every programmer knows, recommending changes is a bit risky. Plans that look good at first, have a way of turning sour with implementation. It is hard to think of everything at the beginning. Some details come to light only when the changes are being made or when testing reveals them. I must admit to being especially prone to that kind of error.

So as you work through these projects, understand that you are breaking new ground. Along the way you will learn things about the compiler that I may not have prepared you for. And you will certainly be challenged to find new solutions--hopefully elegant ones. But that is what learning by experience is all about. No amount of talk can replace the lessons learned by actually doing something.

With that disclaimer, I offer the following suggestions for further development.

Test Programs

One project that should precede the others is the development of a suite of test programs for the Small C compiler. A compiler is a complex program. It is easy to mess up one part of it while working on another. So it is important to fully test each new generation of the compiler to ensure that nothing but what was supposed to change did in fact change.

The challenge here is to come up with a set of programs that fully exercises the compiler's capabilities. Every sort of normal and exceptional condition should be tested. Doing this properly requires a thorough knowledge of the compiler and a large dose of ingenuity.

The test programs should be written in such a way that it is obvious when they fail. In addition to reporting on their progress, they could also be written keep track of the results for a summary display at the end of the run. Since a large number of test cases will be tried, this would help the programmer determine quickly whether or not problems occurred.

The test programs should be augmented with additional tests whenever compiler bugs which were not previously detected are discovered.

Eliminate Quote[]

This exercise does little more than acquaint you with the process of making revisions. You might find it a good way to begin.

There is a global character array in CC1.C called quote[] which is made to look like a string containing just a quotation mark. Originally, Small C did not support the backslash-escape sequences, so this device was used instead.

Eliminate this array and replace the reference to it in string() with "\"".

Continued Character Strings

One of Small C's shortcomings is that it provides no means of continuing character strings which are too long to fit in one line. The maximum line size is 127 characters, as determined by LINEMAX and LINESIZE in CC.H. If you have a text editor that supports lines that long, you could easily exceed the width of your screen. In fact, you could recompile the compiler with larger values for these symbols and thereby handle any length line your editor will allow. But that is a poor solution since listings of your programs will (depending on the printer) either truncate or wrap around to column one (throwing off the internal line count of whatever print utility you may be using). This solution is especially bad because it denies you control of your program's appearance.

The standard C solution to this problem is to place a single backslash (\) at the physical end of the source line (without trailing white space). In that case, the Backslash and the newline are ignored and the string continues with the first character of the next line. Thus,

"This is the first part a\ nd this is the last part."

(where the leading quote is in column one) is the same as

"This is the first part and this is the last part."

Obviously, this solution is crude and gives the programmer very little control over the appearance of his program. To overcome that limitation, some implementors have deviated by having the compiler also discard leading white space on the continuation line. In that case, the line is continued with the first graphic character on the continuation line.

Current practice, however, is to have compilers concatenate into a single string any series of strings that are separated by nothing but white space (newlines included). Thus,

"Part one, " "part two, " "part three."

would be seen by the compiler as

"Part one, part two, part three."

Clearly this approach is much better than the previous two. For one thing, it gives the programmer complete control; and for another, it makes the programmer's intentions explicit. There is no guessing as to whether or not the compiler is overlooking leading white space in a continuation line.

Finally, and best of all for Small C, this solution is the easiest to implement. You may recall that the Small C preprocessor operates on a line by line basis. When it finds a string's leading quote, it searches for the trailing quote in the same line and complains if it does not find it. Making the front end of the compiler able to deal with a single string that spans source lines would be difficult.

Therefore, since it is easier and is in keeping with modern practice, I recommend the string concatenation approach. To implement it, only one small function string() should require alteration. Presently, it contains a single loop in which it obtains the characters of a string and stores them in the literal pool. On exit from the loop, it caps the string with a null terminator. Prior to entering the loop, it passes the initial value of the literal pool offset litptr back to the caller as a side effect. This will be used in locating the string in the pool when it is dumped at the end of the current function.

String concatenation can be easily added by enclosing the current loop in an outer loop. This loop might best be written as a do...while statement. The ... includes the existing loop, as well as the gch() that bypasses the trailing quote. The while condition could be the expression (match(quote)).

On completing the first string and bypassing its trailing quote, any white space (including newlines) would be skipped and the next token would be tested. If it turns out to be another quote, then it is bypassed by match() and the expression yields true which restarts the inner loop. Be sure that the passing of litptr occurs before the outer loop, and the terminating of the string with a zero byte occurs after it.

Testing this change should be extremely simple. Just execute the compiler without parameters so that it inputs from the keyboard and outputs to the screen. Then initialize some character arrays and pointers with simple and concatenated strings. After that, start a function in which you reference character strings. If the generated code is right, you are finished. Remember to try boundary conditions like empty strings, no white space between concatenated strings, and so on.

Better Code for Integer Address Differences

When the compiler subtracts one address from another, the result must be interpreted as the number of objects (not necessarily bytes) lying between the addresses. Subtracting one character pointer from another automatically yields the desired result; but subtracting one integer address from another would yield twice the desired value if it were not scaled down by a factor of two. As an example, consider

		int *ip1, *ip2;
			main() {
			ip2-ip1;
			}

The code produced by the expression is

MOV BX,_IP1 MOV AX,_IP2 SUB AX,BX SWAP12 0 -> XCHG AX,BX GETw1n 1 -> MOV AX,1 ASR12 0 -> MOV CX,AX MOV AX,BX SAR AX,CL

where the p-codes that perform the division by two are shown on the left, together with their respective values. Five assembly language instructions are used to shift the answer right by one bit to accomplish the division. This is clearly wasteful. How much better it would be to generate

MOV BX,_IP1 MOV AX,_IP2 SUB AX,BX ASR12 1 -> MOV CL,1 SAR AX,CL

instead. This improvement can be achieved easily by modifying down2() and the translation string for ASR12.

Whereas down2() now generates

SWAP12 0 GETw1n ASR12 0

The first two p-codes can be dropped if the translation string for ASR12 is changed appropriately. Currently, ASR12 translates through

\011MOV CX,AX\nMOV AX,BX\nSAR AX,CL\n

which makes no use of the p-code's value. By changing this to

		\011?MOV CL,<n>?MOV CX,AX?\n??MOV AX,BX\n?SAR AX,CL\n

the translation will take into account whether or not the p-code's value is zero. If it is, then the shift count is assumed to be in AX as usual. However, a value of one will be taken as a constant shift count, and only

MOV CL,1 SAR AX,CL

will be produced. Besides dropping the first two p-codes generated in down2(), the value that goes with ASR12 must be changed to one.

Be advised that this revision of the ASR12 p-code usage and translation string conflicts with the advice given in Long Integers below. If you intend to carry out that project, then you should simply define a new p-code for use here.

Test this with code like that above. This relatively easy project can be used as a warm up for the ones that follow.

Better Code for Logical AND and OR Operators

The code generated by the logical AND and logical OR operators can be improved somewhat. The expression

		i && j

where i and j are globals, generates

MOV AX,_I OR AX,AX JNE $+5 JMP _2 MOV AX,_J OR AX,AX JNE $+5 JMP _2 MOV AX,1 JMP _3 _2: XOR AX,AX <-- _3:

and the expression

		i || j

generates

MOV AX,_I OR AX,AX JE $+5 JMP _4 MOV AX,_J OR AX,AX JE $+5 JMP _4 XOR AX,AX <-- JMP _5 _4: MOV AX,1 _5:

Notice in each case that an unnecessary XOR AX,AX instruction is generated. When control reaches these instructions, AX is already known to be zero. These can be eliminated by revising skim() appropriately. By testing the argument tcode for EQ10f or NE10f, it can be determined which of these instructions (GETw1n, where n == 0) should be eliminated.

In the first case, the resulting code

			...
			MOV AX,1
			JMP _3		<--
		_2:
		_3			<--

can be improved further by eliminating the jump to label 3 as well as the label itself.

Likewise, in the second case

			...
			JE $+5
			JMP _4		<--
			JMP _5		<--
		_4:
			MOV AX,1
		_5:			<--

can be improved further by eliminating the two adjacent jumps to labels 4 and 5 as well as label 5. In this case, since MOV AX,1 occupies 3 bytes, just like JMP _4, JE $+5 (2 bytes) will transfer control directly to label 5, but without referring to the label. Eliminating the first of these jumps is complicated by the fact that dropout(), which is called from other contexts too, must be modified. Eliminating the unused labels is not a priority item since they have no effect on the final EXE file. They simply take up a little space in the ASM file.

Test this from the keyboard first. Then create an exhaustive test program. Remember to test all cases that call dropout(), not just the logical AND and OR. This is probably a nice semester-long project for someone who is new to Small C.

Consolidated Type Recognition

There are three places in the compiler where data declarations are parsed. Global declarations are parsed by dodeclare(), formal arguments are parsed by dofunction(), and local declarations are parsed by statement(). Each of these functions has essentially identical logic for recognizing the lead-in syntax to a declaration--the data type specification. They must each recognize

char int unsigned unsigned char unsigned int

to determine one of four data types--CHR, INT, UCHR, or UINT. This redundancy is not good, but its cost is relatively slight now since so few data types are supported. However, if other data types are added to the compiler, then the cost in code size would become less acceptable. And, of course, there is the inconvenience of having to apply essentially the same revision to three different places.

To improve this situation, I suggest a project in which you consolidate this recognition logic into a single function (say dotype() ). Have it return one of the four data types on success and zero on failure. Then call the new function from the three places in question. Notice that dodeclare() has to handle the special case of extern declarations. Is that really a problem?

Testing is simply a matter of throwing all possible syntax variations at the compiler, and verifying that it defines or declares items properly, and making sure that references to them see them properly--as integers or characters, signed or unsigned. This latter test will require devising expressions with operators that have different signed and unsigned versions.

This as basically a warm up project. If you have learned how to recompile the compiler, this should take only a day or two.

Eliminating the While Queue

You may have wondered, as you studied dowhile(), dofor(), dodo(), and doswitch(), why they each maintain a local copy of their current entry in the while queue. The answer is that it makes referencing the fields in the entry more efficient. Addwhile() might have returned a pointer to the new entry in the in the global queue. These four functions could then save it in a local pointer and make their references by subscripting that pointer. But in that case, there would be a local reference (relative to BP) for the pointer and then a subscripted reference from the pointer to the item in the queue. So while it looks wasteful, it is not really all that bad. It runs faster and saves space in the code segment, but takes three more words of space on the stack.

But a more significant problem with the present arrangement is the fact that the global while queue restricts the nesting of loops and switches to 10 levels. Why not eliminate the while queue altogether and that limitation with it? Each function that creates a new entry in the queue (the four listed above) already has its own new entry allocated locally on the stack. So, why not let the stack frames that nest with these function serve the purpose of the global queue?

The purpose of having a global queue is so that the break and continue statements can find the proper stack level, exit label, and loop label for the level of nesting at which they occur. Dobreak() calls readwhile() for a pointer to the lowest entry in the queue; and docont() calls it for the lowest level with a loop label (excludes switch entries).

This purpose can be served by having addwhile() and readwhile() work with the local wq[] arrays. Addwhile() does this already in addition to establishing an new entry in the global queue and loading it. It would simply need to have some code removed so that it only loads the local array. But readwhile() must be able to scan the queue backward. This can be accomplished by imbedding a backward pointer chain in the individual wq[] arrays of the four do...() functions.

First, in CC.H, delete the definitions for WQTABSZ, WQSIZ, and WQMAX, define WQLINK with an offset value of zero, and increase the offset values of WQSP, WQLOOP, and WQEXIT by one each.

Then, in CC1.C, delete the global declarations for *wq and *wqptr. Add a global declaration for a head of chain integer pointer. This will be zero when there are no active loops or switches, and will point to the last wq[] when there are.

Delete the statement in main() that allocates space for the while queue. Modify addwhile() to load wq[WQLINK] (the current local array) with the global pointer, and then move the address of the current wq[] to the global pointer. This will establish the chain which ends with a null link. Next modify readwhile() to follow the chain rather than stepping backward in the global array. This would mean passing it the new head of chain pointer instead of wqptr in dobreak(). Modify docont() in a similar way. Then delete delwhile() and modify the four do...() functions to replace their call to delwhile() with an assignment of their own wq[WQLINK] to the global pointer. This will shorten the chain.

The end result will be a slightly smaller compiler and one less restriction.

Be sure to think your way through this revision until you are convinced that it will work in every situation. Test this revision first by throwing various nested loops and switches at the compiler and verifying that they all continue and exit properly. Also verify that the stack gets adjusted properly. You will need to declare local variables within the nested blocks to check that out. Next compile a program, like AR.C on the distribution diskette, and verify that it runs. If there are problems in this patch, they will most certainly show up by the program going berserk. Finally, recompile the compiler as the acid test.

Nesting Include Files

As you may recall from the description of inline() in Chapter 22, only a single include file can be active at one time. That is because there is only one variable for holding the file descriptor of include files--input2. Doinclude() processes #include ... directives by simply opening the named file, assigning its file descriptor to input2, and killing the line containing the include directive. Thereafter, inline(), seeing that input2 is not equal to EOF, reads from it instead of input (the primary file's descriptor). When input2 expires, inline() resets it to EOF and reverts back to using input again.

So what happens when an included file contains an #include ... directive? That's right, input2 gets overlaid with the new include file's descriptor and the original include file is completely forgotten. In effect, two include files have been chained, just as MS-DOS batch files can be chained. In this case, however, it just happens to work that way.

Now, how can we make Small C honor #include ... directives properly? How can we make it resume with the first include file when the one it includes is finished? The two obvious approaches are (1) to replace input2 with an array of file descriptors, and (2) use recursion. Recursion is desirable because it would impose no limitation on the levels of nesting. But, as it turns out, Small C's front end is not designed for easy implementation of that idea. Therefore, I suggest the array approach.

You will need to declare a global array with as many levels as the number of active include files you wish to allow, say ten. Also declare a global subscript (say inest) into the array that designates the level of include nesting; you might initialize it to -1 so that it can be used directly as a subscript once it has been incremented. Anytime it is greater than or equal to zero, one or more include files have temporarily superseded the primary input file, and inest locates the current file descriptor.

Obviously, you will need to delete input2. Doinclude() will need revision to: (1) detect nesting overflow, (2) increment inest to the next nesting level, and (3) put the new file descriptor in the array. Inline() will have to be changed to: (1) test inest instead of input2 to determine whether or not an include file is active, (2) subscript into the array for the include file descriptor, and (3) decrement inest when the end of an include file is reached.

These changes will endow the compiler with the ability to nest include files. By now, however, it has probably occurred to you that the logic in inline() can be streamlined further by not treating the primary file and the include files differently. Why not go all the way and eliminate input too? Let the first element of the descriptor array refer to the primary file, and the higher lying elements the include files. In that case, inest would still be initialized to -1 to tell inline() that the primary file has not yet been opened. When it is greater than zero, an include file is to be used. Finally, another global variable eof can also be eliminated by letting a negative value in inest indicate the end of input text. You will need to use your text editor to search for all references to eof in the compiler and fix them up appropriately.

Study inline() and other front end functions to decide whether -1 in inest can serve both to indicate that the primary file is not open initially and to indicate the end of file condition. If not, then perhaps -2 can serve for the end of file condition.

Testing these changes should also be easy. You might try keyboard input initially, then verify with files that everything works properly. Again, try boundary conditions like null include files, primary file with nothing but an #include, and so on.

Void Argument Lists

A trend in the current movement toward a standardized C language is to use the keyword void in places where, by design, no value is to be found. This compiler already accepts void in front of function definitions to document the fact that the function is not supposed to return a value. But another use of the keyword is in the parentheses that normally contain the list of formal arguments. When that list is intentionally empty, the word void can be written to make it clear that the omission is by design.

The function to be modified for this feature is dofunction(). You will find there a loop controlled by

		while(match(")") == 0)

where the formal argument list is processed. All that is required is to make the loop conditional on not finding void. If void is found, then you must enforce the closing parenthesis. Consider using match() and need() in your new code. This project should be easy to test with keyboard input to the compiler.

Prototype Function Declarations

Another trend in modern C compilers is to have the compiler verify that the actual arguments passed to functions are of the correct type and number. Failure to properly pass arguments is probably the most frequent error committed by C programmers. So this enhancement goes a long way in helping programmers avoid wasted time.

In order to do this, of course, the compiler must know, at the point of a function's call, what arguments it requires. And, since C allows forward references, that information is not always available. In such cases, the compiler could not police the programmer's arguments.

To overcome this limitation, modern compilers support a feature called function prototyping. The idea is that a function may be declared early in the source file even though it may be defined later. The declaration would specify, in place of formal argument names, the types of the arguments. The result is a prototype (or template) of the way the function is to be called. Of course, the same device could also be used with external functions.

So, at two points the necessary information should be collected. If a prototype declaration comes first, then it specifies the number and type of the arguments. But if there is no prototype declaration, then the function's definition provides the information. If both are present, then the function's definition should be policed according to the prototype declaration. Finally, function calls are policed only if the information has been collected--whether from the prototype or the definition.

The first problem is to decide how to store the argument information. Obviously, it must either be in or tied to global symbol table entries. The problem is that, since the number of arguments is variable, a variable length list is needed. Therefore it would be best to have a pointer in the symbol table designate the head of a list of data descriptors which are stored in a buffer somewhere. The descriptors can be small non-zero numeric values, and a zero value could terminate the list. Byte strings would suffice. As it turns out, there are two unused fields in symbol table entries for functions--SIZE and OFFSET. The latter is already named appropriately for use as an offset into a buffer of argument types.

To implement the buffer, you will need to declare a global character pointer (say argtype) that points to the buffer, for which you must allocate memory in main(). In addition, you will need an integer (say argnext) to keep track of the next available position in the buffer. And, you will need a constant (say ARGTSZ) to specify the size of the buffer and designate its end. In addition, you will need a series of constants for the values that will be stored in the buffer.

At this point you must decide how strictly you will police the programmer. If you are too strict, he will not want to use the compiler. If you are too easy, you will miss true errors. For instance, if an argument is declared to be a character variable, should an integer be acceptable? After all, characters are promoted to integers when referenced, and both appear as integers when they are passed on the stack. Why shouldn't the programmer be allowed to pass integers to a function that is written for characters and vice versa? On the other hand, isn't it likely that such a mismatch is an oversight?

One way to handle such decisions is to throw them back on the programmer, as many compiler writers do. Provide a command line switch that lets the programmer override the checking, or control its severity. Personally, I never use those options. They are too much trouble to remember; I prefer to have one way of invoking the compiler which I use routinely. Therefore, I recommend making the checks fairly lenient, and flagging only those errors that would most likely cause a program to blow up. After all, even with exact matching, you will not catch all of the programmers errors. He can still pass the wrong integer as an integer argument, for instance. I suggest that you debate this issue and justify your decision.

In any case, I recommend defining the following symbolic constants

	ARG_C	1     /* character */
	ARG_CU	1     /* unsigned character */
	ARG_I	1     /* integer */
	ARG_IU	1     /* unsigned integer */
	ARG_CA	2     /* character array */
	ARG_CAU	2     /* unsigned character array */
	ARG_CP	2     /* character pointer */
	ARG_CPU	2     /* unsigned character pointer */
	ARG_IA	3     /* integer array */
	ARG_IAU	3     /* unsigned integer array */
	ARG_IP	3     /* integer pointer */
	ARG_IPU	3     /* unsigned integer pointer */
	ARG_F	4     /* function address */

where those symbols with the same values are accepted as a match. You can make the rules stricter simply by renumbering the symbols. Notice that arrays and pointers of the same type match; that is because arrays are passed as addresses on the stack and so are effectively pointers as far as the called function is concerned.

Now comes the hard part--revising the logic. To support prototype declarations, you must work on declglb(). At the point where it recognizes a function, it sets id to FUNCTION and calls need() to enforce the closing parenthesis. Between these, you must insert a loop that (1) recognizes a list of declarators without names--for example, (char, int *, unsigned, char [])--(2) deciphers them, and (3) stores one of the symbolic constants in argtype. In each case argnext must be incremented. And, at the end, a null value must be stored. Also, the initial value of argnext must be preserved so it can be fed to addsym() as the offset value (currently zero for functions). The loop must also verify that argtype does not overflow. Consider, in this loop, calling a function (say declarg() ) to perform the recognizing and storing operations for each argument. Not only will it keep declglb() cleaner, but you will have use for it shortly.

Now, you must perform a similar task in dofunction(). And here a decision must be made. Modern C compilers support a simpler way of declaring formal arguments than the method now supported by Small C. While still supporting the original syntax, they also permit functions to be written like

		func(char ch, int *ip, unsigned u, char ca[]) {
			...
			}

where each formal argument is written as a declaration. Now, we have just written a function declarg() that will recognize everything about these arguments except their names. It could be upgraded to either look for a name or not depending on an argument it receives. It could then be called from declglb() or dofunction(). Basically, this would involve adding to it the functionality found in the argument loop of dofunction() and in doargs(). To determine whether the old or new style syntax is being used, this function could return a success or failure indication. If it fails on the first argument, the old style syntax could be assumed. On success, the rest of the arguments could be parsed using the new function. I leave the details to you.

With this arrangement, the programmer would call for argument verification either by writing a prototype declaration or by using the modern syntax. In addition, however, the existing function doargs() could be extended to store information into argtype. That would cover all of the bases.

At the other extreme, the validation information could be collected only in prototype declarations. That would save a lot of revising of the compiler. It would also give the programmer a lot of control; but, it would require every validated function to have a prototype declaration. Since most C programs are written in a "forward reference" style, prototype declarations would be needed anyway, so not much additional work would be demanded of the programmer. You might first try gathering the validation data only in declglb(). Then, after you have argument validation working, decide whether also having dofunction() gather validation information is worth the trouble. Or you could do the new style of formal arguments as a separate project, and at that time gather validation information.

At this point you are ready to do the actual validating. There are two candidates for this operation. First, if a prototype declaration created a symbol table entry and collected validation information in argtype, then when the function is defined by dofunction() the compiler should complain about differences. And, of course, function calls which are handled by callfunc() should definitely verify arguments.

The verification process should simply be a matter of stepping down the function's string in argtype as each argument is parsed. In each case, if the identity and data type do not match the stored code, then error() is called to complain. If the end of the string is reached before the arguments run out, then too many arguments are being passed. Conversely, if unreferenced codes remain in the string after the arguments end, then too few arguments are being passed.

If no validation data exists for a function, then no validating should be attempted. This can be detected by making the offset in the symbol table entry zero in that case. But be sure to skip the first byte in argtype so as to guarantee that the first function will be checked. Another approach would be to let the offset be an actual address instead of an offset into the buffer. Since all addresses are guaranteed by the compiler to be non-zero, that should work fine. In addition, it would reduce subscripted references to the buffer to simple pointer references.

Finally, there is the problem of how to gracefully handle functions which take a variable number of arguments. The simplest solution is to gather validation data only in protocol declarations, then avoid writing such declarations for those functions--that is, never validate them at all. Microsoft handles this by allowing the trailing argument in a prototype declaration to be written as ... . This tells the compiler that an unspecified number of additional arguments may follow. Of course, they could not be verified.

The error messages generated by this mechanism should be treated as warnings only; that is, the error message should be generated, but the compiler should function just as though no verifying was being done. The generated code should not be affected. If argtype overflows, then after complaining once, the compiler should simply verify those functions for which data was gathered and quit gathering data for the functions that remain.

As you can see, this is a sizable revision. And, for obvious reasons, I have left a lot of the design decisions and details to you.

In my opinion, this project is a bit much for a semester. It might be broken into two parts--support for prototype declarations with argument verification of function calls and definitions, and support for the new formal argument syntax.

Backward Jumps with Block Locals

The restriction that local declarations in nested blocks cannot coexist in the same function with goto statements can be relaxed somewhat. Backward goto references can be allowed. This can be done by having addlabel() save csp in the symbol table in either the TYPE, SIZE, or CLASS field, all of which are unused by labels. To distinguish between backward and forward references, another unused field could be used as a flag. Set the flag one way when the label is added to the table by dogoto(), and the other way when the label itself is parsed by dolabel(), regardless of whether or not the entry already exists. Then, before generating JMPm, have dogoto() generate ADDSP (passing it the saved value of csp).

If csp is saved in the symbol table in both cases, then a forward reference will produce an adjustment of zero (see gen() ) which will simply be ignored when ADDSP is translated to ASCII by outcode(). Therefore, there is no need to avoid generating ADDSP for forward references.

If the reference is forward and if nogo is true (set by statement() because block locals were declared), issue an error message. In dogoto() the original error message must be eliminated, and nogo must be set only if the reference is forward. In declloc() the error message should be changed to better describe the violation.

Before installing this revision, think it through. Be sure you understand why ADDSP has to be generated, why it works with backward references, and why forward references cannot be allowed. And, as usual, convince yourself that the changes you plan will cover all the bases. This is probably a half semester project.

Word Alignment

Machines with 16 or 32 bit data buses, are able to fetch or store word length objects with a single memory access provided the objects are aligned on word boundaries. However, words with odd addresses always require two accesses in machines with 16 bit buses, and require two accesses in about half the cases in machines with 32 bit buses. Obviously, a performance bonus can be obtained in these machines by ensuring that 16 bit objects fall on word boundaries in memory. Currently, the Small C compiler makes no effort do this. Aside from declaring global pointers, integers, and integer arrays first, the programmer has no means of ensuring word alignment. He can do nothing about function arguments and locals.

Word alignment can be achieved easily by inserting an extra byte before each 16 bit object that would otherwise fall on a byte boundary. But there are three cases to consider--global objects, arguments, and local objects.

In the first case, a global integer (say glc for global location counter) can be defined which increments by the size of every global object that gets defined. Then when an integer, an integer array, or a pointer (to integer or character) is about to be defined, glc is first checked. If it is found to be odd (low bit set), a byte is defined by calling

		gen(BYTEn, 0);

first. Put this fix in declglb() before it calls initials(). Don't forget to bump glc for the filler byte too.

Arguments are passed on the stack. Since all arguments are 16 bits long the only thing to do is ensure that the stack pointer is on a word boundary before starting to evaluate the actual arguments in callfunc(). In this case, we already have a location counter, the compiler-relative stack pointer csp. If it is found to be odd on entering callfunc() then call

		gen(ADDSP, csp - 1);

Gen() takes care of adjusting csp for you.

Now, notice that the last statement in callfunc() restores the stack to its original value which it calculates as csp + nargs. That fails to take the alignment byte into consideration; so perhaps we should save csp on entry to callfunc(), then restore it from the saved value instead.

The last case deals with local declarations of integers, integer arrays, and pointers. The pertinent function is declloc(). This change is easy since the stage is already set. There is already a counter called declared which determines the stack frame offset for each local and also accumulates the total number of bytes allocated. When statement() finds its first executable statement in a block, it generates ADDSP to adjust SP for the number of bytes indicated by declared. So all that needs to be done is to find the place in declloc() after the object's identity and data type have been determined and before declared is adjusted to reflect the offset for the current object, and make an alignment adjustment if necessary. Just before

		declared += sz;

test id and type to make the decision. Then if alignment is desired and declared is odd, bump it by one. Notice that aligning locals this way only works if arguments are also aligned on the stack.

Since alignment is somewhat wasteful of space, you could define a no alignment command line switch -NA to disable this feature at the programmer's discretion. Alternatively (or in addition) you could look for a defined symbol in the program (say NOALIGN) to override alignment. This has the advantage of allowing the programmer to specify once in a program that there is to be no alignment. Different choices can be made for different programs without the programmer having to remember at compile time.

As always, before making the source changes, convince yourself that the revision will work and that all the bases are covered. You should know, for instance, why the stack is on a word boundary on entry to a function and why declared contains zero at the same point. But do not stop there. What about entry to inner blocks. Is declared zero? Is the stack on a word boundary. If not, how can you guarantee that it will be? (Hint: Look at the point where statement() uses declared.)

Testing this revision is a bit more difficult than most of the others. If your changes do not work, test programs will most likely run anyway--just slower than they should. First, use keyboard entry to catch the gross errors. Then write a small test program that exercises all three cases (globals, arguments, and locals), with and without alignment adjustments. Compile the program, inspect its ASM file, and run it to verify that it has not been corrupted. Finally, obtain a linker map of the program so you know where the globals and functions are. Verify that the globals are properly aligned. Then execute the program with DEBUG, or other debugger, and verify that references to locals generate even numbered addresses. Be sure to write code in your program that will definitely force some word length objects (in each of the three cases) to fall on an odd boundary without alignment; after all, you could walk away satisfied, having verified nothing but happy coincidences.

One final consideration should not be overlooked. The memory allocation routine _alloc() in CSYSLIB.C should be modified to ensure that each dynamically allocated block of memory begins on a double word boundary. That way, the user has the option of improving efficiency by organizing heap memory structures in the order: double words, words, bytes.

This would probably be a nice semester project.

Restarting the Optimizer Loop

In Chapter 27 it was pointed out that each time peep() applies an optimization, the list of potential optimizations is rescanned from the beginning, in an effort to improve the optimized code. It was pointed out that it would be more efficient to arrange the optimization cases so that optimized code is fully covered by optimizations that follow in the sequence, allowing the optimizing loop in dumpstage() to simply continue on with a single pass through seq[]. The justification for restarting the loop was given as ease of maintaining of the compiler's optimizer. In this project, you should remake that decision for yourself. Find out how much performance is improved by not restarting the loop and balance that against the difficulty of keeping the optimization cases in sequence so that optimizations are not lost on optimized code.

First, compile a version of the compiler with DISOPT defined in CC4.C. Then devise a test program which forces every possible optimization case. This will be used to verify that optimizations are not being missed. Also acquire some typical nontrivial programs for use in testing performance under realistic circumstances. AR.C is a good choice, you could even use the compiler itself.

Compile the first program and note which optimizations were applied. Revise the program until you have forced every one to occur. This will be a learning experience. Are there any cases that cannot be forced?

Now, time the compiles of the conventional programs. And make a record of the compiler's speed.

Next, reorder the optimization cases. This can be done either by renumbering the seq##[] arrays, or by changing the way seq[] is initialized in setseq(). Which is easier? Then, revise dumpstage() so that its inner loop is not restarted after successful optimizations. All of this can be done by working with CC4.C alone. If you have already recompiled the compiler and saved the four OBJ files, then only CC4.C needs to be recompiled and assembled.

Now compile the special test program to verify that all of the optimizations are still taking. If not, revise CC4.C again. Is it always possible find a sequence that covers every case? Could a cyclical situation develop?

Now, time the revised compiler on the conventional test programs. Is it faster? How much faster? Is it worth the trouble?

This is probably a semester-long project.

Stack Probes

One of the risks associated with CPUs that have stacks is that the stack may overrun its allotted space. Since the amount of space needed by the stack depends on the program's algorithm and the data upon which it operates, it is usually not possible to predict with accuracy exactly how much stack space is needed. Recursive algorithms are especially susceptible to erratic, data dependent stack behavior. Of course, if the stack is allowed to overflow, it will corrupt whatever lies above it (toward lower numbered addresses) in memory. And, as a result, unpredictable program behavior will likely occur.

To prevent this, some compiler writers implement a stack probe routine which, on entry to every function, checks to make sure that some reasonable amount of stack space remains. Since this necessarily involves a penalty in performance, it is implemented as a compile-time option. Typically, a program will compile with stack probes during testing, then when the programmer thinks the program is robust, he does the production compile without the safety net.

Since programs are usually compiled many times for just one production compile, stack probing should be a default which can be overridden with an -NSP switch, for example.

Two things are needed to implement stack probes--a tight little library routine for doing the checking, and a change to dofunction() to strategically place a call to the routine in each function. As it turns out, there is already such a function in the Small C library. It is called avail(). When called with a true argument, if the stack has overflowed, the program aborts with an exit code of 1. While the needed functionality is there, this routine probably carries too much baggage for something to be called with every function call.

I suggest using avail() as a pattern for another function, _probe() for instance. This function would not be passed an argument. And would not return a value. But it would check the stack and abort with an exit code of say 3 (an unused Small C disaster code); this value would distinguish stack overflows from other insufficient memory problems.

Before leaving avail(), notice how it determines the value of SP. It declares a local character then takes its address with the address operator (&). Since the character is the last thing on the stack, its address is the value of SP. Looking at this, it is apparent that a more efficient way to do this would be to use the address of the argument abort instead. That would avoid the need to adjust SP to allocate the local variable. But it would also introduce an error of four bytes in the checking since the return address and original BP are "above" (lower numbered address) it on the stack. Is that error significant? Should avail() be revised?

Notice that you could declare a formal argument for _probe() even though no actual argument is passed. After all, the only thing you want is the address of where _probe() thinks the argument is. It does not actually have to be there.

With these modifications, a lean _probe() function can be written. For the ultimate in efficiency, however, you could write _probe() in assembly language. Then you could avoid the push of BP and its restoration on exit. You could also directly refer to SP. And, calling abort() with an argument of 3 is just a matter of

MOV AX,3 PUSH AX CALL _abort

in assembly language.

Looking now at dofunction(), we must decide where to generate the call to _probe(). It could go just after generating ENTER which saves BP and then sets it for the new stack frame. At that point, if -NSP has not been specified, generate the call. Do you have a p-code for calling a function that is not in the symbol table? Should you put _probe() in the symbol table when -NSP is not given, or should you define another p-code for calling literal labels?

Note: If you should decide to define another p-code, don't forget to recompile all of the parts of the compiler that may be affected.

Finally, since Small C allocates all locals in a block at one time, wouldn't it be better to wait until the locals are declared, then probe the stack? That way account is taken of the amount of local space used by the function. If that is done, then statement() should generate the call. But the code in statement() that allocates locals, executes within very nested block. Should care be taken to ensure that _probe() is called only the first time in a function, or should you call probe every time a nested block is entered? Or perhaps, the first time for sure, then subsequently only if additional locals are being allocated? How accurately should the stack be checked anyway?

Testing this revision, could be done with an infinite recursive function that calls avail() with each nesting level, in order to report progress by displaying the amount of free memory remaining. You can define a moderate size local array to gobble up stack space in reasonable chunks. Don't forget to test your revision to ask() which handles -NSP.

If done thoughtfully, this is probably a semester project. The decisions made along the way should be justified.

Long Integers

One of the most obvious enhancements to Small C is support for additional data types. Logically, the first step in that direction is to implement long integers. This would break the ice and set the stage for other more complex data types.

This step is a very major undertaking, affecting virtually every part of the compiler--one that I shall not, therefore, describe in detail.

Before starting this project, you should do Consolidated Type Recognition (described earlier). That will create a single function dotype() for parsing data type specifications. Having done that, you will need to revise that function to accept

long long int unsigned long unsigned long int

in addition to the types it already recognizes. For these cases it should return LONG or ULONG which should be defined with

		#define LONG (BPW << 3)
		#define ULONG (BPW << 3) + 1

in CC.H. This specifies the length (4) in the upper 14 bits and differentiates between the two in the lower two bits.

Next, dodeclare(), dofunction(), and statement() must be revised to accept the new data types and properly represent them in the symbol table.

A new p-code will be needed for defining longs with the DD (define double word) assembler directive.

The primary register will need to be extended to take in DX for the high order 16 bits of long objects. It would consist of DX,AX together, in that order. Similarly, the secondary register must also be extended as CX,BX.

In memory, the low order word should be stored first (with the lowest address). If the Word Alignment project has been performed, then it must be modified to recognize long types and to provide double word alignment in their cases. The choice of double (instead of single) word alignment would not improve performance on machines with 16 bit buses, but it would on those with 32 bit buses.

Function arguments and return values must be generalized to allow the passing and returning of double word values. When passing a long argument callfunc() must generate

PUSH DX PUSH AX

instead of a single push. This can be handled easily by having PUSH1 make use of its value to differentiate between single and double word pushes. Make use of the ?...?...? device in the translation string. This suggests a parallel revision to POP2 that will produce

POP BX POP CX

Dofunction() can no longer assume that all functions return integers. It must set the TYPE field in the symbol table to whatever type specification precedes the function name. This in turn will require a rewrite of parse() which now assumes it has a function definition when the type lead-in syntax is not seen by dodeclare(). Doreturn() must be revised to return the correct type of value, and to convert the return value to the correct type if necessary.

Until now, conversion of data from one type to another has not been a problem for Small C. Now it must be done, not only here but during expression analysis when binary operators are applied by down2(). See Kernighan and Ritchie[10] for details about the rules for data conversion.

Long objects must be fetched and stored like other objects, so fetch() and store() must take them into account. New p-codes will be needed for these purposes too.

This may seem like a lot of changes, but the largest one by far is the revision to the expression analyzer. Double length constants should be supported, so is[CV] will have to become a sequence of two entries, and every reference to it must take that into account. Number() must be modified to recognize long constants and return their data types correctly.

Test() must be able to test double length expression results. This also will require additional p-codes.

A major decision regarding revisions to p-codes must be made. How should precision be reflected in p-codes? The options appear to be (1) define additional p-codes for the new double word precision, (2) associate with each p-code and additional value that tells the precision, and (3) take advantage of the unused value that many of the affected p-codes already have. Option (1) is not good because it forces additional decisions to be made in the analyzer when a specific p-code must be chosen, and it multiplies the cases that the optimizer must deal with. Option (2) is bad because it requires 50% more space for the staging buffer. And, option (3) cannot be applied uniformly because some multi-precision p-codes already use their values for other purposes.

A compromise solution might be to use the unused high order bits of the p-code itself. P-codes are treated as 16 bit values throughout the compiler, and the high order byte is always zero. That byte could be superimposed with the precision of the operation at the time the code is generated in the staging buffer. Care must be taken to avoid a conflict in the use of these same bits by the optimizer commands. This can be handled by using the high order nibble for the precision. The recognition phase of peep() could mask off those bits when it performs its matching operation. The optimizing phase must also be revised so that it will know what precision to apply to literal p-codes that it writes into the buffer.

Finally, there is the problem of how to translate p-codes according to their precision. Two approaches come to mind, (1) let the precision designate a substring (null separated) in the translation string, and (2) add the precision to the p-code proper to derive the subscript into code[]. Approach (1) has the disadvantage that translating p-codes will be slowed down by having to bypass unwanted substrings. Also, long strings will result; but, that can be dealt with easily if the Continued Character Strings revision has been installed in the compiler. Approach (2) would require that you leave gaps in the p-code numbers (as they are assigned in CC.H) to account for the higher precision cases. The nuisance of this can be reduced by separating these p-codes from the others and placing them at the end of the list with appropriate comments.

These methods of dealing with the proliferation of p-codes need more thought. But, perhaps these ideas will help guide you in the right direction. One last point to think of is that your solution should yield gracefully to additional precision data items. Having taken this step, they will certainly follow.

The analyzer will have to be scrutinized thoroughly to ensure that objects of different precision will be properly handled. Also, numeric constants larger than 32767 should be converted to double length values, instead of being designated unsigned as they are now.

It should be obvious that provision must be made to pass the data type to gen() so that it can be properly associated with the p-code. Where does the data type originate? That's right, in the symbol table where primary() finds it. Fortunately, it looks as though primary() requires little revision, if any.

The place where unmatched data types will have to be reconciled is in down2(). Before applying a binary operator, down2() must generate code to convert the lower precision data type to the larger precision one if they differ. And it must make sure that the precision is properly passed on to gen(). In planning for the generation of type conversion code, look toward the possibility of doing The Cast Operator project below.

Down2() will have to scale values added to or subtracted from long addresses by four instead of two. This implies changes to double(). It must likewise generalize the way it scales the difference of two addresses.

Adding longs can be performed by generating in-line code like

ADD AX,BX ADC DX,CX

where the second addition includes the carry from the first one. Likewise,

SUB AX,BX SBB DX,CX

will handle long subtraction.

Multiplication and division are not so easy. In these cases, you will need to write library routines which are declared external only if they are actually used. Morgan[7] gives examples of multi-precision arithmetic routines from which you might get some ideas. Morgan's routines are more generalized than you will need.

Finally, the run time library will need to be upgraded to deal with long integers. The library presently contains 12 functions that convert numbers between integer and ASCII string representations (signed decimal, unsigned decimal, hexadecimal, octal, and any base). For each of these, there should be a corresponding long integer equivalent. The new functions should be named atol(), atolb(), dtol(), ltoa(), ltoab(), ltod(), ltoo(), ltou(), ltox(), otol(), utol(), and xtol(). Having written these, they can be used in revising printf(), fprintf(), scanf(), and fscanf() to handle conversions between ASCII and long integers. This really involves only two functions, since the f...() version of these functions uses the same logic as the other version.

Having installed support for long integers, the stage will be set for other data types. At this point Small C will be very nearly a mature compiler.

Obviously, this project should not be attempted unless you already have a comprehensive understanding of all parts of the compiler. This might do for a graduate level project.

The Cast Operator

The Long Integers project put the compiler in the business of performing data conversions and set the stage for the cast (or coercion) operator. A cast is really just a data conversion on demand. The idea is that by writing a data type in parentheses before any (sub)expression, the result is "coerced" into the designated type. The hard part was done in the Long Integers project. It should only be necessary here to have level13() recognize

		( Type )

as a unary operator. It should not, however, confuse the open parenthesis with a precedence altering parenthesis. This can be verified by the presence of a valid type specification following the opening parenthesis.

Since this small project depends on the Long Integers project, it could be used to supplement that project for extra credit or to help fulfill the requirements of a graduate project.

Floating Point Numbers

This project too could be viewed as a follow-on to the Long Integers project. But this one is hefty enough to serve as a separate assignment.

The first question to decide is what internal representation will be used for floating point numbers. With Intel's introduction of the 8087 family of numeric data processors (NDPs), any floating point compiler that targets the 8086 family processors should use the NDP if one is present. And, if one is not present, then the necessary operations should be simulated in by library functions that provide comparable precision. I leave to you the task of obtaining specific information about the Intel NDPs and algorithms for manipulating floating point numbers. Suffice it here to say that doing the necessary research and writing the library routines for addition, subtraction, multiplication, and division is a significant project.

Having done that, you must then make the compiler recognize and properly handle floating point numbers. This is basically a matter of dealing with the same issues you encountered in the Long Integers project. The difference is that the major design decisions (about p-codes, etc.) have been made and a framework now exists for handling various data types. You will only be expanding on what is already in place.

Other Ideas

I could go on endlessly, describing other projects based on the Small C compiler. But at this point I will close by simply listing other ideas that come to mind.

Add support of structures and unions. For specific information on these, see Kernighan and Ritchie[10]. This, together with the preceding projects would essentially elevate Small C to full C status.
Add fseek() and ftell() to the library. There are already nonstandard versions of these functions that work with integer offsets. But after implementing longs, these standard functions should be added. Use the existing seek and tell functions as a patterns.
Enhance the Small C run-time library with sprintf() and sscanf(). These versions of printf() and scanf() work with character strings instead of files. Pattern your functions after those in a professional package like Microsoft C.
Enhance the Small C run-time library with a set of functions that give full access to the facilities of operating system. Pattern your functions after those in a professional package like Microsoft C.
Enhance the Small C run-time library with a full set of math functions. This is an appropriate project to follow the implementation of floats. Pattern your functions after those in a professional package like Microsoft C.
Upgrade the sophistication of the Small C memory management routines so as to support the deallocation of memory blocks in any sequence. Reallocate embedded blocks of free memory before enlarging the heap. Combine adjacent blocks of released memory into a single block so that future requests will more likely find embedded space.
Have the compiler put the stack in a segment of its own so that the data segment can have all 64K bytes if it needs them. This is not a difficult thing to do. The start-up routine in CALL.ASM module will have to be revised. Also, the stack probe logic (above) will have to be revised.
Provide support for multiple memory models. Read the documentation for a professional C package, like Microsoft C, and follow their terminology and segment naming conventions. This should follow the Long Integers project since that project sets the stage for handling double length pointers. Pointers of two sizes must be handled--near pointers (single word) and far pointers (double word). Having done this, you can compile Small C in a larger memory model to open it up for further development.
Provide support for arrays of pointers. This can be done by fixing the symbol table to handle the additional information. But this is really only part of the general problem of having the compiler deal with multiple declaration modifiers. It should, for instance, be able to handle pointers to pointers. To do this properly, the declaration handling logic will have to be expanded into a kind of expression analysis, and symbol table entries must be tied to variable length chains of attributes.
Revise the preprocessor to support nesting of comments. This is a relatively simple, self-contained change. Take your ideas from the method the preprocessor already uses to handle #if... nesting.
Add support for local data initializers. Instead of always waiting for the first executable instruction in a block before allocating space for locals, when an initializer is found, allocate pending locals, then evaluate the initializing expression and push the result onto the stack as the current object. Whereas global initializers must be constant expressions, local initializers are not so restricted. Originally, C did not support initializing local arrays, current compilers tend to allow it, however. Is this something you want to support? Don't forget that the element values must be pushed from right to left so the first element will have the lowest address. Also, uninitialized elements should be set to zero.
Port the compiler to other operating systems on the same family of Intel processors. UNIX, XENIX, MINIX, and ZINU are possibilities. Essentially, this should be a matter of finding a suitable assembler, and adapting the run-time library to the new operating system calls.
Port Small C to another processor. This involves all of the issues of the previous project plus having to reshape the output code to the demands of a different processor. The assignment of the primary and secondary registers to physical registers must be decided. Problems will arise from different word lengths, different storage sequence for multi-precision data items, and so on.

Go to Appendix A Return to Table of Contents

CHAPTER 28: FURTHER DEVELOPMENT

CHAPTER 28:
FURTHER DEVELOPMENT