CHAPTER 14:

COMPATIBILITY WITH FULL C

The differences between Small C and full implementations of the language are important to programmers who anticipate using a full C compiler in the future. No one wants to write programs that will be hard to convert. This upward compatibility, is virtually assured by the fact that Small C is a subset compiler. On the other hand, for the same reason, downward conversions can be expected to be difficult, or even out of the question.

This chapter presents these differences so that you will know how to write programs that easily convert to full C and so you can estimate the difficulty of porting full C programs to Small C.

The Big Differences

Of course, the major differences between Small C and full C are the features of the C language that Small C does not support. These differences have no effect on upward portability of programs, but can make downward portability either difficult or unrealistic.

Small C's most significant limitations are on the data types it supports--integers, characters, pointers, and single dimension arrays of integers or characters. Clearly, full C programs which make use of unsupported data types are not candidates for conversion to Small C.

The next most significant limitations are its lack of support for structures and unions. A structure is an collection of objects of any type; other languages often call them records. A union is an object that has multiple declarations; that is, a single piece of memory that can contain any one of several types of data at different times. These limitations are less significant than the previously mentioned ones because they impose no hard limits on the language in terms of its usefulness. The use of structures and unions is never essential, although it may seem so to programmers who depend on them. We can always declare a character array and place in it whatever kinds of data or collections of data we wish. The Small C symbol table is an example (Chapter 20). There is no fundamental limitation to this technique, although it is less convenient than we might desire.

Full C programs which use structures can be ported down to Small C, but with some effort. First, the structures must be replaced with arrays of sufficient size. Then, if mixed character and integer data are to coexist in an array, functions (like getint() and putint() in the compiler) could be written to simplify accessing the array's contents. Finally, a routine search of the program should be performed to find and convert all of the references to the redefined structures. This last step is made easier by using the search facilities of a good text editor.

Undeclared Identifiers

During expression evaluation, the Small C compiler assumes that any undeclared name is a function, and automatically declares it as such. If the reference is followed by parentheses, a call is generated; otherwise, the function's address is generated by reference to the label bearing its name. If the same function is defined later in the program, the label for the function is generated. If, on the other hand, it is not defined, then Small C automatically declares the name as an external reference, to be resolved at link time. This arrangement makes it unnecessary to declare a function before referring to it.

Some full C compilers assume this only if the name is written as a function call (with parentheses). Others gripe even in those cases, and require that we either avoid forward references or predeclare functions that are defined later in the program.

How we handle this difference is largely a matter of preference. We might wish to postpone predeclaring function names until we actually convert to full C. Then, after the new compiler complains, insert the necessary function declarations at the front of the source file. That way we let the compiler hunt down the problem function names for us.

Function Names as Arguments

Small C accepts

		int arg

to declare a formal argument which points to a function, and

		arg (...)

to call the function. This is because Small C is not particular about what sort of expression precedes the parentheses that specify a function call. It simply evaluates the expression and uses the result as the offset in the code segment to the desired function.

Since function addresses are actually passed as pointers, a better syntax, which is compatible with full C, is:

		int (*arg)()

and

		(*arg)(...)

respectively. This syntax should be used to maintain compatibility with other compilers.

Indirect Function Calls

As we saw above, any expression followed by parentheses is taken by Small C as a function call; whereas, full C accepts only primary expressions based on a function name--like (*func) or (*fa[x]). Small C is not so particular, however. For instance, it will accept

		ia[x](...)

to call a function whose address is found in element x of an integer array ia. This would not be accepted by a full C compiler because ia[x], not being in parentheses, is not a primary expression and does not yield a function type. Small C will also accept this call rewritten as

		(*ia[x])()

This is as far as Small C can go since it does not know about pointer arrays. By writing the call this way, at conversion time it will only be necessary to change the declaration of the array when the program is ported to full C.

Argument Passing

Small C differs from full C compilers in the way arguments are passed to functions. It pushes them on the stack from left to right as they appear in the source code. Whereas, full C compilers push them in the opposite direction. Under most circumstances this makes no difference. But, as it turns out, full C compilers have a good reason for doing it "backwards."

The functions printf(), scanf(), and their derivatives are written to accept any number of arguments. By passing arguments in the reverse direction, full C compilers, guarantee that the first argument will always in a predictable position in the called function's stack frame--immediately beneath the return address. By having that argument indicate how many other arguments are being passed, the function will know how many arguments to process and where to find them--immediately beneath the first argument. (Refer to these functions in Chapter 12 to see how the first argument does this.) Of course, Ken Thompson could have designed the first C compiler to push arguments in the "obvious" order and had these functions accept the control argument in the right most position; but that would have seemed unnatural to programmers, so he did the right thing by keeping the language natural and hiding the complexities within the compiler. When Ron Cain wrote the first Small C compiler, his run-time library did not include these functions and he did not envision it ever growing to that point; so compatibility in argument passing was not an objective.

At any rate, when I installed printf() and scanf() in the Small C library, I chose to leave the argument passing algorithm unchanged and have the compiler pass an argument to the called function so it could locate the left-most argument. This is done in CL, the lower half of the CX register. The count can be obtained by calling the built in function CCARGC() early in the called function--before CL gets changed. (See printf() and scanf() in the library (Appendix D) for examples of using CCARGC(). These are found in files FPRINTF.C and FSCANF.C respectively.)

So, what does all of this mean to the programmer? It means that except for one consideration he can write portable programs even though they call functions that take a variable number of arguments. The exception is the order in which the arguments are evaluated. Since they are passed from left to right, they are also evaluated in that order. This means that argument expressions may affect the value of arguments to their right by performing assignments, increments, or decrements. Whereas, most full C compilers would have the left-most arguments affected by those on their right. The best thing to do here is avoid writing function arguments with values that depend on the order in which arguments are evaluated. Also, when porting programs down to Small C, be on the lookout for this very subtle problem. We can always break down such argument expressions so that the assignments are performed by expressions which precede the function call. This problem was discussed with examples under Argument Passing in Chapter 8.

Another difference surfaces when we wish to write a function that takes a variable number of arguments. With Small C, we must either place the control argument last, or call CCARGC() to locate it. In either case, the logic will have to be revised whenever the program is ported to another compiler.

Returned Values

Small C functions return only integer data types; whereas, full C compilers support functions that return any data type. Although this may seem restrictive, that is not the case.

For one thing, since characters are automatically promoted to integers wherever they appear in expressions, there is no practical difference between a function that returns an integer and one that returns a character that gets converted to an integer. There is no limit on the type of value a Small C function might return. So a function might in fact return a character with a statement like

		return (ch);

In such a case, the character gets promoted to an integer when the return expression is evaluated, rather than at the point of the call. Whatever the return expression yields, it will be a 16 bit value. The compiler doesn't really care whether it is a character, an integer, or a pointer. However, if the value of a function enters into operations with other operands, Small C will consider it to be an integer, whereas full C compilers take returned values to be of the declared type.

Evaluation of Assignment Operands

Small C evaluates the left side of assignment operators before evaluating the right side. This means that variables (like subscripts) used in determining the destination of assigned values are not affected by the right side of the expression. Many full C compilers, however, evaluate the right side first, thereby allowing the right side to influence the destination.

We should avoid writing expressions in which assignments, increments, decrements, or function calls on the right of an assignment operator affect objects that are used in determining the destination of the assignment.

Because they have been conditioned by other languages, most programmers would tend not to write expressions that violate this rule anyway. But it is possible in the C language, and the problems it can produce are particularly devious.

Octal Escape Sequences

Small C allows only the digits 0-7 in an octal escape sequence such as '\127'. Although this is consistent with Microsoft C and Turbo C, some full C compilers also accept the digits 8 and 9 to which they give the octal values 10 and 11. Since Small C is more restrictive, this difference presents no upward portability problems. There could only be a problem when converting programs written for full C compilers that accept this strange notation. And, even then, we would probably never see an octal number written in this manner.

Promoting Characters to Integers

Whenever we port a program from one C compiler to another we must determine how the two compilers promote character variables to integers--with or without sign extension. A difference here can create problems that are very hard to debug. Fortunately, most C compilers, Small C included, do this the same way--unless specified otherwise, they treat characters as signed values.

Syntax of the #include Directive

Small C #include directives do not require quotation marks or angle brackets around the filename as full C does. Most full C compilers accept the angle brackets as an indication that the file is to be sought in a specific subdirectory (e.g., \include). Thus we usually see

		#include <stdio.h>

to include the standard I/O header file. On the other hand

		#include "Filename"

simply tells the compiler to look in the default directory. Small C accepts both forms, but treats them the same; it always looks in the default directory. For upward compatibility, we should always enclose stdio.h in angle brackets and other files in quotation marks.

Old Style Assignment Operators

As with most modern C compilers, Small C does not recognize the original style of assignment operator in which the equal sign was written as a prefix rather than a suffix. Therefore, sequences like =* and =& are taken as a pair of operators instead of a single assignment operator.

If there is a chance that we may port our programs to a compiler that accepts the old style assignment operators, we should write sequences like these with white space between the operators to avoid the ambiguity. We would probably do this anyway, simply as a matter of good programming style.

File Descriptors

All Small C I/O functions use small integer values called file descriptors to identify files, whereas full C compilers use both pointers and file descriptors, depending on whether high-level or low-level functions are being used. This difference has no consequences as far as the logic of our programs is concerned, since file pointers and descriptors are normally used only to hold a value returned by an open function so that it can be passed to other functions. We might find a need to compare two file descriptors or pointers for equality or inequality, but here again there is no problem since we are comparing either values returned by an open function or a defined value like stdin. We do not really care what the actual values are or whether the variables are pointers or integers.

Full C compilers define in STDIO.H a file control structure called FILE. To declare a file pointer, the programmer writes something like

		FILE *fp;

Although Small C does not utilize file pointers, it supports the writing of this standard syntax by means of a trick. In its STDIO.H, Small C has

		#define FILE char

With this definition, the previous declaration defines fp to be a character pointer. It really doesn't matter what fp is declared to be since anything (integer, character, or pointer) is capable of holding a file descriptor. The main thing is that the declaration is compatible with full C compilers.

Having declared file pointers in this way (or any other way for that matter), we are free to assign to them values returned by the open functions or any of the standard symbols stdin, stdout, stderr, stdaux, or stdprn (defined in STDIO.H).

Printf() and Scanf() Conversion Specifications

The Small C versions of printf(), fprintf(), scanf(), and fscanf() accept a binary conversion specification (designated by the letter b). Full C compilers do not support this feature, so its use must be considered nonportable.

Reserved Words

Keywords that are used in the C language are reserved; they cannot used as identifiers in programs. Small C has a restricted set of reserved words, but full C compilers have more. To write upward compatible programs, we should avoid all of full C's reserved words, even though Small C may accept them. The full list of reserved words is:

	auto		double		int*		struct
	break*		else*		long		switch*
	case*		enum		register	typedef
	char*		extern*		return*		union
	const		float		short		unsigned*
	continue*	for*		signed		void*
	default*	goto*		sizeof*		volatile
	do*		if*		static		while*

Words marked with an asterisk are reserved in Small C. If we anticipate porting programs to the Microsoft or Turbo C compilers, we should also avoid these names: cdecl, far, fortran, huge, near and pascal.

Command-Line Arguments

C programs gain access to information in the command line that invokes the program through two arguments which are passed to main(). In full C, we declare main() as

		main(argc, argv) int argc; char *argv[]; {
			...
			}

when we want access to such information. Argc is an integer indicating how many argument strings are in the command line (including the program name and excluding redirection specifications). Argv is an array of character pointers, each pointing to a null-terminated argument string that has been extracted from the command line. By using argv to locate the strings and argc to know how many strings there are, we can write code that accesses the strings. The first argument string (pointed to by argv[0]) is supposed to be the program name. The others follow in the order of their appearance in the command line. Thus argv[1] points to the first string following the program name. Redirection specifications (Chapter 15) are handled by MS-DOS and are not included as command line arguments.

Note: Since versions of MS-DOS earlier than 3.0 did not supply the program name, all C compilers provide a dummy value for the program name when running under older versions of MS-DOS. Small C substitutes an asterisk regardless of the version of the operating system.

Obviously, the above declaration for main() is not acceptable to Small C because it contains a declaration of a pointer array. So, with Small C, we declare

		main(argc, argv) int argc, *argv; {
			...
			}

instead. While this declares argv to be a pointer to integers, it is in fact a pointer to an array of pointers (integer sized objects) which locate the argument strings. By assigning each "integer" to a character pointer, it may then be used to access the designated string. To make all of this easier, the function getarg() is provided in the Small C library (Chapter 12). This function takes the number of the argument sought, the address of a buffer in which to place it, the size of the destination buffer, argc, and argv. It locates the specified argument, copies it to the destination, and returns its length. If fewer command-line arguments exist than are necessary to supply the one specified, getarg() returns EOF.

When porting programs to full C, it is probably best to port getarg() first, then simply change the declaration of main() to be compatible with full C. We might want to add getarg() to the standard run-time library of the new compiler so it will be there automatically when we need it. Of course, we could simply #include it into the source file of every program, but that requires keeping a copy of its source code available and lengthens compile times somewhat.

See Figure 15-1 and the surrounding text in Chapter 15 for a specific example of how command-line arguments are passed to programs.

Go to Chapter 15 Return to Table of Contents