CHAPTER 8:
FUNCTIONS
The notion of function in programming languages is based on the mathematical concept of the same name. The idea is that a named algorithm, a function, maps a set of values onto one of a set of result values. When the algorithm's name is written in an expression, together with the values it needs, it represents the result that it produces. In other words, an operand in an expression may be written as a function name together with a set of values upon which the function operates; the resulting value, as determined by the function, replaces the function reference in the expression. For example, in the expression

		abc(a, 2) + 1

the term abc(a, 2) names the function abc and supplies the variable a and the constant 2 from which abc derives a value which is then added to 1 for the final value of the expression. If the algorithm of abc happens to form the product of its two values, then the expression effectively becomes

		(a * 2) + 1

Despite the simplicity of this illustration, functions are vary powerful devices. Their power comes from two facts. First, although a function may be invoked any number of places in a program, it is defined only once; and second, functions may have arbitrarily complex algorithms. It follows from the first fact that functions save work (programming effort) and memory space. And both facts together imply that functions may are an effective structuring device for organizing programs into logical units. The trade off, of course, is the overhead of performing the requisite function calls and returns.

Functions and Subroutines

Programming languages that support functions implement them as special subroutines which return a value and are called from within expressions. They refer to them as functions to distinguish them from ordinary subroutines (or procedures) that do not return values and are not called from within expressions. The C language uses functions as the basic procedural units of programs. In fact, C does not even implement ordinary subroutines. If one were not already aware of this, it may seem to be a major shortcoming. But, as we shall see, it is really an elegant simplification that looses none of the generality of other languages.

C is unique in that it accepts an isolated expression as a statement. Expressions need not be written as parts of larger constructs. Thus we could write i+5; or just 5; as independent statements if we wanted to. But what is the point? What do such statements accomplish? Well, these particular statements accomplish nothing. But when we consider that expressions may invoke functions, and may also include assignment, increment, and decrement operations, then it becomes clear that expressions are capable of doing work beyond that of simply yielding a value.

In fact, C does not even recognize assignment statements as such. Assignment statements are really just expressions containing assignment operators. Furthermore, while the concept of function is essentially that of algorithm yielding a value, functions are not limited to this narrow definition. There is no reason that functions cannot do other things which affect program execution--interact with the operating system, for instance.

These considerations lead to the observation that C has no need for a call statement. To call a function, it is only necessary to write an expression consisting of nothing more than a function reference; that is, the function's name followed by parentheses containing the values to be passed to it. Thus the expression

		func();

is a perfectly valid statement that calls the function func(). Although no values are passed to func(), it is still necessary to write the parentheses so the compiler will know to interpret this as a call to func(). Without the parentheses this would be taken as a reference to the function's address.

By eliminating the distinction between functions and subroutines, it would seem that C creates ambiguities relating to the handling of returned values. Do all functions return a value? And, if so, what happens to the value of a function like func(), above, which is called without a context to utilize its value? The answer to the first question is "no," not all functions actually return a value. On the other hand func() may in fact return a value which in this case is wasted; since it is not used it is simply ignored. The only dangerous situation occurs when a function that does not return a value is called from a context that requires one. Suppose, for instance, that the function novalue(), which does not return a value, is called in the statement

		if (novalue()) i = 5;

Will the assignment be performed? The answer is "maybe so, maybe not." When a function is written to not return a value, then when it returns to the point from which it was called, the last value that happens to be in Small C's primary register (the AX register) will be seen as the returned value. So, functions that do not return a value, effectively return unpredictable values which should be ignored when the function is called.

Terminology

Over the years, accepted terminology concerning the implementation of functions in programming languages has accrued. To begin with, we refer to the part of a program that describes a function as a function declaration or definition. And, as we saw in Chapter 1, C distinguishes between these terms. It will become clear momentarily that we may declare a function's existence without actually defining its algorithm.

The values on which a function works are called arguments. Arguments appear in two contexts--in function definitions and in function calls. In the first case, the arguments are just names by which reference is made to the values that are actually passed to the function when it is called. We call these formal or dummy arguments. In the second case, the arguments specify what is actually passed to the function, and so they are called actual arguments.

We have already been using three terms that need to be identified--call refers to the act of invoking a function, return refers to the return of control from a function back to the point from which it was called, and returned value refers to the value produced by a function.

Functions may affect programs in ways other than through their returned values. We call these actions side effects. Typically, side effects take the form of changes to global objects, changes to objects pointed to by arguments, and input/output operations.

Function Declarations

As we noted above, the distinction which C makes between declarations and definitions applies to functions also.

Declarations, which specify the contents of a function, are said to define the function. On the other hand, there are situations where it is necessary to only declare a function's existence and the type of value it returns. We now consider declarations of the latter type. Definitions will be discussed next.

Earlier we noted that it was the appearance of parentheses following a function's name that informed the compiler to generate a call to the function. That same method is used to distinguish function declarations from other types of declarations. Thus the first declaration in Table 8-1 identifies abc as the name of a function.

Table 8-1: Function Declarations

The second declaration in Table 8-1 looks a bit complicated because it has two sets of parentheses and an asterisk. In fact, it only declares fp to be a pointer to any function that returns integers. As in other declarations, the asterisk identifies the following name as a pointer. Therefore, this declaration reads "fp is a pointer to a function returning int." Using the term object loosely, the asterisk may be read in its usual way as "object at." Thus we could also read this declaration as "the object at fp is a function returning int."

So why the first set of parentheses? By now you have noticed that in C declarations follow the same syntax as references to the declared objects. And, since the asterisk and parentheses (after the name) are expression operators, an evaluation precedence is associated with them. In C, parentheses following a name are associated with the name before the preceding asterisk is applied to the result. Therefore,

		int *fp();

would be taken as

		int *(fp());

which says that fp is a function returning a pointer to an integer, which is not at all like the second declaration in Table 8-1.

I do not mean to imply that Small C will accept these last two examples. Among Small C's restrictions is its limit of only one declaration modifier (asterisk, parentheses, etc.) per name. So, although these two declarations are acceptable C syntax, Small C will reject them. The only exception is pointers to functions as shown in Table 8-1.

The last example in Table 8-1 shows how to declare functions that are defined in a different source file that is compiled separately.

Whereas most C compilers allow us to also declare (in non-definitional declarations) the types of the formal arguments, Small C does not. The parentheses must be empty in Small C programs. The reason some compilers permit this, is so they can ensure that arguments are passed to functions correctly. These prototype declarations are a very useful debugging feature. Chapter 28 discusses the possibility of adding prototype declarations to the Small C compiler.

There are three contexts in which non-definitional function declarations may appear--global declarations, local declarations, and declarations of formal arguments (below). Small C does not accept every form of declaration in Table 8-1 in every context. The first and third types of declaration may be written only at the global level. Whereas the second type (pointer to function) may appear either in local or formal argument declarations.

Of the three types of function declaration in Table 8-1, only one is ever required in a Small C program--the pointer declaration. Unlike most C compilers, when Small C encounters an undefined name it presumes to have a function name. And, since Small C functions return only integers, the compiler is able to automatically declare such names as functions returning integers.

This means that we may freely reference functions in a source file before defining them. At the end of the source file, if any of these automatically declared functions have not been defined, then code is generated which declares them to the assembler as external references. It follows, therefore, that the first and third examples of Table 8-1 are never needed in Small C programs. They are supported only for documentation purposes and for compatibility with full C compilers.

Although Small C does not support functions that return characters, it does accept such declarations. This is because, since characters are automatically promoted to integers, there is no practical difference between functions returning integers and those returning characters. So when Small C sees char func(); it accepts it as int func();

Function Definitions

The second way to declare a function is to fully describe it; that is, to define it. Obviously every function must be defined somewhere. Small C function definitions have the form

		Name (ArgumentList?)
			ArgumentDeclaration?...
			CompoundStatement

Name is the name of the function. ArgumentList? is a list of zero or more names for the arguments that will be received by the function when it is called. These are the formal (dummy) arguments mentioned earlier. They simply tag the actual arguments with names by which they can be referenced within the function. The first actual argument corresponds to the first formal argument, the second with the second, and so forth. ArgumentDeclaration?... is a series of declarations which specify the attributes of the formal arguments. Each name in ArgumentList? must be declared. These are ordinary declarations of variables, pointers, arrays, and function pointers as described in Chapters 4-6 and previously in this chapter. In this context, however, the declared items are not defined; instead, they are presumed to exist on the stack, below the return address, where they are placed before the function is called.

Since every actual argument is passed as a 16-bit value, characters are promoted to integers (as usual) and arrays and strings are passed as pointers. This is entirely consistent with their treatment in expressions. In fact, actual arguments are just that--expressions.

Although a character is passed as a word, we are free to declare its formal argument as either character or word. If it is declared as a character, only the low-order byte of the actual argument will be referenced, and (as usual) will be promoted to an integer. If it is declared as an integer, then all 16 bits will be referenced. The only difference is the point at which the promotion to integer occurs. In the first case it occurs both at the point of the call and at the points of reference in the function being defined (according the argument's formal declaration--unsigned or signed). In the second case it occurs only at the point of the call, where the actual argument is pushed onto the stack. In that case the promotion is regulated by the actual argument's declaration. Of course, the argument may be a character constant, in which case the promotion occurs at compile time, such that the value is positive.

It is generally more efficient to reference integers than characters because there is no need for a machine instruction to set the high-order byte. So it is common to see situations in which a character is passed to a function which declares the argument to be an integer. But there is one caveat here: not all C compilers promote character arguments to integers when passing them to functions; the result is an unpredictable value in the high-order byte of the argument. This should be remembered as a portability issue.

Since there is no way in C to declare strings, we cannot declare formal arguments as strings, but we can declare them as character pointers or arrays. In fact, as we have seen, C does not recognize strings, but arrays of characters. The string notation is merely a shorthand way of writing a constant array of characters.

Furthermore, since an unsubscripted array name yields the array's address and since arguments are passed by value (below), an array argument is effectively a pointer to the array. It follows that, the formal argument declarations arg[] and *arg are really equivalent. The compiler takes both as pointer declarations. Array dimensions in argument declarations are ignored by the compiler since the function has no control over the size of arrays whose addresses are passed to it. It must either assume an array's size, receive its size as another argument, or obtain it elsewhere.

The last, and most important, part of the function definition above is CompoundStatement. This is where the action occurs. Since compound statements may contain local declarations, simple statements, and other compound statements, it follows that functions may implement algorithms of any complexity and may be written in a structured style. Nesting of compound statements is permitted without limit.

As an example of a function definition consider

		func (i, ia, c, cp, fp)
			int i, ia[], (*fp)();
			char c, *cp;
			{...}

Here is a function named func which takes five arguments--an integer, an integer array, a character, a character pointer, and a function pointer. The ellipsis stands for whatever constitutes the function's algorithm. Notice that each argument is declared and only arguments are declared between the argument list and the compound statement. The order of the declarations is immaterial. This function without arguments would be declared as

		func () {...}

Notice that these definitions do not specify whether or not a function returns a value or the type of value returned. Full C compilers accept keywords like int, char, and void before a function definition to provide this information. The word void is a recent innovation which explicitly states that the function does not return a value. This version of Small C accepts void as a comment, but does not accept int or char.

For every function definition, Small C generates an assembler directive declaring the function's name to be public. This means that every Small C function is a potential entry point and so can be accessed externally.

Function Calls

A function is called by writing its name followed by a parenthesized list of argument expressions. The general form is

Name ( ExpressionList? )

where Name is the name of the function to be called. As indicated by the question mark, ExpressionList? is optional. Notice that each argument is in fact an expression. It may be as simple as a variable name or a constant, or it may be arbitrarily complex, including perhaps other function calls. What ever the case, the resulting value is pushed onto the stack where it is passed to the called function.

Small C programs evaluate arguments from left to right, pushing them onto the stack in that order. Then they call the function. On return, the arguments are removed from the stack and execution continues with the primary register containing the value returned by the function.

When the called function receives control, it refers to the first actual argument using the name of the first formal argument. The second formal argument refers to the second actual argument, and so on. In other words, actual and formal arguments are matched by position in their respective lists. Extreme care must be taken to ensure that these lists have the same number and type of arguments. Small C does not verify that functions are called properly.

It was mentioned earlier, that function calls appear in expressions. But, since expressions are legal statements, and since expressions may consist of only a function call, it follows that a function call may be written as a complete statement. Thus the statement

		func (x, y+1, 33);

is legal. It calls func(), passing it three arguments--x, y+1, and 33. Since this call is not part of a larger expression, any value that func() might return will be ignored. As another example, consider

		x = func();

which is also an expression. It calls func() without arguments and assigns the returned value to x. If func() should fail to explicitly return a value, then an unpredictable value will be assigned.

The ability to pass one function a pointer to another function is a very powerful feature of the C language. It enables a function to call any of several other functions with the caller determining which subordinate function is to be called. We shall see this used to good effect in the Small C expression analyzer.

As an example of this technique, consider three functions func(), f1(), and f2() where func() is to call either f1() or f2() depending on instructions from its caller. If we call func() with

		func (f1);

it will know to call f1() instead of f2(). (Recall that a function name, like f1, without parentheses yields the function's address.) Now, if func() is defined as

		func (arg) int (*arg)(); {
			...
			(*arg)();
			...
			}

it will perform properly. Notice that arg is declared to be a function pointer. Also, notice that the designated function is called by writing an expression of the same form as the declaration. Although not strictly necessary to make sense of this expression, the asterisk meaning loosely "object at" appearing before arg refers to the function itself. The first set of parentheses must be written as shown so the compiler will not apply the asterisk to the value returned by th e call, instead of the function pointer. This syntax, which was introduced with version 2.1 of Small C, is consistent with full C.

Originally, Small C accepted

		int arg;

to declare such an argument, and

		arg (...)

to call the function. Although, Small C still accepts this notation, to avoid portability problems, it should not be used.

Argument Passing

Now let us take a closer look at the matter of argument passing. With respect to the method by which arguments are passed, two types of subroutine calls are used in programming languages--call by reference and call by value.

The first method passes arguments in such a way that references to the formal arguments become, in effect, references to the actual arguments. In other words, references (pointers) to the actual arguments are passed, instead of copies of the actual arguments themselves. In this scheme, assignment statements have implied side effects on the actual arguments; that is, variables passed to a function are affected by changes to the formal arguments. Sometimes side effects are beneficial, but usually they are not. Frequently, with this approach, it becomes necessary to declare temporary variables into which the arguments are copied, just so the function can work with them without affecting the originals.

The C language, on the other hand, uses the call by value scheme in which values, not references, are passed to functions. Within a called function, references to formal arguments see values on the stack, instead of the objects from which they were taken.

The most important point to remember about passing arguments in C is that there is no connection between an actual argument and its source. Changes to the arguments received by a function, have no affect what so ever on the objects that might have supplied their values. They can be changed with abandon and their sources will not be affected in any way. This removes a burden of concern from the programmer since he may use arguments as local variables without side effects. It also avoids the need to define temporary variables just to prevent side effects.

It is precisely because C uses call by value that we can pass expressions--not just variables--as arguments. The value of an expression can be copied, but it cannot be referenced since it has no existence in memory; it is not an lvalue. Therefore, call by value adds important generality to the language.

Although the C language uses the call by value technique, it is still possible to write functions that have side effects; but it must be done deliberately. This is possible because of C's ability to handle expressions that yield addresses. And, since any expression is a valid argument, addresses can be passed to functions.

There are two ways to refer to objects that are pointed to by arguments--by using the indirection operator (*) and by subscripting. Recall from Chapters 5 and 6 that pointers may be subscripted just like array names.

To see how a function may produce side effects through a pointer argument, consider

		func (p) int *p; {
			*p = 0;
			}

Notice that p is declared to be a pointer to integers. The assignment statement does not assign zero to p but to the

object at p; that is, the integer to which p points.

Since expressions may include assignment, increment, and decrement operators (Chapter 9), it is possible for argument expressions to affect the values of arguments lying to their right. (Recall that Small C evaluates argument expressions from left to right.) Consider, for example,

		func (y=x+1, 2*y);

where the first argument has the value x+1 and the second argument has the value 2*(x+1). What would be the value of the second argument if arguments were evaluated from right to left? This kind of situation should be avoided, since the C language does not guarantee the order of argument evaluation. The safe way to write this is

		y=x+1;
		func (y, 2*y);

As we noted earlier, the Small C compiler does not verify the number and type of arguments passed to functions. It is the programmer's responsibility to ensure that they match the formal arguments in the function's definition. If too many arguments are given, the function will see only the trailing arguments. If too few are given, however, the function will use items on the stack, below the argument list, as though they were arguments. And that will surely produce undesirable results. This mistake--mismatching actual and formal arguments--is perhaps the most common and most troublesome error made by C programmers.

Occasionally, the need arises to write functions that work with a variable number of arguments. An example is printf() in the library.

In full C, these situations are handled by designing such functions so that the first (left-most) argument indicates how many other arguments are being passed. Then using the knowledge that arguments are pushed onto the stack from right to left (opposite to Small C), the function declares only one argument--the first one written, the last one pushed. The address of that argument is easily assigned to a pointer by writing an address operator (&) before the argument name. It then becomes a simple matter to obtain the following arguments by incrementing the pointer. And, since the first argument indicates how many others follow, the function knows when to quit.

However, since Small C pushes arguments from left to right, this technique would only work if such functions were designed to have the last (right most) argument indicate how many arguments precede it. But that option is not open if functions like printf() are to be compatible with their full C counterparts.

Consequently, Small C provides a means by which a function may determine how many arguments were actually passed to it. With each call, a count of the arguments being passed is placed in the 8086's CL register. This count may be obtained by calling a special function CCARGC() and assigning the returned value to a variable. This function exists in the CALL module of the Small C library. CCARGC() must be called first in the function since the CPU registers are volatile and the argument count would soon be lost. A statement like

		count = CCARGC();

will do the job. Four functions in the Small C library require argument counts; they are printf(), fprintf(), scanf(), and fscanf().

The Stack Frame

The machine stack is used by Small C programs for three purposes--for passing arguments, for saving return addresses, and for allocating local objects. Notice that each of these relates to the invocation of a function. The part of the stack that is used when a function is invoked is called the function's stack frame. Chapter 1 pointed out that C programs are initially given control by performing a normal call to main(). Therefore, every function, including main(), operates within the context of a stack frame. Figure 8-1 illustrates the structure of a Small C stack frame.

Figure 8-1. The Stack Frame

When a function is to be called, the actual arguments are pushed onto the stack in the order in which they are evaluated--left to right. Then the function is called by means of a CALL instruction which pushes the return address--the address of the following instruction--onto the stack above the arguments.

When the called function receives control it immediately pushes the previous frame pointer, in the BP register, onto the stack to preserve it. It then moves the stack pointer (SP) to BP thereby establishing a fixed point of reference for the current stack frame. Arguments are referenced with respect to BP by adding a positive offset. (See Appendix B for an overview of the 8086 CPU registers.)

As local objects are created, they are allocated on the stack by subtracting from SP. They are then referenced like arguments, except with negative offsets. When control exits a block in which locals were allocated, SP is incremented to its original value so the same stack space may be used by other locals.

When control returns to the caller, BP is moved to SP so that a POP instruction can restore BP to its previous value. Then a return instruction, pops the return address from the stack into the processor's instruction pointer (IP). This effects the return by causing the next instruction to come from the location indicated by the return address.

Finally, on return from the function, the calling function, which pushed the arguments onto the stack, removes them by the most efficient means possible--by adding a constant to the stack pointer, by popping an argument or two into an unused register, or by taking no action if no arguments were passed to the function.

Each call allocates a stack frame. As the calls are nested, the frames are stacked one above the other. And, as the calls unnest, the frames are unstacked in the reverse order. As a result there is no interference between function calls, even calls to functions that already have frames on the stack.

Since memory is limited, there is a practical limit to the amount of nesting that can be done. It is the programmer's responsibility to determine that his programs do not overflow the stack space that is available. The Small C library (Chapter 12) includes a function avail() which checks for a stack overflow condition.

Returning from Functions

There are two instances where control is caused to return from a function to its caller. When control reaches the right-most brace of the function's body, an implied return is taken. In that case, no return value is specified and so none is provided. The caller will see whatever happens to be in Small C's primary register at the time of the return. It is important to be sure that any function which is supposed to return a value does not return by this means.

On the other hand, explicit returns are possible by writing statements of the form

		return ExpressionList?;

where return is a required keyword and ExpressionList? is an optional list of expressions. If an expression (list) is provided, its value (the value of the last expression in the list) is returned to the caller. But if there is not an expression, then, as with implicit returns, the returned value is unpredictable and should not be used by the caller.

In full C, functions are declared according to the type of value they return. However, in Small C functions return only integer values. This is a reasonable simplification, when we consider that Small C only supports character and integer variables, and that characters are automatically promoted to integers whenever they are referenced. We are free to write functions that return characters, but the value returned will actually be an integer.

But suppose we should want to have a function return a pointer. How can that be done? Well, since integers and pointers in Small C have the same size, there is no real problem. We can just write our return expression to produce an address instead of an integer. But we must be aware that the returned value will be seen as an integer in the expression in which the function is called. In some expressions the difference is immaterial, but in others it is important. If the difference matters, we can break down the expression into two parts. First, assign the function's value to a pointer of the desired type, and then write the pointer in the expression where the function would have been called. For example, if we wanted to assign five to the character pointed to by func() we would write

		cp = func();
		*cp = 5;

where cp is a character pointer.

Recursive Calls

Using the stack for actual arguments, return addresses, and local objects has the advantage that it permits recursive calls. A function is called recursively when it either calls itself directly or it calls other functions which directly or indirectly call it. Use of recursion can simplify many algorithms, but it does drive up the amount of stack space needed and it usually makes the program logic less obvious. Take the function backward() in Listing 8-1, for example. This function displays the characters of a string in reverse order. It receives the character string (actually a pointer to the first character) as an argument when it is first called. If the character at that address is not zero, it calls itself again passing the address of the next character. This nesting continues until a zero character is found, at which point the current instance of backward() simply returns to the prior one. Control resumes in that instance at putchar() which writes the current character to the standard output file (Chapter 12). That instance then returns to the prior one, and so on. The calls continue to unnest until the first instance displays the first character and returns to its caller. As an exercise, you might try writing this function without recursion.

Listing 8-1: Sample Recursive Function Call

A Sample Function

Suppose we need a function which will compare two character strings, returning zero if they differ, and their length if they match. The first string may be a substring of a larger string, so it is not necessarily terminated with a zero byte. Whereas, the second string being a full string, does have a null terminator. We might write this function as follows:

		streq (str1, str2)  char str1[], str2[]; {
			int k;
			k = 0;
			while (str2[k]) {
				if(str1[k] != str2[k]) return 0;
				++k;
				}
			return k;
			}

The function's name is streq meaning string equality. It expects two arguments, to which it gives the formal names str1 and str2. These are declared to be character arrays; however, since array arguments are passed as addresses, these are actually pointer arguments.

A local integer k is defined and initially set to zero; it will serve as both a subscript into the strings and the value to return if the strings match. The while statement executes repeatedly as long as the character in the second string subscripted by k is not zero; i.e., the end of the second string has not been reached. With each iteration, the compound statement is executed. It first compares corresponding characters (subscripted by k) in the two strings. If they are not equal, the function quits by returning zero to the caller. However, if they match, k is incremented by ++k. At this point, the compound statement is exhausted, so the while condition is checked again, this time with a larger value of k.

If this continues until a zero character is found in str2 then a match has occurred, since none of the preceding characters differed. In that case, k is returned since it contains the length of the matched string. This is true because array elements are numbered from zero. Thus, if str2 contains three characters and a null terminator, then the terminator will be reached when k equals three.

This is one of the string matching functions used by Small C; it finds keywords in programs. The first argument is a pointer into a line of code in a program, and the second is a literal string containing the keyword being sought.

Go to Chapter 9 Return to Table of Contents

CHAPTER 8: FUNCTIONS

CHAPTER 8:
FUNCTIONS