VARIABLES
A variable is a named object that resides in memory and is capable of being examined and modified. The term applies to pointers as well as integers and characters. Although arrays fit this definition, for reasons that appear in Chapter 6, they are better regarded as collections of variables. Pointers are treated in detail in the next chapter.
Storage Class
The term storage class refers to the method by which an object is assigned space in memory. Small C recognizes three storage classes--static, automatic, and external .
MOV AX,_SIwhich instructs the CPU to copy the word at address _SI into AX. (See Appendix B for an overview of the 8086 processor.)
The fact that they exist in permanently reserved memory locations, means that static variables can be initialized to arbitrary values which they are guaranteed to have at the beginning of program execution. See Chapter 7 for more about initializing static objects.
In Small C, global variables are always static and only global variables are static. Full C, however, supports static locals.
If locals are dynamically allocated at unspecified memory (stack) locations, then how does the program find them? This is done by using the base pointer register (BP) to designate a stack frame (Figure 8-1) for the currently active function. When the function is entered, the prior value of BP is pushed onto the stack and then the new value of SP is moved to BP. This address--the new value of SP--then becomes the base for references to local variables that are declared within the function.
If, for example, an integer i and a character c are defined (in that order) as automatic variables, the stack pointer would be decreased by three--two for i and one for c. If i is to be loaded into the accumulator, it would be done with the assembly instruction
MOV AX,-2[BP]which tells the CPU to locate the word addressed by BP minus 2 and copy its value into AX. Similarly, c would be obtained by
MOV AL,-3[BP]which locates the byte addressed by BP minus 3 and copies it into AL, the lower half of AX.
Notice that this method of accessing local variables makes no use of labels, and that with each call of a function its local variables may have different addresses depending on the stack address on entry to the function.
Obviously, when a local variable is created it has no dependable initial value. It must be set to an initial value by means of an assignment operation. Full C provides for automatic variables to be initialized in their declarations, like globals. It does this by generating "hidden" code that assigns values automatically after variables are allocated space. Small C does not support this feature; it requires the writing of assignment statements. This may seem like a shortcoming, but it really is not. For one thing, it takes very little more to write an assignment statement than an initializer (Chapter 7). And, for another, the result in the object program is the same; Small C is no less efficient for its lack of local initializers.
One last detail about automatics. When the current function is finished, it is necessary to restore BP and SP to their original values so the calling function will operate properly again. This is done by first moving BP to SP--to deallocate local variables--and then popping the original value of BP back into BP.
It is tempting to forget that automatic variables go away when the block in which they are defined exits. This sometimes leads new C programmers to fall into the "dangling reference" trap in which a function returns a pointer to a local variable, as illustrated by
func() { int autoint; ... return (&autoint); }
When callers use the returned address of autoint they will find themselves messing around with the stack space which autoint used to occupy. In Small C, local variables are always automatic, and only local variables are automatic. Full C, however, supports static locals.
In Small C, only global declarations can be designated extern and only globals in other modules can be referenced as external.
Scope
The scope of a variable is the portion of the program from which it can be referenced. We might say that a variable's scope is the part of the program that "knows" or "sees" the variable. As we shall see, different rules determine the scopes of global and local objects.
When a variable is declared globally (outside of a function) it's scope is the part of the source file that follows the declaration--any function following the declaration can refer to it. Functions that precede the declaration cannot refer to it. Most C compilers would issue an error message in that case. However, Small C assumes that any undeclared name refers to a function, and automatically declares it as such.
The scope of local variables is the block in which they are declared. Local declarations must be grouped together before the first executable statement in the block--at the head of the block. It follows that the scope of a local variable effectively includes all of the block in which it is declared. Since blocks can be nested, it also follows that local variables are seen in all blocks that are contained in the one that declares the variables.
If we declare a local variable with the same name as a global object or another local in a superior block, the new variable temporarily supersedes the higher level declarations. Consider the program in Listing 4-1.
This masking of higher level declarations is an advantage, since it allows the programmer to declare local variables for temporary use without regard for other uses of the same names.
Declarations
Unlike BASIC and FORTRAN, which will automatically declare variables when they are first used, every variable in C must be declared first. This may seem unnecessary, but when we consider how much time is spent debugging BASIC and FORTRAN programs simply because misspelled variable names are not caught for us, it becomes obvious that the time spent declaring variables beforehand is time well spent.
As we saw in Chapter 1, describing a variable involves two actions--declaring its type and defining it in memory (reserving a place for it). Although both of these may be involved, we refer to the C construct that accomplishes them as a declaration. As we saw above, if the declaration is preceded by extern it only declares the type of the variables, without reserving space for them. In such cases, the definition must exist in another source file. Failure to do so, will result in an " ;unresolved reference" error at link time.
Table 4-1 contains examples of legitimate variable declarations. Notice that the first two declarations are introduced by a keyword that states the data type of the variables listed. The keyword char declares characters, and int declares integers. This is the standard way to write declarations. Since it is not specified otherwise, both the character and integer variables declared by these statements are assumed by the compiler to contain signed values. We shall see in a moment that this assumption can be changed. The trend today is to make this assumption specific by placing the keyword signed before the data type; but Small C does not recognize that keyword.
The next three declarations begin with the prefix unsigned which further qualifies the declaration by specifying unsigned treatment of the variables. When a declaration is introduced by such a qualifier the data type may be omitted, causing the compiler to assume int.
As stated earlier, the ability to specify unsigned characters is new with this version of Small C. The original C language and earlier versions of Small C did not permit it. However, the trend in C compilers is to support this highly desirable feature, as the present version of Small C does.
Notice that Table 4-1 gives three examples of external declarations. Here, too, when the data type is not given, the compiler assumes int.
When more than one variable is being declared, they are written as a list with the individual names separated by commas. Each declaration is terminated with a semicolon as are all simple C statements.
Integer Variables
Integers are 16-bit quantities. Signed integers are represented internally in two's complement notation. The high-order bit is a sign bit and the 15 low-order bits are magnitude bits. This gives signed integers a positive range of 0 through 32767, and a negative range of -1 through -32768. As we saw above, a variable is understood to be signed if it is not explicitly declared unsigned.
Unsigned integers differ in that the high-order bit is taken as a magnitude bit. This gives unsigned integers a range of 0 through 65535.
When a signed integer enters into an operation with an unsigned quantity, the signed integer is treated as though it were unsigned. There is no actual change to its bit pattern, it is simply taken as an unsigned value. The result of such operations is also an unsigned value.
When a signed integer combines with another signed quantity, a signed operation is performed and the result is considered to be signed. In many cases there is no difference between signed and unsigned operations. Examples are addition and subtraction, equality and inequality comparisons. Nevertheless, if either operand is unsigned, the result is unsigned.
Character Variables
Character variables are stored as 8-bit quantities. When they are fetched from memory, they are always promoted automatically to integers. This is the only automatic data conversion performed by Small C on fetched variables since integers are the largest values it recognizes. Full C likewise promotes characters to integers, but it may also perform further conversions if needed to match the other operand in a binary operation.
Originally, the C language did not distinguish between signed and unsigned character variables. It simply treated them as signed quantities. When it converted them to integers it did so by extending the high-order (sign) bit throughout the high-order byte. But this approach often produces undesirable results when working with characters that have the high-order bit turned on. If, for example, a variable ch contains the value 0x80, one would expect the condition
(ch == 0x80)to yield true. But since the high-order bit of ch is set, the condition becomes effectively
(0xFF80 == 0x80)which is false. This forces the programmer to reset the 8 high-order bits to zero by writing something like
(ch & 0xFF == 0x80)where & is the bitwise AND operator. Written in this form, the condition yields true.
Some compiler implementors have decided that it is better to treat character variables as unsigned values (like character constants). This eliminates the need to strip high-order bits and saves untold hours of debugging time. Today, most implementors still follow the UNIX convention, but also provide a compile-time option to treat unqualified character variables as unsigned. Most recently, the ability to designate character variables as either signed or unsigned has become standard.
Small C follows the UNIX convention by assuming that unqualified character variables are signed. However, the current version of Small C does accept the unsigned qualifier in character declarations to reverse this assumption.
Signed characters are represented internally in two's complement notation--the high-order bit being the sign and the 7 low-order bits specifying the magnitude. This gives signed characters a positive range of 0 through 127, and a negative range of -1 through -128.
Unsigned characters differ in that the high-order bit is taken as a magnitude bit and the sign is always presumed to be positive. This gives unsigned characters a range of 0 through 255.
As with integers, when a signed character enters into an operation with an unsigned quantity, the character is interpreted as though it were unsigned. The result of such operations is also unsigned. When a signed character joins with another signed quantity, the result is also signed.
There is also a need to change the size of characters when they are stored, since they are represented in the CPU as a 16-bit values. In this case, however, it matters not whether they are signed or unsigned. Obviously there is only one reasonable way to put a 16-bit quantity into an 8-bit location--the high-order byte must be chopped off. It is the programmer's responsibility to ensure that significant bits are not lost when characters are stored.