At various points in the planning phase, we must decide between good programming style and efficiency. We often have to make tradeoff decisions. And even when the emphasis is on efficiency we often have to choose between program size and speed.
As a rule, good programming style is more important than efficiency, because programming time is expensive and the more clearly a program is written the more quickly it can be documented and revised. However, there are times when it is more important for programs to work with limited memory and/or to satisfy fixed performance criteria.
This chapter does not deal with programming style, it merely provides information about the efficiency of certain C constructs, as they are implemented by the Small C compiler, so that tradeoff decisions can be made intelligently. The advice given below is not necessarily appropriate in all circumstances. Since, by default, Small C optimizes its output, the following comments relate to the optimized output, not the preliminary code produced by the -NO command-line switch (Chapter 16).
Because of complexities (like instruction pipelining) in the design of the 8086 family of processors and the different instruction timings of the different processors in the family, it is impossible to give exact information on the relative speeds of alternative constructs. So the following information, as it pertains to speed, should be taken as approximate.
Appendix B describes the 8086 CPU. It may help to review that material before proceeding with the following discussions, since they refer to the 8086 registers.
Systems with an 8-bit data bus always require two memory accesses to fetch or store an integer or pointer. However, systems with wider buses can reference a word with just one memory access. With a 16-bit bus, if the word has an even address (is on a word boundary) one access fetches or stores both bytes; but, if the word has an odd address (is on a byte boundary) two memory accesses will be performed by the hardware since the word spans two aligned words in memory.
With a 32-bit bus, the situation is a bit more complicated. If the word is on a byte boundary, then it may either span two aligned double words in memory or it may fall exactly in the middle of one. In the first case, two accesses are required; but, in the latter case one will suffice. So we can expect roughly half of our integers and pointers to be referenced efficiently, and half inefficiently. However, if an integer or pointer is word aligned, then it must fall in the first or second half of an aligned double word. So only one memory access is required.
We can take advantage of the way Small C allocates global storage to ensure that integers and pointers are word aligned. Small C starts segments on word boundaries, and allocates storage in the order in which global definitions appear in the program. By writing all of our integer and pointer definitions before any character definitions we can ensure that all integers fall on word boundaries.
Unfortunately, Small C does not control the alignment of function arguments or local variables. So, although it may run counter to good programming practice, if speed is important, we can declare integers and pointers globally rather than locally.
Chapter 28 contains a discussion on making the compiler align word sized items in every case.
Integers and Characters
Compared to integers, fetching characters takes somewhat more time and decidedly more instruction space in memory. For instance, only one instruction
MOV AX,_GI(three bytes) is needed for a global integer, but two instructions
MOV AL,_GC CBW(three bytes and one byte respectively) are required for a global character. The second instruction converts the byte in AL to a word in AX by extending its sign bit through AH. If the character is declared to be unsigned, CBW is replaced with
XOR AH,AH(two bytes) which zeros AH.
Choosing character variables wherever they are sufficient does not necessarily reduce program size, and may in fact enlarge it somewhat. For instance, suppose we have the choice of defining a variable as an signed integer or a signed character, and suppose it is to be fetched five times in the program. Then we would save one byte of data space by declaring a character, but we would lose five bytes of code space. On the other hand, suppose instead of a variable we were considering an array of 1000 elements. In that case, declaring an integer array would cost 1000 bytes of data space (compared to a character array), while saving only five bytes of code space--not a wise choice. So, in situations where either integers or characters will do, it is usually better to declare integer variables and character arrays.
In terms of speed, the difference is not so clear. Suffice it to say that integers are generally accessed a little faster that characters because of the additional instruction that characters require. However, this tends to be compensated by the need to transfer two bytes instead of one. On the other hand, the second byte may be free, as we saw above, depending on the data bus width and the alignment of the integers.
Since the storing of a character involves transferring just the low order byte of AX to memory, there is no penalty associated with characters. On machines with 8-bit buses or when the destination is aligned on a byte (not word) boundary, storing integers is slower than storing characters. However, they both require the same amount of instruction space--three bytes for global stores.
Globals and Locals
There is no appreciable difference in fetching or storing global and local objects. Fetching a global integer is done with
MOV AX,_GIand a local integer with
MOV AX,-n[BP]where n is an offset into the stack frame (Chapter 8). In the first case, the instruction, together with the address of the global integer _GI, takes three bytes of space. Two bytes of this is the address which the processor uses to locate the operand. In the second case, the instruction together with the immediate value n also takes three bytes. In this case, the processor forms an effective address by adding n to BP; this takes very little time.
Global objects occupy the same amount of space as locals, but since they are static they always exist--even in the EXE file. On the other hand, locals are dynamically allocated at run time, so they require no space in the EXE file. Therefore, a program containing a large global array may waste disk space and take unnecessarily long to load. These disadvantages can be overcome two ways.
First the array can be declared locally in main(). If it requires initial values, then logic must be executed to initialize it; however, this is often a simple process and the space it adds to the code segment may be a small compared to what is saved in the data segment. The time to initialize such an array is negligible since the array is initialized only once. This method has the disadvantage that the array's address must be passed to every function that needs to refer to it. Another disadvantage is that each reference to the array, from functions other than main(), requires an additional memory access since the array is being referenced indirectly through a pointer--the address passed to the function.
A second approach is to declare a global pointer. Then, in main(), allocate memory for the array and assign its address to the pointer. From then on, the pointer can be used just as though it were an array. This method permits functions to refer to the array as a global object. Since the desired initial value for an array's elements is usually zero, the function calloc() can be used to allocate zeroed memory. The disadvantage of this method is that (as above) each reference to an element requires an additional memory access since the array is being referenced indirectly through the pointer--in this case a global pointer.
Since the compiler evaluates constant expressions at compile time, there is no penalty associated with writing constants as expressions. For instance, we may be working with a character array in which every four bytes constitutes a single entry of four one-byte fields. To make the program logic clearer, we could define symbols for the size of an entry (#define SZ 4) and for offsets to the individual fields in each entry (e.g., #define FLD3 2). If we have a pointer to some entry in the array (e.g., *ap), we could refer to the third field of the previous entry as
*(ap + (FLD3 - SZ))In this case the expression generates just
MOV BX,_AP fetch address from pointer MOV AL,-2[BX] indirectly fetch character CBW sign extend to 16 bitswhere the first instruction loads the contents of ap (an address) into BX, the second one moves into AL the byte located two bytes previous to that address, and the third one converts it to a word. Notice that the subexpression (FLD3 - SZ) appears as the offset -2 in the second instruction. This is just as though we had written
*(ap - 2)Without the inner parentheses in this example, the compiler would have evaluated the expression as though it were written
*((ap + FLD3) - SZ)which is effectively the same but not nearly as efficient. This generates
MOV AX,_AP fetch the address from pointer ADD AX,2 add FLD3 to address SUB AX,4 subtract SZ from address MOV BX,AX move address to BX MOV AL,[BX] indirectly fetch character CBW sign extend to 16 bitsNotice here that the constants appear in two places. Also, the optimizer is not smart enough to eliminate the fourth instruction by making the first three work directly with BX.
The point is that, because Small C evaluates constant (sub)expressions at compile time, there is no penalty associated with writing them in terms of their constituent parts. And whenever more than one constant appears in an expression, it is worth trying to group them together.
Remember, from Chapter 9, that the sizeof operator yields a constant value. So it can be used in constant expressions.
Whenever possible, you should write test expressions in if, for, do, and while statements so that they involve a comparison to the constant zero. In such cases, the compiler economizes on the code it generates. This is illustrated by the examples in Listing 19-23. The exact savings depends on which relational operator is involved and whether a signed or unsigned comparison is performed.
Constant Zero Subscripts
The compiler adds a subscript to the address of an array or to the contents of a pointer in order to obtain the address of the specified element. However, if the compiler sees that the subscript has the constant value zero, it eliminates the addition altogether. It follows that, in terms of the code generated, writing array is the same as writing *array--there is no penalty for using the subscript notation. This is true even if a constant expression is used for the subscript. If it evaluates to zero, the addition operation is skipped.
Whenever possible, write a switch statement instead of a string of if...else... statements. The code generated by the compiler is much smaller since, for each condition, it generates only a pair of words (an address and a constant). At run time, a library routine accepts the value of the expression and rapidly runs down the list of constants looking for a match. When it finds one it branches to the corresponding address. With the if...else... approach, however, the expression must be evaluated and the result tested in each case. See Listing 19-22 for an example of the code generated by an if...else... sequence, and Listing 19-24 for a switch statement.
Generally, it is more efficient to use pointer rather than subscripted references. The advantage is that pointer references do not have to involve the addition of a subscript, since the pointer itself can be adjusted like a subscript. For example, the reference
*gipwhere gip is a global integer pointer, generates
MOV BX,_GIP fetch pointer value MOV AX,[BX] indirectly fetch integerOn the other hand, the subscripted reference
gia[gi],where gia is a global integer array and gi is a global integer, generates
MOV AX,_GI fetch subscript MOV BX,OFFSET_GIA fetch array address SHL AX,1 multiply subscript by 2 ADD BX,AX add subscript to address MOV AX,[BX] indirectly fetch integer