CHAPTER 5:
POINTERS

The ability to work with memory addresses is an important feature of the C language. In many situations, array elements can be reached more efficiently through pointers than by subscripting. It also allows pointers and pointer chains to be used in data structures. This added degree of flexibility is always nice, but in systems programming it is absolutely essential.

Addresses and Pointers

Addresses that can be stored and changed are called pointers. A pointer is really just a variable that contains an address. Although, they can be used to reach objects in memory, their greatest advantage lies in their ability to enter into arithmetic (and other) operations, and to be changed.

Not every address is a pointer. For instance, we can write &var when we want the address of the variable var. The result will be an address which is not a pointer since it does not have a name or a place in memory; it cannot, therefore, have its value altered.

Another example is an array name. As we shall see in Chapter 6, an unsubscripted array name yields the address of the array. But, since the array cannot be moved around in memory, its address is not variable. So, although, such an address has a name, it does not exist as an object in memory (the array does, but its address does not) and cannot, therefore, be changed.

A third example is a character string. Chapter 3 indicated that a character string yields the address of the character array specified by the string. In this case the address has neither a name or a place in memory, so it too is not a pointer.

Pointer Declarations

The syntax for declaring pointers is like that for variables (Chapter 4) except that pointers are distinguished by an asterisk that prefixes their names. Table 5-1 illustrates several legitimate pointer declarations. Notice, in the third example, that we may mix pointers and variables in a single declaration. Also notice that the data type of a pointer declaration specifies the type of object to which the pointer refers, not the type of the pointer itself. As we shall see, all Small C pointers (in teger or character) contain 16-bit segment offsets.

The best way to think of the asterisk is to imagine that it stands for the phrase "object at" or "object pointed to by." The first declaration in Table 5-1 then reads "the object at (pointed to by) ip is an integer."

Table 5-1: Pointer Declarations

Memory Addressing

The size of a pointer depends on the architecture of the CPU and the implementation of the C compiler. The 8086 CPU incorporates a segmented memory addressing scheme in which an effective address is composed of two parts. There is a 16-bit segment address and the 16-bit offset within the segment. The CPU adds these parts together, with the segment address shifted left four bits, to derive the effective address for a memory reference. (See Appendix B for details, especially Figures B-2 through B-4.)

To be perfectly general, an 8086 address must occupy two 16-bit words--one for the segment address and one for the offset. But, since a given segment is usually referenced frequently while transitions between segments are relatively rare, the CPU contains segment registers (Figure B-1) that "remember" the segment addresses. Therefore, it is not always necessary for an address to specify more than the 16-bit offset.

The flexibility of this addressing scheme allows programs to utilize various schemes for segmenting memory so as to obtain a suitable compromise between program size and efficiency. Notice that a 16-bit offset effectively restricts segments to 64K bytes or less. So large programs must be divided into multiple segments. In 8086 terminology, addresses that specify both segment and offset are called far addresses, and those that provide only the offset are called near addresses.

Although some C compilers permit us to designate any of several memory addressing schemes (memory models), Small C always uses just one. Figure 5-1 illustrates the way Small C programs use memory and the segment and stack pointer registers.

Figure 5-1. Small C Memory Model

Each Small C program is divided into two segments, a code segment which is addressed by the CPU's code segment register (CS) and a data segment which is addressed by the data segment register (DS). The stack segment and data segment are one and the same, so the stack segment register (SS) contains the same address as DS. These segment register addresses remain fixed throughout the execution of a program; therefore, only near pointers are needed. Therefore, pointers are always just 16-bit offsets in Small C programs.

The code segment is only as large as it has to be to contain the program's instructions. The data/stack segment is exactly 64K bytes in size, if that much memory is available when the program begins execution. If less is available, then whatever is available is used. Notice how the data/stack segment is organized. Globally declared data occupies a fixed space at the low (address) end. Following that is the heap--space that has been allocated by calling the library functions malloc() and calloc(). The heap expands and contracts as program execution progresses. At the high end is the stack which grows toward the low end of the segment and shrinks back into the high end as the program executes. This leaves the area in the middle free for either heap or stack use. These last two areas should never overlap; if they do the program will certainly misbehave and may well go berserk. The library function avail() returns the amount of free memory available and optionally aborts the program if the stack overlaps allocated memory.

Pointers and Integers

As the preceding discussion indicates, Small C pointers occupy one word, just as integers do. This is fortunate, since Small C does not support arrays of pointers. In Small C, we are free to store pointer values in integer arrays; the compiler will not complain. This sort of abuse of data types used to be common in C circles, but these days it is frowned upon and the newer compilers gripe about it. Personally, I find it only slightly irksome since it obviates a major Small C shortcoming. A conseq uence of this practice is that when it becomes necessary to use a pointer in an integer array, we must first assign it to a pointer variable so the compiler will know that it is working with an address rather than an integer.

Pointer Arithmetic

Another major difference between addresses and ordinary variables or constants has to do with the interpretation of addresses.

Since an address points to an object of some particular type, adding one (for instance) to an address should direct it to the next object, not necessarily the next byte. If the address points to integers, then it should end up pointing to the next integer. But, since integers occupy two bytes, adding one to an integer address must actually increase the address by two. A similar consideration applies to subtraction. In other words, values added to or subtracted from an address must be scaled according to the size of the objects being addressed. This automatic correction saves the programmer a lot of thought and makes programs less complex since the scaling need not be coded explicitly. Notice that while the scaling factor for integers is two, for characters it is one; therefore, character addresses do not receive special handling. It should be obvious that if objects of other sizes were supported than different factors would have to be used.

A related consideration arises when we imagine the meaning of the difference of two addresses. Such a result is interpreted as the number of objects between the two addresses. If the objects are integers, the result must be divided by two in order to yield a value which is consistent with this meaning. See Chapter 6 for more on address arithmetic.

When an address is operated on, the result is always another address. Thus, if ptr is a pointer, then ptr+1 is also an address.

The Indirection Operator

The asterisk, which we saw in pointer declarations, can also be used as an operator in expressions. We call it the indirection operator because when it precedes a pointer (or any address) it produces an indirect reference to the indicated object.

Thus, if ip is a properly initialized integer pointer, then *ip yields the integer pointed to by ip. On the other hand, ip yields the address contained in the pointer.

At the assembly-language level, an indirect reference is accomplished by first loading the value of the pointer into a register, and then using it to fetch the object. If ip is a global pointer to integers, then *ip generates

		MOV BX,_IP
		MOV AX,[BX]

The first instruction loads the contents of ip into BX and the second loads the word addressed by BX into AX. It should be clear why this is called an indirect reference.

Pointer Comparisons

One major difference between pointers and other variables is that pointers are always considered to be unsigned. This should be obvious since memory addresses are not signed. This property of pointers (actually all addresses) ensures that only unsigned operations will be performed on them. It further means that the other operand in a binary operation will also be regarded as unsigned (whether or not it actually is). For instance, if (as we saw above) an integer array ia[] actually co ntains addresses, then it would make sense to write

	(ia[5] > ptr)

which performs an unsigned comparison since ptr is a pointer. Thus, if ia[5] contains -1 and ptr contains 0x1234, the expression will yield true, since -1 is really 0xFFFF--a higher unsigned value than 0x1234. So, although the array is thought by the compiler to contain signed integers, the proper type of comparison is performed anyway.

Note: With the current version of Small C, we could designate the array unsigned for better documentation.

It makes no sense to compare a pointer to anything but another address or zero. C guarantees that valid addresses can never be zero, so that particular value is useful in representing the absence of an address in a pointer.

Furthermore, to avoid portability problems, only addresses within a single array should be compared for relative value (as above). To do otherwise would necessarily involve assumptions about how the compiler organizes memory. Comparisons for equality, however, need not observe this restriction, since they make no assumption about the relative positions of objects.

An Example

A final example may help pull the ideas in this chapter together. Consider the program fragment in Listing 5-1.

Listing 5-1: Example of the Use of Pointers

This code adds the values of 5 characters to corresponding integers. First, the pointer cend is set 5 characters beyond the address in cp (which we presume has already been initialized properly). The while statement then repeatedly tests whether cp is less than cend. If so, the compound statement is performed. If not, control passes to whatever follows. With each execution of the compound statement, the object at cp (a character) is added to the object at ip (an integer) with the object at ip receiving the result. Then both cp and ip are incremented to the next objects. Since ip is an integer pointer, each increment advances it two bytes to the next integer. The loop executes 5 times. After that the task is finished, cp is no longer less than cend, and control go on to whatever follows.

Go to Chapter 6 Return to Table of Contents

CHAPTER 5: POINTERS

CHAPTER 5:
POINTERS