CHAPTER 3:

CONSTANTS

Small C recognizes three types of constants; they are numeric, character, and string. Furthermore, numeric constants may be written in three bases: decimal, octal, and hexadecimal. Each type of constant is discussed below.

Decimal Constants

Ordinary decimal constants are written as a sequence of decimal digits, possibly preceded by a plus or minus sign. The minus sign gives the constant a negative value; whereas, the absence of a minus sign, makes it positive. The plus sign is optional for positive values. The normal range of decimal constants is -32768 through +32767.

However, Small C allows us to write constants in the range 32768 through 65535. Former versions of the compiler would interpret these as their negative equivalents -32768 through -1; that is, the negative values with the same bit patterns. But this is an unexpected interpretation and is not consistent with full C compilers which preserve the value of large positive constants by converting them to long integers.

The current version of Small C takes these large positive constants as unsigned values. While this preserves their value and yields expected results, there is one caveat. In order to produce reasonable results, operations performed on unsigned values are necessarily unsigned operations. For most expression operators there is no difference between the signed and unsigned operations. But for the following operations there is a difference:


		*  	multiplication

	         /  	division

		%  	modulo (remainder)

		<  	less than

		<= 	less than or equal

		>  	greater than

		>= 	greater than or equal

Also, when an unsigned operand is operated on (unary or binary operation), the result is considered to be unsigned. So in complex expressions, we may get some unexpected unsigned operations.

There is one final caution about large positive values. Although Small C treats them as unsigned values, they nevertheless appear in the output file as negative values. But don't panic, this is really okay. Recall that each unsigned value over 32767 has the high-order bit set and so has an equivalent negative value with the same bit pattern. When these negative values pass through the assembler they produce the same binary patterns as their unsigned equivalents. So the end result is the same.

With earlier versions of the Small C compiler, if we wanted an unsigned comparison we had to make sure that one of the operands was thought by the compiler to be an address. This could be done by falsely declaring an integer to be a character pointer (Chapter 5). And, in fact, this unorthodox practice has been used to good effect in many Small C programs. With the current compiler, however, we can write exactly what we intend by using the keyword unsigned.

As was implied in the previous discussion, decimal constants are reduced to their two's complement or unsigned binary equivalent and stored in 16-bit words. Some examples of legitimate decimal constants are 0, 12345, -1024, and +256.

Octal Constants

If a sequence of digits begins with a leading 0 (zero) it is taken as an octal value. In this case the word digits refers only to the octal digits (0 through 7). As with decimal constants, octal constants are converted to their binary equivalent in 16-bit words. Octal constants may, therefore, range from 0 through 0177777. Here as with decimal constants, we must realize that large values (100000 and higher) will be treated by the compiler as unsigned values. Some examples of legitimate oct al constants are 010, 01234, and 077777. Notice that the octal values 0 through 07 are equivalent to the decimal values 0 through 7.

The old CP/M version of Small C did not recognize octal constants. It took constants with leading zeroes as decimal values. Therefore, when converting a program written for that compiler we should strip leading zeroes from its numeric constants.

Hexadecimal Constants

If a sequence of digits begins with 0x or 0X then it is taken as a hexadecimal value. In this case the word digits refers to hexadecimal digits (0 through F). The lowercase letters a through f are acceptable. As with decimal constants, hexadecimal constants are converted to their binary equivalent in 16-bit words. Hexadecimal constants may range from 0 through ffff. Here as with decimal constants, we must realize that large values (8000 and higher) will be treated by t he compiler as unsigned values. Some examples of legitimate hexadecimal constants are 0x0, 0x1234, and 0xffff.

Character Constants

Character constants consist of one or two characters surrounded by apostrophes. It may seem odd that a character constant could have two characters in it, but it makes sense when we consider that, like numeric constants, character constants become 16-bit words (integer sized objects) and so have room for two characters.

The constant'B', for instance, produces 0042 hex (the ASCII value for an uppercase B). And the constant 'AB' produces 4142 hex, which is simply the two characters A and B in the high-order and low-order bytes respectively. Some compilers do not support multi-character constants because they present a portability problem when the byte order of the CPUs differ.

Character constants are always treated as signed integers. Therefore, unlikethe large numeric constants, mentioned above, character constants receive unsigned operations only when they are combined with unsigned operands. In either case, single-character constants are always positive because the high-order bit is always zero.

We should realize that the same is not always true of character variables. Unless a character variable is specifically declared to be unsigned, its high-order bit will be taken as a sign bit. When the character is referenced it will be converted to a signed integer by extending that bit throughout the eight high-order bits that are appended to it. Therefore, we should not expect a character variable, that is not declared unsigned, to compare equal to the same character constant if the high-order bit is set. For more on this see Chapter 4.

String Constants

Strictly speaking, C does not recognize character strings, but it does recognize arrays of characters and provides a way to write constant character arrays which are called strings. Surrounding a character sequence with quotation marks (" ), sets up an array of characters and generates the address of the array. In other words, at the point in a program where it appears, a string constant produces the address of the specified array of character constants. The array itself is locate d elsewhere. This is very important to remember. Notice that this differs from a character constant which generates the value of the constant directly. Just to be sure that this distinct feature of the C language is not overlooked, consider the following illustrations:

In the program

         		main() {
 			char *cp;
			cp = "hello world\n";
			printf(cp);
			}

the function printf() must receive the address of a string as its first (in this case, only) argument. First, the address of the string is assigned to the character pointer cp . Then the value of cp is passed to the function. Unlike other languages, the string itself is not assigned to cp, only its address is. After all, cp is a 16-bit object and, therefore, cannot hold the string itself. The same program could be written better as

		 main() {
			printf("hello world\n");
			}

In this case, it is tempting to think that the string itself is being passed to printf(); but, as before, only its address is.

Since strings may contain as few as one or two characters, they provide an alternative way of writing character constants in situations where the address, rather than the character itself, is needed.

It is a convention in C to identify the end of a character string with a null (zero) character. Therefore, C compilers automatically suffix character strings with such a terminator. Thus, the string "abc" sets up an array of four characters ('a', 'b', 'c', and zero) and generates the address of the first character, for use by the program.

Full C compilers permit long strings to be split between lines; but, Small C makes no such provision. See Chapter 28 for suggestions on how to add this capability to Small C; it is really quite easy.

Escape Sequences

Sometimes it is desirable to code nongraphic characters in a character or string constant. This can be done by using an escape sequence--a sequence of two or more characters in which the first (escape) character changes the meaning of the following character(s). When this is done the entire sequence generates only one character. C uses the backslash (\) for the escape character. The following escape sequences are recognized by the Small C compiler:


	\n		newline

	\t		tab

	\b		backspace

	\f		form feed

	\ooo		value represented by the octal digits ooo

The term newline refers to a single character which, when written to an output device, starts a new line. Directed to a CRT screen it would place the cursor at the first column of the next line. When written to an output device or a character stream file, the newline character becomes a sequence of two characters: carriage return and line feed (not necessarily in that order). Conversely, on input a carriage return or a carriage return/line feed pair becomes a single newline character. Some impl ementations of C use the ASCII carriage return (13) as the newline character while others use the ASCII line feed (10). It really doesn't matter which is the case as long as we write \n in our programs. Avoid using the ASCII value directly since that could produce compatibility problems between different compilers.

The sequence \ooo may be used to represent any character. It consists of the escape character followed by one, two, or three octal digits. The number ends when three digits have been processed, or a non-octal character is found.

There is one other type of escape sequence: anything undefined. If the backslash is followed by any character other than the ones described above, then the backslash is ignored and the following character is taken literally. So the way to code the backslash is by writing a pair of backslashes and the way to code an apostrophe or a quote is by writing \' or \" respectively.

Go to Chapter 4 Return to Table of Contents