1
0
mirror of https://github.com/PDP-10/its.git synced 2026-01-13 15:27:28 +00:00
PDP-10.its/doc/c/c.refman
Lars Brinkhoff 53f2a2eba9 KCC - C compiler.
Binary-only compiler and library, plus documentation and include files
for compiling new programs.
2017-02-15 19:27:00 +01:00

2292 lines
83 KiB
Plaintext
Executable File
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

C Reference Manual
2 March 1977
Dennis M. Ritchie
Bell Telephone Laboratories
Murray Hill, New Jersey 07974
Alan Snyder
Laboratory for Computer Science
Massachusetts Institute of Technology
1. Introduction
C is a computer language based on the earlier language B [1],
itself a descendant of BCPL [3]. C differs from B and BCPL
primarily by the introduction of types, along with the
appropriate extra syntax and semantics.
Most of the software for the UNIX time-sharing system [4] is
written in C, as is the operating system itself. In addition to
the UNIX C compiler, there exist C compilers for the HIS 6000 and
the IBM System/370 [2]. This manual describes the C programming
language as implemented by the portable C compiler [6]. It is a
revision by the second author of the original C Reference Manual
(contained in [5]), which describes the UNIX C compiler.
Differences with respect to the UNIX C compiler and undesirable
limitations of the current portable C compiler are described in
footnotes to this document.
The report ``The C Programming Language'' [5] contains a
tutorial introduction to C and a description of a set of portable
I/O routines, concerned primarily with I/O.
2. Lexical conventions
There are six kinds of tokens: identifiers, keywords,
constants, strings, expression operators, and other separators.
In general blanks, tabs, newlines, and comments as described
below are ignored except as they serve to separate tokens. At
least one of these characters is required to separate otherwise
adjacent identifiers, constants, and certain operator-pairs.
If the input stream has been parsed into tokens up to a given
character, the next token is taken to include the longest string
of characters which could possibly constitute a token.
2.1 Comments
The characters /* introduce a comment, which terminates with
the characters */. Comments thus may not be nested.
C Reference Manual - 2
2.2 Identifiers (Names)
An identifier is a sequence of letters and digits; the first
character must be alphabetic. The underscore ``_'' counts as
alphabetic. Upper and lower case letters are not distinguished.
There is no limit placed on the length of identifiers; all
characters of internal identifiers are significant. However, the
number of significant characters in external identifiers (i.e.,
function names and names of external variables) may be limited by
1
the operating system to as few as the first five characters.
This limitation on external identifiers can be circumvented, to
some extent, by using the token replacement facility (described
in section 12.1).
2.3 Keywords
The following identifiers are reserved for use as keywords, and
may not be used otherwise:
int break
char continue
float if
double else
long goto
short return
unsigned entry
struct for
auto do
extern while
register switch
static case
sizeof default
typedef
The entry keyword is not currently implemented by any compiler
but is reserved for future use.
2.3 Constants
There are several kinds of constants, as follows:
2.3.1 Integer constants
An integer constant is a sequence of digits. An integer is
taken to be octal if it begins with 0, hexadecimal if it begins
with 0x (or 0X), and decimal otherwise. The digits 8 and 9 have
octal value 10 and 11 respectively. An integer constant
_________________________
1
The UNIX C compiler distinguishes upper and lower case in all
identifiers and accepts keywords only in lower case. In
addition, the UNIX C compiler treats only the first eight
characters of internal identifiers and the first seven characters
of external identifiers as significant.
C Reference Manual - 3
2
immediately followed by l or L is a long integer constant.
2.3.2 Character constants
3
A character constant consists of a single ASCII character
enclosed in single quotes `` ' ''. Within a character constant a
single quote must be preceded by a back-slash ``\''. Certain
non-graphic characters, and ``\'' itself, may be escaped
4
according to the following table:
BS \b
NL \n
CR \r
HT \t
VT \v
FF \p
ddd \ddd
\ \\
The escape ``\ddd'' consists of the backslash followed by 1, 2,
or 3 octal digits which are taken to specify the value of the
desired character. A special case of this construction is ``\0''
(not followed by a digit) which indicates a null character.
Character constants behave exactly like integers whose value is
5
the corresponding ASCII code. They do not behave like objects
of character type.
2.3.3 Floating constants
A floating constant consists of an integer part, a decimal
point, a fraction part, an e (or E), and an optionally signed
integer exponent. The integer and fraction parts both consist of
a sequence of digits. Either the integer part or the fraction
part (not both) may be missing; either the decimal point or the e
and the exponent (not both) may be missing. Every floating
constant is taken to be double-precision.
_________________________
2
A long integer constant is equivalent to an integer constant
in the portable C compiler.
3
The UNIX C compiler allows 2 characters in character
constants. Other compilers may allow as many characters as can
be packed into a machine word. The order of packed characters in
a machine word is machine-dependent.
4
The UNIX C compiler does not recognize \v or \p.
5
On UNIX, character constants range in value from -128 to 127.
C Reference Manual - 4
2.4 Strings
A string is a sequence of characters surrounded by double
quotes `` " ''. A string has the type array-of-characters (see
below) and refers to an area of storage initialized with the
given characters. The compiler places a null byte ( \0 ) at the
end of each string so that programs which scan the string can
find its end. In a string, the character `` " '' must be
preceded by a ``\'' ; in addition, the same escapes as described
for character constants may be used.
String constants are constant, i.e., they may not be modified.
3. Syntax notation
In the syntax notation used in this manual, syntactic
categories are indicated by italic type, and literal words and
characters in gothic. Alternatives are listed on separate lines.
An optional terminal or non-terminal symbol is indicated by the
subscript ``opt,'' so that
{ expression }
opt
would indicate an optional expression in braces.
4. What's in a Name?
C bases the interpretation of an identifier upon two attributes
of the identifier: its storage class and its type. The storage
class determines the location and lifetime of the storage
associated with an identifier; the type determines the meaning of
the values found in the identifier's storage.
There are four declarable storage classes: automatic, static,
external, and register. Automatic variables are created upon
each invocation of the function in which they are defined, and
are discarded on return. Static variables are local to a
function or to a group of functions defined in one source file,
but retain their values independently of function invocations.
External variables are independent of any function and accessible
by separately-compiled functions. Register variables are stored
(if possible) in the fast registers of the machine; like
automatic variables they are local to each function and disappear
on return.
C supports four fundamental types of objects: characters,
integers, single-, and double-precision floating-point numbers.
Characters (declared, and hereinafter called, char) are
chosen from the ASCII set; they occupy the right-most seven
bits in a machine-dependent unit of storage called a byte.
Integers (int) are represented in 2's complement notation in
a machine-dependent unit of storage called a word. Integers
C Reference Manual - 5
1
should be at least 16 bits long.
The precision and range of single precision floating point
(float) quantities and double-precision floating-point
(double, or long float) quantities are machine-dependent.
Besides the four fundamental types there is a conceptually
infinite class of derived types constructed from the fundamental
types in the following ways:
arrays of objects of most types;
functions which return objects of a given type;
pointers to objects of a given type;
structures containing objects of various types.
In general these methods of constructing objects can be applied
recursively.
5. Objects and lvalues
An object is a manipulatable region of storage; an lvalue is an
expression referring to an object. An obvious example of an
lvalue expression is an identifier. There are operators which
yield lvalues: for example, if E is an expression of pointer
type, then *E is an lvalue expression referring to the object to
which E points. The name ``lvalue'' comes from the assignment
expression ``E1 = E2'' in which the left operand E1 must be an
lvalue expression. The discussion of each operator below
indicates whether it expects lvalue operands and whether it
yields an lvalue.
6. Conversions
A number of operators may, depending on their operands, cause
conversion of the value of an operand from one type to another.
This section explains the result to be expected from such
conversions.
6.1 Characters and integers
A char object may be used anywhere an int may be. In all cases
the char is converted to an int by extending the character value
2
with high-order zero bits.
_________________________
1
The UNIX C compiler implements a longer variety of integer
(declared as long or long int) and unsigned integers (declared as
unsigned or unsigned int), for which most int operations are
applicable. The portable C compiler treats long int, short int,
and unsigned int as synonymous with int.
2
On the PDP-11, a character is converted to an integer by
propagating its sign through the upper 8 bits of the resultant
integer. Thus, it is possible to have (non-ASCII) characters
with negative values.
C Reference Manual - 6
6.2 Float and double
All floating arithmetic in C is carried out in
double-precision. Whenever a float appears in an expression, it
is lengthened to double by zero-padding its fraction. When a
double must be converted to float, for example by an assignment,
the double is rounded before truncation to float length.
6.3 Float and double; integer and character
Ints and chars may be converted to float or double; truncation
may occur for some values. Conversion of float or double to int
1
or char takes place with rounding. Again, erroneous results are
possible for some values.
6.4 Pointers and integers
Integers may be added to pointers; in such cases the int is
converted as specified in the discussion of the addition
operator.
Two pointers to objects of the same type may be subtracted; in
this case the result is converted to an integer as specified in
the discussion of the subtraction operator.
7. Expressions
The precedence of expression operators is the same as the order
of the major subsections of this section (highest precedence
first). Thus the expressions referred to as the operands of +
(section 7.4) are those expressions defined in sections 7.1-7.3.
Within each subsection, the operators have the same precedence.
Left- or right-associativity is specified in each subsection for
the operators discussed therein. The precedence and
associativity of all the expression operators is summarized in an
appendix.
Unless otherwise noted, the order of evaluation of expressions
is undefined. In particular the compiler considers itself free
to compute subexpressions in the order it believes most
efficient, even if the subexpressions involve side effects.
7.1 Primary expressions
Primary expressions involving . , ->, subscripting, and
function calls group left to right.
7.1.1 identifier
An identifier is a primary expression, provided it has been
suitably declared as discussed below. Its type is specified by
its declaration. However, if the type of the identifier is
``array of . . .'', then the value of the identifier-expression
is a pointer to the first object in the array, and the type of
the expression is ``pointer to . . .''. Moreover, an array
identifier is not an lvalue expression.
Likewise, an identifier which is declared ``function returning
_________________________
1
On UNIX, this conversion involves truncation towards 0.
C Reference Manual - 7
. . .'', when used except in the function-name position of a
call, is converted to ``pointer to function returning . . .''.
7.1.2 constant
A decimal, octal, character, or floating constant is a primary
expression. Its type is int in the first three cases, double in
the last.
7.1.3 string
A string is a primary expression. Its type is originally
``array of char''; but following the same rule as in section
7.1.1 for identifiers, this is modified to ``pointer to char''
and the result is a pointer to the first character in the string.
7.1.4 ( expression )
A parenthesized expression is a primary expression whose type
and value are identical to those of the unadorned expression.
The presence of parentheses does not affect whether the
expression is an lvalue.
7.1.5 primary-expression [ expression ]
A primary expression followed by an expression in square
brackets is a primary expression. The intuitive meaning is that
of a subscript. Usually, the primary expression has type
``pointer to . . .'', the subscript expression is int, and the
type of the result is `` . . . ''. The expression ``E1[E2]'' is
identical (by definition) to ``* (( E1 ) + ( E2 )) ''. All the
clues needed to understand this notation are contained in this
section together with the discussions in sections 7.1.1, 7.2.1,
and 7.4.1 on identifiers, *, and + respectively; section 14.3
below summarizes the implications.
7.1.6 primary-expression ( expression-list )
opt
A function call is a primary expression followed by parentheses
containing a possibly empty, comma-separated list of expressions
which constitute the actual arguments to the function. The
primary expression must be of type ``function returning . . .'',
and the result of the function call is of type `` . . . ''. As
indicated below, a hitherto unseen identifier followed
immediately by a left parenthesis is contextually declared to
represent a function returning an integer; thus in the most
common case, integer-valued functions need not be declared.
Any actual arguments of type float are converted to double
before the call; any of type char are converted to int.
In preparing for the call to a function, a copy is made of each
actual parameter; thus, all argument-passing in C is strictly by
value. A function may change the values of its formal
parameters, but these changes cannot possibly affect the values
of the actual parameters. On the other hand, it is perfectly
possible to pass a pointer on the understanding that the function
may change the value of the object to which the pointer points.
Note that the order of evaluation of function arguments is not
defined.
Recursive calls to any function are permissible.
C Reference Manual - 8
7.1.7 primary-lvalue . member-of-structure
An lvalue expression followed by a dot followed by the name of
a member of a structure is a primary expression. The object
referred to by the lvalue must be of a structure type, and the
1
member-of-structure must be a member of that structure. The
result of the expression is an lvalue appropriately offset from
the origin of the given lvalue whose type is that of the named
structure member.
Structures are discussed in section 8.5.
7.1.8 primary-expression -> member-of-structure
The primary-expression must be a pointer to a structure and the
2
member-of-structure must be a member of that structure type.
The result is an lvalue appropriately offset from the origin of
the pointed-to structure whose type is that of the named
structure member.
The expression ``E1->MOS'' is exactly equivalent to
``(*E1).MOS''.
7.2 Unary operators
Expressions with unary operators group right-to-left.
7.2.1 * expression
The unary * operator means indirection: the expression must be
a pointer, and the result is an lvalue referring to the object to
which the expression points. If the type of the expression is
``pointer to . . .'', the type of the result is `` . . . ''.
7.2.2 & lvalue-expression
The result of the unary & operator is a pointer to the object
referred to by the lvalue-expression. If the type of the
lvalue-expression is `` . . . '', the type of the result is
``pointer to . . .''.
7.2.3 - expression
The result is the negative of the expression. The type of the
expression must be char, int, float, or double. The type of the
3
result is int or double.
_________________________
1
The UNIX C compiler allows any primary-lvalue and assumes it
to have the same form as the structure containing the named
structure member.
2
The UNIX C compiler allows any primary-lvalue and assumes it
to be a pointer which points to an object of the same form as the
structure of which the member-of-structure is a part.
3
The UNIX C compiler defines the type of the result to be the
same as the type of the operand.
C Reference Manual - 9
7.2.4 ! expression
The result of the logical negation operator ! is 1 if the value
of the expression is zero, 0 if the value of the expression is
non-zero. The type of the result is int. The allowable
1
expressions are those allowed by the if statement (section 9.3).
7.2.5 ~ expression
The ~ operator yields the one's complement of its operand. The
type of the expression must be int or char, and the result is
int.
7.2.6 ++ lvalue-expression
The object referred to by the lvalue expression is incremented.
The value is the new value of the lvalue expression and the type
is the type of the lvalue. If the expression is of a fundamental
2
type, it is incremented by 1; if it is a pointer to an object,
it is incremented by the length of the object.
7.2.7 -- lvalue-expression
The object referred to by the lvalue expression is decremented
analogously to the ++ operator.
7.2.8 lvalue-expression ++
The result is the value of the object referred to by the lvalue
expression. After the result is noted, the object referred to by
the lvalue is incremented in the same manner as for the prefix ++
3
operator: by 1 for an object of fundamental type, by the length
of the pointed-to object for a pointer. The type of the result
is the same as the type of the lvalue-expression.
7.2.9 lvalue-expression --
The result of the expression is the value of the object
referred to by the the lvalue expression. After the result is
noted, the object referred to by the lvalue expression is
decremented in a way analogous to the postfix ++ operator.
7.2.10 sizeof expression
The sizeof operator yields the size, in bytes, of its operand.
When applied to an array, the result is the total number of bytes
in the array. The size is determined from the declarations of
the objects in the expression. The major use of sizeof is in
_________________________
1
The UNIX C compiler does not allow float or double operands.
2
The portable C compiler does not allow float or double
operands.
3
The portable C compiler does not allow float or double
operands.
C Reference Manual - 10
communication with routines like storage allocators and I/O
1
systems.
7.3 Multiplicative operators
The multiplicative operators *, /, and % group left-to-right.
7.3.1 expression * expression
The binary * operator indicates multiplication. If both
operands are int or char, the result is int; if one is int or
char and one float or double, the former is converted to double,
and the result is double; if both are float or double, the result
is double. No other combinations are allowed.
7.3.2 expression / expression
The binary / operator indicates division. The same type
considerations as for multiplication apply.
7.3.3 expression % expression
The binary % operator yields the remainder from the division of
the first expression by the second. Both operands must be int or
char, and the result is int. The use of this operation is not
recommended for negative operands.
7.4 Additive operators
The additive operators + and - group left-to-right.
7.4.1 expression + expression
The result is the sum of the expressions. If both operands are
int or char, the result is int. If both are float or double, the
result is double. If one is char or int and one is float or
double, the former is converted to double and the result is
double. If an int or char is added to a pointer, the former is
converted by multiplying it by the length of the object to which
the pointer points and the result is a pointer of the same type
as the original pointer. Thus if P is a pointer to an object,
the expression ``P+1'' is a pointer to another object of the same
type as the first and immediately following it in storage.
No other type combinations are allowed.
7.4.2 expression - expression
The result is the difference of the operands. If both operands
are int, char, float, or double, the same type considerations as
for + apply. If an int or char is subtracted from a pointer, the
former is converted in the same way as explained under + above.
If two pointers to objects of the same type are subtracted, the
result is converted (by division by the length of the object) to
an int representing the number of objects separating the
pointed-to objects. This conversion will in general give
_________________________
1
The UNIX C compiler allows this expression anywhere that a
constant is required.
C Reference Manual - 11
unexpected results unless the pointers point to objects in the
same array, since pointers, even to objects of the same type, do
not necessarily differ by a multiple of the object-length.
7.5 Shift operators
The shift operators << and >> group left-to-right.
7.5.1 expression << expression
7.5.2 expression >> expression
Both operands must be int or char, and the result is int. The
second operand should be non-negative. The value of ``E1<<E2''
is E1 (interpreted as a bit pattern) left-shifted E2 bits;
vacated bits are 0-filled. The value of ``E1>>E2'' is E1
(interpreted as a bit pattern) logically right-shifted E2 bit
1
positions. Vacated bits are filled by 0 bits.
7.6 Relational operators
The relational operators group left-to-right, but this fact is
not very useful; ``a<b<c'' does not mean what it seems.
7.6.1 expression < expression
7.6.2 expression > expression
7.6.3 expression <= expression
7.6.4 expression >= expression
The operators < (less than), > (greater than), <= (less than or
equal to) and >= (greater than or equal to) all yield 0 if the
specified relation is false and 1 if it is true. For non-pointer
operands, operand conversion is exactly the same as for the +
operator. In addition, pointers of any kind can to be compared.
The result in this case depends on the relative locations in
2
storage of the pointed-to objects.
7.7 Equality operators
7.7.1 expression == expression
7.7.2 expression != expression
The == (equal to) and the != (not equal to) operators are
exactly analogous to the relational operators except for their
lower precedence. (Thus ``a<b == c<d'' is 1 whenever a<b and c<d
have the same truth-value). In addition, pointers may be tested
3
for equality or inequality with zero.
_________________________
1
On UNIX, arithmetic shifting is used, where the vacated bits
are filled with a copy of the sign bit.
2
The UNIX C compiler also allows pointers to be compared to
ints and chars; however, the int or char is first multiplied by
the length of the pointed-to object.
3
As with the relational operators, the UNIX C compiler allows
comparison between pointers and arbitrary integers and
characters.
C Reference Manual - 12
7.8 expression & expression
The & operator groups left-to-right. Both operands must be int
or char; the result is an int which is the bit-wise logical and
function of the operands.
7.9 Inclusive and exclusive OR
7.9.1 expression | expression
7.9.2 expression ^ expression
The | and ^ operators group left-to-right. The operands must
be int or char; the result is an int which is the bit-wise
inclusive (|) or exclusive (^) or function of its operands.
7.10 expression && expression
The && operator returns 1 if both its operands are non-zero, 0
otherwise. Unlike &, && guarantees left-to-right evaluation;
moreover the second operand is not evaluated if the first operand
is 0. The allowable expressions are those allowed by the if
statement (section 9.3).
7.11 expression || expression
The || operator returns 1 if either of its operands is
non-zero, and 0 otherwise. Unlike |, || guarantees left-to-right
evaluation; moreover, the second operand is not evaluated if the
value of the first operand is non-zero. The allowable
expressions are those allowed by the if statement (section 9.3).
7.12 expression ? expression : expression
Conditional expressions group left-to-right. The first
expression is evaluated (as by the if statement), and if it is
non-zero, the result is the value of the second expression,
otherwise that of third expression. If the types of the second
and third operand are the same, the result has their common type;
otherwise the same conversion rules as for + apply. Only one of
the second and third expressions is evaluated.
7.13 Assignment operators
There are a number of assignment operators, all of which group
right-to-left. All require an lvalue as their left operand, and
the type of an assignment expression is that of its left operand.
The value is the value stored in the left operand after the
assignment has taken place.
7.13.1 lvalue = expression
The value of the expression replaces that of the object
referred to by the lvalue. The operands need not have the same
type, but both must be int, char, float, double, or pointer. The
expression on the right is converted to the type of the lvalue if
necessary. The conversion from a pointer type to int is simply
to copy the value; thus, it is required that ints be of
sufficient length to hold any legitimate pointer value. The
conversion from int to pointer is defined so that the expression
"p = i = p" (where p is a pointer and i is an int) leaves p with
the same pointer value. The reverse, "i = p = i," is not
C Reference Manual - 13
1
guaranteed to preserve the integer value of i. Some integers
and pointers may convert to pointers which will cause addressing
exceptions when used.
7.13.2 lvalue += expression
lvalue =+ expression
7.13.3 lvalue -= expression
lvalue =- expression
7.13.4 lvalue *= expression
lvalue =* expression
7.13.5 lvalue /= expression
lvalue =/ expression
7.13.6 lvalue %= expression
lvalue =% expression
7.13.7 lvalue >>= expression
lvalue =>> expression
7.13.8 lvalue <<= expression
lvalue =<< expression
7.13.9 lvalue &= expression
lvalue =& expression
7.13.10 lvalue ^= expression
lvalue =^ expression
7.13.11 lvalue |= expression
lvalue =| expression
The behavior of an expression of the form ``E1 op= E2'' or
``E1 =op E2'' may be inferred by taking it as equivalent to
``E1 = E1 op E2''; however, E1 is evaluated only once.
Moreover, expressions like ``i += p'' in which a pointer is
added to an integer, are forbidden. The "op=" form is preferred
over the "=op" form, because it eliminates ambiguities possible
in expressions such as ``x=-1''.
7.14 expression , expression
A pair of expressions separated by a comma is evaluated
left-to-right and the value of the left expression is discarded.
The type and value of the result are the type and value of the
right operand. This operator groups left-to-right. It should be
avoided in situations where comma is given a special meaning, for
example in actual arguments to function calls (section 7.1.6) and
lists of initializers (section 10.2).
8. Declarations
Declarations are used within function definitions to specify
the interpretation which C gives to each identifier; they do not
necessarily reserve storage associated with the identifier.
Declarations have the forms
_________________________
1
On UNIX, no conversion is necessary among different pointer
types and integers. Thus, the value of i in this example would
be preserved.
C Reference Manual - 14
declaration:
decl-specifiers init-declarator-list ;
type-specifier ;
The declarators in the init-declarator-list contain the
identifiers being declared. The decl-specifiers consist of at
most one type-specifier and at most one storage class specifier.
decl-specifiers:
type-specifier
sc-specifier
type-specifier sc-specifier
sc-specifier type-specifier
The second form of declaration is used to define structures
(section 8.5).
8.1 Storage class specifiers
The sc-specifiers are:
sc-specifier:
auto
static
extern
register
The auto, static, and register declarations also serve as
definitions in that they cause an appropriate amount of storage
to be reserved. In the extern case there must be an external
definition (see below) for the given identifiers somewhere
outside the function in which they are declared.
Identifiers declared to be of class register may not be used as
the operand of the address-of operator &. In addition, each
implementation will have its own restrictions on the number and
types of register identifiers which can be supported in any
function. When these restrictions are violated, the offending
1
identifiers are treated as auto.
If the sc-specifier is missing from a declaration, it is
generally taken to be auto.
8.2 Type specifiers
The type-specifiers are
_________________________
1
The portable C compiler treats register as synonymous with
auto.
C Reference Manual - 15
type-specifier:
int
char
float
double
long
long int
short
short int
unsigned
unsigned int
long float
struct { type-decl-list }
struct identifier { type-decl-list }
struct identifier
The struct specifier is discussed in section 8.5. If the
type-specifier is missing from a declaration, it is generally
1
taken to be int.
8.3 Declarators
The init-declarator-list appearing in a declaration is a
comma-separated sequence of declarators, each of which may be
followed by an initializer for the declarator (initialization is
discussed in section 10.3).
init-declarator-list:
init-declarator
init-declarator , init-declarator-list
init-declarator:
declarator initializer
opt
The specifiers in the declaration indicate the type and storage
class of the objects to which the declarators refer. Declarators
have the syntax:
declarator:
identifier
* declarator
declarator ( )
declarator [ constant-expression ]
opt
( declarator )
The grouping in this definition is the same as in expressions.
_________________________
1
The UNIX C compiler implements a facility whereby identifiers
can be equated to types. Such identifiers can be used as
type-specifiers. The portable C compiler only partially
implements this facility.
C Reference Manual - 16
8.4 Meaning of declarators
Each declarator is taken to be an assertion that when a
construction of the same form as the declarator appears in an
expression, it yields an object of the indicated type and storage
class. Each declarator contains exactly one identifier; it is
this identifier that is declared.
If an unadorned identifier appears as a declarator, then it has
the type indicated by the specifier heading the declaration.
If a declarator has the form
* D
for D a declarator, then the contained identifier has the type
``. . . pointer to X'', where `` . . . X'' is the type which the
identifier would have had if the declarator had been simply D.
If a declarator has the form
D ( )
then the contained identifier has the type ``. . . function
returning X'', where `` . . . X'' is the type which the
identifier would have had if the declarator had been simply D.
A declarator may have the form
D[constant-expression]
or
D[ ]
In the first case the constant expression is an expression whose
value is determinable at compile time, and whose type is int. in
the second the constant 1 is used. (Constant expressions are
defined precisely in section 15.) Such a declarator makes the
contained identifier have type ``. . . array of X'', where
`` . . . X'' is the type which the identifier would have had if
the declarator had been simply D. The constant specifies the
number of elements in the array.
An array may be constructed from one of the basic types, from a
pointer, from a structure, or from another array (to generate a
multi-dimensional array).
Finally, parentheses in declarators do not alter the type of
the contained identifier except insofar as they alter the binding
of the components of the declarator.
Not all the possibilities allowed by the syntax above are
actually permitted. The restrictions are as follows: functions
may not return arrays, structures or functions, although they may
return pointers to such things; there are no arrays of functions,
although there may be arrays of pointers to functions. Likewise
a structure may not contain a function, but it may contain a
pointer to a function.
As an example, the declaration
int i, *ip, f(), *fip(), (*pfi)();
declares an integer i, a pointer ip to an integer, a function f
returning an integer, a function fip returning a pointer to an
C Reference Manual - 17
integer, and a pointer pfi to a function which returns an
integer. Also
float fa[17], *afp[17];
declares an array of float numbers and an array of pointers to
float numbers. Finally,
static int x3d[3][5][7];
declares a static three-dimensional array of integers, with rank
3x5x7. In complete detail, x3d is an array of three items: each
item is an array of five arrays; each of the latter arrays is an
array of seven integers. Any of the expressions ``x3d'',
``x3d [ i ]'', ``x3d [ i ] [ j ]'', ``x3d [ i ] [ j ] [ k ]'' may
reasonably appear in an expression. The first three have type
``array'', the last has type int.
8.5 Structure declarations
Recall that one of the forms for a structure specifier is
struct { type-decl-list }
The type-decl-list is a sequence of type declarations for the
members of the structure:
type-decl-list:
type-declaration
type-declaration type-decl-list
A type declaration is just a declaration which does not mention a
storage class (the storage class ``member of structure'' here
being understood by context) or include an initializer.
type-declaration:
type-specifier declarator-list ;
Within the structure, the objects declared have addresses which
increase as their declarations are read left-to-right. Each
component of a structure begins on an addressing boundary
appropriate to its type. Therefore, there may be unnamed holes
1
in a structure.
Another form of structure specifier is
struct identifier { type-decl-list }
This form is the same as the one just discussed, except that the
identifier is remembered as the structure tag of the structure
_________________________
1
The UNIX C compiler forces all structures to have an even
length in bytes and be aligned on word boundaries.
C Reference Manual - 18
specified by the list. A declaration may then be given using the
structure tag but without the list, as in the third form of
structure specifier:
struct identifier
Structure tags allow definition of self-referential and
mutually-recursive structures (forward references to structure
type names must be within the same group of definitions and be a
pointed-to or returned type); they also permit the long part of
the declaration to be given once and used several times. It is
however absurd to declare a structure which contains an instance
of itself, as distinct from a pointer to an instance of itself.
A simple example of a structure declaration, taken from section
16.2 where its use is illustrated more fully, is
struct tnode {
char tword[20];
int count;
struct tnode *left;
struct tnode *right;
};
which contains an array of 20 characters, an integer, and two
pointers to similar structures. Once this declaration has been
given, the following declaration makes sense:
struct tnode s, *sp;
which declares s to be a structure of the given sort and sp to be
a pointer to a structure of the given sort.
The names of structure members and structure tags may be the
same as ordinary variables, since a distinction can be made by
context. All of the members of a structure must have unique
names. However, a single member name may be used in many
1
structure definitions.
9. Statements
Except as indicated, statements are executed in sequence.
_________________________
1
The UNIX C compiler requires that the names of tags and
members be distinct. In addition, the same member name is
allowed to appear in different structures only if the two members
are of the same type and if their origin with respect to their
structure is the same. Thus, separate structures can share a
common initial segment.
C Reference Manual - 19
9.1 Expression statement
Most statements are expression statements, which have the form
expression ;
Usually expression statements are assignments or function calls.
9.2 Compound statement
So that several statements can be used where one is expected,
or local variables defined, the compound statement is provided:
compound-statement:
{ declaration-list statement-list }
opt
declaration-list:
declaration
declaration declaration-list
statement-list:
statement
statement statement-list
9.3 Conditional statement
The two forms of the conditional statement are
if ( expression ) statement
if ( expression ) statement else statement
In both cases the expression is evaluated and if it is non-zero,
the first substatement is executed. In the second case the
second substatement is executed if the expression is zero. As
usual the ``else'' ambiguity is resolved by connecting an else
with the last encountered elseless if.
The expression may be of any fundamental type or a pointer.
The comparison with zero is done in a manner appropriate for the
type of the expression.
9.4 While statement
The while statement has the form
while ( expression ) statement
The substatement is executed repeatedly so long as the value of
the expression remains non-zero. The test takes place before
each execution of the statement, and is the same as that
performed by the if statement.
9.5 Do statement
The do statement has the form
do statement while ( expression ) ;
The substatement is executed repeatedly until the value of the
expression becomes zero. The test takes place after each
execution of the statement, and is the same as that performed by
C Reference Manual - 20
the if statement.
9.6 For statement
The for statement has the form
for ( expression-1 ; expression-2 ; expression-3 ) statement
opt opt opt
This statement is equivalent to
expression-1;
while ( expression-2 ) {
statement
expression-3 ;
}
Thus the first expression specifies initialization for the loop;
the second specifies a test, made before each iteration, such
that the loop is exited when the expression becomes zero; the
third expression typically specifies an incrementation which is
performed after each iteration.
Any or all of the expressions may be dropped. A missing
expression-2 makes the implied while clause equivalent to ``while
( 1 )''; other missing expressions are simply dropped from the
expansion above.
9.7 Switch statement
The switch statement causes control to be transferred to one of
several statements depending on the value of an expression. It
has the form
switch ( expression ) statement
The expression must be int or char. The statement is typically
compound. Each statement within the statement may be labelled
with case prefixes as follows:
case constant-expression :
where the constant expression must be int or char. No two of the
case constants in a switch may have the same value. Constant
expressions are precisely defined in section 15.
There may also be at most one statement prefix of the form
default :
When the switch statement is executed, its expression is
evaluated and compared with each case constant in an undefined
order. If one of the case constants is equal to the value of the
expression, control is passed to the statement following the
matched case prefix. If no case constant matches the expression,
and if there is a default prefix, control passes to the prefixed
statement. In the absence of a default prefix none of the
statements in the switch is executed.
Case or default prefixes in themselves do not alter the flow of
control.
C Reference Manual - 21
9.8 Break statement
The statement
break ;
causes termination of the smallest enclosing while, do, for, or
switch statement; control passes to the statement following the
terminated statement.
9.9 Continue statement
The statement
continue ;
causes control to pass to the loop-continuation portion of the
smallest enclosing while, do, or for statement; that is to the
end of the loop. More precisely, in each of the statements
while ( ... ) { do { for ( ... ) {
. . . . . . . . .
contin: ; contin: ; contin: ;
} } while ( ... ); }
a continue is equivalent to ``goto contin''.
9.10 Return statement
A function returns to its caller by means of the return
statement, which has one of the forms
return ;
return ( expression ) ;
In the first case no value is returned. In the second case, the
value of the expression is returned to the caller of the
function. If required, the expression is converted, as if by
assignment, to the type of the function in which it appears.
Flowing off the end of a function is equivalent to a return with
no returned value.
9.11 Goto statement
Control may be transferred unconditionally by means of the
statement
goto expression ;
The expression should be a label (sections 9.12, 14.4) or an
expression of type ``pointer to int'' which evaluates to a label.
It is illegal to transfer to a label not located in the current
function unless some extra-language provision has been made to
adjust the stack correctly.
C Reference Manual - 22
9.12 Labelled statement
Any statement may be preceded by label prefixes of the form
identifier :
which serve to declare the identifier as a label. More details
on the semantics of labels are given in section 14.4 below.
9.13 Null statement
The null statement has the form
;
A null statement is useful to carry a label just before the ``}''
of a compound statement or to supply a null body to a looping
statement such as while.
10. Function definitions and global declarations
A C program consists of a sequence of function definitions and
global declarations. Global declarations may be given for simple
variables and for arrays. They are used to declare and/or
reserve storage for objects.
10.1 Function definitions
Function definitions have the form
function-definition:
type-specifier function-declarator function-body
opt
A function declarator is similar to a declarator for a ``function
returning ...'' except that it lists the formal parameters of the
function in the parentheses which must follow the function name.
Some examples of function-declarators are:
f(a, b) returns int
*f(a) returns pointer to int
(*f(a))() returns pointer to function returning int
The function-body has the form
function-body:
type-decl-list function-statement
opt
The purpose of the type-decl-list is to give the types of the
formal parameters. No other identifiers should be declared in
this list, and formal parameters should be declared only here.
Formal parameters may be declared as being of class register.
The function-statement is just a compound statement.
function-statement:
compound-statement
A simple example of a complete function definition is
C Reference Manual - 23
int max (a, b, c)
int a, b, c;
{int m;
m = (a > b) ? a : b;
return (m > c ? m : c);
}
Here ``int'' is the type-specifier; ``max(a, b, c)'' is the
function-declarator; ``int a, b, c;'' is the type-decl-list for
the formal parameters; ``{ . . . }'' is the function-statement.
C converts all float actual parameters to double, so formal
parameters declared float have their declaration adjusted to read
double. Correspondingly, char parameters are adjusted to read
int. Also, since a reference to an array in any context (in
particular as an actual parameter) is taken to mean a pointer to
the first element of the array, declarations of formal parameters
declared ``array of ...'' are adjusted to read ``pointer to
...''. Finally, because neither structures nor functions can be
passed to a function, it is useless to declare a formal parameter
to be a structure or function (pointers to structures or
functions are of course permitted).
A free return statement is supplied at the end of each function
definition, so running off the end causes control, but no value,
to be returned to the caller.
10.2 Global declarations
A global declaration has the same form as a declaration within
a function (section 8), except that the sc-specifiers auto and
register may not be used.
Global declarations with sc-specifiers extern or static are
like similar declarations within functions, except that the
identifiers so declared are accessible throughout the remainder
of the source file. A global static declaration reserves storage
which is retained throughout the execution of a program. A
global extern declaration declares that the associated
identifiers have been externally defined, but is not itself such
a definition.
A global declaration without an sc-specifier is an external
definition. It reserves storage for the identifiers and allows
them to be accessed by separately-compiled functions which
contain appropriate extern declarations for the identifiers. It
is an error to have more than one external definition of an
1
identifier in a C program. Functions appearing in an external
data definition are declared as extern.
_________________________
1
The UNIX C compiler treats external data definitions and
global extern declarations as equivalent. More than one external
definition of an identifier is allowed, so long as at most one
includes initialization.
C Reference Manual - 24
10.3 Initialization
Explicit initialization is permitted in declarations which
reserve storage, namely register, auto, and static declarations,
and external definitions. Automatic structures and arrays may
1
not be initialized. The initial value of static and extern
identifiers not explicitly initialized is zero. The initial
value of register and auto identifiers not explicitly initialized
is undefined.
An initializer represents the initial value for the
corresponding object being defined (and declared).
initializer:
constant
{ constant-expression-list }
constant-expression-list:
constant-expression
constant-expression , constant-expression-list
Thus an initializer consists of a constant-valued expression, or
comma-separated list of expressions, inside braces. The braces
may be dropped when the expression is just a plain constant. The
exact meaning of a constant expression is discussed in section
15. The expression list is used to initialize arrays and
structures; see below.
The type of the identifier being defined should be compatible
with the type of the initializer: a double constant may
initialize a float or double identifier; a non-floating-point
expression may initialize an int, char, or pointer.
An initializer for an array may contain a comma-separated list
of compile-time expressions. The length of the array is taken to
be the maximum of the number of expressions in the list and the
square-bracketed constant in the array's declarator. This
constant may be missing, in which case 1 is used. The
expressions initialize successive members of the array starting
at the origin (subscript 0) of the array. The acceptable
expressions for an array of type ``array of ...'' are the same as
2
those for type ``...''.
Structures can be initialized, but this operation is
incompletely implemented and machine-dependent. Basically the
structure is regarded as a sequence of words and the initializers
_________________________
1
The portable C compiler does not support initialization of
register or auto identifiers.
2
The UNIX C compiler also allows, as a special case, a single
string to be given as the initializer for an array of chars; in
this case, the characters in the string are taken as the
initializing values.
C Reference Manual - 25
are placed into those words. Structure initialization, using a
comma-separated list in braces, is safe if all the members of the
1
structure are integers or pointers but is otherwise ill-advised.
11. Scope rules
A complete C program need not all be compiled at the same time:
the source text of the program may be kept in several files, and
precompiled routines may be loaded from libraries. Communication
among the functions of a program may be carried out both through
explicit calls and through manipulation of external data.
Therefore, there are two kinds of scope to consider: first,
what may be called the lexical scope of an identifier, which is
essentially the region of a program during which it may be used
without drawing ``undefined identifier'' diagnostics; and second,
the scope associated with external identifiers, which is
characterized by the rule that references to the same external
identifier are references to the same object.
11.1 Lexical scope
C supports block-structure only within function definitions
(i.e., function definitions may not be nested, but any compound
statement can define variables local to that statement). The
lexical scope of names declared in external definitions extends
from their definition through the end of the file in which they
appear. The same is true for implicit or explicit external
declarations inside of function definitions. The lexical scope
of formal parameters is the body of the function. The lexical
scope of non-external names declared at the head of compound
statements extends from their definition through the end of the
compound statement. The only allowed forward reference to a
label is as the expression in a goto statement.
It is an error to redeclare an identifier already declared in
the current context, except for a consistent set consisting of
any number of external declarations plus at most one external
definition for an identifier.
11.2 Scope of externals
If a function declares an identifier to be extern, then
somewhere among the files or libraries constituting the complete
program there must be an external definition for the identifier.
All functions in a given program which refer to the same external
identifier refer to the same object, so care must be taken that
the type and extent specified in the definition are compatible
with those specified by each function which references the data.
In a multi-file program, an external definition for an external
identifier must appear in exactly one of the files. Any other
files which wish to use the identifier must contain a
_________________________
1
The UNIX C compiler implements initialization of arbitrary
structures, and allows nested bracketed sequences of initializers
for aggregates.
C Reference Manual - 26
corresponding extern declaration of the identifier. The
identifier can be initialized only in the file where storage is
allocated.
12. Compiler control lines
When a line of a C program begins with the character #, it is
interpreted as a special directive to the compiler. Such
compiler control lines may appear anywhere in the source file,
1
except within comments and constants. The names of compiler
control lines are not reserved; they are recognized by context.
12.1 Token replacement
A compiler-control line of the form
# define identifier token-string
(note: no trailing semicolon) causes the compiler to replace
subsequent instances of the identifier with the given string of
tokens. When processing the # define line, token replacement is
performed on the token string, but not on the identifier. When
token replacement occurs, the inserted token string is not
2
subject to further token replacement. The names of compiler
control lines are not subject to token replacement, nor are
compiler control line arguments specified as identifiers.
This facility is most valuable for definition of ``manifest
constants'', as in
# define tabsize 100
. . .
int table[tabsize];
Macros may be defined by immediately following the identifier
with a parenthesized list of formal parameters (see also section
12.3).
_________________________
1
In order to use compiler control lines with the UNIX C
compiler, it is required that the first line of the source file
begin with #.
2
Unfortunately, the UNIX C compiler uses a different method of
token replacement with different semantics. Token replacement is
not performed on the token-string when processing a # define
line. However, when the token-string is inserted, it is subject
to token replacement.
C Reference Manual - 27
12.2 File inclusion
In multi-file C programs, it is necessary to have extern
declarations for any external identifier used in files other than
the one in which it is defined. Rather than repeat tedious and
error-prone declarations for each external identifier in each
file, one can create a separate file containing these
declarations and cause it to be dynamically inserted into each
source file.
A compiler control line of the form
# include "filename"
results in the replacement of that line by the entire contents of
1
the file filename. Included files may include other files.
This technique is also useful for manifest constants and
structure definitions.
2
12.3 Macros
The C macro facility allows token replacement strings to be
parameterized. A macro is defined by lines of the form
# macro identifier ( parameter-list )
opt
token-string
# end
The parameter list is a comma-separated list of identifiers,
which are the formal parameters of the macro. The token-string,
which may be given on zero or more input lines, may contain
occurrences of the formal parameter names. When substitution is
performed, these occurrences will be replaced by the
corresponding actual parameters, which are strings of tokens.
The format of a macro ``invocation'' is the same as for
function calls. Thus, the macro facility can be used to write
small ``functions'' (without local variables) which will produce
in-line code. However, one must be careful in that macro
parameters are essentially call by name, whereas function
parameters are call by value. In addition, it is a good idea to
enclose within parentheses all occurrences of formal parameters
in macro definitions, in order to avoid precedence problems after
substitution of actual parameters.
_________________________
1
The Unix C compiler also allows the filename to be enclosed in
angle brackets instead of quotation marks. Such a filename is
interpreted relative to a system standard include-file directory.
2
This facility is not supported by the UNIX C compiler.
C Reference Manual - 28
12.4 Compile-time conditionals
Conditional compilation of source text is provided by the forms
# ifdef identifier # ifndef identifier
... ...
# endif # endif
These forms cause the text enclosed by the compiler control lines
to be included in the compilation only if the given identifier
has ( ifdef ) or does not have ( ifndef ) a lexical definition.
An identifier is given a lexical definition by # define and
# macro. Compile-time conditionals may be nested.
1
12.5 Undefine
The undefine compiler control line has the form
# undefine identifier
It removes any lexical definition of the identifier established
by a previous # define or # macro. The identifier will
henceforth not be subject to any form of token replacement. When
used with a keyword, # undefine causes the reserved identifier to
lose its built-in meaning and become an ordinary identifier.
2
12.6 Renamed identifiers
In writing some system support software, it is often desirable
to use names for functions and external data which are not
subject to accidental conflict with user-chosen names. This
ability is provided by the rename compiler control line, which
has the form
# rename identifier string
The specified identifier will replaced by the given character
string when it appears in the output of the compiler.
13. Implicit declarations
It is not always necessary to specify both the storage class
and the type of identifiers in a declaration. Sometimes the
storage class is supplied by the context: in external
definitions, and in declarations of formal parameters and
structure members. In a declaration inside a function, if a
storage class but no type is given, the identifier is assumed to
be int; if a type but no storage class is indicated, the
identifier is assumed to be auto. An exception to the latter
rule is made for functions, since auto functions are meaningless
(C being incapable of compiling code into the stack). If the
type of an identifier is ``function returning ...'', it is
_________________________
1
This facility is not supported by the UNIX C compiler.
2
This facility is not supported by the UNIX C compiler.
C Reference Manual - 29
implicitly declared to be extern.
In an expression, an identifier followed by ( and not otherwise
declared is contextually declared to be ``function returning
int''. As an initializer, an otherwise undeclared identifier is
1
contextually declared to be ``function returning int''.
For some purposes it is best to consider formal parameters as
belonging to their own storage class. In practice, C treats
parameters as if they were automatic (except that, as mentioned
above, formal parameter arrays, chars, and floats are treated
specially).
14. Types revisited
This section summarizes the operations which can be performed
on objects of certain types.
14.1 Structures
There are only two things that can be done with a structure:
pick out one of its members (by means of the `` . '' or `` -> ''
operators); or take its address (by unary `` & ''). Other
operations, such as assigning from or to it or passing it as a
parameter, draw an error message. In the future, it is expected
that these operations, but not necessarily others, will be
allowed.
14.2 Functions
There are only two things that can be done with a function:
call it, or take its address. If the name of a function appears
in an expression not in the function-name position of a call, a
pointer to the function is generated. Thus, to pass one function
to another, one might say
int f();
...
g (f);
Then the definition of g might read
g (funcp)
int (*funcp)();
{. . .
(*funcp)();
. . .
}
Notice that f was declared explicitly in the calling routine
since its first appearance was not followed by `` ( ''.
_________________________
1
The UNIX C compiler contextually declares identifiers in
initializers to be of type int.
C Reference Manual - 30
14.3 Arrays, pointers, and subscripting
Every time an identifier of array type appears in an
expression, it is converted into a pointer to the first member of
the array. Because of this conversion, arrays are not lvalues.
By definition, the subscript operator [ ] is interpreted in such
a way that ``E1[E2]'' is identical to ``*((E1) + (E2))''.
Because of the conversion rules which apply to +, if E1 is an
array and E2 an integer, then E1[E2] refers to the E2-th member
of E1. Therefore, despite its asymmetric appearance,
subscripting is a commutative operation.
A consistent rule is followed in the case of multi-dimensional
arrays. If E is an n - dimensional array of rank
i x j x . . . x k, then E appearing in an expression is converted
to a pointer to an (n - 1) - dimensional array with rank
j x . . . x k. If the * operator, either explicitly or
implicitly as a result of subscripting, is applied to this
pointer, the result is the pointed-to (n - 1) - dimensional
array, which itself is immediately converted into a pointer.
For example, consider
int x[3][5];
Here x is a 3x5 array of integers. When x appears in an
expression, it is converted to a pointer to (the first of three)
5-membered arrays of integers. In the expression ``x [ i ]'',
which is equivalent to ``*(x+i)'', x is first converted to a
pointer as described; then i is converted to the type of x, which
involves multiplying i by the length the object to which the
pointer points, namely 5 integer objects. The results are added
and indirection applied to yield an array (of 5 integers) which
in turn is converted to a pointer to the first of the integers.
If there is another subscript the same argument applies again;
this time the result is an integer.
It follows from all this that arrays in C are stored row-wise
(last subscript varies fastest) and that the first subscript in
the declaration helps determine the amount of storage consumed by
an array but plays no other part in subscript calculations.
14.4 Labels
Labels do not have a type of their own; they are treated as
having type ``array of int''. Label variables should be declared
``pointer to int''; before execution of a goto referring to the
variable, a label (or an expression deriving from a label) should
be assigned to the variable.
Label variables are a bad idea in general; the switch statement
makes them almost always unnecessary.
15. Constant expressions
In several places C requires expressions which evaluate to a
constant: after case, as array bounds, and in initializers. In
the first two cases, the expression can involve only integer and
character constants, possibly connected by the binary operators
C Reference Manual - 31
+ - * / % & | ^ << >>
< > <= >= == != && || ? :
or by the unary operators
- ~ !
Parentheses can be used for grouping, but not for function
1
calls.
A bit more latitude is permitted for initializers. Besides
constant expressions as discussed above, one can have double and
string constants, and one can apply the unary & operator to
external scalars. The unary & can also be applied implicitly by
appearance of functions or unsubscripted external arrays. An
undefined identifier appearing in an initializer is implicitly
2
declared to be a function returning int.
16. Examples.
These examples are intended to illustrate some typical C
constructions as well as a serviceable style of writing C
programs.
16.1 Inner product
This function returns the inner product of its array arguments.
double inner (v1, v2, n)
double v1[], v2[];
{double sum;
int i;
sum = 0.0;
for (i = 0; i < n; i++)
sum += v1[i] * v2[i];
return (sum);
}
The following version is somewhat more efficient, but perhaps a
little less clear. It uses the facts that parameter arrays are
really pointers, and that all parameters are passed by value.
_________________________
1
The UNIX C compiler allows sizeof, but not the relational
operators, &&, ||, !, or conditional expressions.
2
The UNIX C compiler also allows initializers which evaluate to
the address of an external or global static variable plus or
minus a constant, such as ``&a[3]'', where a is an external or
global static array.
C Reference Manual - 32
double inner (v1, v2, n)
double *v1, *v2;
{double sum;
sum = 0.0;
while (n--)
sum += *v1++ * *v2++;
return (sum);
}
The declarations for the parameters are really exactly the same
as in the last example. In the first case array declarations
`` [ ] '' were given to emphasize that the parameters would be
referred to as arrays; in the second, pointer declarations were
given because the indirection operator and ++ were used.
16.2 Tree and character processing
Here is a complete C program ( courtesy of R. Haight ) which
reads a document and produces an alphabetized list of words found
therein together with the number of occurrences of each word.
The method keeps a binary tree of words such that the left
descendant tree for each word has all the words lexicographically
smaller than the given word, and the right descendant has all the
larger words. Both the insertion and the printing routine are
recursive.
The program calls the library routines getchar to pick up
characters and cexit to terminate execution. Cprint is called to
print the results according to a format string.
Because all the external definitions for data are given at the
top, no extern declarations are necessary within the functions.
To stay within the rules, a type declaration is given for each
non-integer function when the function is used before it is
defined. However, since all such functions return pointers which
are simply assigned to other pointers, no actual harm would
result from leaving out the declarations; the supposedly int
function values would be assigned without error or complaint.
# define nwords 1500 /* number of different words */
# define wsize 20 /* max chars per word */
# define tnode struct _tnode /* make tnode look like a type */
struct _tnode /* the basic structure */
{char tword[wsize];
int count;
tnode *left, *right;
};
tnode space[nwords]; /* the words themselves */
int nnodes nwords; /* number of remaining slots */
tnode *nextp space; /* next available slot */
tnode *freep; /* free list */
/*
* The main routine reads words until end-of-file,
* i.e., '\0' returned from "getchar".
* "tree" is called to sort each word into the tree.
*/
C Reference Manual - 33
main (argc, argv)
int argc;
char *argv[];
{tnode *top, *tree();
char c, word[wsize];
int i;
i = top = 0;
while (c = getchar ())
if (('a' <= c && c<='z') || ('A' <= c && c <= 'Z'))
{if (i < wsize - 1)
word[i++] = c;
}
else
if (i)
{word[i++] = '\0';
top = tree (top, word);
i = 0;
}
tprint (top);
}
/*
* The central routine. If the subtree pointer is null, allocate
* a new node for it. If the new word and the node's word are the
* same, increase the node's count. Otherwise, recursively sort
* the word into the left or right subtree depending on whether
* the argument word is less or greater than the node's word.
*/
tnode *tree (p, word)
tnode *p;
char word[];
{tnode *alloc ();
int cond;
/* Is pointer null? */
if (p == 0)
{p = alloc ();
copy (word, p->tword);
p->count = 1;
p->right = p->left = 0;
return (p);
}
/* Is word repeated? */
if ((cond = compar (word, p->tword)) == 0)
{p->count++;
return (p);
}
/* Sort into left or right */
if (cond < 0)
p->left = tree (p->left, word);
else
p->right = tree (p->right, word);
return (p);
}
C Reference Manual - 34
/*
* Print the tree by printing the left subtree, the given node,
* and then the right subtree.
*/
tprint (p)
tnode *p;
{while (p)
{tprint (p->left);
cprint ("%4d: %s\n", p->count, p->tword);
p = p->right;
}
}
/*
* String comparison: return number ( >, =, < ) 0
* according as s1 ( >, =, < ) s2.
*/
compar (s1, s2)
char *s1, *s2;
{int c1, c2;
while ((c1 = *s1++) == (c2 = *s2++))
if (c1 == '\0') return (0);
return (c1 - c2);
}
/*
* String copy: copy s1 into s2 until the null
* character appears.
*/
copy (s1, s2)
char *s1, *s2;
{while (*s2++ = *s1++);
}
/*
* Node allocation: return pointer to a free node.
* Bomb out when all are gone. Just for fun, there
* is a mechanism for using nodes that have been
* freed, even though no one here calls "free."
*/
tnode *alloc ()
{tnode *t;
if (freep)
{t = freep;
freep = freep->left;
return (t);
}
if (--nnodes < 0)
{cprint ("Out of space\n");
cexit ();
}
return (nextp++);
}
C Reference Manual - 35
/*
* The uncalled routine which puts a node on the free list.
*/
free (p)
tnode *p;
{p->left = freep;
freep = p;
}
To illustrate a slightly different technique of handling the same
problem, we will repeat fragments of this example with the tree
nodes treated explicitly as members of an array. The fundamental
change is to deal with the subscript of the array member under
discussion, instead of a pointer to it. The struct declaration
becomes
struct _tnode
{char tword[wsize];
int count;
int left, right;
};
and alloc becomes
alloc ()
{int t;
t = --nnodes;
if (t <= 0)
{cprint ("Out of space\n");
cexit ();
}
return (t);
}
The free stuff has disappeared because if we deal with
exclusively with subscripts some sort of map has to be kept,
which is too much trouble.
Now the tree routine returns a subscript also, and it becomes:
int tree (p, word)
char word[];
{int cond;
if (p == 0)
{p = alloc ();
copy (word, space[p].tword);
space[p].count = 1;
space[p].right = space[p].left = 0;
return (p);
}
if ((cond = compar (space[p].tword, word)) == 0)
{space[p].count++;
return (p);
C Reference Manual - 36
}
if (cond < 0)
space[p].left = tree (space[p].left, word);
else
space[p].right = tree (space[p].right, word);
return (p);
}
The other routines are changed similarly. It must be pointed out
that this version is noticeably less efficient than the first
because of the multiplications which must be done to compute an
offset in space corresponding to the subscripts.
The observation that subscripts ( like ``a [ i ] '' ) are
less efficient than pointer indirection ( like ``*ap'' ) holds
true independently of whether or not structures are involved.
There are of course many situations where subscripts are
indispensable, and others where the loss in efficiency is worth a
gain in clarity.
C Reference Manual - 37
References
1. Johnson, S. C., and Kernighan, B. W. The
programming language B. Computing Science Technical
Report No. 8, Bell Laboratories, Murray Hill, N. J.,
1972.
2. Peterson, T. G., and Lesk, M. E. A user's guide to
the C language on the IBM 370. Internal Memorandum,
Bell Laboratories, 1974.
3. Richards, M. BCPL: a tool for compiler writing and
system programming. Proc. SJCC 1969, 557-566.
4. Ritchie, D. M., and Thompson, K. L. The UNIX
time-sharing system. Comm. ACM 7, 17 (July 1974),
365-375.
5. Ritchie, D. M., Kernighan, B. W., and Lesk, M. E.
The C programming language. Computing Science
Technical Report No. 31, Bell Laboratories, Murray
Hill, N. J., 1975.
6. Snyder, A. A portable compiler for the language C.
Rep. TR-149, Project MAC, M.I.T., Cambridge, Ma., 1975.
C Reference Manual - 38
APPENDIX
Syntax Summary
1. Expressions.
expression:
primary
* expression
& expression
- expression
! expression
~ expression
++ lvalue
-- lvalue
lvalue ++
lvalue --
sizeof expression
expression binop expression
expression ? expression : expression
lvalue asgnop expression
expression , expression
primary:
identifier
constant
string
( expression )
primary ( expression-list )
opt
primary [ expression ]
lvalue . identifier
primary ->identifier
lvalue:
identifier
primary [ expression ]
lvalue . identifier
primary -> identifier
* expression
( lvalue )
The primary-expression operators
( ) [ ] . ->
have highest priority and group left-to-right. The unary
operators
* & - ! ~ ++ -- sizeof
have priority below the primary operators but higher than any
C Reference Manual - 39
binary operator, and group right-to-left. Binary operators
and the conditional operator all group left-to-right, and have
priority decreasing as indicated:
binop:
* / %
+ -
>> <<
< > <= >=
== !=
&
| ^
&&
||
? :
Assignment operators all have the same priority, and all group
right-to-left.
asgnop:
=
+= -= *= /= %= >>= <<= &= ^= |=
=+ =- =* =/ =% =>> =<< =& =^ =|
The comma operator has the lowest priority, and groups
left-to-right.
2. Declarations.
declaration:
decl-specifiers init-declarator-list ;
type-specifier ;
decl-specifiers:
type-specifier
sc-specifier
type-specifier sc-specifier
sc-specifier type-specifier
sc-specifier:
auto
static
extern
register
C Reference Manual - 40
type-specifier:
int
char
float
double
long
long int
short
short int
unsigned
unsigned int
long float
struct { type-decl-list }
struct identifier { type-decl-list }
struct identifier
init-declarator-list:
init-declarator
init-declarator , init-declarator-list
init-declarator:
declarator initializer
opt
declarator:
identifier
* declarator
declarator ( )
declarator [ constant-expression ]
opt
( declarator )
type-decl-list:
type-declaration
type-declaration type-decl-list
type-declaration:
type-specifier declarator-list ;
declarator-list:
declarator
declarator , declarator-list
initializer:
constant
{ constant-expression-list }
C Reference Manual - 41
constant-expression-list:
constant-expression
constant-expression , constant-expression-list
constant-expression:
expression
3. Statements.
compound-statement:
{ declaration-list statement-list }
opt
statement:
expression ;
compound-statement
if ( expression ) statement
if ( expression ) statement else statement
while ( expression ) statement
for ( expression ; expression ; expression ) statement
opt opt opt
switch ( expression ) statement
case constant-expression : statement
default : statement
break ;
continue ;
return ;
return ( expression ) ;
goto expression ;
identifier : statement
;
statement-list:
statement
statement statement-list
4. External definitions.
program:
external-definition
external-definition program
external-definition:
function-definition
declaration
function-definition:
type-specifier function-declarator function-body
opt
C Reference Manual - 42
function-declarator:
identifier ( parameter-list )
opt
* function-declarator
function-declarator ( )
function-declarator [ constant-expression ]
opt
( function-declarator )
parameter-list:
identifier
identifier , parameter-list
function-body:
type-decl-list function-statement
opt
function-statement:
compound-statement
5. Compiler control lines
# define identifier token-string
# define identifier( parameter-list ) token-string
# include string
# macro identifier ( parameter-list )
opt
# end
# ifdef identifier
# ifndef identifier
# endif
# undefine identifier
# rename identifier string