1
0
mirror of https://github.com/PDP-10/its.git synced 2026-01-13 15:27:28 +00:00
PDP-10.its/doc/c/ccdoc.text
Lars Brinkhoff ee33e8228a Clean up C documentation.
Rename CC.PAT[C,SYS] to CC PATCH.  Remove E padding from CCDOC.
2018-11-01 21:23:34 +01:00

639 lines
29 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Random C Documentation
Copyright (c) 1976, 1977, 1978 by Alan Snyder
File: ccdoc.r
Last Modified: 4 April 1978
1. Compiler Structure (28 Feb 1977)
The C compiler consists of six logical phases:
L - lexical analysis phase (C1)
P - parsing phase (C2)
C - code generation phase (C3)
M - macro expansion phase (C4)
E - error message editing phase (C5)
S - optional symbol table dumping phase (C8)
In addition, there may be a control program (CC) that invokes the
above phases. Each of the above phases may be run as separate
jobs. However, a pair of configuration options allows the
following phases to be merged into single jobs:
L and P (called LP)
E and CC (called CC)
The source files which make up the various phases are as follows:
L: c1 c92 c95 c96
P: c21 c22 c23 c24 c25 c26-xx c91 c95 c96
C: c31 c32 c33 c34 c35-xx c91 c94 c95 c96
M: c41 c42-xx c43-xx c92 c93 c94 c95 c96
E: c5 c93 c96
S: c8 c93 c95
Files with suffixes -xx are target machine dependent. Files c25,
c35, and c43 are constructed from the output of GT. (A TECO
macro, install.teco, is provided which does this; if you don't
know TECO, simply use the provided c25, c35, and c43 files as a
guide.) File c42 contains the C routine macros.
C Documentation - 2 - 4 April 1978
2. Intermediate Files (28 Feb 1977)
The phases of the compiler communicate using intermediate files.
These files may be of several formats:
TEXT - consists of lines separated by NEW-LINE
CHARACTER BINARY - an array of arbitrary
characters, with no special interpretation
for any characters
INTEGER BINARY - an array of arbitrary integers
Three operations are performed on intermediate files:
READ
WRITE - causes the file to be created
APPEND - prior existence of file assumed
The following routines are used to peform these operations:
TEXT: COPEN, CGETC, CPUTC, CCLOSE
CHARACTER BINARY: COPEN, CGETC, CPUTC, CCLOSE
INTEGER BINARY: COPEN, GETI, PUTI, CCLOSE
The following table describes the files used by the compiler,
their formats, and which operations are performed by which
phases.
source the source file, provided by the user;
TEXT, READ: L
output the output file, whose name is
determined from the source file name;
TEXT, WRITE: M
TOKEN the internal representation of the
program in the form of tokens (not used
if LP merged); INTEGER BINARY, WRITE:
L, READ: P
CSTORE holds the source form of identifiers
and floating-point constants; CHARACTER
BINARY, WRITE: L, READ: M, E, S
STRING holds character string literals;
CHARACTER BINARY, WRITE: L, READ: M
ERROR error messages in internal form;
INTEGER BINARY, WRITE: CC, APPEND: L,
P, C, M, READ: E
NODE internal representation of program as
syntax trees; INTEGER BINARY, WRITE: P,
READ: C
TYPTAB type table; INTEGER BINARY, WRITE: P,
READ: C, S
HMAC header macros; TEXT, WRITE: P, READ: M
MAC macros; TEXT, WRITE: P, APPEND: C,
READ: M
C Documentation - 3 - 4 April 1978
SYMTAB symbol table, written only if a symbol
table listing is requested; INTEGER
BINARY, WRITE: P, READ: S
3. Library Routines Assumed by the C Compiler
The following routines are assumed to exist in the C library when
the C compiler is loaded:
routine description
fd = copen (s, m, f) open file for input or output
c = cgetc (fd) read character from file
cputc (c, fd) output character to file
b = ceof (fd) test for end-of-file
cclose (fd) close file
puti (i, fd) output integer in internal format
i = geti (fd) read integer in internal format
cexit (cc) terminate phase with completion code
p = getvec (n) allocate block of storage
b = cisfd (x) is X a valid file-descriptor (used by CPRINT)
The I/O routines must support read, write, and append mode, and
TEXT, CHARACTER BINARY, and INTEGER BINARY formats. For the
BINARY formats, no formatting transformations may be performed;
the bits read must exactly equal the bits written for all
possible values. The storage allocator may be quite trivial;
note that the storage so allocated is never freed.
The following external variables are referenced:
name description
int cout; standard output unit
C Documentation - 4 - 4 April 1978
4. Tokens
A token consists of two integer values, a TAG and an INDEX. The
token tags are as follows:
0 20 40 60
0 / int entry
1 eof * char register
2 - float sizeof
3 ; + double long
4 ~ = struct short
5 { < auto unsigned
6 ] > static typedef
7 [ ++ extern
8 ) -- return
9 ( == goto
10 : != if
11 , <= else
12 . >= switch
13 ? << break
14 ~ >> continue
15 ! -> while idn
16 & asgnop do intcon
17 | && for floatcon
18 ^ || default string
19 % case lineno
The following token tags are used internal to the lexical phase:
LEXEOF end of compiler control line
TCONTROL start of compiler control line
TMARG <macro argument>
The interpretation of the INDEX is dependent upon the TAG:
TIDN index of identifier name in CSTORE
TINTCON value of integer constant
TFLOATC index of source representation of float
constant in CSTORE
TSTRING index of source representation of
string constant in string file ('\0' is
represented as "$0", '$' as "$$")
TLINENO the line number
TMARG the argument number (first one is 0)
TEQOP an index repesenting which =op it is:
C Documentation - 5 - 4 April 1978
0 =>>
1 =<<
2 =+
3 =-
4 =*
5 =/
6 =%
7 =&
8 =^
9 =|
others the number of the line upon which the
token appeared
5. Token Intermediate File
The representation of a token in the token intermediate file is a
TAG optionally followed by an INDEX. The various kinds of tokens
are described in the table below:
token tag index
eof 1 (none)
operator tag (none)
keyword tag (none)
asgnop 36 index
idn 75 index
intcon 76 value
floatcon 77 index
string 78 index
lineno 79 line number
C Documentation - 6 - 4 April 1978
6. Syntax Tree Node Types
The node types in syntax trees are as follows:
0 20 40 80
0 % =% if
1 eof / =& goto
2 idn *(2) =^ branch
3 int -(2) =| label
4 float + && stmtlist
5 string = || switch
6 call == . case
7 ? != : default
8 ++(pre) < , return
9 ++(post) > sizeof program
10 --(pre) <= exprstmt
11 --(post) >= elist
12 *(1) <<
13 &(1) >>
14 -(1) =>>
15 ~ =<<
16 ! =+
17 &(2) =-
18 | =*
19 ^ =/
7. Syntax Tree Nodes
A syntax tree node consists of the node type followed by some
number of integers and/or pointers to other syntax tree nodes.
The node types can be grouped into classes according to the
locations of pointers in the nodes. The following tables
describe the formats of the nodes in each of the various node
type classes. The following terms are used:
<iln> internal label number
<e> pointer to expression tree
<s> pointer to statement tree
<chain> part of chain of case labels
<elist> pointer to expression list node
Now for the descriptions:
class 0: no pointers
FIELDNAME <index of name in cstore>
IDENTIFIER <type> <class> <offset>
INTEGER <value>
FLOAT <index of source rep in cstore>
STRING <index of value in string file>
BRANCH <iln>
C Documentation - 7 - 4 April 1978
class 1: pointer at 2
RETURN <lineno> <e>
GOTO <lineno> <e>
LABEL <iln> <s>
EXPRSTMT <lineno> <e>
class 2: pointers at 2, 3, 4
IF <lineno> <e> <s1> <s2>
SWITCH <lineno> <e> <s> <chain>
class 3: pointers at 1, 2
STMTLIST <s1> <s2>
EXPRLIST <elist> <e>
<BINARY-OP> <e1> <e2>
FUNCTION-CALL <e> <elist>
class 4: pointer at 1
<UNARY-OP> <e>
CASE <chain> <iln> <integer>
DEFAULT <chain> <iln>
class 5: pointer at 3
PROGRAM <function-idn> <function-type> <s> <stack-size> <nargs>
8. Looping
The parser converts the three looping constructs (DO, WHILE, FOR)
into statement lists with labels and gotos. Labels must be
present for the actual loop, for the continue statement, and for
the break statement. These transformations are described below:
do <stmt> while (<e>); -> {l3:
<stmt>
l2: if (<e>) goto l3;
l1:;~
where continue = goto l2 and break = goto l1.
while (<e>) <stmt> -> {l2:
if (e) {<stmt> goto l2;~
l1:;~
where continue = goto l2 and break = goto l1.
C Documentation - 8 - 4 April 1978
for (<e1>;<e2>;<e3>) <stmt> -> {<e1>;
l3: if (<e2>)
{<stmt>
l2: <e3>;
goto l3;
~
l1:;~
where continue = goto l2 and break = goto l1.
9. Symbol Table Format (28 Feb 1977)
This section describes the format of the symbol table. There are
two separate dictionaries, the global dictionary and the local
dictionary. The global dictionary is used to hold declarations
that are made at top-level (not within a function definition) and
all extern declarations. The local dictionary is used to hold
all non-extern declarations made within a function definition.
The local definition is used only while processing a function;
when the processing of the function terminates, the local
dictionary is emptied. The local dictionary grows and shrinks as
blocks are entered and exited. When a block is exited, the
declarations local to that block are checked for undefined
structure types and unused automatic variables. Entries for
undefined labels are percolated to the next higher level;
undefined labels are checked at the end of the function.
In order to minimize the number of ways that the compiler can run
out of space, both dictionaries are allocated in one array
of dictionary entries called DICT, as follows:
_________________________________
DBEGIN --> | |
| GLOBAL DEFINITIONS |
|_________________________________|
DGDP --> | |
| FREE |
| |
|_________________________________|
DLDP --> | |
| LOCAL DEFINITIONS |
|_________________________________|
DEND -->
C Documentation - 9 - 4 April 1978
10. Types (28 Feb 1977)
All C types are represented by pointers into a type table
(TYPTAB). The format of the entry consists of a TAG, a SIZE, an
ALIGNMENT class, plus some other values whose interpretation
depends on the TAG. There are TAGs for the basic types (e.g.
INT), plus the type modifiers (e.g. POINTER). The extra value
for most type modifiers is another TYPE (i.e. a pointer to
somewhere else in the TYPTAB). Types not involving structures
are uniquely represented, so that type equality can be done by
comparing the pointers. Type equality involving recursive
structures is not worth the effort, and not really needed anyway.
An entry for a structure type contains a pointer to a list of
field definitions, allocated from the bottom of TYPTAB.
Writing the type table onto an intermediate file involves
converting the pointers to offsets.
11. Machine Description Updates (26 Apr 1976)
The following lists changes to the machine description language.
1. There is a new, optional definition statement,
OFFSETRANGE, which specifies allowable offsets (in
addressable units) in indirect references. Offset
ranges may be specified for each pointer class. The
default is to allow only zero offsets. The form of the
statement is
offsetrange px (lo, hi), ... ;
The lo and hi are integers which specify the lowest and
highest allowable offsets for indirect references of
class px. Either or both of lo and hi may be omitted,
meaning that there is no bound in that direction. The
OFFSETRANGE statement should immediately follow the
POINTER statement.
12. AMOP Updates (2 Apr 1977)
The following lists changes to the set of AMOPs.
1. There are two new AMOPs, LSEQ (0010) and COMMA (0170).
Both take two arbitrary expressions and evaluate them
in order. LSEQ returns its first operand, COMMA
returns its second operand.
2. The increment and decrement operators are now optional.
3. New, optional AMOPS have been added for ++ and -- to
floats and doubles.
C Documentation - 10 - 4 April 1978
4. New, optional AMOPS have been added to specify the
passing of function arguments (in particular, to allow
pushing function arguments on a stack). The AMOPS are
.ARGI, .ARGD, .ARG0, .ARG1, .ARG2, .ARG3, for integers,
doubles (if passed directly), and pointers. Each AMOP
takes one operand, the actual argument, and should be
specified to leave its result in the same location. If
these AMOPs are defined, then the code generator does
not allocate any stack space for argument lists. It
simply invokes one of these operators for each argument
(from left to right) and assumes that they do whatever
is necessary. When these AMOPs are used, the ARGP
argument to the CALL macro is meaningless.
13. Keyword Macro Updates (3 Apr 1977)
The following lists changes to the set of keyword macros.
1. DATA: %DA()
The DATA macro indicates that the following macros define impure
data areas which are not explicitly initialized.
2. IMPURE: %IM()
The IMPURE macro indicates that following macros define impure
data areas which are not explicitly initialized (and thus are
implicitly initialized to zero).
3. PDATA: %PD()
The PDATA macro indicates that following macros define pure data
areas, namely string literals.
4. PURE: %PU()
The PURE macro indicates that following macros define pure code.
5. ALIGNn: %ALn()
Instead of a single ALIGN macro, there are now three ALIGN
macros, for alignment classes 1, 2, and 3. Alignment class 0 is
the alignment for characters.
6. ADCONn: %An(0,BASE,OFFSET)
The arguments to ADCON are now a base and offset, which together
make up a REF that describes either an external or a static
variable or function.
C Documentation - 11 - 4 April 1978
7. PROLOG: %P(FUNCNO,FUNCNAME,FNARGS)
The FNARGS argument has been added: it specifies the number of
formal arguments to the function.
8. RETURN: %RT(FUNCNO)
The FUNCNO argument has been added to allow reference to the
corresponding EPILOG code.
9. STATIC: %ST(N)
The size argument has been flushed, as it could not be reliably
computed, anyway.
10. ELSWITCH: %ELS(N,LBASE,LOFFSET,IBASE,IOFFSET)
This new macro terminates the lists following LSWITCH macros.
11. ETSWITCH: %ETS(LO,LBASE,LOFFSET,IBASE,IOFFSET,HI)
This new macro terminates the lists following TSWITCH macros.
14. Pointer Comparison (15 Dec 1974)
The following comparison operations are allowed on pointers:
1. A pointer may be compared with any pointer,
using any of the comparison operations.
2. A pointer may be tested for equality or
inequality with 0.
In the first case, we have the operation p1:p2 -> l, where p1 and
p2 are the pointers and l is the destination label. The
algorithm is as follows:
if (ctype (p1) < ctype (p2)) exchange (p1, p2);
if (ctype (p1) > ctype (p2))
convert p1 to ctype (p2);
apply operator of ctype (p2);
In the second case, we have the operation p:0 -> l, where p is
the pointer and l is the destination label. The default method
of handling such an operation is to convert the pointer to an
integer and apply the integer comparison operation. However, as
an optimization, we allow the definition of special null-pointer
testing operations, ==0px and !=0px, for any pointer class x.
Note that the pointer-to-integer conversions are not supposed to
really change anything; their main purpose is to accomplish
moving the pointer value from a pointer register to an integer
register (where necessary). In the case of comparing a pointer
to zero, this move may not be necessary as machine instructions
C Documentation - 12 - 4 April 1978
to do the test on the pointer register may exist; that is what
the optional pointer comparison operators are for.
15. Changes to GT (28 Dec 1974)
Two new tables are output by GT for use by the code generation
phase. These tables are OPREG[] and OPMEM[]. The tables contain
one word for each AMOP; the bits of the word indicate registers
in OPREG and memory reference classes in OPMEM. The bits are set
to indicate which locations are possible result locations for the
AMOPs.
The following entries are pre-defined (G indicates set by GT, R
indicates set by the code generator as part of run-time
initialization):
AMOP value set
*u any indirect (of appropriate type) R
!, &&, || all integer registers G
comparison all integer registers G
=, ? any register (of appropriate type) R
call retreg of appropriate type R
e_int intlit G
e_float floatlit G
e_string stringlit G
e_idn appropriate idn class R
16. Scoring Algorithm (28 Dec 1974)
This section describes the scoring algorithm used in the
selection of OPLOCs. It is based on a cost of 4 units for a
register-register move and 5 units for a register-memory move.
An impossible situation is scored at -1000. The total score is
the sum of all applicable scores derived by the following
algorithm:
For each clobbered register which is busy, -10 (2 RM).
Considering the possible result locations of the top-level
operator:
If the desired result location is a label but the top-level
operator never has a label result, -1000. Otherwise, if
the desired result location is register, then:
If some desired register is possible and at least one
possible desired register is free or all of the
desired registers are busy then 0 (there will be no
penalty for picking this OPLOC, on the basis of the
result location, at least). Otherwise, if a busy
desired register is possible and not all desired
registers are busy, then -10 (2 RM). Otherwise, if a
C Documentation - 13 - 4 April 1978
non-desired free register is possible, -4 (1 RR).
Otherwise, if a non-desired busy register is
possible, -14 (1 RR + 2 RM). Otherwise, if memory is
a possible result location, -5 (1 RM).
Otherwise, if the desired result location is memory, then:
If all of the possible result locations are desired,
then 0. Otherwise, if a temporary is desirable and a
free register is a possible result location, then -5
(1 RM). Otherwise, if a temporary is desirable and a
busy register is a possible result location, then -15
(3 RM). Otherwise, -1000.
Considering the possible locations of each of the operands of the
top-level operator:
If the operand is desired in a register and the operand
location is identically the result location of the
top-level operator, then if no desired registers are free,
then -10 (2 RM). If the operand is desired in a register
and the operand's top-level operator may return its result
in a register, then if there is no register which meets
both requirements, -4 (1 RR). If the operand is desired in
a register and the operand's top-level operator never
returns its result in a register, then -5 (1 RM).
If the operand is desired in memory and all non-indirect
memory reference classes are acceptable, then if the
operand's top-level operator never returns its result in
memory, -5 (1 RM). If the operand is desired in memory and
there are one or more unacceptable non-indirect memory
reference classes, then if all possible memory result
locations of the operand's top-level operator are not
acceptable or if the operand's top-level operator may
return its result in a register and a temporary location is
not acceptable, -1000.
17. Changes to GT (31 Dec 1974)
The distinction in the interpretation of the # indicators,
depending upon whether or not they appear in a macro call or not,
no longer exists. The #X indicators now always refer to a call
on the NAME macro. New #'X indicators are recognized which
always refer to the macro argument sequences.