The CMAC Machine Copyright (c) 1976, 1977, 1978 by Alan Snyder M.I.T. Laboratory for Computer Science Cambridge, Ma. 02139 CMAC is a set of macros which form an assembly language for a very simple machine. It has been designed to be used to bootstrap the C compiler to new host machines. The code produced for the CMAC machine is not efficient. However, efficiency is not the relevant consideration. The primary goal is to minimize the effort needed to implement the macros. The CMAC machine has two data types, integers and pointers. Characters are mapped into integers; character strings are sequences of integers. The C compiler does not use floating-point; thus, floating-point is not included in the CMAC machine. Both integers and pointers are stored in single machine words. On word-addressed machines, pointers will be the same as integers. On byte-addressed machines, they will most likely be byte addresses, and thus always multiples of two, four, or whatever. Pointers are distinguished from integers in the CMAC machine because, on host machines which are not word-addressed, operations on pointers (such as increment and decrement) will be different than the corresponding integer operations. The CMAC machine has two registers, called A and B. Each register is capable of holding either an integer or a pointer. All operations are performed on values in registers. Explicit CMAC instructions are used to get values into and out of the registers. This division of labor allows the problems of the different storage classes of memory references to be isolated in the load and store macro definitions, thus simplifying the definitions of the operational macros. CMAC programs are designed to be translated into the machine language of the host machine using the standard macro assembler of the host machine, augmented by a set of macro definitions for the CMAC macros. In order to run the C compiler, the compiler programs should be so translated and then linked together using the standard linker of the host machine. In addition, a few support routines (for I/O) must be hand-coded (these routines are described in a separate document). It is conceivable that the 4 April 1978 - 2 - The CMAC Machine CMAC programs might have to first be edited in order to conform to the format expected by the host machine's macro assembler. Alternatively, the CMAC programs can be translated (again, by some macro processor) into a compact representation which can then be interpreted by an interpreter running on the host machine. This requires both the writing of an interpreter and the writing of the macro definitions which construct the interpreted representation. However, the use of this technique may be necessary on small machines where the direct translation from CMAC macros to machine code results in excessively large programs. The form of a CMAC program is that of an ASCII text file consisting of text lines terminated by the newline (LF) character. Each line contains one macro call, consisting of a TAB, followed by the macro name, optionally followed by a TAB and one or more character string arguments separated by commas. For example, FOO A,B is a call of a macro named "FOO" with arguments "A" and "B". Macro names and arguments consist of upper-case letters and digits. Numeric arguments may be prefixed by a minus sign. In C identifiers, the underscore character will be represented by 'J'; all external identifiers in the C compiler are unique in their first five characters. All numeric arguments are in decimal. 1. Move Macros The move family of macros are used to move values between memory and the two registers. There are three basic classes of move macros: load, store, and load-address. Within each class, there are macros corresponding to the relevant C storage classes. The first argument of each of these macros specifies a register (either A or B). The second argument is dependent on the particular C storage class; it gives the information needed to specify an exact storage location. The form of this information is described in Table I. 1.1 Load Macros The load class of macros each has two arguments, a register (either A or B) and a storage class-dependent argument. The function of a load macro is to move the value in the location specified by the second argument into the register specified by the first argument. The load macros are as follows: The CMAC Machine - 3 - 4 April 1978 Table I. Forms of Storage Class-Particular Information class information form auto word offset of variable in current stack frame extern the actual C identifier static a static-variable unique number literal an integer value parm parameter number (starting with 0) indirect the register containing the pointer register the register containing the value string a string-literal unique number _________________________________________________________________ LAUTO R,OFFSET load register from auto variable LEXTRN R,NAME load register from external variable LSTAT R,# load register from static variable LLIT R,# load register with literal value LPARM R,# load register from parameter LVPTR R,R load register via pointer in register LREG R,R move from one register to the other 1.2 Store Macros The store class of macros each has two arguments, a register (either A or B) and a storage class-dependent argument. The function of a store macro is to move the value in the register specified by the first argument to the location specified by the second argument. The store macros are as follows: STAUTO R,OFFSET store register into auto variable STEXTN R,NAME store register into external variable STSTAT R,# store register into static variable STPARM R,# store register into parameter STVPTR R,R store register via pointer in register 1.3 Load-address Macros The load-address class of macros each has two arguments, a register (either A or B) and a storage class-dependent argument. The function of a load-address macro is to construct a pointer to the location specified by the second argument, and put that pointer value in the designated register. The load-address macros are as follows: LAAUTO R,OFFSET load address of auto variable LAEXTN R,NAME load address of external variable LASTAT R,# load address of static variable LAPARM R,# load address of parameter LASTRG R,# load address of string literal 4 April 1978 - 4 - The CMAC Machine 2. Operate Macros The operate family of macros perform operations upon values in the two registers. There are two classes of operate macros: unary and binary. 2.1 Unary Operators The unary operators each take one argument, which specifies a register. This register is used for both the source and destination of the operation. The unary operate macros are as follows: CMINUS R arithmetic minus CNOT R bitwise negation Since there are only two possible argument values, it is possible to implement these operations as subroutine calls to one of a pair of subroutines, one for each register. 2.2 Binary Operators The binary operators all take their first (or "left") operand from the A register and their second (or "right") operand from the B register, and place their result in the A register, leaving the B register unchanged. The semantics of the operators are the same as the corresponding C operators. Since the operand locations are fixed, no arguments are provided to these macros. The binary operate macros are as follows: CADD integer addition CSUB integer subtraction CMUL integer multiplication CDIV integer division CMOD integer remainder CLS bitwise left shift CRS bitwise right shift CAND bitwise AND COR bitwise OR CXOR bitwise XOR PINC pointer increment PDEC pointer decrement PSUB pointer subtraction Note that the pointer operations are not necessarily the same as the integer operations. Pointer increment and decrement take a pointer as their first operand and an integer as their second operand and produce a pointer result. The integer represents an offset in words; it may have to be scaled before it is actually added or subtracted to the pointer. Pointer subtraction takes two pointer operands and produces an integer result. The integer should represent the number of words that the first pointer is offset from the second; it may be necessary to scale the result The CMAC Machine - 5 - 4 April 1978 of the subtraction in order to get the integer in the proper units. Since the binary operator macros take no arguments, they are easily implemented as subroutine calls. 3. Conditional-jump Macros The conditional-jump family of macros all take a single argument, a internal-label unique number. This number designates a label to which control should jump depending upon the result of a test specified by the conditional-jump macro. This test is based on the values in the registers. There are two types of conditional-jump operators, unary and binary. The unary operators test the value in the A register. The binary operators perform a comparision between the values in the A register and the B register. The unary conditional-jump macros are as follows: JNULL # jump if null pointer JNNULL # jump if non-null pointer The binary conditional-jump macros are as follows: JEQ # jump if A == B JNE # jump if A != B JLT # jump if A < B JGT # jump if A > B JLE # jump if A <= B JGE # jump if A >= B 4. Keyword Macros The remaining family of macros is the set of keyword macros. These correspond closely to the keyword macros used in the abstract machine of the C compiler. The CMAC keyword macros are described below. Each description is headed by the name of a macro and its argument names; following is a description of the arguments and the intended function of the macro call. 4.1 Program Definition Macros HEAD The HEAD macro marks the beginning of a CMAC program. It may produce any needed header statements. CEND The CEND macro marks the end of a CMAC program. It may produce an END statement, if needed. 4 April 1978 - 6 - The CMAC Machine CENTRY NAME NAME is a C identifier. The expansion of the CENTRY macro should declare the specifed variable to be an entry point, that is, one which is defined in the current program but accessible to other programs. CEXTRN NAME The CEXTRN macro is similar to the CENTRY macro except that it defines the variable to be an external reference, that is, one which is used in the current program but assumed to be defined in another program. PURE The PURE macro indicates that the following program text represents PURE code or data. This macro may be used, along with the IMPURE macro, to segregate PURE and IMPURE storage. The use of this macro is optional. IMPURE The IMPURE macro indicates that the following program text represents impure storage. 4.2 Symbol Defining Macros CEQU NAME NAME is a C identifier; it is to be defined as having a value equal to the current value of the location counter. LABDEF N The LABDEF macro defines the location of internal label number N to be the current value of the location counter. STATIC N The STATIC macro defines the location of the static variable whose internal static variable number is N to be the current value of the location counter. Typically, this macro will define an assembly language symbol by which the static variable can be referenced. STRDEF N The STRDEF macro defines the address of the string constant whose internal number is N to be the current value of the location counter. It is immediately followed by one or more INTCON macros; the last one will define a zero word. The CMAC Machine - 7 - 4 April 1978 LINNUM N The LINNUM macro associates the line in the source program whose line number is specified by the integer N with the current value of the location counter. It need not produce any code; it is provided merely to aid in the reading of CMAC programs. 4.3 Storage Defining Macros ADCON NAME NAME is C identifier. The ADCON macro should define a word of storage initialized with a pointer to the specified external variable. This macro is used in the initialization of static and external pointers and arrays of pointers. SADCON N N is an integer. The SADCON macro should define a word of storage initialized with a pointer to the static variable numbered N. This macro is used in the initialization of static and external pointers and arrays of pointers. INTCON I The INTCON macro should define a word of storage whose initial value is that specified by the integer I. It is used in the initialization of static and external variables and arrays, in the definition of string constants, and in the construction of tables for the LSWITCH macro. LABCON N The LABCON macro should define a word of storage whose initial value is the address corresponding to internal label number N. The LABCON macro is used to construct the tables for the LSWITCH and TSWITCH macros. STRCON N The STRCON macro should define a word of storage whose initial value is a pointer to the string constant whose internal string number is N. The STRCON macro is used in the initialization of static and external variables. CZERO N The CZERO macro specifies the definition of a block of storage initialized to zero; the size in words of this storage area is specified by the integer N. 4 April 1978 - 8 - The CMAC Machine 4.4 Control Macros PROLOG FUNCNO,FUNCNAME The PROLOG macro produces the prolog code for a C function. FUNCNAME is the name of the C function. FUNCNO is an integer which specifies the internal function number of the function; it may be used in conjunction with the EPILOG macro to access the size of the function's stack frame. The PROLOG macro should define the entry point name and produce the code necessary to save the environment of the calling function and to set up the environment of the called function using the information provided in the function call. These actions may be performed by a subroutine. The first eight words of every stack frame are reserved for use by the PROLOG macro; that is, the first automatic variable in a function is given an offset of eight words. The PROLOG macro call appears in a CMAC program immediately before the first instruction of the corresponding function. EPILOG FUNCNO,FRAMESIZE The EPILOG macro produces the epilog code for a C function. The epilog code should restore the environment of the calling function and return to that function. These actions may be performed by a subroutine. FUNCNO and FRAMESIZE are integers which specify the internal function number of the function and the size in words of its stack frame, respectively. These integers can be used to define an assembly-language symbol whose value is the size of the stack frame; this symbol can then be used by the code produced by the PROLOG macro which allocates the stack frame. CCALL NARGS,ARGP,NAME The CCALL macro generates a function call. NARGS is an integer specifying the number of arguments to the function call; ARGP is an integer specifying the word offset in the caller's stack frame of the arguments which have been so placed by previous instructions. NAME is the name of the function being called. CALREG NARGS,ARGP,REG The CALREG macro is like the CCALL macro except that the function being called has been computed dynamically. The address of this function is located in the register specified by REG. CRETRN The CRETRN macro produces the statements needed to return from a function to the calling function, i.e., transfer to the EPILOG code. The returned value of the function will have been placed in the A register by previous CMAC instructions. The CMAC Machine - 9 - 4 April 1978 CGOTO N The CGOTO macro produces an unconditional jump to the location defined by internal label number N. LSWITCH N,DEFLT The LSWITCH macro should generate code which jumps according to the value of the integer in register A. This macro is immediately followed by N (N>0) INTCON macros (the cases), which are immediately followed by N LABCON macros (the corresponding labels), followed by an ELSWIT macro. A search should be made through the case list; if a match is found, a jump should be made to the label defined by the corresponding LABCON macro. If the integer matches none of the list entries, then a jump should be made to the internal label whose internal label number is given by the integer DEFLT. ELSWIT N,DEFLT This macro completes an LSWITCH. TSWITCH LO,HI,DEFLT The TSWITCH macro produces an indexed jump based on the value of the integer in register A. This macro is immediately followed by a sequence of HI-LO+1 LABCON macros defining the target labels corresponding to integer values from LO to HI. Values outside this range should result in transfers to the internal label whose internal label number is given by the integer DEFLT. ETSWIT LO,HI,DEFLT This macro completes a TSWITCH.