The CMAC Machine


                           Copyright (c) 1976, 1977, 1978
                                         by
                                     Alan Snyder
                       M.I.T. Laboratory for Computer Science
                                Cambridge, Ma. 02139


          CMAC  is  a  set  of macros which form an assembly language for a
          very simple machine.    It  has  been  designed  to  be  used  to
          bootstrap the C compiler to new host machines.  The code produced
          for  the  CMAC  machine is not efficient.  However, efficiency is
          not the relevant consideration.  The primary goal is to  minimize
          the effort needed to implement the macros.

          The  CMAC  machine  has  two  data  types, integers and pointers.
          Characters  are  mapped  into  integers;  character  strings  are
          sequences   of   integers.      The   C  compiler  does  not  use
          floating-point; thus, floating-point is not included in the  CMAC
          machine.  Both integers and pointers are stored in single machine
          words.

          On   word-addressed  machines,  pointers  will  be  the  same  as
          integers.  On byte-addressed machines, they will most  likely  be
          byte  addresses,  and  thus  always  multiples  of  two, four, or
          whatever.  Pointers are distinguished from integers in  the  CMAC
          machine  because,  on host machines which are not word-addressed,
          operations on pointers (such as increment and decrement) will  be
          different than the corresponding integer operations.

          The  CMAC  machine  has  two  registers,  called  A  and B.  Each
          register is capable of holding either an integer  or  a  pointer.
          All  operations  are  performed on values in registers.  Explicit
          CMAC instructions are used to get values  into  and  out  of  the
          registers.    This  division  of labor allows the problems of the
          different storage classes of memory references to be isolated  in
          the  load  and  store  macro  definitions,  thus  simplifying the
          definitions of the operational macros.

          CMAC programs are designed to  be  translated  into  the  machine
          language  of  the host machine using the standard macro assembler
          of the host machine, augmented by a set of macro definitions  for
          the  CMAC  macros.   In order to run the C compiler, the compiler
          programs should be so translated and then linked  together  using
          the  standard  linker  of  the  host machine.  In addition, a few
          support routines (for I/O) must be hand-coded (these routines are
          described in a separate document).  It is  conceivable  that  the
          4 April 1978                  - 2 -              The CMAC Machine


          CMAC  programs  might have to first be edited in order to conform
          to the format expected by the host machine's macro assembler.

          Alternatively, the CMAC programs can  be  translated  (again,  by
          some  macro  processor)  into  a compact representation which can
          then be  interpreted  by  an  interpreter  running  on  the  host
          machine.   This  requires  both the writing of an interpreter and
          the  writing  of  the  macro  definitions  which  construct   the
          interpreted  representation.   However, the use of this technique
          may be necessary on small machines where the  direct  translation
          from  CMAC  macros  to  machine code results in excessively large
          programs.

          The form of a  CMAC  program  is  that  of  an  ASCII  text  file
          consisting   of   text  lines  terminated  by  the  newline  (LF)
          character.  Each line contains one macro call,  consisting  of  a
          TAB, followed by the macro name, optionally followed by a TAB and
          one  or more character string arguments separated by commas.  For
          example,

                  FOO     A,B

          is a call of a macro named "FOO"  with  arguments  "A"  and  "B".
          Macro  names  and  arguments  consist  of  upper-case letters and
          digits.  Numeric arguments may be prefixed by a minus sign.  In C
          identifiers, the underscore character will be represented by 'J';
          all external identifiers in the C compiler are  unique  in  their
          first five characters.  All numeric arguments are in decimal.

          1.  Move Macros

          The  move family of macros are used to move values between memory
          and the two registers.  There are three  basic  classes  of  move
          macros:  load, store, and load-address.  Within each class, there
          are macros corresponding to the relevant C storage classes.   The
          first  argument  of  each  of  these  macros specifies a register
          (either A or B).    The  second  argument  is  dependent  on  the
          particular  C  storage  class; it gives the information needed to
          specify an exact storage location.  The form of this  information
          is described in Table I.

          1.1  Load Macros

          The  load  class  of  macros  each  has two arguments, a register
          (either A or B) and a  storage  class-dependent  argument.    The
          function  of  a  load  macro is to move the value in the location
          specified by the second argument into the register  specified  by
          the first argument.  The load macros are as follows:
          The CMAC Machine              - 3 -                  4 April 1978


          Table I.  Forms of Storage Class-Particular Information

               class     information form

               auto      word offset of variable in current stack frame
               extern    the actual C identifier
               static    a static-variable unique number
               literal   an integer value
               parm      parameter number (starting with 0)
               indirect  the register containing the pointer
               register  the register containing the value
               string    a string-literal unique number

          _________________________________________________________________

               LAUTO   R,OFFSET         load register from auto variable
               LEXTRN  R,NAME           load register from external variable
               LSTAT   R,#              load register from static variable
               LLIT    R,#              load register with literal value
               LPARM   R,#              load register from parameter
               LVPTR   R,R              load register via pointer in register
               LREG    R,R              move from one register to the other

          1.2  Store Macros

          The  store  class  of  macros  each has two arguments, a register
          (either A or B) and a  storage  class-dependent  argument.    The
          function  of  a  store macro is to move the value in the register
          specified by the first argument to the location specified by  the
          second argument.  The store macros are as follows:

               STAUTO  R,OFFSET         store register into auto variable
               STEXTN  R,NAME           store register into external variable
               STSTAT  R,#              store register into static variable
               STPARM  R,#              store register into parameter
               STVPTR  R,R              store register via pointer in register

          1.3  Load-address Macros

          The  load-address  class  of  macros  each  has  two arguments, a
          register (either A or B) and a storage class-dependent  argument.
          The function of a load-address macro is to construct a pointer to
          the  location  specified  by  the  second  argument, and put that
          pointer value in  the  designated  register.    The  load-address
          macros are as follows:

               LAAUTO  R,OFFSET         load address of auto variable
               LAEXTN  R,NAME           load address of external variable
               LASTAT  R,#              load address of static variable
               LAPARM  R,#              load address of parameter
               LASTRG  R,#              load address of string literal
          4 April 1978                  - 4 -              The CMAC Machine


          2.  Operate Macros

          The  operate  family  of macros perform operations upon values in
          the two registers.  There are  two  classes  of  operate  macros:
          unary and binary.

          2.1  Unary Operators

          The  unary  operators  each  take one argument, which specifies a
          register.   This  register  is  used  for  both  the  source  and
          destination  of  the  operation.  The unary operate macros are as
          follows:

               CMINUS  R                arithmetic minus
               CNOT    R                bitwise negation

          Since there are only two possible argument values, it is possible
          to implement these operations as subroutine calls  to  one  of  a
          pair of subroutines, one for each register.

          2.2  Binary Operators

          The  binary  operators  all  take their first (or "left") operand
          from the A register and their second (or  "right")  operand  from
          the B register, and place their result in the A register, leaving
          the B register unchanged.  The semantics of the operators are the
          same  as  the  corresponding  C  operators.    Since  the operand
          locations are fixed, no arguments are provided to  these  macros.
          The binary operate macros are as follows:

               CADD                     integer addition
               CSUB                     integer subtraction
               CMUL                     integer multiplication
               CDIV                     integer division
               CMOD                     integer remainder
               CLS                      bitwise left shift
               CRS                      bitwise right shift
               CAND                     bitwise AND
               COR                      bitwise OR
               CXOR                     bitwise XOR
               PINC                     pointer increment
               PDEC                     pointer decrement
               PSUB                     pointer subtraction

          Note  that the pointer operations are not necessarily the same as
          the integer operations.  Pointer increment and decrement  take  a
          pointer  as  their  first  operand and an integer as their second
          operand and produce a pointer result.  The integer represents  an
          offset  in  words; it may have to be scaled before it is actually
          added or subtracted to the pointer.   Pointer  subtraction  takes
          two pointer operands and produces an integer result.  The integer
          should  represent  the  number of words that the first pointer is
          offset from the second; it may be necessary to scale  the  result
          The CMAC Machine              - 5 -                  4 April 1978


          of  the  subtraction  in  order  to get the integer in the proper
          units.

          Since the binary operator macros  take  no  arguments,  they  are
          easily implemented as subroutine calls.

          3.  Conditional-jump Macros

          The conditional-jump family of macros all take a single argument,
          a  internal-label  unique number.  This number designates a label
          to which control should jump depending upon the result of a  test
          specified  by  the conditional-jump macro.  This test is based on
          the  values  in  the  registers.    There  are   two   types   of
          conditional-jump   operators,   unary   and  binary.   The  unary
          operators test the value in the A register.  The binary operators
          perform a comparision between the values in the  A  register  and
          the  B  register.    The  unary  conditional-jump  macros  are as
          follows:

               JNULL   #                jump if null pointer
               JNNULL  #                jump if non-null pointer

          The binary conditional-jump macros are as follows:

               JEQ     #                jump if A == B
               JNE     #                jump if A != B
               JLT     #                jump if A < B
               JGT     #                jump if A > B
               JLE     #                jump if A <= B
               JGE     #                jump if A >= B

          4.  Keyword Macros

          The remaining family of macros is  the  set  of  keyword  macros.
          These  correspond  closely  to  the  keyword  macros  used in the
          abstract machine of the C compiler.  The CMAC keyword macros  are
          described  below.   Each  description  is headed by the name of a
          macro and its argument names; following is a description  of  the
          arguments and the intended function of the macro call.

          4.1  Program Definition Macros

               HEAD

          The  HEAD  macro  marks  the beginning of a CMAC program.  It may
          produce any needed header statements.

               CEND

          The CEND macro marks the end of a CMAC program.  It  may  produce
          an END statement, if needed.
          4 April 1978                  - 6 -              The CMAC Machine


               CENTRY  NAME

          NAME is a C identifier.  The expansion of the CENTRY macro should
          declare  the specifed variable to be an entry point, that is, one
          which is defined in the current program but accessible  to  other
          programs.

               CEXTRN  NAME

          The  CEXTRN  macro  is similar to the CENTRY macro except that it
          defines the variable to be an external reference,  that  is,  one
          which is used in the current program but assumed to be defined in
          another program.

               PURE

          The   PURE  macro  indicates  that  the  following  program  text
          represents PURE code or data.  This macro may be used, along with
          the IMPURE macro, to segregate PURE and IMPURE storage.  The  use
          of this macro is optional.

               IMPURE

          The  IMPURE  macro  indicates  that  the  following  program text
          represents impure storage.

          4.2  Symbol Defining Macros

               CEQU    NAME

          NAME is a C identifier; it is to be defined  as  having  a  value
          equal to the current value of the location counter.

               LABDEF  N

          The  LABDEF macro defines the location of internal label number N
          to be the current value of the location counter.

               STATIC  N

          The STATIC macro defines the  location  of  the  static  variable
          whose  internal  static  variable  number  is N to be the current
          value of the location counter.  Typically, this macro will define
          an assembly language symbol by which the static variable  can  be
          referenced.

               STRDEF  N

          The STRDEF macro defines the address of the string constant whose
          internal  number  is  N  to  be the current value of the location
          counter.  It is  immediately  followed  by  one  or  more  INTCON
          macros; the last one will define a zero word.
          The CMAC Machine              - 7 -                  4 April 1978


               LINNUM  N

          The  LINNUM macro associates the line in the source program whose
          line number is specified by the integer N with the current  value
          of  the  location  counter.   It need not produce any code; it is
          provided merely to aid in the reading of CMAC programs.

          4.3  Storage Defining Macros

               ADCON   NAME

          NAME is C identifier.  The ADCON macro should define  a  word  of
          storage  initialized  with  a  pointer  to the specified external
          variable.  This macro is used in the initialization of static and
          external pointers and arrays of pointers.

               SADCON  N

          N is an integer.  The  SADCON  macro  should  define  a  word  of
          storage  initialized  with  a  pointer  to  the  static  variable
          numbered N.  This macro is used in the initialization  of  static
          and external pointers and arrays of pointers.

               INTCON  I

          The  INTCON  macro  should define a word of storage whose initial
          value is that specified by the integer I.   It  is  used  in  the
          initialization  of  static  and external variables and arrays, in
          the definition of string constants, and in  the  construction  of
          tables for the LSWITCH macro.

               LABCON  N

          The  LABCON  macro  should define a word of storage whose initial
          value is the address corresponding to internal  label  number  N.
          The  LABCON macro is used to construct the tables for the LSWITCH
          and TSWITCH macros.

               STRCON  N

          The STRCON macro should define a word of  storage  whose  initial
          value  is  a pointer to the string constant whose internal string
          number is N.  The STRCON macro is used in the  initialization  of
          static and external variables.

               CZERO   N

          The  CZERO  macro  specifies the definition of a block of storage
          initialized to zero; the size in words of this  storage  area  is
          specified by the integer N.
          4 April 1978                  - 8 -              The CMAC Machine


          4.4  Control Macros

               PROLOG  FUNCNO,FUNCNAME

          The  PROLOG  macro  produces  the  prolog  code for a C function.
          FUNCNAME is the name of the C function.   FUNCNO  is  an  integer
          which  specifies the internal function number of the function; it
          may be used in conjunction with the EPILOG macro  to  access  the
          size  of  the  function's  stack  frame.  The PROLOG macro should
          define the entry point name and produce  the  code  necessary  to
          save  the  environment  of the calling function and to set up the
          environment of the called function using the information provided
          in the function call.   These  actions  may  be  performed  by  a
          subroutine.    The  first  eight  words  of every stack frame are
          reserved for  use  by  the  PROLOG  macro;  that  is,  the  first
          automatic  variable  in  a  function  is given an offset of eight
          words.   The  PROLOG  macro  call  appears  in  a  CMAC   program
          immediately  before  the  first  instruction of the corresponding
          function.

               EPILOG  FUNCNO,FRAMESIZE

          The EPILOG macro produces the epilog code for a C function.   The
          epilog  code  should  restore  the  environment  of  the  calling
          function and return to that  function.    These  actions  may  be
          performed  by  a  subroutine.   FUNCNO and FRAMESIZE are integers
          which specify the internal function number of  the  function  and
          the  size  in  words  of  its  stack  frame, respectively.  These
          integers can be used to define an assembly-language symbol  whose
          value  is  the  size  of the stack frame; this symbol can then be
          used by the code produced by the PROLOG macro which allocates the
          stack frame.

               CCALL   NARGS,ARGP,NAME

          The CCALL macro generates a function call.  NARGS is  an  integer
          specifying  the number of arguments to the function call; ARGP is
          an integer specifying the word offset in the caller's stack frame
          of  the  arguments  which  have  been  so  placed   by   previous
          instructions.  NAME is the name of the function being called.

               CALREG  NARGS,ARGP,REG

          The CALREG macro is like the CCALL macro except that the function
          being  called has been computed dynamically.  The address of this
          function is located in the register specified by REG.

               CRETRN

          The CRETRN macro produces the statements needed to return from  a
          function  to  the  calling function, i.e., transfer to the EPILOG
          code.  The returned value of the function will have  been  placed
          in the A register by previous CMAC instructions.
          The CMAC Machine              - 9 -                  4 April 1978


               CGOTO   N

          The  CGOTO  macro  produces an unconditional jump to the location
          defined by internal label number N.

               LSWITCH N,DEFLT

          The LSWITCH macro should generate code which jumps  according  to
          the  value  of  the  integer  in  register  A.    This  macro  is
          immediately followed by N (N>0) INTCON macros (the cases),  which
          are  immediately  followed  by N LABCON macros (the corresponding
          labels), followed by an ELSWIT macro.  A search  should  be  made
          through the case list; if a match is found, a jump should be made
          to  the  label defined by the corresponding LABCON macro.  If the
          integer matches none of the list entries, then a jump  should  be
          made  to  the internal label whose internal label number is given
          by the integer DEFLT.

               ELSWIT  N,DEFLT

          This macro completes an LSWITCH.

               TSWITCH LO,HI,DEFLT

          The TSWITCH macro produces an indexed jump based on the value  of
          the integer in register A.  This macro is immediately followed by
          a  sequence  of  HI-LO+1 LABCON macros defining the target labels
          corresponding to integer values from LO to HI.    Values  outside
          this range should result in transfers to the internal label whose
          internal label number is given by the integer DEFLT.

               ETSWIT  LO,HI,DEFLT

          This macro completes a TSWITCH.