KCC USER DOCUMENTATION <1 About KCC> KCC is a compiler for the C language on the PDP-10. It was originally begun by Kok Chen of Stanford University around 1981 (hence the name "KCC"), and has had many improvements made to it since then by a number of people at Stanford, Columbia, and SRI. It implements C as described by the following references: H&S: Harbison and Steele, "C: A Reference Manual", HS1: (1st edition) Prentice-Hall, 1984, ISBN 0-13-110008-4 HS2: (2nd edition) Prentice-Hall, 1987, ISBN 0-13-109802-0 K&R: Kernighan and Ritchie, "The C Programming Language", Prentice-Hall, 1978, ISBN 0-13-110163-3 Currently KCC is only supported for TOPS-20, although there is no reason it cannot be used for other PDP-10 systems or processors, if the need arises. The remaining discussion assumes you are on a TOPS-20 system. <1 Using KCC> C source files should have the extension ".C", such as PROG.C and SUBS.C. To build a C program, whether from one or more source files ("modules"), there are three things that must be done. First, all modules have to be compiled with KCC to produce .REL files (e.g. PROG.REL and SUBS.REL); second, the LINK loader must be invoked to load all of the necessary modules into an executable core image; and third, this image must be saved on disk as an .EXE file. Every complete C program must contain one and only one module that defines the function "main". This function is where control begins when the program is executed, and unless otherwise specified the .EXE file will be named after the module that "main" appears in. You can make a C program either by using the EXEC commands COMPILE, LOAD, and SAVE, or by invoking KCC directly. For example, suppose "main" is defined in PROG.C, and the file SUBS.C contains auxiliary subroutines. Then, To make: EXEC command Direct KCC invocation ------- ------------ --------------------- PROG.EXE from .C files: @LOAD PROG,SUBS @CC -q PROG SUBS @SAVE PROG Just the .REL files: @COMPILE PROG,SUBS @CC -q -c PROG SUBS PROG.EXE from .RELs: Same as 1st @CC PROG.REL SUBS.REL One advantage of using the EXEC commands is that they will only compile those files which appear to require it, i.e. modules for which the .C file is more recent than the .REL file. The EXEC can also translate TOPS-20 directory names into a format that the DEC loader will understand, so that commands like @COMPILE PROG are possible. However, KCC will do a similar form of conditional compilation if the -q switch is set, for those modules specified without a .C extension. (This may become the default someday.) More commonly, the EXEC at your site may not have been modified to know about KCC, or you may wish to specify certain options to the compilation, or you may just come from a UNIX background and feel more used to the direct invocation method. <1 Direct Invocation - Compiler switches> The KCC compiler switches are intended to resemble those of the UN*X "cc" command as closely as possible. If you are familiar with these, you can probably use KCC instinctively. The command line is broken up into argument strings each separated by a space (NOT by a comma). If an argument string starts with a "-", it is a switch, otherwise it is a filename. Case is significant in switches! Normally, if the filename as given exists, it is used regardless of its form. The exception is files with a ".REL" extension, which are never compiled but are passed on to the linking loader. If a filename does not exist and appears to have no extension, ".C" is added. This feature is primarily useful with the -q switch as it requests conditional compilation. Case is not significant in filenames. If none of -c, -E, or -S are given as switches, KCC will invoke LINK after compilation and an executable file (*.EXE) will be produced. The ordering of switches and filenames, in general, does not matter; all switches are processed before compiling starts. However, note that filenames and libraries will be compiled and/or loaded in the order given, and -I paths will also be scanned in the order given. It is possible to specify KCC switches while giving a COMPILE-class command to the EXEC, if your EXEC recognizes the switch /LANGUAGE-SWITCHES. The argument to this EXEC switch should be a double-quoted string which starts with a space. For example: @compile foo /laNGUAGE-SWITCHES:" -m -d=sym" ------------------------------------------------------------------------ The following are the available compiler switches, in alphabetical order. They are the same as those used by UN*X "cc", except where marked with a "*" -- these are mainly of interest to KCC implementors. * -A Specify a file name for the assembler header file (included at the start of all assembler output). -c Compile and assemble, but don't link (produce *.REL). -C Retain comments in preprocessor (only useful with -E). * -d Debugging output. Same as -d=all. Generates many debug files. * -d= Debugging fine-tuning. are flag names of particular kinds of debug output files. The names can be abbreviated. Prefixing the name with a '+' turns it on; '-' turns it off. All flags are initially assumed off. Current flags are: parse Parse tree output (*.DEB) pho Peep-Hole Optimizer output (*.PHO) - HUGE!!! sym Symbol table output (*.CYM) all All of the above E.g. "-d=parse+sym" == "-d=all-pho" -D Define following ident to "1" or string after '='. E.g. "-DMAXSIZE=25". Several of these may be specified. -E Run source only through preprocessor, to standard output. * -H Specify a non-standard location for <>-enclosed #include files. -i Loader: load code for multi-section (extended addressing) operation. -I Supply a search path for doublequoted #include files. Several of these may be specified, and will be searched in that order. * -L Loader: Specify a non-standard location for library files. * -L= Loader: Specify an arbitrary string argument to the loader. Note that the syntax does not permit spaces to be included. Several of these may be given. -lnam Loader: Specify library filename for loader. The "nam" argument is used to construct the filename LIBnam.REL in the library directory path and this is searched when encountered in the specifications. * -m Use MACRO rather than FAIL. Semi-obsolete, same as -x=macro. -O Optimize (no-op, defaults on). Same as -O=all. * -O= Optimization fine-tuning. Mainly for debugging. are flag names of particular kinds of optimizations. The names can be abbreviated. Prefixing the name with a '+' turns it on; '-' turns it off. All flags are initially assumed off, so to ask for no optimization use -O= (same as -O=-all). Current flags are: parse Parse tree optimization gen Code generator optimizations object Object code (peephole) optimizations all All of the above E.g. "-O=parse+gen" == "-O=all-object" -o= Specify output filename for the executable image. For UN*X-compatibility kicks, "-o " also works. * -P= Portability level specifications. Several switches may be given in a format similar to that for -d and -O. The flags specify the C implementation level that the compiler should use: base Base level C -- most portable and restricted carm H&S CARM level -- full implementation ansi ANSI C draft level (only partly effective) Only one of the previous 3 is allowed, plus an optional: kcc Permit KCC-specific extensions to the selected level. The default is "ansi+kcc" if -P is not given. -P alone is interpreted as "base". * -q Conditional compilation. All file specs without an extension will only be compiled if the .C file is more recent than the .REL file. For example, "cc -q foo bar.c arf.rel" compiles FOO.C if it is more recent than FOO.REL, always compiles BAR.C, and never compiles ARF. -S Don't assemble (produce *.FAI or *.MAC, plus *.PRE) -U Undefine following identifier. All -U switches are processed before any -D switches. Only __FILE__ and __LINE__ are predefined. * -v Verbose - same as "-v=all". * -v= Verbosity switches, similar to -d and -O. fundef - print function names as they are defined (not yet). stats - show statistics for run load - show command string given to loader (if any) -w Don't type out warnings. * -x= Cross-compile switches. Several switches may be given in a format similar to that for -d and -O. The flags specify an aspect of the "target machine" that the code should be compiled for (case is significant!): Target System: tops20, tops10, waits, tenex, its Target CPU: ka, ki, ks, kl0, klx Target Assembler: fail, macro, midas Target char size: ch7 (to compile with 7-bit chars) e.g. "-x=ka+tenex". See "Cross-compiling". ------------------------------------------------------------------------ NOTE: syntax The -I, -H, and -L switches all take a "path" as argument. This is interpreted as specifying both a prefix and a postfix string which are used to sandwich a partial filename from some other source (#include "xxx", #include , and -lxxx respectively). The two strings are separated by the character '+' (this is site dependent however). Thus, for example: Specification Prefix Postfix Sample with "xxx" -I+[SYS,NEW] "" "[SYS,NEW]" xxx[SYS,NEW] -HNEWC: "NEWC:" "" NEWC:xxx -LPS:LIB+.REL "PS:LIB" ".REL" PS:LIBxxx.REL NOTE: Obsolete features The following switches and interpretations are obsolete. They will likely be flushed altogether, but are documented here for historical reasons: * -n same as -O= (no optimization) * -s same as -d=sym (output *.CYM symbol table dump) It used to be a feature that "simple" switches, which did not take any arguments, could be lumped together into a single switch string. For example, "cc -mS test" is the same as the more standard "cc -m -S test". However, use of this feature is discouraged; the potential confusion and inconsistency don't seem to be worth it. NOTE: Switch Portability The following lists the switches implemented by other systems but not by KCC. This information seems useful and this is a convenient place to put it. Other-system switches that KCC implements are not included. Switches which mean one thing to KCC but another thing to other systems are included. Currently only 4.2BSD switches are listed. -g Output additional symtab info for dbx(1), pass -lg to ld(1) -go Ditto for sdb(1). -p Output profiling code for prof(1). -pg Ditto but for gprof(1). -R Passed on to as(1) to make initialized vars shared and read-only. -Bpath Use substitute compiler pass programs specified by . -t[p012] Use only the pass programs from -B designated by -t. ld(1) switches: A, D, d, e, l, M, N, n, o, r, S, s, T, t, u, X, x, y, z <1 User Program - Command line interpretation> The C runtime startup interprets the command line to a C program in a consistent fashion, and supports (1) argument string passing, (2) I/O redirection, (3) pipes, and (4) background processing. There is also provision for (5) suppressing this default command line interpretation. (1) Command line arguments: Command line arguments can be passed to the main() function from the EXEC or monitor in the UN*X fashion. That is, main() is given two arguments, the first of which is an argument count and the second a pointer to an array of char pointers, each of which constitutes an argument. Thus it is conventional to declare the parameters to main() in this way: main(argc, argv) int argc; char **argv; For example, if you have a C program saved as PROG.EXE, then invoking PROG with the command: @PROG one two will set argc to 3, and the strings that argv points to will be "PROG", "one", and "two". Note that arguments are separated by blanks and not by commas! (2) I/O redirection: I/O redirection of stdin and stdout is also supported. Thus: 1. @PROG bar ; will send all stdout output to a new file "bar". 3. @PROG >>log ; will append all stdout output to the old file "log". These can be combined: @PROG bar ; does both 1 and 2. (from "foo", to "bar") However, @PROG bar ; interprets "bar" as a single argument string, ; because it looks like a filename. (3) Pipes: On TOPS-20 systems which implement the PIP: device (developed at Stanford), pipes can also be supported, so that a command such as: @PROG | BAZ causes the stdout of PROG to be redirected to the stdin of BAZ. (4) Background processing: Again, provided the EXEC has been suitably modified, a command line ending in an ampersand ('&') will cause the program to be run in the background, while the user goes on to do other things: @PROG one two& (5) Suppressing the command line interpretation: In certain unusual circumstances it may be necessary to suppress the default command line interpretation, so that the user program itself can handle it in a different way. For information on how to do this, see the include file . <1 C as implemented by KCC> KCC is intended to conform to the description of C as specified by Harbison & Steele's "C: A Reference Manual". It is strongly recommended that all C programmers use this book in preference to Kernighan & Ritchie. As the ANSI C standard becomes more concrete, KCC will likewise evolve to conform to this standard; some of the proposed ANSI features are already implemented. The -P (portability) switch controls the exact level at which KCC attempts to compile a C program. There are three possible levels, and only one of these may be in effect: ANSI - permits all currently implemented ANSI constructs to be recognized and compiled. This is basically CARM level plus some new things; KCC does not yet fully implement the ANSI draft standard, as it keeps changing. Users should be cautious about using ANSI features. CARM - Disables all ANSI-added features which are not in Harbison and Steele's CARM book. KCC fully implements this level. BASE - The most restrictive level. This is basically the same as CARM, but will make KCC complain about some constructs or usages that are likely to be unimplemented by some other compilers. In addition, there is a "KCC extensions" flag which is independent of the level; when enabled, this permits a number of KCC-specific extensions to be recognized regardless of whatever level is in effect. Normally KCC uses the ANSI level with KCC extensions enabled; this corresponds to "-P=ansi+kcc". The next several pages document KCC's implementation of C by following the general ordering of H&S and pointing out aspects where KCC differs or describing which of several optional behaviors KCC implements. Any ANSI features which are implemented are also described. <2 KCC Lexical Elements> [H&S 2, "Lexical Elements"] KCC uses the US ASCII character set. There is provision for using a separate target character set, different from the source set, but currently the only such is a target set for WAITS ASCII. KCC has no maximum line length. Error messages will quote only the most recent part of an offending line if it is longer than 80 characters. KCC is standard in that nested comments are not supported. If the sequence "/*" is seen within a comment, a warning message will be printed just in case the user neglected to terminate the previous comment. <2 Identifier names> KCC adheres to the standard definition of C identifier syntax, allowing the character "_", the letters A-Z and a-z, and the digits 0-9 as valid identifier characters. Identifiers may have any length, but only the first 31 characters (case sensitive) are unique during compilation, which conforms to the ANSI minimum. This applies to all of the following name spaces (as per H&S 4.2.4): Macro names Statement labels Structure, union, and enum tags Component (member) names Ordinary names: Enum constants and typedef names. Variables (see discussion of storage classes). However, the situation is different for symbols which must be exported to the PDP-10 linker. Such names are truncated to 6 characters and case is no longer significant. The character '_' (underscore) is transformed into '.' (period); the PDP-10 software allows the additional symbol characters '$' and '%', but there is no way to generate these with C unless special provision is made; see #asm and '`' under "KCC Extensions". See also the discussion of exported symbols. <2 Reserved Words> KCC has a number of additional reserved words depending on the portability level setting. When KCC extensions are allowed, as is normally the case, the following keywords exist: "asm" - used for assembly code inclusion. "entry" - only in certain special circumstances. See the discussion of libraries and entry points. When ANSI level is in effect (again, the normal case), there are three additional reserved words. All can be considered type modifiers: "signed" Indicates integer type is signed. Implemented. "const" Constant object (recognized but unimplemented) "volatile" Volatile object (recognized but unimplemented) <2 Constants> The types "int" and "long" are the same -- one PDP-10 word of 36 bits, with the high bit a sign bit. Thus, the largest positive integer constant is 0377777777777, or 34,359,738,368. The type "double" is represented by a PDP-10 hardware format standard range double precision number (two words). On KA processors the format is slightly different. The decimal range is from 1.5e-39 to 1.7e38, with eighteen digits of precision. Character constants have type "int". Multicharacter constants are non-standard and not supported. Because characters are 9-bit bytes, numeric escape code values can range from '\0' to '\777'. Hexadecimal character constants are not permitted. String constants are stored as 9-bit byte strings, and do not share storage. That is, two instances of the constant string "foo" will be stored in two distinct places. On TOPS-20, string constants are put in the "pure" segment of a program, but this does not actually enforce any read-only restrictions. If the portability level is ANSI then adjacent string constants are concatenated into a single string. Thus, "foo" "bar" is the same as "foobar". <2 Preprocessor directives> [H&S 3, "The C Preprocessor"] All standard C preprocessor directives are supported as described in Harbison and Steele, including #elif and the "defined" operator. This page specifies how KCC behaves for situations which are implementation dependent. Lexical Conventions: [H&S 3.2] Preprocessor commands must have '#' as the first character on the line; whitespace cannot precede it. KCC allows whitespace between the '#' and the command name (this is non-portable). Formal parameter names ARE recognized within character and string constants in macro body definitions. Comments are treated as whitespace and not passed on to anything else; however, KCC will print a "Nested comment" warning if it encounters a comment which contains "/*". This serves both to catch slightly non-portable usage (see H&S 2.2) and to detect places where the user may have accidentally omitted a "*/". Defining Macros: [H&S 3.3] When defining a macro, formal parameter names are recognized within string and character constants, and therefore no check is made for lexical correctness of such constants; this will change when the ANSI standard firms up. Any comments and whitespace in the macro body are replaced by a single space. KCC permits an argument token list (arguments to a macro call) to extend over multiple lines. Arguments to a call are converted in a fashion similar to that for macro bodies -- comments and whitespace are replaced by a single space. Newlines within an argument list are also considered whitespace. However, string and character constants in arguments are treated as tokens, and their contents are not scanned for macro names. Predefined Macros: [H&S 3.3.4] __LINE__ expands into the current decimal line number. (BSD, ANSI) __FILE__ expands into the current source filename. (BSD, ANSI) __DATE__ expands into the date of compilation. __TIME__ expands into the time of compilation. The date/time of compilation is cleared at the start of compilation for each source file, and is set by the first occurrence of __DATE__ or __TIME__ within that source file. __STDC__ expands into the ANSI standard level # (not implemented yet). The first two macros are furnished for compatibility with 4.2BSD; the next two were added from ANSI. __STDC__ will only be added when -P=ansi is a full implementation. There are no other predefined macros; use the file for standard KCC environment definitions. Undefining and Redefining Macros: [H&S 3.3.5] It is not an error to redefine an already defined macro, but a warning message will be output unless the new macro definition is the same as the old definition; i.e. redundant definitions are allowed. There is no macro definition stack, i.e. definitions are not pushed/popped by #define/#undef. Attempting to define a macro named "defined" will cause an error, since otherwise it would conflict with the "defined" operator. Converting Tokens to Strings: [HS2 3.3.8] KCC does recognize formal parameter names within string and character constants. This will change as the ANSI standard shapes up. File Inclusion: [H&S 3.4] Included files may be nested to 10 levels. Macro expansion is done on the line if the filename does not start with '<' or '"'. Filenames may contain '>' or '"' characters. #include looks only in the standard directory. #include "filename" looks first in DSK:, then in the -I paths in order of specification (left to right), then in the standard directory. The standard directory for include files is C: on TOPS-20, on TENEX, and [SYS,KCC] on WAITS, but this is site dependent in any case. Conditional Compilation: [H&S 3.5] #if,#else,#endif,#elif,#ifdef,#ifndef The "defined" operator is recognized only within #if and #elif expressions. Note that neither #elif nor "defined" are in K&R, and H&S is used as the reference here; neither will be recognized unless the portability level is at least "carm". Within the body of a failing conditional, only other conditional commands are recognized; all others, even illegal commands, are ignored. Explicit Line Numbering: [H&S 3.6] #line The information from #line will be used in KCC error messages. Macro expansion is performed on the line. Like all other preprocessor commands, #line is eliminated and not passed on when using the -E switch. With regard to "#" alone at the start of a line, remember that whitespace is allowed between the "#" and the command name, thus KCC will not recognize a "#" alone as a synonym for "#line". If there is no command name, the line is simply ignored without error. KCC-specific Commands: #asm, #endasm These two commands cause the text delimited by them to be macro-expanded (as for -E) and converted into an "asm()" expression for direct inclusion in the output assembly language file. This currently only works inside functions. This feature is very likely to change, and should only be used where absolutely necessary. Keep the code simple, as someday KCC may want to parse it. See "KCC Extensions" for additional details. <2 Storage classes> [H&S 4.3 "Storage Class Specifiers"] KCC implements the standard storage classes of auto, extern, register, static, and typedef (H&S sec 4.3), with the following notes: REGISTER declarations are currently equivalent to AUTO. KCC does not assign variables to registers, and optimizations are performed without using the "hint" given by REGISTER. AUTO variables are almost always more efficient, and in any case they are easier to implement. KCC uses the "omitted-EXTERN" solution to deal with the question of top-level definitions versus references (H&S sec 4.8). That is, omitting "extern" from a top-level declaration has the effect of indicating that this is a defining declaration rather than a referencing declaration. Duplicate Declarations: As per H&S 4.2.5, KCC permits any number of external referencing declarations, if the types are the same. However, because KCC treats omitted-extern declarations as defining declarations, these references must all have an explicit "extern". Likewise, an external reference may be later followed by a defining declaration. KCC has additional special handling for declarations of functions, because it can always be determined whether a function declaration is a reference or a definition. Any number of "static" referencing declarations are allowed. Conflicts are resolved as follows: If an implicit external reference is followed by a static reference or definition, KCC will assume the function is static. It is an error if the first reference has an explicit "extern". It is also an error if a static reference is followed by an external reference or definition. In either case compilation proceeds as if the function was static. <2 Initializers> [H&S 4.6 "Initializers"] KCC adheres to H&S in all required respects. The following notes cover points which H&S describes as implementation dependent: Optional braces are allowed for all non-aggregate initializers. It is permitted to drop braces from initializer lists under the rules described in H&S 4.6.8 (HS1 4.6.9), but KCC attempts to perform extremely stringent checking on the "shape" of initializers, and will complain about too many or too few braces. FLOATING-POINT initializers may be of any arithmetic type. KCC performs compile-time floating-point arithmetic, so initializers for static and external variables may use any constant arithmetic expression. POINTER initializers, as described in H&S, must evaluate to an integer or to an address plus (or minus) an integer constant. ARRAY initializers are currently not allowed for automatic arrays. This will change as ANSI permits it. ENUMERATION initializers may use any integer (as well as enum) expression. STRUCTURE initializers can initialize bit fields with any integer expression. As for arrays, automatic and register structures cannot be initialized. This will change as ANSI permits it. UNIONS currently cannot be initialized. This will change as ANSI permits it. <2 Exported symbols> [H&S 4.8 "External Names"] Symbols which are exported to the assembler file have special restrictions imposed by current PDP-10 software, which only recognizes 6-character symbols from the set A-Z, 0-9, '.', '$', and '%'. In particular, case is not significant. Also, there is a distinction between symbols exported only to the assembler and those exported both to the assembler and the linker. While there is technically no reason that any symbol has to be given to the assembler if it is not also meant for the linker, in practice it is convenient for debugging to have some "local" symbol definitions available so that DDT can access them. Here is a breakdown of export status by storage class: typedef = Exports nothing. (Not a real storage class) auto = Exports nothing. (Local stack variables use an internal offset) register = Exports nothing. static = If not global scope (i.e. is within a block) then nothing exported; an internally-generated label is used. If global (top-level, within no block) then exported to assembler only. A label is made, but no INTERN or ENTRY statement. extern = Always exported to both assembler and linker. Omitted-extern: a DEFINITION. A label, INTERN, and ENTRY are output. Explicit-extern: a REFERENCE. An EXTERN statement is output, but only if the symbol is actually referenced by the code. Omitted-Extern: External declarations with no "extern" storage class explicitly given are assumed to be external DEFINITIONS. A defined extern symbol will have its own label, plus an INTERN statement telling the assembler that this is an externally visible symbol, plus an ENTRY statement which allows library routine search to find this symbol. ENTRY statements will be put into the .PRE output file rather than the main output file, since the assembler will need to scan them prior to anything else. Explicit-Extern: If an "extern" is explicitly given, the compiler assumes that it is simply a REFERENCE. Nothing will be done unless the symbol is actually referenced by the code, in which case an EXTERN line will be generated in the assembler output for that file. The reason for the reference count check is that each assembler EXTERN constitutes a library search request which must be satisfied by a module with the corresponding symbol declared as an ENTRY. Unless this is only done for actual references, the many superfluous declarations found in *.h files will tend to cause many unneeded library modules to be loaded. Static symbols: Note that global static symbols are passed on to the assembler even though this is not necessary; an internally-generated label could be used just as well. The main reason this is done is to facilitate debugging with DDT, otherwise it could be difficult to identify static functions when looking at the machine instructions. This may cause problems if identifiers which are otherwise distinct become identical as a result of the conversion to a 6-char PDP-10 symbol. However, a symbol declared static within a given source file will never be visible from another file that you may link later with it. For example, a function declared as static char *function() { ... } will only be visible from other functions within the same source file. This allows several modules to have functions with the same name modulo the six character limit, as long as no two of the functions are both extern. It is STRONGLY recommended for multi-module programs that you declare as many functions as possible to be "static". <2 Libraries and Entries> REL files to be converted by MAKLIB into object libraries must have any external symbols declared with ENTRY rather than merely INTERNing them, and this declaration must be at the start of the REL file. In order to do this, KCC generates a *.PRE "prefix" output file in addition to the *.FAI or *.MAC output file, and invokes the assembler in such a way that the PRE file is assembled before the main file. This file contains ENTRY statements and any other predeclarations that are needed before the assembler sees the actual code. Normally the user will never see this file, but if the -S switch is used then it will be left around as well as the FAI/MAC file. Note that if running the assembler manually on the FAI/MAC file, you must invoke it with a command line like this: [@]FAIL [@]MACRO [*]FOO=FOO.PRE,FOO.FAI [*]FOO=FOO.PRE,FOO.MAC COMPATIBILITY INFO: For compatibility, KCC will continue to recognize an "entry" keyword for some time to come. The following describes the obsolete syntax: To declare an entry, use the "entry" keyword at the start of the source, before any other declarations: "entry" ident ["," ident ...] ";" i.e., the keyword "entry", followed by a list of identifiers separated by commas, followed by a semicolon. This is passed on essentially verbatim to the assembler, and has no other affect on compilation. It should be used at the start of any runtimes or other file intended for a library, on all variables and functions that should be visible as entries in the library. Note that it should still be safe to use "entry" as a non-keyword; if used other than at the start of the file it will be treated like any other normal identifier. To repeat: the "entry" statement is no longer necessary. It should not be used in new code, and should be removed from old code. <2 Types> [H&S 5 "Types"] STORAGE UNITS: A KCC storage unit (what "sizeof" returns) is a 9-bit byte, and there are 4 of these in each 36-bit PDP-10 word, ordered left to right from most significant to least significant. INTEGERS: KCC's integer types have the following sizes: Type Bits "sizeof" value char 9 1 short 18 2 (PDP-10 halfword) int 36 4 (PDP-10 word) long 36 4 (PDP-10 word) All of these types may be explicitly declared as "signed" if ANSI level is in effect. Single variables declared as "char" or "short" are stored right-justified into a full word; only when packed into an array or structure are they stored as 9-bit (or 18-bit) bytes, left to right within each word. UNSIGNED INTEGERS: Unsigned integers are fully implemented; any integer object may be either "signed" or "unsigned", and both forms use exactly the same amount of storage, with the high order bit considered the sign bit (if the object is signed). However, because the PDP-10 has no instructions specifically for unsigned data, some operations are slower for unsigned ints. Addition (+) and subtraction (-) are the same. == and != are the same. Left shift (<<) always uses the LSH instruction (logical shift). Right shift (>>) uses LSH for unsigned, ASH for signed operands. ASH is an arithmetic shift which propagates the sign bit. <,<=,>,>= are slightly slower for unsigned operands. Casts to floating-point are slower. Multiply (*) is also slightly slower. Divide (/) and remainder (%) are much slower. CHARACTER: The plain "char" type is "unsigned char". Sign extension is done only if chars are explicitly declared as "signed char". Normally a char is 9 bits, although it is possible to compile code using a 7-bit assumption (see the section on char pointer hints). Old versions of KCC used to store the chars of a string constant in 7-bit form, packed 5 to a word (ASCIZ format); this is no longer the case and string constants are normally now full 9-bit char strings. An extension to KCC provides five additional types of "char" objects, specified as "_KCCtype_charN", where N is the number of bits in the char and may be one of 6, 7, 8, 9, or 18. All may be signed or unsigned; their "plain" form is unsigned. See the "KCC Extensions" section for additional details. FLOATING-POINT: The "float" type is represented by one word in the PDP-10 single precision floating point format; there is one bit of sign, 8 bits of exponent, and 27 bits of mantissa. The "double" type uses two words in the PDP-10 double precision format. (Note that for the KA-10 this is a software format rather than the more usual hardware format.) The exponent range is approximately 1.5e-39 to 1.7e38 in both formats; single precision has about 8 significant digits and double precision has 18. See a PDP-10 hardware reference manual for details. KCC also supports the new ANSI "long double" type when ANSI level is in effect. Currently this is the same as "double" but this will probably change on KL-10s to use "G" format floating point, which has an exponent range of 2.8e-309 to 9.0e307 but only 17 significant digits. The (double) type can represent all values of (long). That is, conversion of a (long) to a (double) and back to (long) results in exactly the original value. POINTERS: Pointers are always a single word, but can have two different internal formats. Pointers to chars, shorts, or bit fields, are PDP-10 byte pointers (local or one-word global); pointers to all other objects are PDP-10 global word addresses. Byte pointers point to the byte itself rather than to the preceding byte, thus LDB instead of ILDB is done to fetch the byte. It is very important to ensure that functions which return values of (char *) be properly declared; likewise, any function arguments which are expected to be (char *) must be cast to this if necessary. Operations which expect a char pointer will not work properly when given a word pointer, and vice versa. See the section on "pointer hints" near the end of this file for additional information. The "NULL" pointer is represented internally as a zero word, i.e. the same representation as the integer value 0, regardless of the type of the pointer. The PDP-10 address 0 (AC 0) is zeroed and never used by KCC, in order to help catch any use of NULL pointers. ARRAYS: The only special thing about arrays is that arrays of chars consist of 9-bit bytes packed 4 to a word, and arrays of shorts have 18-bit halfwords packed 2 to a word; all other objects occupy at least one word. ENUMERATIONS: KCC treats enumeration types simply as integers. In the words of H&S 5.5 (HS1 5.6.1), KCC uses the "integer model" of enumerations, which is what ANSI has adopted. STRUCTURES and UNIONS: Structures and unions are always word-aligned and occupy a whole number of words. Unlike the case for other declarations of type "char" or "short", adjacent "char" and "short" members in a structure are packed together as for arrays. Structures and unions may be assigned, passed as function parameters, and returned as function values. Bit fields are implemented; the maximum size of a bit field is 36 bits. They may be declared as "int", "signed int", or "unsigned int"; plain "int" bitfields are unsigned. Fields are packed left to right, conforming to the PDP-10 byte ordering convention. It's too bad that C does not allow pointers to bit fields, because the PDP-10 byte pointer instructions are perfectly suited to this application! FUNCTIONS: As per H&S. A pointer to a function is simply a word address. For the gory details of function calls and stack usage, see the "Internals" section. TYPEDEFS: As per H&S. With regard to 5.10.2 (HS1 5.11.1), KCC has no problems with redefining typedef names in inner blocks. <2 Type Conversions> [H&S 6 "Conversions and Representations"] Integer conversions: There are no representation changes when converting any integer type to any other integer type of the same size. Sign extension and truncation are performed when necessary to convert from one size to another. Conversions from pointers are done as per H&S 6.2.3 (V1 6.3.4); a pointer is treated as an unsigned int and then converted to the destination type using the integral conversion rules. Floating-point conversions: Casting (float) to (double) or (long double) retains the same value. However, (double) to (long double) may lose one digit of precision, depending on the implementation chosen for (long double). A cast to (float) of an int may lose some precision, although a char or short can always be fully transformed. (double) can retain the exact value of an int or long int, which can be restored to its original value by converting back to int. Casting an unsigned integer to a floating-point value always results in a positive number. Pointer conversions: There are a great variety of pointer conversions possible; however, you can make sense of them if you simply note the following Three Laws of Pointers: (1) Nihil ex nihilis -- a NULL pointer always remains NULL. (2) Smaller is finer -- a pointer to any object can always be converted into a pointer to a SMALLER (or equal-sized) object, without losing any information. Converting it back to the original type restores the original value. (3) Bigger is blunter -- converting a pointer to any object to a pointer to a LARGER object will force the pointer to have an alignment suitable for that of the larger type; any fine details of positioning within the new type are lost, and the original pointer cannot be recovered (unless it was already properly aligned to begin with). The new object pointed to will completely enclose the smaller object. Specifically: Chars are aligned on 9-bit byte boundaries, shorts on halfword boundaries, and all other data types on word boundaries (with the exception of bitfields and the _KCCtype_charN types). Converting any pointer to a (char *) and back is always possible, as a char is the smallest possible object. If the original object was larger than a char, the char pointer will point to the first byte of the object; this is the leftmost 9-bit byte in a word (if word-aligned) or in the halfword (if a short). A cast to (int *) of a char pointer produces an address that points to the word that the char pointer indicates, regardless of which byte in the word was being pointed at. Pointer casts are not always trivial, but they are reasonably fast (from 1 to 4 instructions depending on the alignment requirements). The only exception to the 3 rules is the case of pointers to objects of _KCCtype_charN types (see the KCC extensions section). Casting any pointer to or from those types is performed by first converting the original pointer into a word pointer (thus forcing alignment to a word boundary) and then applying the desired conversion. Assignment conversions: KCC permits any casting conversion during an assignment, but will complain about an implied cast if the conversion is not one of the legal assignment conversions. Unary conversions: The "Usual Unary Conversions" are different for CARM and ANSI: Original operand type Converted type CARM ANSI (default) float double float signed char/short/bitfield int int unsigned char/short unsigned int int unsigned bitfield unsigned int *int or @unsigned int * = if bitfield has fewer bits than an int. @ = if bitfield has more (or same #) bits than an int. The first difference is (float) to (double). What H&S describes as an "optional compilation mode" to suppress the unary conversion of (float) to (double) is always in effect for ANSI level, as ANSI is allowing this feature as part of the standard conversions, and the resulting PDP-10 code is much more efficient. If ANSI level is not selected, then all (float) values will be implicitly converted into (double) as per the old C standard. Note that all portability levels require that (float) values always be promoted to (double) in function arguments, so this particular implicit conversion is always in effect. The second difference is in the integer promotions. CARM uses what ANSI calls "unsigned preserving" rules; ANSI uses "value preserving" rules, meaning that a conversion to a wider type should always result in a signed integer type regardless of whether the shorter type was unsigned or not, as long as the new type can represent all values of the old type. Binary conversions: As already noted, (float) values are not always implicitly converted to (double) before being operated on, if ANSI level is in effect. There is one other difference between ANSI and CARM with respect to the usual binary conversions: If one operand is "long" and the other is "unsigned int", CARM: makes both "unsigned long". ANSI: makes both "long". <2 Expressions> [H&S 7 "Expressions"] As per H&S, with the following notes: [7.2.2] (V1 7.2.3) Overflow and underflow are neither noticed nor handled. The result is whatever the PDP-10 hardware gives in those cases. [7.3.3] KCC correctly does not use parentheses to force the usual unary conversions. [7.4.2] (V1 7.3.5) KCC permits component selection for structures returned from functions, except when the component is an array. That is, "f().a" will work and will select component "a" of the returned structure, but it is not legal to do "f().array[i]". This point may be clarified in the future by the ANSI draft standard. [7.4.3] (V1 7.3.6) KCC correctly does not allow formal parameters of type "function", so the issue of converting this type does not arise. KCC does not currently do any checking to see if the types of the arguments match the types of the parameters for the called function. When ANSI function prototypes are implemented, this will change. KCC does not issue any warnings about discarded function return values. [7.5.1] (V1 7.4.1) Casts - KCC correctly implements "narrowing" casts for floating point and for integers. [7.5.2] (V1 7.4.2) "sizeof" - the result of "sizeof" currently has type (int). This is far more than adequate for any possible size value. The result of sizeof is ALWAYS in terms of 9-bit bytes, regardless of the setting of -x=ch7, with two exceptions: the size of a char is always 1, and the size of a char array is the # of elements (chars) in the array. This is true no matter how many bits are in a char. [7.5.6] (V1 7.4.6) '&' - Attempting to apply '&' to a "register" variable simply causes KCC to issue a warning message and force the variable to class "auto". KCC does not permit '&' to be applied to array or function names; this will change as ANSI permits it. [7.5.7] (V1 7.4.7) '*' - Applying the indirection operator to a null pointer (0) simply retrieves (or sets) the contents of AC 0, which should always be zero if nothing accidentally sets it. Treating the null pointer as a char pointer will always retrieve zeroes and set nothing. [7.6.1] (V1 7.5.1) '*','/','%' - Division by zero is a no-op; the value will be that of the dividend. Truncation is always toward zero whether the operands are negative or not: 5/2 == (-5)/(-2) == 2 (-5)/2 == 5/(-2) == -2 For the remainder operator, (x)%0 gives unpredictable garbage. The sign of the remainder will be the same as that of the dividend: 5%2 == 5%(-2) == 1 (-5)%2 == (-5)%(-2) == -1 These operations are slower for unsigned than for signed operands. Division in particular is slow. [7.6.2] (V1 7.5.2) '-' - The type of the difference between two pointers is (int). [7.6.3] (V1 7.5.3) '<<','>>' - Left shift (<<) always uses logical shifting; bits can be shifted into the sign bit. Right shift uses logical shifting for unsigned integer types (the sign bit is shifted out, and 0-bits shifted in), but uses ARITHMETIC shifting for signed integer types (the sign bit is propagated). Using a negative value for the right operand reverses the direction of the shift. Using a large number (36 or greater) simply shifts everything to oblivion as expected. Note that it is possible to use left-shift arithmetic shifting (the ASH instruction) by giving a negative shift distance to >>; of course this is very non-portable. [7.8] (V1 7.7) '?' - KCC correctly permits the result of a conditional expression to have structure, union, enumeration, or void types. [7.9.1] (V1 7.8.1) Structure and union assignment is (of course) permitted. [7.9.2] (V1 7.8.2) 'op=' Compound assignment - KCC does not support the obsolete "=+" compound assignment forms. [7.11] (V1 7.10) Constant expressions - KCC can and does evaluate constant floating-point expressions at compile time. Almost all casts are also allowed, except certain pointer-pointer conversions where the result would depend on whether the program was running multi-section. KCC is currently somewhat too liberal about the constant expressions in preprocessor #if statements; it allows the use of any integral constant expression, including enum constants and sizeof operators. This is possible because the preprocessor is integrated with the compiler. The eventual fix for this will probably issue a warning but permit the usage. [7.12] (V1 7.11) KCC correctly does not interleave expression computations. [7.13] (V1 7.12) KCC tries to issue warnings about discarded values. This may change with time. [7.14] (V1 7.13) KCC does some optimization of memory accesses, but not much. This may change with the coming of ANSI's "volatile" type modifier. <2 Statements> [H&S 8 "Statements"] As per H&S, with the following notes: [8.7] switch statement - KCC permits the control expression of a switch statement to be of any integral or enumeration type. <2 Functions> [H&S 9 "Functions"] [9.4] Adjustments to Parameter Types Parameters which are declared as "char" or "short" are really handled as type "int", and "float" is really "double"; however, KCC does not implement narrowing as per 9.4, because the description of this is too unclear -- what happens if such a parameter is used as an lvalue? The situation will improve with ANSI function prototypes. KCC follows the language strictly and does not permit formal parameters of type "function returning...". <1 The C Libraries> [H&S Part II (V1 11: "The Run-time Library")] ALL of the facilities described in H&S part II are implemented as described. In addition, various UN*X system call emulations and standard library routines are also supported. The file LIBC.DOC furnishes a complete summary of the implemented library routines; there is also USYS.DOC, which both summarizes the system-call simulations. In general, users are advised to read H&S or a UPM (Unix Programmer's Manual) for complete descriptions of library functions, as these files are primarily intended to document KCC-specific differences rather than to provide a user guide. <2 [H&S 13] Standard Language Additions> <2 [H&S 14] (V1 11.1) Character Processing> <2 [H&S 15] (V1 11.2) String Processing> <2 [H&S 16] Memory Functions> <2 [H&S 17] (V1 11.5) Input/Output Facilities> (V1: "Standard I/O") <2 [H&S 18] (V1 11.4) Storage Allocation> <2 [H&S 19] (V1 11.3) Mathematical Functions> <2 [H&S 20] Time and Date Functions> <2 [H&S 21] Control Functions> <2 [H&S 22] Miscellaneous Functions> <2 C Library - Other Library Functions> A few other miscellaneous facilities exist which are not listed in CARM, such as jsys() and the TERMCAP library. They are described in LIBC.DOC. <1 C Library - UN*X System Calls> The KCC runtime environment is intended to resemble that of UN*X to a limited extent. For example, main() is invoked with "argc, argv" arguments parsed from the command line, and many system calls are emulated. This emulation is not intended to be complete, and the calls exist primarily to help transport software to and from UN*X systems. Whenever possible, the standard portable routines as described in H&S should be used instead of these "system calls". The file USYS.DOC summarizes the calls which KCC supports, and describes how they differ from the UN*X versions. A UPM (Unix Programmer's Manual) should be consulted for descriptions of how these calls should behave on UN*X itself. <1 KCC Language Extensions> KCC implements a number of extensions to the C language which are intended to allow for better integration with other PDP-10 software. It is possible to disable these extensions by means of the -P switch. These extensions are: [1] The "entry" keyword (obsolete). [2] The '`' identifier quoting mechanism. [3] The #asm and asm() assembly language mechanism. [4] The "_KCCtype_charN" data types. <2 Extension [1] - The "entry" keyword> The use of this statement has been described earlier in the discussion of library entry points. However, it is an obsolete feature and should no longer be needed for any purpose. Future versions of KCC will flush it if no one objects. <2 Extension [2] - Identifier Quoting> The current PDP-10 software allows symbols to have 6 characters from the set A-Z, 0-9, ., %, $. KCC maps 0-9 to 0-9, a-z and A-Z to A-Z, and '_' to '.'. KCC supports a non-standard extension to C whereby any characters enclosed within accent-grave ('`') marks are treated as a valid C identifier. This allows the user to specify identifiers containing the characters '$' and '%', as well as any arbitrary character, although KCC will print a warning if a character not in the PDP-10 set is seen. Examples: `$FOO`, `OPENF%`, `$$BP`, `switch` This mechanism should be used ONLY where necessary. It is not portable and should be conditionalized if used in portable code. Identifiers defined in this way should be CONSISTENTLY quoted in this way, because they are stored internally with '`' as their first character to distinguish them from normal unquoted identifiers and keywords. This avoids potential confusion and allows one to specify an identifier which is otherwise a reserved keyword, such as `if`. <2 Extension [3] - #asm and asm()> Many C compilers have an escape mechanism which allows the programmer to specify a series of assembly language instructions within a C program. KCC's means of doing this is with the "asm()" expression, which looks exactly like a function call. Currently only one argument is allowed to asm() and this must be a string literal. The text of the string is simply passed directly to the assembler output file at that point in the compilation. There is also a preprocessor command called #asm, which converts everything up to an #endasm into an asm() expression. This is convenient for very long stretches of assembler code, or where the enclosed text must be macro-expanded. Invoke %%CODE or %%DATA to switch between assembling pure and impure (variable) code/data. #asm inclusions will always begin in the code segment, and must always end in the code segment. Never use %%CODE when already in the code segment, or %%DATA when already in the data segment. Because asm() is syntactically an expression, it can only appear where an expression is legal. However, any attempt to use it anywhere but as the sole contents of a function body is highly fraught with peril. If it is necessary to specify some assembler directives separate from any function, an acceptable way of doing this is by means of a static dummy function, such as: static void dummyfunct(){ asm("%%DATA\n STUFF: ASCIZ/foo/ \n %%CODE\n"); } It cannot be repeated too often that use of asm() is strongly discouraged. It is possible that someday its functionality will be extended to the point that KCC can parse and understand the contents (thus, for example, references to C auto variables would be allowed); however, this would primarily be for the purpose of allowing KCC to generate .REL files directly rather than to encourage wider use of asm(). At the start of the assembler file, a PURGE is done of all the assembler IF pseudos. Thus, assembler code cannot use any IF pseudo tests, nor macros which use them. Incidentally, attempting to use a SEARCH MONSYM will cause FAIL to barf several times with a "FAIL BUG IN SEARCH" message, due to the lack of the IF pseudos; this is annoying but harmless. MACRO does not have this problem. <2 Extension [4] - "_KCCtype_charN" data types> Normally the "char" data type is 9 bits. In the PDP-10 world much existing software depends on 7-bit characters, and to make it easier to write the necessary system-dependent code a 7-bit char data type was introduced and generalized. The 5 possible char sizes (6, 7, 8, 9, and 18) were chosen because it is only for those sizes that OWGBPs exist (one-word global byte pointers), and thus only those sizes can be guaranteed to work when using extended addressing. Any of the char types can be signed or unsigned; if the plain form is used, unsigned is assumed. Narrowing and widening is done properly whatever the size. Note that the 18-bit size corresponds to "short"; it is included mainly for completeness rather than in the expectation that someone would actually use it. The 9-bit size is the same as regular "char", unless the -x=ch7 option is in effect, in which case "char" is the same as the 7-bit size. These types can normally be used just as for "char". However, there are some special effects associated with certain operations: (1) "sizeof" of a N-bit char array returns the number of N-bit chars (elements) in the array. Usually this is what you want. Giving this number to malloc will cause problems only for chars of 18 bits. (2) A cast (explicit or implicit) of a string literal to a N-bit char pointer will cause the string literal to be stored as N-bit bytes. This is NOT strict C, which would merely convert the char pointer; however, this is the most useful interpretation. This permits the somewhat bizarre construct of using a string literal to make an array of 18-bit bytes (this is the only aspect where "_KCCtype_char18" differs from "short"). (3) 6-bit string literals are stored as SIXBIT rather than using the low 6 bits of the ASCII char values. Note that while such strings are null-terminated, null is a valid SIXBIT character (meaning space). The value of invalid SIXBIT characters is undefined. (4) Function parameters cannot be declared to have a type of char size 7 or 8. The reason is complicated; see the last part of this section. Some examples: _KCCtype_char6 tmp[] = "tmp"; /* A 4-element array of SIXBIT chars */ _KCCtype_char7 wd[5] = "word"; /* A 5-element array of 7-bit chars */ _KCCtype_char8 packet[40]; /* A 40-element array of 8-bit chars */ _KCCtype_char18 useless; /* Same as "unsigned short useless;" */ _KCCtype_char7 *arg = "text"; /* A pointer to an ASCIZ string */ _KCCtype_char6 *pt6; /* A pointer to a 6-bit char string */ arg = "othertext"; /* Implicit conversion to ASCIZ */ pt6 = "dskdmp"; /* Implicit conversion to SIXBIT */ pkg_call((_KCCtype_char7 *)"argtext"); /* Explicit cast to ASCIZ */ Portability issues: The long names for these types were deliberately chosen so as to minimize the chances of possible conflict with identifiers in software imported from elsewhere, and to discourage the indiscriminate (non-portable) use of the types. Note that users who must make heavy use of them (for good reasons, we hope) can simply use typedefs or #defines at the start of their code in order to equate them with simpler names; e.g. #define char7 _KCCtype_char7 /* Use shorter typename */ This method also has the advantage of localizing non-portable constructs in a way that gives others a fighting chance to port the software elsewhere by changing the initial definitions. Storage: There are a few aspects of the way N-bit char objects are stored which may be surprising at first. Char arrays are always packed starting with the leftmost byte in a word; however, single-char objects (such as "char c;" have their value stored in the rightmost ALIGNED BYTE. This is a necessary consequence of the fact that the '&' operator applied to a char object must result in a valid char pointer, and the very strong desire that all C code work with extended addressing. There are only a few possible kinds of OWGBPs and they all require this alignment. For 6, 9, and 18 bits this causes no difficulty since bytes of those sizes completely fill a word, and there are no unused low-order bits; thus char values may be stored completely right-justified, and in some cases full-word operations can be performed on them. However, for 7 and 8 bit bytes the rightmost byte will leave 1 and 4 unused low-order bits, respectively, and this is where KCC stores the values for such objects. Debuggers examining a program with IDDT may be surprised that "_KCCtype_char8 foo = 1;" results in a word labelled FOO with its value 020 instead of 1. This alignment restriction causes no real problems except for the obscure case of function parameter declarations. In the absence of ANSI function prototypes, the default "function argument promotions" are performed when a call is made; all integers shorter than (int) are converted to (int) and passed as such. But this means that the integer value is right-justified; if the function parameter was declared to match the promoted type (int) then all is well, but attempts to declare it as a 7 or 8 bit char will just result in a confused function (attempts to read the parameter value or take its address will fail since the value is not properly aligned). This could be fixed by having KCC do an implicit conversion upon function entry, but it is far simpler and much, much more efficient to simply declare such parameters as (int) in the first place. If the code will never be run on a KL then, of course, this and many other things could be simplified. <1 KCC Internals> <2 KCC Internals - Memory organization> A C program compiled by KCC has four distinct memory regions: data, text (code), stack, and free. DATA - This contains all user-declared data variables, both initialized (set to user's specification) and un-initialized (set to zero). The first address following this region is stored in "_edata". TEXT - This is the UNIX terminology for program code. The first address following this region is stored in "_etext". STACK - The program stack. This grows upwards in memory. FREE - The region of memory that malloc() can dynamically allocate. This starts at the address stored in "_end" and can allocate memory up to (but not including) the address stored in "_ealloc". In addition, there may be small unused areas of memory. The normal layout on TOPS-20 for a single-section program: Start addr End addr Region Name LOW _edata-1 DATA _edata STACK HIGH-1 - (unused) HIGH _etext-1 TEXT _etext _ealloc-1 FREE _ealloc 777777 - (unused, reserved) Normally LOW == 0 and HIGH == 400000. These correspond to the normal addresses for low and high segments. Also, normally _ealloc is set to 770000, so that pages 770-777 can be reserved for mapping DDT (some people seem to prefer that to IDDT). The normal layout on TOPS-20 for a MULTI-section program: Start End Region Name Section 0 - (unused) Section 1 1,,LOW _edata-1 DATA _edata 1,,HIGH-1 - (unused) 1,,HIGH _etext-1 TEXT _etext 1,,777777 - (unused) Section 2 2,,0 STACK 2,,777777 - (unused) Sections 3-37 3,,0 _ealloc-1 FREE (all sections up to 37) _ealloc 37,777777 - (unused, reserved) Normally _ealloc is set to 37,,700000 so that pages 700-777 of section 37 are reserved for mapping XDDT (again, for those people who don't know about IDDT). <2 KCC Internals - Stack structure> The organization of the portion of the stack seen by a C routine is shown in the following diagram (with the top of the stack being the earlier lines in this file, and the stack pointer at the very top): SP-->________________________________________________________________ | Spilled registers | | generated when we need more intermediate values than | | there are available PDP-10 registers | |________________________________________________________________| | | | | (as many | Arguments being stacked for the next call | | repetitions | These are generated in the reverse of | | of these | lexical order; thus the first argument | | two areas | appears at the top of the stack. This is | | as levels | so that functions like printf which take a | | of nesting | variable number of arguments can work. | | in function |__________________________________________________| | calls) | | | | Values to be saved over the call | | | e.g. if we do foo()+bar() then one function | | | has to be called first, and we save its | | | value here so we can add it to the other | | | result once the second call returns | |_____________|__________________________________________________| | | | Local variables | | stored in lexical order, i.e. the first declared | | variable is lowest on the stack | |________________________________________________________________| | | | Return address for calling function | |________________________________________________________________| | Pointer for return value | | this only exists if the function returns a struct | | that takes more than two words; otherwise the result | | is returned in registers 1 and (if two words) 2 | |________________________________________________________________| | | | Arguments to this call | | in reverse lexical order as described above | |________________________________________________________________| Of course, not all of these areas are likely to appear at once. There is no frame pointer, only a stack pointer; generated code always knows the location of the stack pointer in relation to changes in the above structure (as arguments get pushed and popped, registers get spilled and despilled, etc). Thus code to access an argument or local variable will use a different offset from the stack pointer depending on where it is generated. <2 KCC Internals - Calling conventions and register use> Arguments to KCC C functions are passed on the stack and returned in the registers. Functions are not expected to save any registers upon entry, and in fact are assumed to clobber all of ACs 1-16 inclusive. Caller conventions - argument passing: Since all function calls are assumed to clobber the registers, it is up to the caller to save on the stack any register values which it wishes to preserve over the function call. As described in the section on stack structure, function arguments are then pushed in reverse order onto the stack; the last argument is pushed first, and the first argument is pushed last. Passing a structure as argument consists of copying it whole onto the stack. If the function is expected to return a structure or union longer than two words, a "zeroth arg" must also be pushed, which is the address of a location that the function should copy the returned structure into. The function is then called with a PUSHJ 17, instruction which adds the return address onto the stack. Caller conventions - result returning: All accumulators (except AC17) are at the callee's disposal. However, AC0 is never used by generated code, as some old programs assume NULL always points to zero, and as the hardware imposes several restrictions on its use. AC15 and AC16 are also reserved for minor KCC runtime functions. Single word function return values are left in AC1; double word returns go in AC1 and AC2. Return values larger than that are copied into the location specified by the struct-return pointer, which is provided by the caller as the "zeroth" argument. <2 KCC Internals - Extended addressing> A C program can be run in an extended section by specifying this in either of two ways at load time, depending on whether you are using KCC or the EXEC to do the loading. (a) KCC: Use the "-i" switch. e.g. @cc -i prog.c (b) LOAD (or LINK): The first module should be C:LIBCKX. e.g. @load c:libckx,prog No special switches need be given to KCC for the generated code to be suitable for extended addressing - the same code will always run either extended or non-extended. In extended sections, code and permanently allocated data (i.e. global variables) live in section N, the stack lives in section N+1, and allocated memory begins in section N+2, expanding to fill all higher sections. Normally N==1; this can be changed if really necessary. All byte pointers not intended for immediate use (e.g. literal arguments to a LDB or DPB instruction) are constructed as OWGBPs (One-Word Global Byte Pointer). <1 Cross-compiling> The -x, -L, -H, and -A switches allow some degree of cross-compilation. The effects of the various -x specifications are listed below: CPU: ka, ki, ks, kl0, klx KCC can compile code to run on any CPU type; this is done both by means of different code generation sequences and by assembler macros which KCC also generates as needed. "ka" specifies a KA-10 using software format floating point doubles (all other types use hardware format). "ki" specifies a KI-10, and "ks" both a KS-10 and a KL-10A without extended addressing. "kl0" specifies a KL-10B capable of extended addressing, but restricts the code to section 0; "klx" specifies a KL-10B non-zero section environment. It is possible to specify more than one CPU type; the intent is to allow for producing code that will run on all specified machines. As distributed, KCC code is compiled for "ks+kl0+klx". However, the results of other combinations are somewhat unpredictable and should be avoided at the moment. SYSTEM: tops20, tenex, tops10, waits, its Currently there are only two things affected by this setting: character and string constant values, and ERJMP. [1] If compiling for WAITS (or for anything else if on WAITS), character values are mapped to and from WAITS ASCII and standard US ASCII. [2] If compiling for TOPS20 or TENEX, the proper value of ERJMP and an auxiliary definition called ERJMPA are generated. There may be more distinctions in the future. ASSEMBLER: fail, macro, midas The assembler selection is independent of the system or CPU. Currently either FAIL and MACRO can be selected and both will work. Selecting MIDAS does not yet work completely. CHARSIZE: ch7 It is possible to request that KCC generate code which assumes that chars are 7 bits, and char pointers are 7-bit byte pointers. Thus, arrays of chars will have 5 chars per word, instead of 4. This feature, invoked by the "-x=ch7" switch, is mainly of use to people who must integrate C code with old software that cannot deal with anything but 7-bit bytes. It is not really guaranteed to work in all conceivable cases. In particular, you should be aware that many of the normally-compiled library routines (such as malloc) will continue to return 9-bit char pointers, although the str- and mem- functions should work with either 9-bit or 7-bit strings. The values returned by "sizeof" will not change. As explained in the discussion of the sizeof operator, sizes are always in terms of 9-bit bytes, except that the size of a char array is always the number of elements (chars) in the array. sizeof(char) is always 1. General comments: Ideally KCC (on any system) should be able to generate code for any other PDP-10 system. To actually do this requires some understanding of how the various parts of a program come together. It is not enough just to specify some -x switches; you must take care of the following: 1. #include files. You may need to use an alternate standard include-file directory to satisfy <>-type includes. -H can be used to specify an alternate location. 2. Switches. You should use -D to predefine any parameters from which are not properly defaulted. Alternatively you can put a different version of c-env.h in a non-standard location pointed to by -H (as above). 3. Library. The C runtime library loaded with the program must be the correct one (already cross-compiled for the target). KCC always generates a default "-lc" request for the C runtime library; the location searched for this can be specified by the -L switch. For details on porting the C library and KCC itself, see the file PORT.DOC in the KCC source directory. <1 Char Pointer Hints> The code generated for handling char pointers always uses byte-pointer instructions, and so will work for any byte size (at least on machines implementing the ADJBP instruction). This can sometimes be useful when dealing with PDP-10 based data structures. However, such pointers have to be constructed "by hand" since all char pointers that KCC generates are either 9-bit or 7-bit. See also the -x=ch7 option in "Cross-compiling". In general, when char pointers are involved, constructs like *++ptr are faster than *ptr++. This is because *++ptr can usually be folded by the optimizer into an ILDB (or IDBP) instruction. There is no equivalent on the PDP-10 to a *ptr++ construct; this must always be done as at least two instructions. Whenever possible, try to avoid using two char pointers in subtraction, as in (ptr1-ptr2). Many instructions have to be executed to find the difference between two char pointers, due to the strange internal format. For the same reason, try to avoid less-than (<, <=) or greater-than (>, =>) comparison of char pointers. Tests for equality (== and !=) are fine, however. Finally, on machines which do not implement the ADJBP instruction (KA, KI), it is also helpful to avoid addition or subtraction of integers to char pointers. None of this applies to other types of pointers, such as (int *), which are simple addresses and can be manipulated very efficiently. <1 Portable Math Library> * Menu: * PML: (KCC-PML) Portable Math Library <1 Local library additions> * Menu: * LIBLCL: (KCC-LIBLCL) Local library additions * LIBT20: (KCC-LIBT20) Frank Wancho's TOPS-20 library