# # @(#)README 1.1 94/10/31 SMI; from S5R2 1.2 # December 20, 1985 Fixed by Vida Ghodssi, Guy Harris, Sun Microsystems System V version installed; faster, fixes many bugs (including Gilmore's "can't feed 'cpp's output to itself" bug, mentioned below). 4.3BSD "-M" option added. August 1, 1985 Fixed by Vida Ghodssi, Sun Microsystems Remove the limit on the number of symbols handeled by the symbol table. This was done by making the symbol table bucket based rather than an array of fixed size. December 16, 1983 Fixed by John Gilmore, Sun Microsystems Remove the limit on the number of -D and -U options, by scanning them directly out of argc/argv when needed. Minor change in effect: previously a -U would override a -D even if the -D followed the -U on the command line. e.g. cpp -Ufoo -Dfoo=3 would leave foo undefined. This has been changed to process them in the specified order, leaving foo defined as 3. However, cpp -Dfoo=3 -Ufoo still leaves foo undefined, as it did before. BUG REPORT: cpp should accept lines of the form "# n name" as equivalent to "#line n name" since otherwise you can't run its output back thru itself. Silly for a filter. August 30, 1982 Fixed by Kurt Shoens, UCB If the "#line n name" occurs, then all future references to the current file are generated in terms of "name", instead of the name of file given to cpp in its command argument August 25, 1978 updated to November 29, 1982 Files in this directory form the C preprocessor, which handles '#include' files and macro definition and expansion for the C compiler. This new version is from 5 to 12 times faster (on UNIX systems) than the old. To create the executable file 'cpp' in the current directory: make To install the preprocessor 'cpp' so it will be used by the C compiler: # safety first: backup the existing version cp /lib/cpp /lib/ocpp # install the new version cp cpp /lib/cpp Invocation cpp [-CEPRM] [-Dname] ... [-Dname=def] ... [-Idirectory] ... [-Uname] ... [ []] If there are two non-flag arguments then the first is the name of the input file and the second is the name of the output file. If there is one non-flag argument then it is the name of the input file and the output is written on the standard output. If there are no non-flag arguments then the input is taken from the standard input and the output is written on the standard output. Flag arguments are: -C retain comments in output -Dname define name as "1" -Dname=def define name as def -E ignored -Idirectory add directory to search list for #include files -M generate Makefile dependencies (-C and -M ignored) -P don't insert lines "# 12 \"foo.c\"" into output -R allow recursive macros -Uname undefine name Documentation clarifications: Symbols defined on the command line by "-Dfoo" are defined as "1", i.e., as if they had been defined by "#define foo 1" or "-Dfoo=1". The directory search order for #include files is 1) the directory of the file which contains the #include request (e.g. #include is relative to the file being scanned when the request is made) 2) the directories specified by -I, in left-to-right order 3) the standard directory(s) (which for UNIX is /usr/include) An unescaped linefeed (the single character "\n") terminates a character constant or quoted string. An escaped linefeed (the two-character sequence "\\\n") may be used in the body of a '#define' statement to continue the definition onto the next line. The escaped linefeed is converted into a single blank in the macro body. Comments are uniformly removed (except if the argument -C is specified). They are also ignored, except that a comment terminates a token. Thus "foo/* la di da */bar" may expand 'foo' and 'bar' but will never expand 'foobar'. If neither 'foo' nor 'bar' is a macro then the output is "foobar", even if 'foobar' is defined as something else. The file #define foo(a,b)b/**/a foo(1,2) produces "21" because the comment causes a break which enables the recognition of 'b' and 'a' as formals in the string "b/**/a". Macro formal parameters are recognized in '#define' bodies even inside character constants and quoted strings. The output from #define foo(a) '\a' foo(bar) is the six characters "'\\bar'". Macro names are not recognized inside character constants or quoted strings during the regular scan. Thus #define foo bar printf("foo"); does not expand 'foo' in the second line, because it is inside a quoted string which is not part of a '#define' macro definition. Macros are not expanded while processing a '#define' or '#undef'. Thus #define foo bletch #define bar foo #undef foo bar produces "foo". The token appearing immediately after a '#ifdef' or '#ifndef' is not expanded (of course!). Macros are not expanded during the scan which determines the actual parameters to another macro call. Thus | #define foo(a,b)b a | #define bar hi | foo(bar, | #define bar bye | ) |produces " bye" (and warns about the redefinition of 'bar'). |--->not any longer. Newlines have been stripped by now, so the # is no longer at the beginning of a line. The note is still true, though. When a macro is expanded, the first step is to put the actual arguments in the corresponding locations in the token-string the macro is defined to be. The next step is to start to re- process the token-string as input text. There are some differences between the new and the old preprocessor. Bugs fixed: "1.e4" is recognized as a floating-point number, rather than as an opportunity to expand the possible macro name "e4". Any kind and amount of white space (space, tab, linefeed, vertical tab, formfeed, carriage return) is allowed between a macro name and the left parenthesis which introduces its actual parameters. The comma operator is legal in preprocessor '#if' statements. Macros with parameters are legal in preprocessor '#if' statements. Single-character character constants are legal in preprocessor '#if' statements. Linefeeds are put out in the proper place when a multiline comment is not passed through to the output. The following example expands to "# # #" : #define foo # foo foo foo If the -R flag is not specified then the invocation of some recursive macros is trapped and the recursion forcibly terminated with an error message. The recursions that are trapped are the ones in which the nesting level is non-decreasing from some point on. In particular, #define a a a will be detected. (Use "#undef a" if that is what you want.) The recursion #define a c b #define b c a #define c foo a will not be detected because the nesting level decreases after each expansion of "c". The -R flag specifically allows recursive macros and recursion will be strictly obeyed (to the extent that space is available). Assuming that -R is specified: #define a a a causes an infinite loop with very little output. The tail recursion #define a a a causes the string "<>" to be output infinitely many times. The non-tail recursion #define a b> #define b a< a complains "too much pushback", dumps the pushback, and continues (again, infinitely). Stylistic choice: Nothing (not even linefeeds) is output while a false '#if', '#ifdef', or '#ifndef' is in effect. Thus when all conditions become true a line of the form "# 12345 \"foo.c\"" is output (unless -P). Error and warning messages always appear on standard error (file descriptor 2). Mismatch between the number of formals and actuals in a macro call produces only a warning, and not an error. Excess actuals are ignored; missing actuals are turned into null strings. Comments which worked their way into #if lines no longer cause a syntax error. Newlines found during the scan for actual arguments are changed to blanks so that confusing (for cpp) situations did not occur. Formfeeds (^L) act like newlines w.r.t. recognizing # as the flag for cpp. Incompatibility: The virgule '/' in "a=/*b" is interpreted as the first character of the pair "/*" which introduces a comment, rather than as the second character of the divide-and-replace operator "=/". This incompatibility reflects the recent change in the C language which made "a/=*b" the legal way to write such a statement if the meaning "a=a/ *b" is intended.