.chapter "Scope, Declarations, and Equates" .para As was discussed in sect_progs, the structure of a CLU program is not deeply nested as is customary in block-structured languages, but rather consists of a group of modules all at the same level. The names of the modules, and in the case of clusters, the names of the operations, are globally known throughout this level. However, since modules are not nested within other modules, identifiers used within modules to name, for example, variables, are purely local. Since we expect modules to be rather small (in the absence of nesting), we felt it was reasonable to insist that local identifiers not be redefined within a module. Therefore, although there is block structure within a module, it is not possible to redefine in an inner scope an identifier declared in an outer scope. .para Each full_module defines a scoping unit. In addition, all compound statements define new scoping units in the obvious places. For example, in the if statement, both the then clause and the else clause are new scoping units. .para Variables may be declared anywhere within a scoping unit; declarations are not constrained to appear at the beginning of a unit. The actual scope of the variable begins after its declaration, and continues to the end of the smallest enclosing scoping unit. .para A variable declaration gives a name for the variable and the type of the variable. In addition, an initial value for the variable may be provided. The use of an uninitialized variable will raise an exception if the error is not caught at compile time. (CLU arrays and records are defined in such a way that there are no uninitialized elements or components to worry about, so we can guarantee that all expressions are well-defined by checking variable usage.) .para 2Equates* are used to establish abbreviations for types and constants. Each expression in an equate must be compile-time computable, and must produce an object belonging to one of the built-in, immutable types (see sect_semantics.1). .para Equates must all appear in a group at the beginning of a scoping unit; the order of the equates is unimportant, but they may not be recursive. The actual scope of the equates is the entire scoping unit containing them. .chapter "Expressions and Statements" .para CLU is somewhat unusual in that almost all expressions are considered to be just a syntactic means of invoking procedures. This view permits user-defined and built-in types to be treated uniformly, e.g, x + y invokes T$add whether the type of x, T, is built-in or user-defined. This view also fits our model that exceptions arise from invocations (see sect_except). Note that this view does not preclude in-line code for expressions (for both built-in and user-defined types); we simply view the production of in-line code as analogous to in-line substitution for an invocation, followed possibly by some optimization. .para One exception to the view of expressions as invocations is the use of the cand (conditional and) and cor (conditional or) operators. These operators are defined to shortcut evaluation of their operands; for example, the second operand of cand will be evaluated only if the first operand evaluates to true. Thus, cand and cor cannot be explained in terms of invocation. These operators are not available for overloading, and they do not raise any exceptions. .para CLU statements are, for the most part, fairly conventional. The most basic statements are the assignment statement and the invocation statement; the semantics of these statements is discussed in the next section. .para There are a number of compound statements: block, conditional, iterative, tagcase and except statements. Blocks are used to group statements, and to introduce new scoping units. The conditional statement is the usual if statement, with an additional elseif form which may be used when there are a number of clauses all at the same level. .para There are two iterative statements: one is the usual while statement; the other, the for statement, is used in conjunction with an iterator which controls the looping. .para The tagcase statement is used to discriminate on the tag of a oneof object; it provides 2arms* for possible values of the tag, plus a special others arm to handle tag values not mentioned explicitly. .para The except statement is used to handle exceptions arising from invocations, plus locally generated exits. Its form is similar to that of the tagcase statement, with arms to handle explicitly named exceptions and exits, and an optional others arm to handle any exception not explicitly mentioned. .para Finally, there are a number of termination statements. The return statement terminates a procedure or iterator in the normal condition, while the signal statement is used to terminate in an exceptional condition. The yield statement is used within an iterator to produce the next item in the sequence. .para 1Return*, signal, and yield are inter-module control mechanisms. The remaining termination statements are all intra-module. The exit statement raises an exit condition that must be handled by a local except statement. The break statement terminates the smallest enclosing loop, and the continue statement terminates just the current cycle of the smallest enclosing loop. .en .sr sect_iter Section 14.5 .sr sect_proctypes Section 9.11 .sr app_types Appendix II .sr sect_handle Section 14.3 .sr app_io Appendix III \k .chapter "Exception Handling and Exits" .para 1 A procedure is designed to perform a certain task, taking some number and types of arguments and returning some number and types of result objects. However, in certain cases (e.g., for particular values of arguments), that task may be impossible to perform. In such a case, instead of returning normally (which would imply successful performance of the intended task), the procedure should notify its caller of its failure by signalling an i(exception). .para For example, consider integer division. The int$div procedure takes two integer arguments and returns their quotient. However, if the second argument to int$div is zero, then there is no quotient. In this case, instead of returning, int$div signals the exception zero_divide. We include in the type specification of a procedure a description of the exceptions it may signal, for example, int$div is of type .show proctype (int, int) returns (int) signals (zero_divide) .eshow .para In this section, we will concentrate on exceptions signalled by procedures. However, exceptions may also be signalled by iterators, and all we say about procedures applies to iterators as well, except as described in sect_iter below. .section "The Exception Handling Mechanism" .para 1 The exception handling mechanism consists of two parts, the signalling of exceptions and the handling of exceptions. Signalling is the way a procedure notifies its caller that it has discovered an exceptional condition. Handling is the way that the caller of the procedure specifies what is to be done if the procedure signals an exception. .para Signalling an exception is an alternative form of returning. When a procedure signals an exception, the current activation of that procedure terminates and control is transferred to a handler in the caller. The signaller may return objects to the exception handler, to help explain the exceptional condition. .para An exception is identified by a name. A procedure may signal zero or more exceptions, whose names must be distinct. Since signalling is like returning, each exception has an associated list of types specifying what objects may be returned to the caller. An exception name and its associated list of types is called an i(exception@specification). The specifications of the exceptions signalled by a procedure are part of the type of the procedure (see sect_proctypes). In addition, any procedure can signal the exception i(failure), which always has a single accompanying object of type string. The failure exception is implicitly part of all procedure types; it may not be declared explicitly. (The use of i(failure) is intended to indicate errors from which it is unlikely or impossible to recover, such as hardware malfunctions.) .section "Signalling Exceptions" .para 1 An exception is signalled by the signal statement, which has the form: .show signal name lbkt ( expression, etc ) rbkt .eshow where i(name) is the name of the exception to be signalled. .para A signal statement may appear anywhere in the body of a procedure. The execution of a signal statement begins with the evaluation of the expressions (if any), from left to right, to produce a list of i(signal@argument) objects. The activation of the executing procedure is then terminated. Execution continues as described in section sect_handle below. .para The named exception must be either i(failure) or one of the exceptions listed in the procedure header. If the exception is i(failure), then there must be exactly one signal argument expression, whose type is string. Otherwise, if the corresponding exception specification in the procedure header has the form .show name (T1, etc, Tn) .eshow then there must be exactly i(n) signal argument expressions and the type of the expression i(i) must be included in the Ti. .para The following useless procedure contains a number of examples of signal statements: .show .ta 20 signaller = proc (i: int)  signals (foo, bar (int), bletch (string, bool)); if i < 0 then signal foo; end; if i > 0 then signal bar (i - 1); end; if i = 0 then signal bletch ("zero", true); end; signal failure ("unreachable statement executed"); end signaller; .eshow .section "Handling Exceptions" .para 1 When a procedure activation terminates by signalling an exception, we say that the corresponding procedure invocation (the text of the call) i(raises) that exception. The caller specifies what action should be taken when an exception is raised by the use of i(handlers), which are written using the except statement. .para The except statement has the form: .show statement except s(1)lcurly when_handler rcurly t(1)lbkt others_handler rbkt t(1)end where .long_def others_handler .def1 when_handler when name , etc lbkt ( decl , etc ) rbkt : body .or when name , etc ( * ) : body .def1 others_handler others lbkt ( idn : type_spec ) rbkt : body .eshow We will call the statement to which the handlers are attached S. The handlers handle exceptions that are raised by invocations in the statement S. Each when_handler specifies one or more exception names and a body to be executed if one of those exceptions is raised. The optional others_handler is used to handle all exceptions not explicitly named in the when_handlers. The statement S can be a compound statement, and can even contain other except statements. Whenever two except statements are nested in this fashion, and both have handlers for the same exception, the innermost handler will take precedence (see below). .para An except statement is executed as follows. First, the statement S is executed. If it terminates normally, then the except statement terminates normally also. If some exception E is raised in S, and the exception E is not handled by a handler within S, then the execution of S is terminated and the attached handlers are examined to see if any one of them will handle the exception E. If so, then the body of the corresponding handler is executed; when the body terminates, the entire except statement terminates. If there is no handler for the exception E in this except statement, then the except statement itself terminates raising the exception E. This will presumably be handled by some enclosing except statement. .para Thus, when an exception E is raised, control is passed to the innermost exception handler that handles the exception E. Exceptions that are raised inside of handlers are treated no differently from other exceptions: control is passed to the innermost exception handler for that exception (in a surrounding except statement). Whenever a handler terminates, the except statement of which it is a part terminates as well. The set of invocations for which a handler is effective is called the i(range) of that handler. The range of a handler for an exception E is that set of invocations within the attached statement that are not inside the range of a nested handler for the exception E. .para Recall that the infix and prefix operators are merely syntactic sugar for procedure invocations. Thus, the execution of such operators can signal exceptions and these exceptions can be handled by the procedure containing the use of the operator. app_types describes the operations of the built-in types and type generators, and the exceptions that those operations may signal. .para An invocation need not be surrounded by except statements to handle all exceptions potentially raised by that invocation. This policy was adopted because in many cases the programmer can prove that a particular exception will not arise. For example, the invocation int$div(x,7) will never signal zero_divide. However, this policy does lead to the possibility that some invocation may raise an exception E and not be within the range of any handler for E. Thus, we make the following rule. If an invocation raises an exception E, and that invocation is not within the range of any handler for E, then the procedure containing that invocation is terminated and signals the exception i(failure). The exception name E is made into a string (all in lower case), and this string is the argument of the failure signal. As a special case, if the original exception E was itself i(failure), then the original string argument is passed along with the new signal, instead of "failure". (This avoids losing the original exception name when a i(failure) propagates up several levels.) .para Now let us consider the form of the handlers in more detail. The when forms handle particular sets of exceptions. The first form, without declarations, simply specifies a set of exception names. This form is used to handle exceptions with no associated signal arguments. The same form i(with) declarations is used to handle exceptions with signal arguments. Each exception must have the same number of arguments as specified in the formal argument list (i.e., the declarations), and their types must match exactly. Within the body of the handler, the declared formal arguments may be used to access the actual signal arguments. These arguments are variables (initialized to the signal arguments), local to the handler body. The second form (with *) can be used to handle any exceptions of the given names, regardless of whether or not there are associated signal arguments. Any actual signal arguments will be thrown away. .para All of the exception names appearing in the when_handlers of an except statement must be distinct. Each exception must be potentially raised by some invocation within the range of the handler. For any exception handled using the when form with arguments, all invocations within the range of the handler that potentially raise an exception with that name must provide the exact number and types of signal arguments as specified in the formal argument list. (The programmer must place handlers for an exception sufficiently close to the invocations that raise that exception so that this restriction is satisfied.) .para The others form is optional. At most one may be used in an except statement, and it must appear last. An others_handler handles any exception not handled by another handler in the except statement. If a formal argument is declared, it must be of type string. If the actual exception is not i(failure), then the formal argument will denote a string object which is the name of the actual exception, in lower case; any actual signal arguments will be thrown away. However, if the actual exception is i(failure), then the formal argument will denote the actual (string) signal argument. .section "An Example" .para 1 We now present an example demonstrating the use of exception handlers. We will write a procedure, sum_stream, which reads a sequence of signed decimal integers from a character stream and returns the sum of those integers. The stream is viewed as containing a sequence of fields separated by spaces; each field must consist of a non-empty sequence of digits, optionally preceded by a single minus sign. Sum_stream has the form .show .ta 20 sum_stream = proc (s: stream) returns (int) signals (s(1)overflow, t(1)unrepresentable_integer (string), t(1)bad_format (string)); etc end sum_stream; .eshow Sum_stream signals overflow if the sum of the numbers or an intermediate sum is outside the implemented range of integers. Unrepresentable_integer is signalled if the stream contains an individual number that is outside the implemented range of integers. Bad_format us signalled if the stream contains a field that is not an integer. .para We will use the i(getc) operation of the i(stream) data type (see app_io), whose type is .show proctype (stream) returns (char) signals (end_of_file, not_possible(string)); .eshow This operation returns the next character from the stream, unless the stream is empty, in which case end_of_file is signalled. Not_possible is signalled if the operation cannot be performed on the given stream (e.g., it is an output stream, or does not allow character operations, etc.) We will assume that we are given a stream for which getc is possible. .para The following procedure is used to convert character strings to integers: .show .ta 20 s2i = proc (s: string) returns (int) signals (s(1)invalid_character (char), t(1)bad_format, t(1)unrepresentable_integer); etc end s2i; .eshow S2i signals invalid_character if its string argument contains a character other than a digit or a minus sign. Bad_format is signalled if the string contains a minus sign following a digit, more than one minus sign, or no digits. Unrepresentable_integer is signalled if the string represents an integer that is outside the implemented range of integers. .para An implementation of sum_stream is presented in Figure current_figure. .begin_figure "The sum_stream procedure." .show .ta 20 28 36 42 sum_stream = proc (s: stream) returns (int) signals (s(1)overflow, t(1)unrepresentable_integer (string), t(1)bad_format (string)); sum: int := 0; num: string := ""; while true do % skip over spaces between values; sum is valid, num is meaningless c: char := stream$getc(s); while c = ' ' do c := stream$getc(s); end; % read a value; num accumulates new number, sum becomes previous sum while c ~= ' ' do num := string$append(num, c); c := stream$getc(s); end; except when end_of_file: end; % restore sum to validity sum := sum + s2i(num); end; except when end_of_file: return(sum); when unrepresentable_integer: signal unrepresentable_integer (num); when bad_format,invalid_character(*): signal bad_format (num); when overflow: signal overflow; end; end sum_stream; .eshow .finish_figure There are two loops within an infinite loop: one to skip spaces, and one to accumulate digits for conversion to a number. Notice the placement of the inner end_of_file handler. If end_of_file is raised in the second inner loop, then the sum is computed correctly, and the first invocation of stream$getc will again raise end_of_file. This time, however, the infinite loop is terminated and execution transfers to the other end_of_file handler, which then returns the accumulated sum. .para We have placed the remaining exception handlers outside of the infinite loop to avoid cluttering up the main part of the algorithm. Each of these exception handlers could also have been placed after the particular statement containing the invocation that signalled the corresponding exception. The (*) form is used in the handler for the bad_format and invalid_character exceptions since the signal arguments are not used. Note that the overflow handler catches exceptions signalled by the int$add procedure, which is invoked using the infix + notation. Note also that in this example all of the exceptions raised by sum_stream originate as exceptions signalled by lower-level modules. Sum_stream simply reflects these exceptions upwards in terms that are meaningful to its callers. Although some of the names may be unchanged, the meanings of the exceptions (and even the number of arguments) are different in the two levels. .para As mentioned above, we have assumed the stream$getc will not signal not_possible; if it does, then sum_stream will terminate, raising the exception failure("unhandled exception: not_possible"). .section "Summary" .para 1 Any activation of a procedure may terminate in one of two ways: it may terminate normally, returning zero or more result objects, or it may signal an exception, along with zero or more signal arguments. In the latter case, we say that the invocation of the procedure may i(raise) the given exception. The set of possible exceptions that may be raised by a procedure invocation is determined from the type of the procedure. This set always includes the i(failure) exception. .para If a procedure invocation is a component of an expression, and the invocation terminates by raising an exception E (with associated signal arguments), then the entire expression immediately terminates, raising the exception E (with the associated signal arguments). The set of possible exceptions that may be raised by an expression is the set of all exceptions that may be raised by procedure invocations within that expression. .para Expressions are embedded in statements. If, during the execution of a statement, an embedded expression terminates by raising an exception E (with associated signal arguments), then the statement itself immediately terminates raising the exception E (with the associated signal arguments). .para Statements may be composed from smaller statements. In general, if a component statement terminates by raising an exception E, then the containing statement also immediately terminates, raising the exception E. However, if the statement is an except statement, and the except statement contains a handler that handles the exception E, then the handler body is executed, as described in the preceding section. If an iterator invocation terminates raising an exception E, then the entire for statement which invoked the iterator immediately terminates raising the exception E. .para The set of possible exceptions that may be raised by a non-except statement is the union of the sets of possible exceptions that may be raised by any component expression or statement. The set of possible exceptions that may be raised by an except statement consists of the set of exceptions that may be raised by the component statement, minus the set of exceptions handled by the handlers, plus the set of exceptions that may be raised by the handler bodies. .para Thus, any expression or statement may terminate either normally or by raising some exception. The set of possible exceptions always includes the i(failure) exception. .para Finally, all procedure (and iterator) bodies are implicitly surrounded by an exception handler of the form: .show begin etc i(body) etc end except when others (s: string) : signal failure(s) end .eshow .section "Exits and the Placement of Handlers" .para 1 A i(local) transfer of control can be effected by by using the exit statement, which has the form: .show exit name lbkt ( expression, etc ) rbkt .eshow The exit statement is similar to the signal statement except that where the signal statement i(signals) an exception to the i(calling) procedure, the exit statement i(raises) the exception directly in the i(current) procedure. An exception raised by an exit statement i(must) be handled by a handler in the procedure containing the exit statement. The handler must explicitly name the particular exception (i.e., the others form cannot be used) and may not throw away any signal arguments (i.e., the (*) form cannot be used). .para The exit statement and the signal statement mesh nicely to form a uniform mechanism. The signal statement can be viewed as simply terminating a procedure activation; an exit is then performed at the point of invocation. (Because this exit is implicit, it is not subject to the restrictions listed above.) .para In some cases, however, other requirements may prohibit placing exception handlers to take advantage of the implicit exit. For example, assume that you wish to handle a particular exception signalled by a particular set of invocations. To avoid catching unwanted exceptions, the handler must be placed sufficiently close to the set of invocations so that no other invocation raising an exception of that name is in the range of the handler. The facts that the handlers must be close to the invocations, and that the statement you wish to terminate when the exception is raised may be rather large can require you to put explicit exit statements in the handlers to force termination of the larger statement. The point is that exits are a necessary feature in maintaining the overall effectiveness of the signal mechanism.