.sr opdefs_pdx Appendix`II
.sr int_pdx Appendix`II.4
.sr real_pdx Appendix`II.5
.sr type_def_sec Section`3.1
.sr cluster_sec Section`13
.sr self Section`7
.sr builtin_secs Sections self.1 to self.7
.sr gen_secs Sections self.8 to self.11
.sr cand_sec Section`10.8
.sr fetch_sec Section`10.5.1
.sr fetch_store_secs Sections 10.5.1 and 11.2.1
.sr force_sec Section`10.11
.sr array_cons_sec Section`10.6
.sr equate_sec Section`8.3
.sr rec_cons_sec Section`10.6
.sr get_set_secs Sections 10.5.2 and 11.2.2
.sr tagcase_sec Section`11.6
.sr system_sec Section`3.1
.sr invoke_sec Section`9.3
.sr const_sec Section`8.3
.sr bind_sec Section`4
.sr failure_sec Section`12.1
.sr cvt_parm_secs Sections 13.4 and 13.5
.chapter "Types, Type Generators, and Type Specifications"
.para
A 2type* consists of a set of objects together with
a set of operations to manipulate the objects.
As discussed in type_def_sec,
types can be classified according to whether their objects are mutable or immutable.
An immutable object (e.g, an integer) has a value that never varies,
while the value (state) of a mutable object can vary over time.
.para
A 2type generator* is a 2parameterized* type definition,
representing a (usually infinite) set of related types.
A particular type is obtained from a type generator by writing
the generator name along with specific values for the parameters;
for every distinct set of legal values,
a distinct type is obtained.
For example,
the array type generator has a single parameter that determines the element type;
array[int], array[real], and array[array[int]] are three distinct
types defined by the array type generator.
Types obtained from type generators are called 2parameterized* types;
others are called 2simple* types.
.para
Within a program,
a type is specified by a syntactic construct called a 2type_spec*.
The type specification for a simple type
is just the identifier (or reserved word) naming the type.
For parameterized types,
the type specification consists of the identifier (or reserved word) naming
the type generator, together with the parameter values.
.para
This section gives an informal introduction to
the built-in types and type generators provided by CLU;
many details (such as error conditions) are not discussed.
Complete and precise definitions are given in opdefs_pdx.
builtin_secs describe the objects, literals,
and some of the operations for each of the built-in types,
while gen_secs describe the objects, type specifications,
and interesting operations of types obtained from the built-in type generators.
A number of operations can be invoked using infix and prefix operators;
as the various operation names are introduced, the corresponding operator,
if any, will follow in parentheses.
.para
In addition, we describe type specifications for user-defined types,
and other special type specifications in Section self.12.
The mechanism by which new types and type generators are implemented
is presented in cluster_sec.
.
.section "Null"
.para
The type null has exactly one immutable object,
represented by the literal nil.
The type null is generally used as a kind of "place holder" in a oneof type
(see self.9).
.
.section "Bool"
.para
The two immutable objects of type bool, with literals true and false,
represent logical truth values.
The binary operations 2equal* (=), 2and* (&), and 2or* (|),
are provided, as well as unary 2not* (~).
.
.section "Int"
.para
The type int models (a range of) the mathematical integers.
The exact range is not part of the language definition,
and can vary somewhat from implementation to implementation (see int_pdx).
Integers are immutable objects,
and are written as a sequence of one or more decimal digits.
The binary operations
2add* (+), 2sub* (-), 2mul* (*), 2div* (/), 2mod* (//),
and 2power* (**) are provided, as well as unary 2minus* (-).
There are binary comparison operations
2lt* (<), 2le* (<=), 2equal* (=), 2ge* (>=), and 2gt* (>).
In addition, there are two operations, 2from_to* and 2from_to_by*,
for iterating over a sequence of integers.
For example, one can iterate over the odd numbers between one and 100 with
.show
int$from_to_by(1, 100, 2)
.eshow
.
.section "Real"
.para
The type real models (a subset of) the mathematical real numbers.
The exact subset is not part of the language definition,
although certain constraints are imposed (see real_pdx).
Reals are immutable objects,
and are written as a 2mantissa* with an (optional) 2exponent*.
A mantissa is either a sequence of one or more decimal digits,
or two sequences (one of which may be empty) joined by a period.
The mantissa must contain at least one digit.
An exponent is 'E' or 'e',
optionally followed by '+' or '-',
followed by one or more decimal digits.
An exponent is required if the mantissa does not contain a period.
As is usual, 2m*E2x* = 2m**102x*.
Examples of real literals are:
.show
3.14        3.14E0        314e-2        .0314E+2        3.        .14
.eshow
.para
As with integers,
the operations
2add* (+), 2sub* (-), 2mul* (*), 2div* (/), 2mod* (//),
2power* (**), 2minus* (-),
2lt* (<), 2le* (<=), 2equal* (=), 2ge* (>=), and 2gt* (>),
are provided.
It is important to note that there is no form of
2implicit* conversion between types.
So, for example,
the various binary operators cannot have one integer and one real argument.
The 2i2r* operation converts an integer to a real,
2r2i* rounds a real to an integer,
and 2trunc* truncates a real to an integer.
.
.section "Char"
.para
The type char provides the alphabet for text manipulation.
Characters are immutable, and form an ordered set.
Every implementation must provide at least 128, but no more than 512, characters;
the first 128 characters are the ASCII characters in their standard order.
.para
Printing ASCII characters (octal 40 thru octal 176),
other than single quote or backslash,
can be written as that character enclosed in single quotes.
Any character can be written by enclosing one of the
following escape sequences in single quotes:
.show 12
escape sequence        s(1)character
.sp .5
\'t(1)'     s(2)(single quote)
\"t(1)"t(2)(double quote)
\\t(1)\t(2)(backslash)
\nt(1)NLt(2)(newline)
\tt(1)HTt(2)(horizontal tab)
\pt(1)FFt(2)(form feed, newpage)
\bt(1)BSt(2)(backspace)
\rt(1)CRt(2)(carriage return)
\vt(1)VTt(2)(vertical tab)
\***t(1)specified by octal value (* is an octal digit)
.eshow
The escape sequences may be written using upper case letters.
Examples of character literals are:
.show
'7'        'a'        '"'        '\"'        '\''        '\B'        '\177'
.eshow
.para
The usual binary comparison operations exist for characters:
2lt* (<), 2le* (<=), 2equal* (=), 2ge* (>=), and 2gt* (>).
There are also two operations, 2i2c* and 2c2i*,
for converting between integers and characters:
the smallest character corresponds to zero,
and the characters are numbered sequentially.
.
.section "String"
.para
The type string is used for representing text.
A string is an immutable sequence of zero or more characters.
Strings are lexicographically ordered, based on the ordering for characters.
A string is written as a sequence of zero or more character representations,
enclosed in double quotes.
Within a string literal,
a printing ASCII character other than double quote or backslash
is represented by itself.
Any character can be represented by using the escape sequences listed above.
Examples of string literals are:
.show
"Item\tCost"        "altmode (\033) = \\033"        ""        " "
.eshow
.para
The characters of a string are indexed sequentially starting from one,
and there are a number of operations that deal with these indexes:
2fetch*, 2substr*, 2rest*, 2indexc*, and 2indexs*.
The 2fetch* operation is used to obtain a character by index.
Invocations of 2fetch* can be written using a special syntax
(fully described in fetch_sec):
.show
s[i]		% get the character at index i of s
.eshow
2Substr* returns a string given a string, a starting index, and a length:
.show
string$substr("abcde", 2, 3) = "bcd"
.eshow
2Rest*, given a string and a starting index, returns the rest of the string:
.show
string$rest("abcde", 3) = "cde"
.eshow
2Indexc* computes the least index at which a character occurs in a string,
and 2indexs* does the same for a string;
the result is zero if the character or string does not occur:
.show 3
string$indexc('d', "abcde") = 4
string$indexs("cd", "abcde") = 3
string$indexs("abcde", "cd") = 0
.eshow
.para
Two strings can be concatenated together with 2concat* (||),
and a single character can be appended to the end of a string with 2append*.
Note that string$concat("abc",`"de") and string$append("abcd",`'e')
produce the 2same* string as writing "abcde".
2C2s* converts a character to a single-character string.
The size of a string can be determined with 2size*.
2Chars* iterates over the characters of a string,
from the first to the last character.
There are also the usual comparison operations:
2lt* (<), 2le* (<=), 2equal* (=), 2ge* (>=), and 2gt* (>).
.
.section "Any"
.para
A type specification is used to restrict
the class of objects that a variable can denote,
a procedure or iterator can take as arguments, a procedure can return, etc.
There are times when no restrictions are desired,
when any object is acceptable.
At such times, the type specification any is used.
For example,
one might wish to implement a table mapping strings to arbitrary objects,
with the intention that different strings could map to objects of different types.
The lookup operation, used to get the object corresponding to a string,
would have its result declared to be of type any.
.para
The type any is the 2union* of all possible types,
and it is the 2only* true union type in CLU;
all other types are 2base* types.
Every object is of type any, as well as being of some base type.
The type any has no operations;
however,
the base type of an object can be tested at run-time (see force_sec).
.
.section "Array Types"
.para
Arrays are one-dimensional, and are mutable.
Arrays are unconventional because the number of elements in an array
can vary dynamically.
Furthermore, there is no notion of an "uninitialized" element.
.para
The 2state* of an array consists of an integer called the 2low bound*,
and a sequence of objects called the 2elements*.
The elements of an array are indexed sequentially,
starting from the low bound.
All of the elements must be of the same type;
this type is specified in the array type specification,
which has the form
.show
array [ type_spec ]
.eshow
Examples of array type specifications are
.show 2
array[int]
array[array[string]]
.eshow
.para
There are a number of ways to create a new array,
of which only two are mentioned here.
The 2create* operation takes an argument specifying the low bound,
and creates a new array with that low bound and no elements.
An array 2constructor* can be used to create an array
with an arbitrary number of elements.
For example,
.show
array[int] $ [5: 1, 2, 3, 4]
.eshow
creates an integer array with low bound five, and four elements, while
.show
array[bool] $ [true, false]
.eshow
creates a boolean array with low bound one (the default), and two elements.
Array constructors are discussed fully in array_cons_sec.
.para
An array type specification states nothing about the bounds of an array.
This is because arrays can grow and shrink dynamically.
2Addh* adds a new element to the end of the array,
with index one greater than the previous top element.
2Addl* adds a new element to the beginning of the array,
and decrements the low bound by one,
so that the new element has an index one less than the previous bottom element.
2Remh* removes the top element;
2reml* removes the bottom element and increments the low bound.
Note that all of these operations preserve the indexes of the other elements.
Also note that these operations do not create holes;
they merely add to or remove from the ends of the array.
.para
As an example, if a 2remh* were performed on the integer array
.show
array[int] $ [5: 1, 2, 3, 4]
.eshow
the element 4 would disappear, and the new top element would be 3,
still with index 7.
If a 0 were added using 2addl*, it would become the new bottom element,
with index 4.
.para
The 2fetch* operation extracts an element by index,
and the 2store* operation replaces an element by index.
There is no notion of an "uninitialized" element;
an index is illegal if no element with that index exists.
Invocations of these operations can be written using special forms
(covered fully in fetch_store_secs):
.show 2
a[i]			% fetch the element at index i of a
a[i] := 3;		% store 3 at index i of a
.eshow
.para
The 2top* and 2bottom* operations return
the element with the highest and lowest index, respectively.
The 2high* and 2low* operations return the highest and lowest indexes,
respectively.
The 2elements* iterator yields the elements from bottom to top,
and the 2indexes* iterator yields the indexes from low to high.
There is also a 2size* operation that returns the number of elements.
.para
Every newly created array has an identity that is distinct from all other arrays;
two arrays can have the same elements without being the same array object.
The identity of arrays can be distinguished with the 2equal* (=) operation.
The 2similar1* operation tests if two arrays have the same state,
using the 2equal* operation of the element type.
2Similar* tests if two arrays have similar states,
using the 2similar* operation of the element type.
For example, writing
.show
ai$[3: 1, 2, 3]
.eshow
(where "ai" is equated to array[int])
in different places produces arrays that are similar1 and similar (but not equal),
while the following produces arrays that are similar, but not similar1 (or equal):
.show
array[ai] $ [1: ai$create(1)]
.eshow
.
.section "Record Types"
.para
A record is a mutable collection of one or more named objects.
The names are called 2selectors*, and the objects are called 2components*.
Different components may have different types.
A record type specification has the form
.show
record [ field_spec , etc ]
.eshow
where
.show
.def field_spec "name , etc : type_spec"
.eshow
Selectors must be unique within a specification,
but the ordering and grouping of selectors is unimportant.
For example, all the of the following name the same type:
.show 2
record [last, first, middle: string, age: int]
record [first, middle, last: string, age: int]
record [last: string, age: int, first, middle: string]
.eshow
.para
A record is created using a record 2constructor*.
For example:
.show
info $ {last: "Jones",  first: "John",  age: 32,  middle: "J."}
.eshow
(assuming that "info" has been equated to one of the above
type specifications; see equate_sec.)
An expression must be given for each selector,
but the order and grouping of selectors need not resemble
the corresponding type specification.
Record constructors are discussed fully in rec_cons_sec.
.para
For each selector "sel",
there is an operation 2get_*sel to extract the named component,
and an operation 2set_*sel to replace the named component with
some other object.
For example,
there are 2get_middle* and 2set_middle* operations for the type specified above.
Invocations of these operations can be written in a special form
(discussed fully in get_set_secs):
.show 2
r.middle			% get the 'middle' component of r
r.age := 33;			% set the 'age' component of r to 33
.eshow
.para
As with arrays,
every newly created record has an identity that is distinct from all other records;
two records can have the same components without being the same record object.
The identity of records can be distinguished with the 2equal* (=) operation.
The 2similar1* operation tests if two records have the same components,
using the 2equal* operations of the component types.
2Similar* tests if two records have similar components,
using the 2similar* operations of the component types.
.
.section "Oneof Types"
.para
A oneof type is a 2tagged discriminated union*.
A oneof is an immutable labeled object,
to be thought of as "one of" a set of alternatives.
The label is called the 2tag*,
and the object is called the 2value*.
A oneof type specification has the form
.show
oneof [ field_spec , etc ]
.eshow
where (as for records)
.show
.def field_spec "name , etc : type_spec"
.eshow
Tags must be unique within a specification,
but the ordering and grouping of tags is unimportant.
.para
As an example of a oneof type,
the representation type for a linked list of integers,
int_list,
might be written
.show 2
oneof [s(1)empty: s(2)null,
t(1)cell:t(2)record [car: int, cdr: int_list]]
.eshow
As another example, the contents of a "number container" might be specified by
.show 4
oneof [s(1)empty:            s(2)null,
t(1)integer:t(2)int,
t(1)real_num:t(2)real,
t(1)complex_num:t(2)complex];
.eshow
.para
For each tag "t" of a oneof type,
there is a 2make_*t operation which takes an object
of the type associated with the tag,
and returns the object (as a oneof) labeled with tag "t".
For example,
.show
number$make_real_num (1.37)
.eshow
creates a oneof object with tag "real_num"
(assuming "number" has been equated to the "number container" type specification above;
see equate_sec).
.para
The 2equal* operation tests if two oneofs have the same tag,
and if so, tests if the two object are the same,
using the 2equal* operation of the value type.
2Similar* tests if two oneofs have the same tag,
and if so, tests if the two objects are similar,
using the 2similar* operation of the value type.
.para
To determine the tag and value parts of a oneof object,
one normally uses the tagcase statement, discussed in tagcase_sec.
.
.section "Procedure and Iterator Types"
.para
Procedures and iterators are immutable objects, created by the CLU system
(see system_sec).
The type specification for a procedure or iterator contains most of the information
stated in a procedure or iterator heading; a procedure type specification has the form
.show
proctype ( lbkt type_spec , etc rbkt ) lbkt returns rbkt lbkt signals rbkt
.eshow
and an iterator type specification has the form
.show
itertype ( lbkt type_spec , etc rbkt ) lbkt yields rbkt lbkt signals rbkt
.eshow
where
.show 4
.long_def exception
.def1 returns "returns ( type_spec , etc )"
.def1 yields "yields ( type_spec , etc )"
.def1 signals "signals ( exception , etc )"
.def1 exception "name lbkt ( type_spec , etc ) rbkt"
.eshow
The first list of type specifications describes
the number, types, and order of arguments.
The returns or yields clause gives the number, types, and order of the objects
to be returned or yielded.
The signals clause lists the exceptions raised by the procedure or iterator;
for each exception name,
the number, types, and order of the objects to be returned is also given.
All names used in a signals clause must be unique, and cannot be "failure"
which has a standard meaning in CLU (see failure_sec).
The ordering of exceptions is not important.
For example, both of the following type specifications name the procedure type
for string$substr:
.show 2
proctype (string, int, int) returns (string) signals (bounds, negative_size)
proctype (string, int, int) returns (string) signals (negative_size, bounds)
.eshow
1String*$chars has the following iterator type:
.show
itertype (string) yields (char)
.eshow
.para
Procedure and iterator types have an 2equal* (=) operation.
Invocation is 2not* an operation,
but a primitive action of CLU semantics (see invoke_sec).
.
.section "Other Type Specifications"
.para
The type specification for a user-defined type has the form
.show
idn lbkt [ constant , etc ] rbkt
.eshow
where each 2constant* must be compile-time computable (see const_sec).
The identifier must be bound to a data abstraction (see bind_sec).
If the referenced abstraction is parameterized,
constants of the appropriate types and number must be supplied.
The order of parameters always matters in user-defined types.
.para
There are three special type specifications that are used
when implementing new abstractions: rep, cvt, and type.
These forms are discussed in cvt_parm_secs.
Within an implementation of an abstraction,
formal parameters declared with type can be used as type specifications.
.para
In addition, identifiers which have been equated to type specifications
can also be used as type specifications.
Equates are discussed in equate_sec.