mirror of
https://github.com/PDP-10/its.git
synced 2026-01-11 23:53:12 +00:00
These were all files included in MIT's 1999 release of SYSDOC and SYSTEM, so they need the license header. CHAORD is an XGP text file; XGP processors should ignore unknown directives so ;comment should work.
2188 lines
95 KiB
Plaintext
Executable File
2188 lines
95 KiB
Plaintext
Executable File
;comment Copyright (c) 1999 Massachusetts Institute of Technology
|
||
;comment See the COPYING file at the top-level directory of this project.
|
||
;comment ------------------------------
|
||
;skip 1
|
||
;list
|
||
;lftmar 350
|
||
;kset 20fg
|
||
|
||
CHAOS ORDER
|
||
|
||
**** DRAFT ****
|
||
NOTES:
|
||
Work more on dynamic flow control, see end of that section.
|
||
Data grams seem to have a bug that the two parties can never
|
||
agree on whether they both agree that it really happened. Am I losing?
|
||
Flush cruft at end?
|
||
Add QES, which is the same as RFC but implies you expect ANS? Any point to it?
|
||
For ITS, a flavor of listening which queues RFCs coming in while index is busy,
|
||
or otherwise avoids that timing problem.
|
||
-------
|
||
This table of contents has not been kept up to date *******
|
||
|
||
Goals
|
||
Non-goals
|
||
Hardware Assumptions and Constraints
|
||
|
||
User appearance
|
||
|
||
Connections
|
||
Contact Names
|
||
ITS implementation
|
||
Lisp Machine implementation
|
||
|
||
Network Control Program
|
||
|
||
Packets
|
||
Hosts attached to more than one subnet
|
||
Subnet and Host numbers
|
||
Packet numbers
|
||
Indices
|
||
Routing
|
||
Operations
|
||
Table of packet field use vs opcode
|
||
Flow and Error Control
|
||
Dynamic Window-size Adjustment
|
||
Uniquization
|
||
Media Handlers
|
||
Buffers
|
||
ITS System Calls
|
||
|
||
Comparison with LCSNET
|
||
Principle differences
|
||
High priority data packets, interrupts, and flushing
|
||
Data grams
|
||
Multiple messages per packet
|
||
Checksums in the packet
|
||
Host-independent user-level protocols
|
||
Very Small Hosts
|
||
|
||
Transmission Media
|
||
|
||
Ethernet
|
||
TEN11 Interface
|
||
DL10 & DTE20
|
||
Asynchronous line
|
||
|
||
Higher-Level Protocols
|
||
|
||
Telnet
|
||
File Access
|
||
Mail
|
||
Locate-named-service
|
||
|
||
>Goals
|
||
|
||
High speed communication between processes running in various local machines.
|
||
By "high speed", I mean much faster than the Arpanet.
|
||
At least comparable to TU20 magtape (30000 characters/second),
|
||
about 10 times the speed of the Arpanet.
|
||
(30-50 kcps seems to be the measured preformance, with ITS, 10/2/78)
|
||
|
||
No undetected errors in data transmission.
|
||
|
||
Not to depend on a particular medium. (However, we are compromising
|
||
by picking a fixed packet size. The simplicity and efficiency are worth it.)
|
||
|
||
Simple enough to put in small pdp11's. Also, simple to the user.
|
||
|
||
As much power as the Arpanet but, hopefully, a lot less hair.
|
||
|
||
Work well for both "telnet" and "file transfer."
|
||
|
||
The initial implementation in ITS should have the "in-system" part as
|
||
small and simple as possible. This includes no per-host tables
|
||
in the NCP, which allows large nets and easy reconfiguration.
|
||
There is, of course, a host-name to address translation table
|
||
used by "user" programs. The NCP has to have a per-subnet table
|
||
which remembers where the bridge(s) to that subnet from
|
||
the local subnet are.
|
||
|
||
The acknowledgement protocol must be designed not to limit
|
||
performance.
|
||
|
||
Statistical flow control ((see below.))
|
||
|
||
Avoid bottlenecks such as the Arpanet "control link". Be immune
|
||
to transmission medium failures which cause deleted, duplicated,
|
||
or out-of-order packets.
|
||
|
||
|
||
>Non-goals
|
||
|
||
Byte sizes other than 8 bits. (pdp10 binary transmission should
|
||
be part of a user-level file-transfer/ML-device protocol.)
|
||
|
||
Compatibility with the Arpanet.
|
||
|
||
Substituting for TEN11 interface functions such as running
|
||
the AI TV11 and XGP.
|
||
|
||
Automatic routing will be deferred. Initially the routing
|
||
tables will be assembled into the programs. A host needs
|
||
one routing table for each subnet it is connected to
|
||
(or one for each network interface it possesses.)
|
||
|
||
|
||
>Hardware Assumptions and Constraints
|
||
|
||
Transmission is physically in "packets" which have headers, rather
|
||
than in, e.g., continuous streams.
|
||
|
||
The chaos net (ether) interface limits the physical length
|
||
of a packet to 4097 bits including overhead bits. The net result
|
||
is the maximum number of data bytes (excluding the header
|
||
defined by this protocol) in any packet is 488. This limitation
|
||
will be extended to the whole network (to keep things simple).
|
||
However some provision will be made for a possible network-wide
|
||
packet-size expansion, which has already been done once.
|
||
|
||
All transmission media will be assumed to be highly-reliable
|
||
but not perfect; "perfect" reliability will be assured by having
|
||
the two ends of a connection use an acknowledgement protocol
|
||
which detects lost messages. Transmission media are required
|
||
to lose any messages that they don't deliver intact. (I.e. there
|
||
must be hardware checksums.)
|
||
|
||
>User appearance
|
||
|
||
The network allows user processes in various machines to
|
||
communicate with each other in various ways, for instance,
|
||
in imitation of a terminal, or in imitation of a disk file
|
||
system. These facilities are built on top of the basic
|
||
capability to send "packets" (a header plus some data in the
|
||
form of 8-bit bytes) through the network. The network undertakes
|
||
never to lose or garble any packets, except when the connection
|
||
is cut off entirely.
|
||
|
||
This document defines the low-level, "in-system" part of the
|
||
protocol. On top of this, special programs (running in user-mode)
|
||
will implement the higher-level protocol that the general user
|
||
program sees. These protocols and programs won't be discussed
|
||
further in this document, but remember that the strange packet
|
||
formats and so forth are not seen by most user programs.
|
||
|
||
>>Connections
|
||
|
||
When two processes wish to communicate, they establish a
|
||
connection between them. This connection allows two streams
|
||
of packets to flow, one in each direction. [Explain why
|
||
connections should be bi-directional rather than uni-directional.
|
||
Basically that's what you always want, and it makes things simpler.]
|
||
|
||
Connections are essentially the only facility provided by the network.
|
||
However, when first establishing the connection it is necessary
|
||
for the two processes to contact each other, and make each
|
||
other known to their respective operating systems. In addition,
|
||
it is often the case (in the usual user-server situation) that
|
||
one of the processes does not exist beforehand, but is to be created
|
||
and made to run a specified program.
|
||
|
||
>>Contact Names
|
||
|
||
The way we choose to implement contacting is to say that one process
|
||
is always a "user" and one process is always a "server". The server
|
||
has some "contact name" to which it "listens". The user requests its
|
||
operating system to connect it to a specified contact name at a
|
||
specified host. If a process at that host is listening to that
|
||
contact name, the two are connected. If no one is listening to that
|
||
contact name, the operating system must create a server process
|
||
which will load itself with the appropriate program and connect up.
|
||
|
||
Discovering which host to connect to to obtain a given service
|
||
is an issue for higher-level protocols. It will not be dealt
|
||
with at all initially (that is, there will be a table of host
|
||
names and numbers and the user will have to enter the name.)
|
||
|
||
Once the connection has been established, there is no more need for
|
||
the contact name, and it is discarded. Indeed, often the contact name
|
||
is simply the name of a network protocol (such as "telnet") and several
|
||
users may want to have connections to that service at the same time,
|
||
so contact names must be "reusable." (In the other common case, the
|
||
contact name will be a "gensym".)
|
||
|
||
As far as the operating systems involved are concerned, contact names
|
||
are simply arbitrary ascii strings defined by user programs. It is
|
||
expected that the various higher-level protocols will define standard
|
||
contact names; for instance, to get the telnet protocol one would
|
||
connect to "telnet"; to get the file transfer protocol one would
|
||
connect to "file-transfer". If a machine receives a request to connect
|
||
to a contact name which no one is currently listening to, a server
|
||
process must be created and made to execute a program which decides,
|
||
from the contact name, what server program to load and execute, or else
|
||
to refuse the request for connection.
|
||
|
||
Contact names have no relation to file names; they are simply
|
||
a device for introducing two processes to each other. If one was
|
||
using the network to transfer a file, one would first contact
|
||
the file transfer server at the appropriate host, then send a
|
||
packet containing the name of the file to be accessed.
|
||
|
||
|
||
>>ITS system calls
|
||
|
||
Ordinary user programs will not access the network directly; they will
|
||
go indirectly through a job-device or sty-type program which will
|
||
use a higher-level protocol to make the network look like what the
|
||
user wants, the traditional things being a terminal and a disk
|
||
file system.
|
||
|
||
Since these intermediate user-mode programs for using the network will
|
||
exist, there is no reason for the interface to the low level network
|
||
provided by the system to look at all like a standard device. Instead,
|
||
it will be designed solely for simplicity and ease of implementation,
|
||
and for a certain degree of efficiency. This interface will be
|
||
described after the interface between Network Control Programs in
|
||
different machines (the low-level protocol) is described.
|
||
|
||
At some future time the intermediate programs might get moved into the
|
||
system for reasons of efficiency, but that should not be allowed to
|
||
complicate the initial implementation.
|
||
|
||
As of October 1978, the opening and closing of connections is completely
|
||
device-dependent, as are "status"-type operations, however byte-string
|
||
I/O is supported in a fashion which is device-independent except when
|
||
errors occur, which is handy in several programs. Packet-level,
|
||
device-dependent I/O is also supported.
|
||
|
||
The .INSRT-able file of routines NETWRK, will be augmented to handle
|
||
both the Chaos net and the Arpa net.
|
||
|
||
|
||
>>Lisp Machine implementation
|
||
|
||
In the case of the Lisp Machine, the only distinction between user
|
||
programs and system programs is who maintains and documents them,
|
||
and how carefully.
|
||
|
||
(More?)
|
||
|
||
>Network Control Program
|
||
|
||
This is the part of the operating system(s) that implements the network
|
||
(obviously).
|
||
|
||
>>Packets
|
||
|
||
The NCP's operate by exchanging packets. A packet consists of a
|
||
header containing control information, and zero or more 8-bit bytes of
|
||
data. Hardware restrictions of the Chaos net interface
|
||
restrict the maximum length of a packet to 253 16-bit words. In fact,
|
||
we will limit it to 252 words (to make packet buffers in pdp10's be 128
|
||
words including two overhead words). Again for the convenience of
|
||
pdp10's, the header should be an even number of 16-bit words.
|
||
|
||
In this section the packets will be described as they look to a pdp11.
|
||
They look the same inside a Lisp Machine, since the byte structure is the
|
||
same. Inside a pdp10, packets are stored with two 16-bit words
|
||
left-adjusted in each pdp10 word. Additionally, the bytes in the data
|
||
portion of the packet are swapped so as to put them in pdp10 standard
|
||
order. pdp11's that act as network interfaces for pdp10's will be required
|
||
to do this byte swapping since they're likely to have more time available
|
||
than the 10 to do it in, and can also do it faster, having a special
|
||
instruction for it. pdp10's that communicate directly to the network will
|
||
have hardware assistance for byte reshuffling in their interfaces. See the
|
||
transmission media section for how packets are encapsulated during
|
||
transmission through the various media.
|
||
|
||
The header is 8 16-bit words and contains the following fields:
|
||
|
||
-----------------------
|
||
|opcode(8) | unused(8)|
|
||
-----------------------
|
||
|fc(4) | nbytes(12) |
|
||
-----------------------
|
||
| destination host # |
|
||
-----------------------
|
||
| destination index |
|
||
-----------------------
|
||
| source host # |
|
||
-----------------------
|
||
| source index |
|
||
-----------------------
|
||
| packet # |
|
||
-----------------------
|
||
| ack packet # |
|
||
-----------------------
|
||
|
||
opcode - tells the receiver of the packet how to interpret
|
||
it. See the Operations section below.
|
||
This is 8 bits long. The 128 opcodes with high
|
||
order bit =0 are for NCP use. The 128 with high
|
||
order bit =1 are for user-program use.
|
||
|
||
unused 8 bits reserved for future use.
|
||
|
||
fc - forwarding count. 4 bits which count the number of times this
|
||
packet has been forwarded by bridges. Initially this field
|
||
is always generated as zero. Each bridge increments it;
|
||
if it overflows, there is assumed to be a loop
|
||
and the packet is discarded.
|
||
|
||
nbytes - the number of 8-bit bytes of data in the data part.
|
||
The maximum value of nbytes is 488. The minimum is 0.
|
||
This is 12 bits long to allow for 4K-bit packets.
|
||
(Actually 12 bits is enough for up to 32K-bit packets.)
|
||
|
||
destination host #
|
||
This is divided into two 8-bit fields. The high
|
||
byte specifies which subnet. The low byte specifies
|
||
which host on that subnet, and (on ethernet subnets)
|
||
is identical to the hardware host number. Neither
|
||
field may be zero in a valid host number.
|
||
|
||
destination index - index for this connection assigned by the
|
||
destination host's NCP.
|
||
|
||
source host # - see destination host #.
|
||
|
||
source index - index for this connection assigned by the
|
||
source host's NCP.
|
||
|
||
packet # - an ascending reference number used in error and
|
||
flow control (see below).
|
||
|
||
ack packet # - used in error and flow control (see below.)
|
||
|
||
>>Hosts attached to more than one subnet
|
||
|
||
(This also applies to hosts with more than one interface to
|
||
the same subnet, if there ever are any.)
|
||
|
||
Such hosts ought to act as bridges. That is, if a packet
|
||
is received which is not addressed to this host, it should
|
||
be sent back out, using this host's routing tables. The
|
||
forwarding count should be used to prevent infinite loops
|
||
in the event of inconsistent forwarding tables in two bridges.
|
||
|
||
It is undesirable for a host to have more than one number.
|
||
So a host connected to multiple subnets should choose one
|
||
subnet as its "home", which is the address which is advertised
|
||
as that host. The host's other network connections are
|
||
un-named bridges. In some causes it may be preferable
|
||
not to pick a "home" subnet; instead, one invents a new
|
||
private subnet which only has that one host on it,
|
||
and all the host's network connections act as bridges
|
||
to that subnet (also to each other).
|
||
|
||
The routing should be set up so that packets destined for such
|
||
a host from a subnet on which it has an interface choose that
|
||
interface as the bridge to that host, so that in fact data flows
|
||
the same way as if the host had more than one number and the
|
||
host-name to host-number lookup magically chose the right number.
|
||
|
||
|
||
>>Subnet and host numbers.
|
||
|
||
Subnet numbers are arbitrary. Host numbers are assigned according to
|
||
position on the cable, as explained (elsewhere).
|
||
|
||
These numbers may be found in the file SYSENG;HOSTS >
|
||
|
||
The physical cable address of a host should include both its subnet number
|
||
and its host number (prior to October 1978 the subnet field of a physical
|
||
address was always 0). This allows the physical address to be used as
|
||
a unique machine identifier and makes it possible for a host to discover
|
||
its full 16-bit host address without prior knowledge.
|
||
|
||
>>Packet numbers
|
||
|
||
Each time the sending user puts another packet into the network, this
|
||
number is increased by one. (These numbers are independent for the
|
||
two directions of a connection.) The receiver uses these numbers to
|
||
get the packets into order and ensure that there are no duplications
|
||
nor omissions. The packet numbers are 16 bits and wrap around to zero
|
||
when they overflow. When the connection is first opened, an initial
|
||
value for the packet# is established. If it was 0, then the packet#
|
||
of the first data packet would be 1.
|
||
|
||
Packet #'s should be compared modulo 2**16. On pdp11's, use
|
||
|
||
CMP A,B
|
||
BMI <A is less> (BMI rather than BLT or BLO)
|
||
|
||
On pdp10's, use
|
||
|
||
SUB A,B
|
||
TRNE A,100000 (rather than CAMGE A,B)
|
||
JRST <A is less>
|
||
|
||
On Lisp machines, use
|
||
|
||
(AND (BITTEST 100000 (- A B))
|
||
<A is less>) [rather than (AND (< A B) ...)]
|
||
|
||
>>Indices
|
||
|
||
Each connection has two indices assigned to it, one at each end. Each
|
||
index is an arbitrary 16-bit number assigned by the NCP at its end; usually
|
||
it is an index into that NCP's tables. Indices are required to be
|
||
non-zero. For maximum simplicity, all packets include both indices. The
|
||
receiver of a packet uses the destination index to find out who to give the
|
||
packet to. Generally the source index is used only for error checking, but
|
||
when a connection is first opened the source index has to be saved and used
|
||
as the destination index in future packets in the reverse direction.
|
||
|
||
To prevent packets somehow left over from old connections from
|
||
interfering with new connections, we require that a certain time
|
||
elapse between successive connections between the same pair of hosts
|
||
with the same index numbers at each end, this time to be longer than
|
||
the maximum time a packet is reasonably expected to sit around in
|
||
the network somewhere (the maximum transit time through all bridges, etc.)
|
||
This requirement is implemented by making part of the index number be
|
||
a "uniquizer"; when a host reuses a slot in its tables, it increments
|
||
the uniquizer, so that the index number for that slot is not the same
|
||
as it was previously. Then if the uniquizer field is sufficiently wide,
|
||
and the rate of creation of connections (actually the rate of allocation
|
||
of indices) is sufficiently low, the requirement will be satisfied.
|
||
For the kind of network we're talking about, the time is a few tens of
|
||
seconds, and the uniquizer need only be a few bits wide. It is up
|
||
to each host how wide it makes the uniquizer, depending on how big
|
||
it wants its tables to be. It is best if table slots are also
|
||
allocated circularly, so that slots are reused at the minimum possible rate.
|
||
|
||
The uniquizer also serves to "more or less" uniquely identify connections
|
||
so that duplicate copies of a Request For Connection (RFC) packet
|
||
can be identified and discarded.
|
||
|
||
A user process's "capability" or "channel" to a connection, used by it
|
||
to ask the NCP to operate on that connection, simply contains the
|
||
appropriate index.
|
||
|
||
Associated with each index the NCP has a "state", the host # and index
|
||
# of the other end of the connection, some read buffers and associated
|
||
variables, including a current packet #, and some write buffers and
|
||
associated variables, again including a current packet #.
|
||
|
||
The "state" can be Closed (no connection or other activity currently
|
||
associated with this index), Open (this index has a connection to
|
||
another index at another machine), RFC-sent (requested another machine
|
||
for a connection, but no answer yet), Listen (listening for a request
|
||
for connection to a certain contact name), Broken (connection closed
|
||
abnormally by network or machine lossage), and RFC-received (waiting
|
||
for a server process to get going and pick up a request for connection
|
||
that came in).
|
||
|
||
|
||
>>Routing
|
||
|
||
This section is a place-holder. Initially routing will be kludged with
|
||
a fixed table. Once the network is set up automatic routing will
|
||
be put in.
|
||
|
||
The routing decision consists of looking at the destination subnet
|
||
field of a packet and deciding which interface (on a multi-interface
|
||
host) to transmit it on, i.e. what is the best route to that subnet.
|
||
In addition, if the destination is not on a subnet to which there
|
||
is a direct interface, one must determine what is the host number of
|
||
the bridge it should be sent to.
|
||
|
||
It also involves recognizing packets which are destined to the
|
||
current host and receiving them.
|
||
|
||
|
||
The following is not yet implemented. It is an initial plan for routing.
|
||
|
||
Gaetways will broadcast RUT packets periodically, perhaps once a minute.
|
||
The destination field is zero and the source field is the address of the
|
||
gateway on the net on which the packet is being broadcasted. The data field
|
||
contains a bunch of 16-bit words, divided into fields in the same way as
|
||
a host address. The subnet field of each word contains the number of a
|
||
subnet which this gateway is able to reach. The host field of each word
|
||
contains the hop count, which is 0 if the gateway is physically connected
|
||
to the specified (non-ether) subnet, 1 if the gateway is connected to
|
||
the specified (ether) subnet, or 1+ the hop count of the closer gateway
|
||
if this gateway indirects through another. The hop counts allow the gateways
|
||
to avoid loops when there are multiple paths; they may not need to be
|
||
looked at by non-gateway hosts.
|
||
|
||
Each host maintains a routing table, with entries keyed by subnet number.
|
||
For subnets to which the host is physically connected, the entry points
|
||
to the host's physical interface. For subnets which the host knows how to
|
||
get to because it has been informed of a gateway via a RUT packet, the entry
|
||
contains the gateway's host address, the hop count, and the time that this
|
||
route was last confirmed by a RUT packet. If the NCP remembers only one
|
||
route to a given subnet, it wants to remember the one with the smallest
|
||
hop count unless that one is more than (say) 5 minutes old. If the NCP
|
||
remembers all the routes, it should stop using ones which are more than
|
||
(say) 5 minutes old. It is not clear that it should ever use one with
|
||
a hop count larger than the minimum. The reason for the time-out is to
|
||
avoid forever losing packets by sending to a gateway which has gone down.
|
||
If there are no gateways up and known about by the NCP which reach a given
|
||
host, it will look like that host is down.
|
||
|
||
>>Operations
|
||
|
||
This section tells what the values of the opcode field in a packet are, and
|
||
how an NCP responds to each one.
|
||
|
||
1 RFC - Request for Connection
|
||
|
||
This message is sent from user to server in order to open a
|
||
connection. The data contains the contact name. Actually, if
|
||
the data contains a space, the characters before the space are
|
||
the contact name and the characters after the space are "arguments".
|
||
This is done so that simple transactions can work (see below).
|
||
|
||
The destination index is zero, because it is not known yet. The
|
||
responses are:
|
||
|
||
OPN, if a server process is found or created that wishes
|
||
to accept the request and open up a connection.
|
||
|
||
CLS, if the connection cannot be opened. The data field
|
||
contains an ascii explanation of why not.
|
||
|
||
ANS, if a simple transaction is occuring. The data contains
|
||
the answer, and no connection is ever created.
|
||
|
||
FWD, if there is no server process at this host, but there
|
||
might be a suitable one some place else.
|
||
|
||
There may also be no response, if the RFC was lost in the network, or
|
||
the destination host is down, or the reply was lost in the network.
|
||
|
||
To increase the reliability of establishment of connections, RFC's and
|
||
OPN's are retransmitted, just as data packets are, until a response is
|
||
elicited. CLS, ANS, and FWD cannot be retransmitted because we take
|
||
the position that retransmission is implemented by something associated
|
||
with a connection, and these packets are not sent by connections (and are
|
||
not acknowledged). OPN is sent by a connection, and RFC is sent by
|
||
a sort of embryonic connection which continues to exist while it awaits
|
||
an OPN or other reply.
|
||
|
||
Since RFC is retransmitted, it is the responsibility of the NCP to detect
|
||
and discard duplicates. When an RFC is received, all existing connections
|
||
in the OPN or RFC-RECEIVED state, and all "pending" RFC's which are awaiting
|
||
servers to connect to them (if the NCP has such), should be checked to see
|
||
if they have the same source host and index number as the received RFC.
|
||
If so, the RFC is a duplicate and should be ignored. Note that connections
|
||
in the LOST, CLOSED, or BROKEN states should not be checked, since these
|
||
are not really connections as far as the foreign host is concerned, but simply
|
||
ghosts of connections left around to remember error status for their controlling
|
||
user programn.
|
||
|
||
Since the response to RFC is not guaranteed, processes issueing RFC's
|
||
must have timeouts. In most implementations the normal host-down timeout
|
||
will suffice.
|
||
|
||
[We should discuss why the special kludgery for control packets, rather
|
||
than using the regular connection mechanism to assure reliable transmission
|
||
of control packets, as the Arpanet does. Also a discussion of which control
|
||
operations inherently need reliability and which inherently don't, and
|
||
why that should be so.]
|
||
|
||
The packet # field contains the first packet # that will be assigned
|
||
to data transmitted from the user process, minus one modulo 2**16. In
|
||
the simplest case, this can be zero, and the first packet sent will be
|
||
packet # 1. One might also imagine uniquizing the packet numbers
|
||
as an extra error check, but this should not be necessary, because
|
||
the indices are uniquized, and connections must be explicitly agreed
|
||
to by each end before data can flow.
|
||
|
||
|
||
2 OPN - Connection Open
|
||
|
||
This is the positive acknowledgement to RFC. The source index field
|
||
conveys the acknowledger's connection index to the requester. The packet #
|
||
field contains the first packet # that will be assigned to data transmitted
|
||
from the server process, minus one modulo 2**16. The data portion of this
|
||
packet is the same as a STS packet (see below), and mainly serves to convey
|
||
the server's window-size to the user. The ack packet # field must contain
|
||
the usual value, i.e. the number that was sent in the packet # field of the RFC.
|
||
The receipt and ack in the OPN serve primarily to terminate retransmission of the RFC.
|
||
|
||
When an OPN is received, a STS is sent in response, telling the server
|
||
the user's window-size. The exchange of an OPN and a STS also serves
|
||
as acknowledgement to each side that the other believes the connection
|
||
is open. No data packets may be committed to the network until after
|
||
this has occurred. If this rule was not followed, and packets happened
|
||
to get out of order, a data packet could arrive before the connection
|
||
was open and cause a LOS.
|
||
|
||
To improve the reliability of establishment of connections, OPN's are
|
||
retransmitted, just as data packets are, until receipted (and
|
||
acknowledged) by STS. Because of this retransmission, the NCP must
|
||
detect and discard duplicate OPN's. If an OPN is received for a connection
|
||
which exists and is not in the RFC-SENT state, the OPN should be ignored
|
||
and no LOS should be sent.
|
||
|
||
OPN's contain 16-bit data and are not byte-swapped. See below under opcode 300.
|
||
|
||
3 CLS - Connection Closed
|
||
|
||
CLS is the negative response to RFC. It indicates that no server was
|
||
listening to the contact name, and one couldn't be created, or for
|
||
some reason the server didn't feel like accepting this request for a
|
||
connection, or the destination NCP was unable to complete the
|
||
connection (e.g. connection table full.) The destination index will
|
||
be the source index of the RFC. The source index will be zero because
|
||
the NCP did not put this connection into its tables. The data bytes,
|
||
if there are any, contain an ascii explanation.
|
||
|
||
CLS is also used to close a connection after it has been open for a while.
|
||
In the Arpanet, the NCP undertakes not to close the connection when the
|
||
user requests it, but waits until all data transfer has completed. This is
|
||
a source of extra complexity, since data transfer may be hung up, there
|
||
have to be timeouts, there have to be connections waiting to be closed
|
||
which aren't owned by any user, etc. It seems simpler to make CLS take
|
||
effect immediately, and let the user processes assure that data transfer
|
||
has been completed. Note that telnet-like applications don't need it, and
|
||
ftp-like applications have to have it separately from closing anyway.
|
||
|
||
Since there is no error recovery or retransmission mechanism for CLS,
|
||
the use of CLS is necessarily optional. However, it is desirable to
|
||
send a CLS when possible to decrease user confusion.
|
||
|
||
4 FWD - forward a request for connection
|
||
|
||
This is a response to RFC which indicates that the desired service
|
||
is not available at the process contacted, but may be available at
|
||
a possibly-different contact name at a possibly-different host. The
|
||
data field contains the new contact name and the ack packet # field
|
||
contains the new host number. The issuer of the RFC should issue
|
||
another RFC to that address.
|
||
|
||
5 ANS - answer to a simple transaction
|
||
|
||
Simple transactions are transactions which consist of a single question
|
||
(request) and a single answer (response). They have no side effects,
|
||
so it is not necessary for either party to know whether the other party
|
||
thinks the transaction was completed or not. Simple transactions need
|
||
no flow control and no error control, other than a timeout by the user
|
||
side to detect lost messages. They are a simple, efficient way for
|
||
doing simple-minded things such as extracting information (such as the
|
||
time or a host name table) from a central server.
|
||
|
||
A simple transaction is initiated by sending an RFC to a server which
|
||
happens to use simple transactions rather than full connections. The
|
||
data field of the RFC consists of the contact name of the server,
|
||
optionally followed by a space and arguments to the request. The
|
||
server responds by sending an ANS packet, whose data field is the
|
||
answer. The destination address of the ANS comes from the source
|
||
address of the RFC. The source address, packet #, and ack packet #
|
||
fields of the ANS are not used and should be zero.
|
||
|
||
The difference between simple transactions (2 packets) and
|
||
full datagrams (4 packets) is that in the simple transaction the two
|
||
parties don't have to agree about whether or not the transaction
|
||
in fact took place, while in the full datagram they do, making
|
||
acknowledgement necessary.
|
||
|
||
The server of a simple transaction should be prepared to process
|
||
the transaction multiple times without error. Simple transactions
|
||
should not have side effects which would be dangerous if repeated.
|
||
|
||
200-377 DATA - Transmits Data
|
||
|
||
The data portion of the packet is data being sent through the
|
||
connection. The packet # is a number that increments by one for each
|
||
data packet sent in this direction on this connection. This is used to
|
||
detect lost packets (which includes packets garbled in transmission and
|
||
packets lost in the statistical flow control scheme) and duplicated
|
||
packets (caused by lost or delayed acknowledges. The NCP undertakes to
|
||
deliver the packets to the destination process in the same order that
|
||
they came from the source process, with no duplications and no
|
||
omissions. Note that any opcode with the sign bit on is a data packet
|
||
as far as the NCP is concerned; if they wish, higher-level protocols
|
||
may use the opcode field to define various different kinds of data
|
||
packets. Thus, what is herein called a data packet may be a "control"
|
||
packet to a higher-level protocol. Normally, opcodes 200 and 300
|
||
should be used. Opcodes 201-277 and 301-377 are to be used for special
|
||
purposes, as defined by higher-level protocols.
|
||
|
||
Opcodes 300-377 are defined to be "16-bit data". Note that this does not
|
||
affect the byte count in the packet header; it is still a count of 8-bit
|
||
bytes. The sole effect of 16-bit data is to prevent byte-swapping when
|
||
communicating with pdp10's; pdp10's store the 2 8-bit bytes in a 16-bit
|
||
"word" in the reverse of the network standard order (defined by pdp11's
|
||
and Lisp machines). ((For purposes of 16-bit data, Interdata machines
|
||
are considered pdp10's.))
|
||
|
||
6 SNS - sense status
|
||
|
||
This packet is a request for a STS packet to be returned. It is used for
|
||
"probing", see the section on flow and error control.
|
||
|
||
Note that, to avoid a timing error, a SNS received by a connection in the
|
||
RFC-sent state should be ignored. This can happen if an OPN is transmitted
|
||
followed by a SNS and the packets get out of order.
|
||
|
||
SNS should not be transmitted on a connection that is not in the Open state.
|
||
|
||
|
||
7 STS - report status
|
||
|
||
STS is used for a variety of purposes. It is the vehicle to carry an
|
||
acknowledgement, when no data packet is being sent on which the
|
||
acknowledge could be piggy-backed. STS is used to set or change the
|
||
window size, to acknowledge opening of a connection, and to carry
|
||
receipts (see the flow control section.) In the future STS will (may)
|
||
be used to carry information used in automatic window-size adjustment.
|
||
Like most packets, the ack packet# field of STS carries an
|
||
acknowledgement, the number of the last packet given to the user
|
||
process. The first two bytes of the data field (low-order byte first)
|
||
carry a receipt, the number of the last packet guaranteed to be given
|
||
to the user process eventually. The next 2 data bytes carry the window
|
||
size (low-order byte first). Additional data bytes will be defined in
|
||
the future.
|
||
|
||
STS's contain 16-bit data and are not byte-swapped. See above under opcode 300.
|
||
|
||
10 RUT - routing information
|
||
|
||
This packet type is reserved for the future, when automatic routing exists.
|
||
|
||
RUT's contain 16-bit data and are not byte-swapped. See above under opcode 300.
|
||
|
||
11 LOS - you are losing
|
||
|
||
If a host receives a packet for a connection that does not exist (other
|
||
than RFC which isn't associated with a particular connection, LOS, and
|
||
CLS which is safe to ignore), it should return a LOS packet to the
|
||
source of the offending packet. The source of the LOS should be the
|
||
destination of the offending packet, and the packet# and ack packet#
|
||
fields should be copied. The data portion of a LOS contains an ascii
|
||
explanation of the problem. A host receiving a LOS should break the
|
||
connection specified by the destination index and inform the associated
|
||
process that something has gone wrong. It should make the LOS packet
|
||
available to that process so the explanation of the problem can be
|
||
read.
|
||
|
||
The LOS packet isn't actually necessary, since if the supposed other
|
||
end of a connection refuses to cooperate (i.e. never sends any
|
||
packets), after a while the NCP will give up, close the connection, and
|
||
inform the user that the foreign host appears to be dead.
|
||
|
||
For debugging, an echo feature is implemented as follows. If you send
|
||
a packet with a data opcode and source and destination indices of 0,
|
||
it will be sent back as a LOS packet. The data field will be destroyed
|
||
by the error explanation, but the packet # and ack packet # fields can
|
||
be used to remember any sequencing or checking information.
|
||
|
||
12 LSN - listen (never transmitted through the net, see below)
|
||
|
||
13 MNT - maintenance
|
||
|
||
Normal NCPs will discard MNT packets, without generating a LOS. This packet
|
||
type is reserved for use by maintenance programs.
|
||
|
||
>>Table of packet field use vs opcode
|
||
|
||
The unused field is never used and must be zero, the forwarding count
|
||
field is always used in the same way, and the nbytes field is always
|
||
the length of the data. Fields marked "0" below are don't care
|
||
rather than must be zero, but zero is certainly recommended. The packet#
|
||
field of CLS, SNS, and STS would best be set the same as the packet# of
|
||
the next data packet to be sent (just as in RFC and OPN).
|
||
|
||
Destination Source
|
||
Opcode Host Index Host Index Packet# Ack pk# Data
|
||
------ ---- ----- ---- ----- ------- ------- ----
|
||
RFC usual 0 usual usual first - 1 0 contact name
|
||
|
||
OPN usual usual usual usual first - 1 usual 0, window size
|
||
|
||
CLS usual usual usual 0 0 0 reason
|
||
|
||
ANS usual usual usual 0 0 0 answer
|
||
|
||
FWD usual usual usual 0 0 new host contact name
|
||
|
||
SNS usual usual usual usual 0 usual 0
|
||
|
||
STS usual usual usual usual 0 usual receipt#, window size
|
||
|
||
LOS src h src i dst h dst i pk# ack pk # reason
|
||
|
||
LSN 0 0 0 0 0 0 contact name
|
||
|
||
Data usual usual usual usual usual usual data
|
||
|
||
MNT completely nonstandard
|
||
|
||
RUT completely nonstandard
|
||
|
||
>>Flow and Error Control
|
||
|
||
The NCPs conspire to ensure that data packets are sent from user to
|
||
user with no duplications, omissions, or changes of order.
|
||
Secondarily, the NCPs attempt to achieve a maximum rate of flow of
|
||
data, and a minimum of overhead and retransmission.
|
||
|
||
The transmission medium is required to lose all damaged packets. Therefore
|
||
error control reduces to retransmission of lost packets, plus immunity to
|
||
duplicated and out-of-sequence packets.
|
||
|
||
The following concepts must be explained: the window, acknowledgement,
|
||
receipt, retransmission, and probing.
|
||
|
||
The window is the set of data packets "in the network" between the
|
||
sending process and the receiving process. Conceptually the window
|
||
slides along as transmission proceeds. When a packet is acknowledged,
|
||
that means that the window is to the right of that packet, since the
|
||
receiving process has gobbled that packet. The window has a fixed size
|
||
to limit the amount of buffer space that is used up if the sender sends
|
||
faster than the receiver receives. If the sending user process tries
|
||
to transmit more packets than the window size allows, it should be made
|
||
to wait until some packets have been acknowledged. The window is made
|
||
sufficiently large to regulate how often acknowledgements must be
|
||
returned. Note that the window includes only data packets, not
|
||
control packets. Control packets are to be processed immediately
|
||
when they are received, and do not take up buffer space. Separate
|
||
mechanisms are provided to deal with control packets being lost in
|
||
the network.
|
||
|
||
No sender is actually required to pay any attention to the window size.
|
||
No receiver is actually required to set the window size to something
|
||
reasonable. However, those hosts that want to maximize performance
|
||
should do something about the window size. The size is initially set
|
||
during the RFC/OPN/STS dialogue, presumably according to the type of
|
||
protocol being used. An NCP may, if it chooses, dynamically adjust the
|
||
window size according to observed network behavior. (See dynamic
|
||
window-size adjustment section below.)
|
||
|
||
An acknowledgement is a signal from a receiver to a sender that all
|
||
packets through packet number "n" have been given to the receiving
|
||
process, therefore the window should go from n+1 through n+window_size.
|
||
Note that acknowledgements can get out of order, so one should use
|
||
the maximum (mod 2^16) of all the acknowledgements ever received
|
||
as the start of the window. Since acknowledgements are so common,
|
||
there is a field (ack packet #) in the data packet which allows
|
||
an acknowledgement to be "piggy-backed" on another packet.
|
||
|
||
A receipt is a signal from a receiver to a sender that all packets
|
||
through packet number "n" have been received successfully by the NCP
|
||
and are guaranteed to be delivered to the user process, therefore they
|
||
need not be retransmitted. Note that acknowledgement implies receipt.
|
||
The separate receipt mechanism is supplied so that useless
|
||
retransmissions can be limited, when the data have been received but
|
||
cannot be acknowledged because the receiving process is being slow
|
||
about reading them. The STS packet is used to send a
|
||
receipt. Receipts are optional.
|
||
|
||
Retransmission is the process of sending all unreceipted data packets
|
||
in the sender's queue through the network to the receiver, except those
|
||
that were last sent very recently (within 1/30'th of a second in ITS.)
|
||
Retransmission occurs every 1/2 second, and when a STS packet is
|
||
received. The idea of retransmission is to keep sending a packet until
|
||
it has been proven to have successfully reached its destination (by
|
||
receipt or acknowledgement.) The reason retransmission occurs in
|
||
response to STS is so that a receiver may cause a faster retranmission
|
||
rate than twice a second, if it so desires. Since STS carries a
|
||
receipt, and very-recently-transmitted packets are not retransmitted,
|
||
this should never cause useless retransmission.
|
||
|
||
A probe is the sending of a SNS packet, in the hope of eliciting
|
||
either a STS or a LOS, depending on whether the other side believes
|
||
in the connection. Probing is used periodically as a way of testing
|
||
that the connection is still open, and also serves as a way to get STS
|
||
packets retransmitted as a hedge against the loss of an acknowledgement,
|
||
which could otherwise stymie the connection.
|
||
|
||
We probe every five seconds, on connections which have unacknowledged
|
||
packets outstanding (a non-empty window), and on connections which have
|
||
not received any packets for one minute. If a connection receives no
|
||
packets for 1 1/2 minutes, this means that at least 5 probes have been
|
||
ignored, and the connection is declared to be broken.
|
||
|
||
The receiver generates STS packets under the following circumstances:
|
||
When a SNS is received (thus a response to a probe). When a duplicate
|
||
packet is received, so that further useless retransmission will be
|
||
prevented by the receipt. When the number of received but not
|
||
acknowledged packets is more than 1/3 the window size; evidently the
|
||
normal piggy-backed acknowledge mechanism is not working, so we
|
||
generate a STS to carry the acknowledge that will empty the window back
|
||
out, hopefully in time before transmission becomes clogged. When the
|
||
window size changes (to tell the other end of the change).
|
||
|
||
When it is time to send a STS, we attempt to send one immediately.
|
||
If this fails (for instance, there might be no buffers available),
|
||
we keep trying to send one every half-second.
|
||
|
||
The receiver can also generate "spontaneous" STS's, to stimulate
|
||
retransmission or to carry an acknowledge, to keep things moving on
|
||
fast devices with insufficient buffering, such as the Gould printer.
|
||
This provides a way for the receiver to speed up the retransmission
|
||
timeout in the sender, and to make sure that acknowledges are happening
|
||
often enough. For example, one might use a timer to generate a STS
|
||
every 1/10th of a second.
|
||
|
||
Note that spontaneous STS's should not be generated until the connection
|
||
is fully open. This means that the server should not send STS until it
|
||
has gotten a STS back from its OPN. STS's other than spontaneous ones
|
||
have no such problem.
|
||
|
||
>>Host Status
|
||
|
||
All physical Chaosnet hosts, even gateways, are required to answer an
|
||
RFC with contact name STATUS and byte count 5 (no "arguments" allowed
|
||
in the data field of the packet) by returning an ANS packet whose data
|
||
field contains: the name of the host in ascii, an octal 200 (any byte
|
||
with the sign bit on terminates the name), and additional status and
|
||
metering information to be defined later, perhaps in a site-dependent
|
||
way. This makes it possible to write a program which determines the
|
||
status of all nodes in the network; the program could either be driven
|
||
by a host table or could try all possible host addresses; the NCP should
|
||
respond promptly to the STATUS RFC rather than starting up a program
|
||
to handle it, if starting up a program would take more than a second
|
||
or two.
|
||
|
||
>>>Here is some narrative description of the NCP.
|
||
|
||
Each receiver (each end of each connection is a receiver, and also a
|
||
sender; think of receivers and senders as little actors inside the NCP)
|
||
has a list of buffers containing packets which have been successfully
|
||
received and are waiting to be read by the user process, and two
|
||
packet# variables. One is the number of the last packet read by the
|
||
user process. The other is the number of the last packet which has
|
||
been acknowledged. If these two are not equal, the receiver needs to
|
||
send an acknowledgement "soon."
|
||
|
||
The received-packet list needs to be kept sorted by packet number, and
|
||
the NCP has to make sure that the user process does not see duplicates
|
||
or omissions. If packets arrive out of order, the NCP has to sort them.
|
||
This means that the user process may not be able to read a packet even
|
||
though the receive list is non-empty, because the first packet in the
|
||
receive list is not the successor of the last packet read by the user
|
||
process.
|
||
|
||
A good way to do this is to have two lists of received packets, each
|
||
of which is sorted. The first list contains those packets which
|
||
the user process may read; the first in the list is the successor
|
||
to the last packet read, and there are no gaps in the list.
|
||
The second list is the remaining packets, which the user may not
|
||
read until some other packet arrives to fill in the gap. Each
|
||
list needs a pointer to head and tail, and the packet number of
|
||
the next packet to be appended to the tail.
|
||
|
||
It is not actually a requirement that an NCP support out-of-order
|
||
packets, rather than simply discarding them and hoping that they
|
||
will be retransmitted in order, but if it's not very hard to do
|
||
one might as well do it.
|
||
|
||
Acknowledgements are sent by putting the last-received variable into
|
||
the "ack packet #" field of an outgoing packet on the opposite
|
||
direction of the appropriate connection, and copying the last-received
|
||
variable into the last-acknowledged variable. Where does the outgoing
|
||
packet come from? First of all, all outgoing data, SNS, and STS
|
||
packets automatically carry acknowledgement for the reverse direction
|
||
of their connection. So if an outgoing packet happens to be sent at a
|
||
time when an acknowledgement is necssary, that takes care of it.
|
||
|
||
Secondly, if the number of outstanding unacknowledged packets is more
|
||
than 1/3 the window size, a STS should be generated and sent immediately
|
||
to acknowledge those packets before the sender fills up the window.
|
||
|
||
Thirdly, the "soon" of four paragraphs back is implemented by a timeout
|
||
in the NCP. If an acknowledgement remains required for a certain amount
|
||
of time, a STS should be generated and sent to carry it. The appropriate
|
||
time interval is 1 second, I would guess. This timeout does not have
|
||
to be too exact, however. One could also not bother with this and let
|
||
the other end's probe timeout trigger the STS via a SNS. However, it
|
||
is desirable to send a receipt fairly soon after receiving a packet
|
||
to avoid useless retransmission. This could be done either by a timeout
|
||
or by sending a receipt when the packet is received for the second time.
|
||
[No known NCP has such a timeout. 10/2/78]
|
||
|
||
The reason for having a timeout here, rather than just sending an
|
||
acknowledgement right away, is two-fold. It allows "batching" of
|
||
acknowledgements, where a single packet can be used to acknowledge
|
||
many packets, which halves the network traffic caused by bulk
|
||
data transfer. It also allows "piggy-backing" of acknowledgements
|
||
on data packets, which (for instance) decreases the network traffic
|
||
caused by remote-echoing telnet connections.
|
||
|
||
When a receiver receives a data packet, it compares the packet # of
|
||
that packet with the last-received variable. If it is less or equal
|
||
(modulo 2^16), it is a duplicate of a packet already given to the user,
|
||
and should be discarded (or it is at least 30000 packets ahead of the
|
||
user, which is unlikely.)
|
||
|
||
If it is greater, it is sorted into the received-packet list at the
|
||
appropriate position (if it has the same packet# as a packet already in
|
||
that list, it is a duplicate and is discarded.) It is NOT acknowledged
|
||
at this time; no packet is ever acknowledged until it has been given to
|
||
the user process ("end to end acknowledgement"). Since a packet on the
|
||
received packet list has not yet been acknowledged, it may be safely
|
||
discarded at any time if the operating system runs out of buffer space,
|
||
PROVIDED that it has not yet been receipted.
|
||
Also, if the receiving user process is not listening to the net, the
|
||
NCP cannot be swamped with arbitrary numbers of packets, since the
|
||
sending user process is not supposed to send more than window-size
|
||
packets past the last one the receiving user process read.
|
||
However, if receipts are being used, once a receipt has been sent for
|
||
a packet that packet may not be discarded. It is up to the individual
|
||
NCP which strategy it prefers to use.
|
||
|
||
Note that the most likely cause of packet duplication is that an
|
||
acknowledge or a receipt was lost, so whenever a duplicate packet is
|
||
discarded, a STS packet should be sent back containing the current
|
||
receipt and acknowledge packet numbers.
|
||
|
||
The sender has a list of packets which have been entrusted to it by the
|
||
user for transmission and one packet # variable, the number of the last
|
||
packet sent by the user. When the user next sends a packet, the sender
|
||
will increment this variable and set the packet# of the sent packet to
|
||
the result. The sender also sets the source and destination host
|
||
numbers and indices of the packet, sets the "ack packet #" to the
|
||
last-received variable of its corresponding receiver, sets the
|
||
receiver's last-acknowledged variable to that (clearing the receiver's
|
||
need-an-acknowledge flag), chooses a transmission medium by checking
|
||
the routing tables, and gives the packet to the transmission medium for
|
||
"immediate" transmission (perhaps it has to wait its turn in a queue.)
|
||
It also saves the packet on a list, in case retransmission is required.
|
||
|
||
With each buffered packet the sender holds in trust, it remembers the time
|
||
that packet was last transmitted. From time to time "retransmission"
|
||
occurs. The sender gives one or more packets from its list to the
|
||
transmission medium. It always starts with the oldest, so as to keep
|
||
things in order, and sends the rest in order until it gets to one that was
|
||
transmitted too recently to do again. Retransmission is used to recover
|
||
from lost or damaged packets, lost or damaged acknowledgements, and packets
|
||
discarded by the receiver due to lack of buffering capacity.
|
||
|
||
Each time a receiver receives a packet, it gives the "ack packet #"
|
||
from that packet to its corresponding sender. The sender discards any
|
||
packets with numbers less than or equal to that, since their successful
|
||
receipt has just been acknowledged, and advances the window. If a STS
|
||
packet is received, its receipt field is processed by discarding
|
||
packets, but the window is not advanced.
|
||
|
||
>>Dynamic Window-size Adjustment
|
||
|
||
This section has not been updated for receipts. Also, it is a bunch
|
||
of junk. Probably we can do without this stuff.
|
||
|
||
Permit me to stress that this stuff is optional for small NCPs.
|
||
|
||
The goals of flow control are:
|
||
1. Error recovery.
|
||
2. If the receiver is faster than the sender, avoid unnecessary
|
||
delays in transmission due to having to wait for an
|
||
acknowledge or having to wait for the sender process to wake up.
|
||
3. If the sender is faster than the receiver, minimize
|
||
retransmissions due to receive buffer overflow.
|
||
4. Minimize the number of otherwise-useless packets generated
|
||
to carry an acknowledgement or a window-size negotiation,
|
||
and minimize useless retransmissions.
|
||
|
||
Consequences of the goals:
|
||
1. All packets will be retransmitted until acknowledged.
|
||
2. The sending NCP must buffer several packets, and packets
|
||
must be acknowledged in groups, not one-by-one.
|
||
3. If the receiver is slow, something must force the sender
|
||
not to send packets too fast.
|
||
4. The interval between retransmissions should not be too small.
|
||
It may be desirable for it to increase if the receiving
|
||
process is not listening for some reason.
|
||
|
||
The window size is the maximum number of packets which may be in the
|
||
network at one time (for one direction of one connection). "In the
|
||
network" means output by the sending process and not yet input by
|
||
the receiving process. (These processes are the entities which
|
||
determine the rate, unless they are so fast that the network slows
|
||
them down.)
|
||
|
||
The window size is not the number of packets acknowledged at a time;
|
||
for best operation the latter must be 1/2 to 1/3 of the former.
|
||
See below.
|
||
|
||
If the sending process is slow (and determines the rate), things
|
||
are relatively simple. We just have to have a big enough window
|
||
size and frequent enough acknowledgement to cover for sending
|
||
process wakeup delays.
|
||
|
||
If things are not limited by the sender, then
|
||
Window size
|
||
Flow rate = ---------------
|
||
Round trip time
|
||
|
||
Round trip time = time to wake up sender process (multiplied
|
||
by the fraction of the time this
|
||
is necessary)
|
||
+ time packet is sitting in sender buffers
|
||
before it is transmitted
|
||
+ transit time through the net
|
||
+ time packet is sitting in receiver buffers
|
||
before it is read; this is the maximum
|
||
of time to process previous packets
|
||
and time to wakeup sender process
|
||
+ time until acknowledge is generated and sent
|
||
+ transit time through the net
|
||
|
||
The round trip time is the time between when packet N is output by the
|
||
sending process and when it is acknowledged, permitting the sending
|
||
process to output packet N+WS.
|
||
|
||
The main variable components of the round trip time are the delay
|
||
before acknowledgement and the delay waiting in the receiver buffer for
|
||
packets to be processed. If these were zero, the round trip time would
|
||
consist of two process wakeups and two network transit times
|
||
(determined by the delay waiting for the cable and waiting for previous
|
||
packets from this host to be transmitted, the time needed to load and
|
||
unload the interface in the buffer, and the actual transmission time,
|
||
multiplied by the number of bridges in the path.)
|
||
|
||
This ideal round trip time is probably on the order of 2 seconds.
|
||
The timeout for retransmission should be 2 to 3 times the round trip
|
||
time. The timeout for acknowledgement should be 1/3 to 1/2 the
|
||
round trip time. One could either measure the actual round trip time,
|
||
or use an estimate of say 3 seconds, a little higher than the ideal.
|
||
It would be a good idea to measure the round trip time in any case,
|
||
which is done by getting the elapsed time since transmission when
|
||
a packet is discarded due to its being acknowledged, and averaging
|
||
that.
|
||
|
||
The receiver process should initially set the window size to the
|
||
maximum flow rate it wants to handle times the desirable round trip
|
||
time.
|
||
|
||
Symptoms of improper window size:
|
||
|
||
If the window-size is too large, the round trip time becomes
|
||
long due to packet processing delay in the receiver buffer.
|
||
(There will be many packets in the receiver buffer, and the
|
||
receiver will be processing them slowly.) The long round-trip
|
||
time will cause unnecessary retransmissions. Retransmissions
|
||
could also be caused by the NCP's discarding received packets
|
||
due to insufficient core to buffer them.
|
||
|
||
If the window-size is too small, excessive process blocking
|
||
and waking up occurs. The receiver process often empties its
|
||
buffer and has to block until more packets arrive. The sender
|
||
process often fills up its buffer and has to block until
|
||
some of the buffered packets are acknowledged. A small window
|
||
size also causes acknowledgements to have to be sent more
|
||
frequently than necessary. Note that from the receiver's
|
||
point of view it is impossible to distinguish between the
|
||
window size being too small and the sending process being
|
||
too slow.
|
||
|
||
Here is a scheme for dynamic adjustment of the window size:
|
||
|
||
Note that window-size adjustments cannot take effect
|
||
(in the sense of fixing the symptoms) immediately, so it
|
||
is necessary to limit the rate at which the window size
|
||
is adjusted.
|
||
|
||
When the receiver receives (and discards) a duplicate of a
|
||
packet it already has in its buffer, this indicates either
|
||
that an acknowledgement was lost or that the window size
|
||
is too large. Since packets are assumed not be lost very
|
||
often, we may as well assume the window size is too large
|
||
and send a WIN packet to decrease it. Another possibility
|
||
would be to have the sender detect the long round-trip
|
||
time and complain to the receiver, who could adjust the
|
||
window size. The receiver must not decrease the window
|
||
size again until all packets currently buffered have
|
||
been read and acknowledged, indicating that the sender
|
||
has had a chance to decrease the number of packets
|
||
buffered at its end. A reasonable amount to decrease
|
||
the window size by is 1/3 of its current value.
|
||
|
||
When the sending process wants to output a packet, but the number of
|
||
packets already buffered is greater than or equal to the window size,
|
||
it should send a WTS, indicating that the problem is too small a window
|
||
size or too slow a receiver rather than too slow a sender. When the
|
||
receiving process wants to input a packet, but the buffer is empty, and
|
||
a flag is set indicating that a WTS has been received, it should send a
|
||
WIN packet adjusting the window size upward by 1/2 of its current value
|
||
(and clear the WTS-received flag). This is rate-limited by preventing
|
||
the sender from sending a second WTS until all the packets buffered at
|
||
the time the first WTS was sent have been acknowledged, indicating that
|
||
the receiver has had time to act on the first WTS.
|
||
|
||
The variables required. For both the sending and receiving sides, a
|
||
packet number which has to be acknowledged before WTS or WIN can be
|
||
sent again, and a flag saying whether this number is operative. Also,
|
||
a WTS-received flag in the receiver.
|
||
|
||
It is important to meter the performance of this mechanism and find out
|
||
whether it does anything and whether what it does is right.
|
||
|
||
Consider the possibilities of changing this into a more symmetric and
|
||
negotiation-based scheme, where the sender always initiates window size
|
||
changing and the receiver either agrees or ignores the request.
|
||
Consider using elapsed time as an additional rate-limiter (have to use
|
||
the other thing, too, so idle connections don't keep changing window
|
||
size; this may be deleteable if it is always sender-initiated.)
|
||
|
||
More notes on the subject of window-size too small.
|
||
This is identical to receiver too slow. The net flow rate
|
||
out of the sender is trying to be higher than that into
|
||
the receiver, so packets pile up in buffers at each end.
|
||
The round-trip becomes arbitrarily high to preserve the
|
||
equation and divide window size down enough to get the
|
||
flow rate.
|
||
|
||
The situation where the window-size is too small and we want to do
|
||
something about it has to be distinguished from two other situations.
|
||
One, the receiver is accepting packets slowly but the sender is also
|
||
sending them slowly. We don't want to change the window-size, because
|
||
it doesn't matter since packets aren't piling up, and at any time they
|
||
might both decide to go fast. Two, the receiver's net flow rate is
|
||
high, but its response time is long (it's taking packets in bursts).
|
||
Here the round-trip time is still long, but making the window size
|
||
smaller would make things worse.
|
||
|
||
The symptoms that distinguish the case where we want to make the
|
||
window-size smaller are: the round-trip time is long, the sender
|
||
buffer is full, and the number of packets acknowledged at a time is
|
||
small compared to the window size. Actually, the last two are sufficient,
|
||
since if the acknowledgement batch size is small, and we know it's
|
||
not the sender's fault, may as well decrease the window size
|
||
to save buffer space and decrease the round-trip time.
|
||
|
||
>>Uniquization
|
||
|
||
To avoid problems with packets left over from old connections
|
||
causing problems with new connections, we do two things. First of
|
||
all, packets are not accepted as input unless the source and
|
||
destination hosts and indices correspond to a known, existent
|
||
connection. By itself, this should be adequate, provided that
|
||
retransmission is only done by the originating host, not by intervening
|
||
gateways and bridges in the network. This is because we can safely
|
||
assume that when a host agrees to open a connection with a certain
|
||
index number at its end, it will give up on any previous connection
|
||
with the same index, therefore it won't retransmit any old packets
|
||
with that index once it has sent out a new RFC or OPN. The indications
|
||
are that our network will be "local" enough that indeed retransmission
|
||
will only be done by the original host.
|
||
|
||
Problems could still occur if packets get out of order, so that an OPN
|
||
establishing a new connection gets ahead of a data packet for an old
|
||
connection with the same index. To protect against this, it is
|
||
necessary to assure that at least a few seconds elapse before an index
|
||
number is reused. This could be done either by remembering when an
|
||
index is last used, or by reserving part of the 16-bit index number as
|
||
a uniquization field, which is incremented each time an
|
||
otherwise-the-same index is reused. This field needs to big enough to
|
||
cover for the maximum delay of an old data packet with the same index,
|
||
and depends on the rate of creation of connections. Which method is
|
||
chosen is at the discretion of each local NCP. Another necessary
|
||
assumption is that when a system crashes and is reloaded (thus
|
||
forgetting any remembered information about which indices were in use
|
||
when and so forth) that the time to reload it is more than a few
|
||
seconds.
|
||
|
||
Problems could occur not only with left over data packets, but also
|
||
with left over control packets. This isn't too much of a problem since
|
||
control packets are not retransmitted, but it could still happen that a
|
||
host gets faked out into thinking that it has a connection to another
|
||
host that the other host doesn't know about. In this case, it should
|
||
just look like the connection was opened and then either the other host
|
||
went down or the connection was broken by a LOS packet, since the other
|
||
host won't generate any data packets and won't accept any.
|
||
|
||
>>Media handlers
|
||
|
||
A host may be connected to more than one transmission medium. It has
|
||
service programs for each.
|
||
|
||
When a packet is received that is not addressed to this host, the
|
||
forwarding count should be incremented. If it doesn't overflow, the
|
||
packet should be sent back out according to the routing tables,
|
||
otherwise it should be discarded. Normally it would not be useful to
|
||
send a packet back out on the same subnet it came in on, but we may as
|
||
well let the forwarding count catch this along with other loops.
|
||
|
||
When a packet is received, if the opcode is RFC, it is handled
|
||
specially. The contact name is compared against those of all the
|
||
indices which are in the Listening state. If a match is found, that
|
||
index is put into the RFC-received state, its LSN packet is discarded,
|
||
and the RFC packet is put into its input list so that the server
|
||
process can see it. If no server is listening to that contact name,
|
||
the RFC packet is placed on the pending-RFC list, and (in the case of
|
||
ITS) a server process is created which will load itself with a suitable
|
||
program to open an index in "server" mode, gobble an RFC packet, look
|
||
at the contact name, and either reload itself with the appropriate
|
||
server program or send a CLS reply.
|
||
|
||
When a non-RFC packet is received, the system must look for a receiver
|
||
index to handle it. If none is found, or the state is wrong, or the
|
||
source host and index don't match, a LOS should be sent unless the
|
||
received packet was a LOS. Otherwise, if the received packet is WIN,
|
||
WTS, or NOP, it is processed and discarded. Other packets are given to
|
||
the user; OPN, CLS, and LOS cause a state change but are also given to
|
||
the user as input.
|
||
|
||
The transmitting side of a transmission medium handler has a queue of
|
||
packets to be transmitted. It should send them out, in order, as fast
|
||
as possible, except that if a receiving host has no buffer space (which
|
||
can be detected because its chaosnet interface will cause
|
||
"interference" on the ether), it should look down the list for another
|
||
host to send to. [No known NCP bothers to look for another
|
||
host to send to. 10/2/78] As long as packets to the same host are sent in the
|
||
order they are queued, everything will be all right. (Actually, this
|
||
normally shouldn't matter much.) In addition, when the packets are put
|
||
into the transmit queue, the destination host number has to be looked
|
||
up in a table to determine which transmission medium to use to get to
|
||
it and (in the case of ether) which physical host number to put in the
|
||
packet trailer for the hardware.
|
||
|
||
>>Buffers
|
||
|
||
In ITS, the buffering scheme will be as follows. There will be a pool of
|
||
128-word packet buffers available. When it runs out, more can be made. When
|
||
there are many free some can be flushed. 128-word buffers are made out of
|
||
1024-word pages (adding a new page type), rather than using the existing
|
||
128-word buffer mechanism, because there is a limited number of 128-word
|
||
buffers, and that mechanism is a little painful to use. There are likely
|
||
to be several hundred (?) packet buffers (say 12K of core) in use when
|
||
high-speed (e.g. mag-tape speed) file transfer is going on.
|
||
|
||
Each packet buffer has a two-word header, and 126 words which can hold a
|
||
packet. Packet buffers can be on one (or sometimes two) of six lists:
|
||
The free list. The receive list, of which there are two for each
|
||
index, one for packets which the user may safely read and one for
|
||
out-of-order packets which are awaiting the arrival of an earlier
|
||
packet before the user may read them. The send list, of which there is
|
||
one for each index. The transmission list. The pending-RFC list. The
|
||
pending-LSN list.
|
||
|
||
The free list contains packet buffers which are free. They are threaded
|
||
together through addresses in the first word. Zero ends the list.
|
||
|
||
The transmission list contains packets which are to be transmitted out
|
||
on the network "immediately". At interrupt level packets are pulled
|
||
off of this list and sent. (There might be more than one transmission
|
||
list if a machine is connected to more than one physical medium.)
|
||
The transmission list is threaded through addresses in the left half
|
||
of the first word. Zero ends the list. After transmission -1 is stored
|
||
to indicate that the packet is not on the transmission list any more.
|
||
If the right half of the first word is -1, indicating that the packet
|
||
is not also on a send list, it is returned to free.
|
||
|
||
Each send list contains packets for a particular connection which have
|
||
been entrusted to the system by the user to be sent, but have not yet
|
||
been acknowledged. They are threaded together through the right half
|
||
of the first word. The second word contains the time that the packet
|
||
was last transmitted (actually, the time that it was last put on the
|
||
transmission list.)
|
||
|
||
Each receive list contains packets which have been received on a particular
|
||
connection and not yet read by the user. They are threaded together
|
||
by addresses in the first word, and the list ends with zero. The receive
|
||
lists must be kept sorted by packet number.
|
||
|
||
The pending-RFC list contains request-for-connection packets which have
|
||
not yet been either accepted or rejected. They are threaded together
|
||
through the first word. When a server process finishes getting created
|
||
and loaded, it will take an RFC off the pending-RFC list and put it
|
||
on its own receive list. The second word of these packets contains
|
||
the time received so that the system can know when something has gone
|
||
wrong and they should be thrown away.
|
||
|
||
The pending-LSN list contains LSN packets for all the listening users.
|
||
These packets are just used as a handy place to save the contact name
|
||
being listened to. They are threaded together through the first word.
|
||
The source-index field in the packet header can, of course, be used
|
||
to find which user this packet belongs to.
|
||
|
||
>>ITS System Calls
|
||
|
||
(Other systems would have similar calls, with appropriate
|
||
changes for their own ways of doing things.)
|
||
|
||
OPEN
|
||
|
||
Not allowed. (I said this wasn't a "standard" device!)
|
||
Instead use:
|
||
|
||
CHAOSO
|
||
|
||
arg 1 - receive channel number
|
||
arg 2 - transmit channel number
|
||
arg 3 - receive window size
|
||
|
||
First, the two specified channels are closed. Then an index
|
||
is assigned to the user and the two channels are set up to
|
||
point to it. Two channels are used since in general ITS
|
||
channels are unidirectional, and to allow to the user to
|
||
handle receive and transmit interrupts differently.
|
||
|
||
The created index is placed in the Closed state. To set up
|
||
a connection, IOT an RFC or LSN packet down the transmit
|
||
channel.
|
||
|
||
|
||
PKTIOT
|
||
arg 1 - channel number
|
||
arg 2 - address of a 126.-word block.
|
||
|
||
Always transfers exactly one packet.
|
||
The format of the 126.-word block is:
|
||
16 16 4
|
||
-----------------------------------------
|
||
| opcode | unused | fc | nbytes | 0 |
|
||
-----------------------------------------
|
||
|destination host |destination index| 0 |
|
||
-----------------------------------------
|
||
| source host | source index | 0 |
|
||
-----------------------------------------
|
||
| packet # | ack packet # | 0 |
|
||
-----------------------------------------
|
||
| data1 | data2 ...
|
||
|
||
... data487 |
|
||
-----------------------------------------
|
||
|
||
In the descriptions below, if an error is said to
|
||
occur that means IOC error 10. (channel in illegal mode) [3?]
|
||
is signalled. (Probably change this to an error return?) *******
|
||
|
||
In the case of an output PKTIOT, the user sets only
|
||
the opcode, nbytes, and data-n fields. When the
|
||
NCP copies the packet into a buffer in the system
|
||
it sets the other fields of the header to the
|
||
appropriate values.
|
||
|
||
This is not completely true. When outputting an RFC,
|
||
the user sets the destination host field, and sets the
|
||
ack packet # to the receive window size desired. The user
|
||
also sets the window size when outputting an OPN.
|
||
|
||
The NCP checks for the following special values
|
||
in the opcode field of a packet output by the user:
|
||
|
||
RFC - error if the index is not in the Closed state.
|
||
The packet is transmitted (but not queued for
|
||
possible retransmission) and the index enters
|
||
the RFC-sent state. The user should do an input
|
||
PKTIOT which will wait for the OPN, CLS, FWD, or ANS reply
|
||
packet to arrive. The NCP also copies and saves
|
||
the user-specified host number and window size.
|
||
|
||
LSN - error if the index is not in the Closed state.
|
||
It is put into the Listen state. The packet
|
||
is not transmitted, but it is saved so that
|
||
when an RFC comes in the system can compare
|
||
the contact names. (Note- LSN is a special
|
||
opcode which is never actually transmitted
|
||
through the net.) The pending-RFC list is searched
|
||
to see if an RFC with the same contact name has
|
||
been received. If so, it is given to this index
|
||
as if it was received just after the LSN was
|
||
sent out.
|
||
|
||
OPN - error if the connection is not in the RFC-received
|
||
state. It is put into the Open state. The
|
||
packet is transmitted (but not queued for
|
||
retransmission, since until it is received
|
||
the other end does not know what index to
|
||
send acknowledgements to.) The system also
|
||
copies and remembers the window size.
|
||
|
||
CLS - error if the connection is not in the Open
|
||
or the RFC-received state. It is put into
|
||
the Closed state and the packet is transmitted
|
||
(but not queued for retransmission). This packet
|
||
may optionally contain data bytes which are
|
||
an ascii excuse for the close.
|
||
|
||
FWD - error if the connection is not in the RFC-received
|
||
state. The packet is transmitted, but not queued
|
||
for retransmission, and the connection is put into
|
||
the Closed state.
|
||
|
||
ANS - error if the connection is not in the RFC-received
|
||
state. The packet is transmitted, but not queued
|
||
for retransmission, and the connection is put into
|
||
the Closed state.
|
||
|
||
200 or higher - This is a data packet. Error if the
|
||
connection is not in the Open state. A packet#
|
||
is assigned, the destination and source fields
|
||
are filled in, and the packet is transmitted and
|
||
queued for retransmission.
|
||
|
||
Any other opcode causes an error.
|
||
|
||
In the case of an input PKTIOT, the user will get an error
|
||
if the connection is in the Closed or Broken state,
|
||
except if it is in the Closed state and there are data
|
||
packets queued. This is so that the user can read the
|
||
CLS packet. Otherwise, it will hang until a packet
|
||
arrives, then return the packet into the user's
|
||
126.-word block.
|
||
|
||
The user should check the sign bit of the first word,
|
||
which will be set if this is a data packet. The
|
||
non-data packets which can get given to the user are
|
||
RFC, OPN, FWD, ANS, LOS, and CLS. (You shouldn't be
|
||
surprised if you get something else, though!)
|
||
|
||
|
||
IOT, SIOT
|
||
These can be used to do unit-mode 8-bit-byte transfers.
|
||
Control bit 1.4 means don't-hang, and applies to both input
|
||
and output. Only data packets with opcode 200 will be
|
||
transferred. Anything else on input causes the transfer
|
||
to stop, like an end-of-file. Use PKTIOT to find out what
|
||
the story is. (The correct way is to verify that there are
|
||
some packets in the input buffer, then do a SIOT, and if it
|
||
transfers 0 bytes then the first packet in the input buffer
|
||
must not be a data packet, so PKTIOT it in.)
|
||
|
||
There can be input available to SIOT even when the state is
|
||
not %CSOPN (e.g. if the input buffer contains data and
|
||
a CLS packet.) In this case, you should first SIOT (if you
|
||
care to pick up the data) then PKTIOT.
|
||
|
||
CLOSE
|
||
|
||
Immediately closes the connection. All buffers and other
|
||
information associated with the index are discarded. Normally
|
||
the user should first IOT a CLS
|
||
packet containing an ascii explanation for why it is
|
||
closing. Note that any data previously written on the
|
||
connection but not yet received by the other end will be
|
||
lost. User programs should exchange "bye" commands of some
|
||
sort before closing if they care about losing data. It is
|
||
done this way to keep the NCP simple.
|
||
|
||
|
||
RESET
|
||
|
||
Does nothing.
|
||
|
||
|
||
FORCE
|
||
|
||
If there is a partially-filled output packet (created by IOT
|
||
or SIOT), it is transmitted.
|
||
|
||
FLUSH
|
||
|
||
On an output channel, does FORCE and then waits until
|
||
there are no queued output buffers. I.e., waits for
|
||
all output to be received and acknowledged by the foreign
|
||
host. This in fact waits for acknowledge, not just receipt.
|
||
|
||
|
||
RCHST
|
||
|
||
val 1 SIXBIT/CHAOS/
|
||
val 2 0
|
||
val 3 0
|
||
val 4 0
|
||
val 5 -1
|
||
|
||
|
||
RFNAME
|
||
|
||
val 1 SIXBIT/CHAOS/
|
||
val 2 0
|
||
val 3 0
|
||
val 4 0
|
||
val 5 0 or 1 (i.e. .UAI or .UAO)
|
||
|
||
|
||
WHYINT
|
||
|
||
val 1 - %WYCHA
|
||
val 2 - state
|
||
val 3 - number of packets queued (receive,,transmit)
|
||
val 4 - window size (receive,,transmit)
|
||
val 5 - input channel#,,output channel# (-1 if not open or I/O-pushed)
|
||
|
||
LH(val 3) is the number of packets available to input IOT.
|
||
This is different from the number of received packets
|
||
if some are out of order. This is increased by 1 if
|
||
there is a partially-read buffer available to SIOT;
|
||
this packet is not available to PKTIOT. This is zero
|
||
if the connection is direct-connected to a STY.
|
||
|
||
RH(val 3) is the number of packets which have been transmitted
|
||
by output IOT but which have not yet been received and
|
||
acknowledged by the foreign host.
|
||
|
||
The state codes are:
|
||
|
||
%CSCLS Closed
|
||
%CSLSN Listen
|
||
%CSRFC RFC-received
|
||
%CSRFS RFC-sent
|
||
%CSOPN Open
|
||
%CSLOS Broken by receipt of "LOS" packet.
|
||
%CSINC Broken by incomplete transmission (no acknowledge
|
||
for a long time)
|
||
|
||
|
||
NETBLK
|
||
|
||
Similar to Arpanet NETBLK.
|
||
|
||
|
||
STYNET
|
||
|
||
arg 1 - STY channel
|
||
arg 2 - Chaos channel to connect to, or
|
||
-1 to disconnect
|
||
arg 3 - Other Chaos channel (not actually used)
|
||
arg 4 - Output-reset character sequence, up to 3 8-bit
|
||
characters left-justified.
|
||
|
||
This works the same as on the Arpanet. The specified STY
|
||
is connected to or disconnected from a Chaos net channel.
|
||
Data is transferred in and out by the system without the
|
||
need for intervention by the user program. If an unusual
|
||
condition occurs, the STY is disconnected from the Chaos
|
||
channel and the user is interrupted. It is illegal to do
|
||
I/O on any of the involved channels while they are connected.
|
||
|
||
This call is provided for the benefit of the "Telnet" server.
|
||
|
||
|
||
CHAOSQ
|
||
|
||
arg 1 - address of a 126.-word block (packet buffer)
|
||
|
||
This is a special system call for use by the ATSIGN CHAOS
|
||
program, which is a daemon program that gets run when
|
||
an RFC is received that does not match up against an
|
||
existing LSN.
|
||
|
||
The first packet on the pending-RFC queue is copied
|
||
into the packet buffer, then moved to the end of the
|
||
queue (so that the right thing happens when several
|
||
RFC's are pending at the same time.)
|
||
|
||
The call fails if the pending-RFC queue is empty.
|
||
|
||
The program should use the contact name in this
|
||
packet to choose a server program to execute. This
|
||
server program will then LSN to (presumably) the same
|
||
contact name, thus picking up the RFC.
|
||
|
||
Interrupts
|
||
|
||
IOC error interrupts occur if an attempt is made to IOT
|
||
when the connection is in an improper state, or to IOT
|
||
a packet with an illegal opcode.
|
||
|
||
An I/O channel interrupt is signalled on the input channel
|
||
when the number of queued buffers changes from zero to
|
||
non-zero.
|
||
|
||
An I/O channel interrupt is signalled on the output channel
|
||
when the number of queued buffers changes from greater or
|
||
equal to the window size, to less than the window size.
|
||
|
||
An I/O channel interrupt is signalled on the input channel
|
||
when the connection state changes.
|
||
|
||
Interrupts can be used for
|
||
|
||
(1) detecting when input arrives.
|
||
(2) detecting when the system is willing to accept
|
||
output.
|
||
(3) detecting when the other end does a CLOSE.
|
||
(4) detecting when a requested connection
|
||
is accepted or rejected.
|
||
(5) detecting when a request for connection
|
||
comes into a listening server.
|
||
|
||
>ITS packages
|
||
|
||
Here document CHSDEF and NNETWK someday. Also the host table.
|
||
|
||
>An NCP in English
|
||
|
||
This section contains the salient routines and variables of the ITS
|
||
Chaos Network Control Program, in English.
|
||
|
||
The following variables exist per-connection:
|
||
|
||
CHSUSR the user who owns the connection, and his channel number for
|
||
it. This is used for reporting interrupts, and to keep track
|
||
of who owns what.
|
||
|
||
CHSSTA the state. The following states exist:
|
||
Closed - this is the initial state
|
||
Listening - awaiting an RFC that matches the user's LSN.
|
||
RFC-received - entered from the Listening state when a matching
|
||
RFC arrives.
|
||
RFC-sent - entered from the closed state when an RFC is transmitted.
|
||
Open - entered from RFC-sent when an OPN is received, from RFC-received
|
||
when an OPN is sent. This is the "active" state.
|
||
Lost - entered when a LOS is received
|
||
Incomplete transmission - entered when probing produces no response.
|
||
|
||
Some flags also exist:
|
||
Send STS as soon as possible
|
||
Connection is direct-connected to a STY.
|
||
|
||
CHSIBF List of received packets, in order, which the user may read.
|
||
|
||
CHSPBF List of out-of-order received packets. The user may not read these.
|
||
When the missing packets arrive, these are transferred to CHSIBF.
|
||
|
||
CHSOBF List of transmitted packets which have not yet been receipted. This
|
||
list retains the packets for retransmission. Control packets do
|
||
not go on this list, only data packets. RFC's and OPN's DO go on this list.
|
||
|
||
CHSNOS Number of output slots. This is the number of packets which the user
|
||
may output before the window fills and the user must wait. It is equal
|
||
to the window size minus the number of unacknowledged packets.
|
||
|
||
CHSPKN The number of the last packet given to the user, and the number of
|
||
the last packet sent by the user. Used in assigning packet numbers,
|
||
checking order, and sending acknowledgements.
|
||
|
||
CHSACK The numbers of the last packets acknowledged in each direction
|
||
of the connection.
|
||
|
||
CHSNBF The number of packets in CHSIBF and the number of packets in CHSPBF.
|
||
Redundant information which is handy now and then.
|
||
|
||
CHSITM The time (in 30ths of a second) that a packet was last received
|
||
from the network for this connection. Used in probing/incomplete-transmission.
|
||
|
||
CHSWIN The two window sizes for the two directions of the connection.
|
||
|
||
CHSLCL The local host and index numbers.
|
||
|
||
CHSFRN The foreign host and index numbers.
|
||
|
||
---
|
||
When the user reads a packet, this algorithm is executed:
|
||
If the CHSIBF list is empty, then if the connection state is
|
||
not Open, give an error, otherwise await the arrival of a packet
|
||
into CHSIBF. Remove the first packet in CHSIBF and give it to
|
||
the user. If it is a data packet (opcode >= 200), "virtually
|
||
acknowledge" it as follows. Put its packet number into CHSPKN
|
||
as the last packet given to the user. Get the last packet number
|
||
acknowledged from CHSACK. If the difference of these is more than
|
||
1/3 the window size, send a STS.
|
||
|
||
When the user outputs a packet, this algorithm is executed:
|
||
(what algorithm?)
|
||
|
||
---------- Material after this point may be inoperative ----------
|
||
|
||
>Comparison with LCSnet
|
||
, and other blathering.
|
||
|
||
>>Principle differences
|
||
|
||
The LCSnet proposed protocol is called DSP. The Chaosnet protocol will
|
||
just be called chaos in this section.
|
||
|
||
(1) DSP specifies things in terms of bytes where Chaosnet specifies
|
||
them in terms of packets. We choose packets to increase the simplicity
|
||
and efficiency of the scheme. DSP has to work in terms of bytes because
|
||
it allows packets to be reformatted en route, hence
|
||
|
||
(2) DSP assumes that gateways can exist between networks with the same
|
||
protocols but different packet sizes. Therefore, the protocol has to
|
||
allow for the fact that packets may be reformatted en route. I happen
|
||
to believe that this situation is extremely unlikely to exist, and in
|
||
fact gateways between "different" networks will have to do much more
|
||
than just change the packet size. Therefore, it makes sense to make
|
||
the gateway worry about gateway issues, rather than have them permeate
|
||
the whole protocol. I believe that gateways will look more like
|
||
regular network ports than like transmission media; to get from a host
|
||
on the local net to a host on the arpa net, one will connect to the
|
||
arpa net gateway and ask it to open a connection from itself to the
|
||
host on the arpa net, then tie those connections together. A gateway
|
||
will perform not only packet reformatting, but protocol translation,
|
||
flow control on both sides, and maybe even character set translation.
|
||
There can also be entities called "bridges", which connect two networks
|
||
(or two separate segments of one network) with the same protocol. A bridge
|
||
simply forwards any packets it receives, but never alters the packets,
|
||
and never looks inside them except to find out where to forward them to.
|
||
|
||
(3) A related difference is that DSP includes the arpa net, and TCP,
|
||
and by extension all the networks in the universe, in its port number
|
||
address space. Chaosnet would have you connect to a gateway, then
|
||
send the gateway the port number to connect to in the foreign
|
||
address space separately. Chaosnet does have to include all networks
|
||
reachable by bridges in its address space.
|
||
|
||
(4) Chaosnet has an "opcode" field in the packet header, where DSP
|
||
does not. DSP acheives the same effect with various bits here and
|
||
there. It makes little difference unless user-level programs decide
|
||
to exploit the opcode field.
|
||
|
||
(5) DSP and Chaosnet have quite different mechanisms for creating
|
||
connections. In DSP, one never creates a connection, exactly;
|
||
one simply starts sending to a port address. Local network note
|
||
#3 mumbles about how one discovers which port address to send to,
|
||
but I have not seen any specifics. In Chaosnet, the mechanism
|
||
for finding out where to send to and the mechanism for creating
|
||
a connection are intertwined; the excuse is that often the process
|
||
being connected to is created at the same time as the connection.
|
||
|
||
(6) DSP uses unique, never-reused port IDs. Chaosnet does not.
|
||
The problem with unique, never-reused IDs is that I know of no
|
||
system that can implement them. Multics comes close, with the
|
||
aid of a special hardware clock. The clock is set from the
|
||
operator's watch when the system is powered on, and the mechanism
|
||
depends on the fact that the error in the operator's watch is
|
||
less then the time required to bring up the system after a power
|
||
failure. Small systems that cannot afford special hardware just
|
||
for this, and don't have permanent disk storage, would find it
|
||
very hard to generate unique IDs.
|
||
|
||
Chaosnet prefers to rely on a scheme that doesn't require special
|
||
hardware, but nearly always works. By requiring a connection
|
||
to be opened before data packets can be sent through it, and by
|
||
some assumptions about the structure of the network, the problem
|
||
is eliminated. See the Flow and Error Control section for
|
||
further discussion.
|
||
|
||
(7) DSP closes the two directions of a connection separately. Why?
|
||
|
||
>>High priority data packets, interrupts, and flushing.
|
||
|
||
The basic idea is to note that if you want to send a high priority
|
||
message, this means you want it out of order with respect to previously-
|
||
sent data on some connection. Therefore, high priority data should
|
||
be sent over an auxiliary connection. The per-connection overhead
|
||
is not prohibitively high, and this eliminates vast quantities of
|
||
hair from the innermost portion of the system.
|
||
|
||
One advantage that DSP gains by having "high priority messages"
|
||
built into the system is that it also incorporates a standardized
|
||
way to "mark" a particular point in a data stream. However, this
|
||
is comparatively unimportant, particularly since I think high-priority
|
||
messages will probably never get used. The only place I've heard
|
||
them proposed to be used is with Telnet, but ordinary terminals
|
||
get along quite well without "out of band" signals when used with
|
||
reasonable operating system software.
|
||
|
||
Interrupts and flushing of input are random crocks associated
|
||
with high priority messages. I don't propose to implement them either.
|
||
|
||
>>Datagrams. (connections only used to pass a single packet.)
|
||
|
||
These are easy. The guy who wishes to send a datagram does
|
||
OPEN, IOTs an RFC to the service to which the gram is to be
|
||
sent, and NETBLKs waiting for the connection to open up. He
|
||
then IOTs the data packet, FLUSHes waiting for it to get there,
|
||
then CLOSEs.
|
||
|
||
The server OPENs and IOTs an OPN in response to the RFC. She
|
||
then IOTs in the datagram packet, CLOSEs, and goes off processing
|
||
the message.
|
||
|
||
Four packets are transmitted, two in each direction. (An RFC, an OPN,
|
||
a DATA, and an ACKing NOP.) No need to send any CLS messages, since
|
||
each user process knows to do a CLOSE system call after one data
|
||
packet has been transmitted. It has been claimed that this is
|
||
the theoretical minimum if acknowledgement is required. The reason
|
||
is that the data packet must contain some unique id generated by
|
||
the RECEIVER to avoid duplicates, and it must be acknowledged,
|
||
so that's two packets in each direction, with no combining possible.
|
||
|
||
Note that as [someone at] PARC has pointed out, for the important
|
||
case of side-effect-free transactions, a timeout can substitute
|
||
for acknowledgement, and only two packets are necessary. See ANS.
|
||
|
||
|
||
>>Why not multiple messages per packet?
|
||
|
||
[1] Not needed for data. The division of the data stream into
|
||
packets is invisble to the real user, anyway. It's only used by
|
||
the "user-ring" portion of the network system software.
|
||
|
||
[2] Extra complexity. Consider the hair involved with packed
|
||
control messages in the Arpanet. Because of the control link being
|
||
shared between multiple connections between the same pair of hosts,
|
||
this could save a little. I don't know of any NCP that does this;
|
||
furthermore, having that shared facility is a bad idea. The only
|
||
case in the Arpanet where packing two control messages into one
|
||
packet is useful is when opening a connection the receiver wants
|
||
to send STR and ALL both. In this protocol we just put the window
|
||
size in as part of the RFC and OPN messages.
|
||
|
||
[3] There is an argument that having message boundaries separate
|
||
from packet boundaries is useful because gateways between different
|
||
networks may need to split up packets because the two networks
|
||
may have different maximum packet sizes. My feeling about this
|
||
is that the gateway is likely to have to do a good deal more than
|
||
that. It seems like too much to wish for that the two networks
|
||
should use exactly the same packet format, protocols, or even character
|
||
set; so the gateway rather than being a packet-reformatting device
|
||
is much more likely to look like a user program with two connections,
|
||
one on one network and one on the other, which passes data between
|
||
the connections with appropriate conversion. In particular, flow
|
||
control is likely to be host1 to gateway and host2 to gateway,
|
||
rather than host1 to host2.
|
||
|
||
|
||
>>Why not have a checksum in the packet?
|
||
|
||
This network is likely to have a very diverse collection of machines
|
||
on it, which means it will be impossible to define a checksum which
|
||
can be computed efficiently in software on all machines. Now all
|
||
hardware links in the system ought to have whatever amount of
|
||
hardware checking is appropriate to them, but due to the efficiency
|
||
costs of a user-level end to end checksum, this should not be
|
||
a built-in requirement of the basic low-level protocol. Instead,
|
||
checksumming should be an optional feature which some higher-level
|
||
protocols (those that need it because the data being passed through
|
||
them is so vitally important that every possible effort must be made
|
||
to ensure its correctness) may implement. Checksumming should
|
||
be implemented at the user level in exactly the same way and for exactly
|
||
the same reasons as encryption should be implemented at the user level.
|
||
|
||
|
||
>>How about host-independent user-level protocols, where one just
|
||
connects to a service and doesn't have to know what host it's at today?
|
||
|
||
Yeah, how about it? As far as I know, this protocol provides an
|
||
adequate base for constructing such a thing. Also I haven't
|
||
seen anything published on the subject.
|
||
|
||
|
||
>>Very small hosts.
|
||
|
||
E.g. we'd like to put the Chess machine on the net. It has very little
|
||
memory, but not totally impotent microcode. A small host need only
|
||
support one connection, may ignore WIN, LOS, and CLS, may only have one packet
|
||
in transmission at a time, and may process receive packets one at a time
|
||
(ignoring any that come in until the first one has been fully processed).
|
||
It IS necessary to check that received DATA packets come in in the right order.
|
||
|
||
RFC may be handled by remembering the other guy's host number and index,
|
||
and sending back a completely canned OPN. The contact name is ignored.
|
||
If a second user tries to connect while a first user is connected,
|
||
the first user gets bumped. Let them fight it out on some larger
|
||
machine (or the floor) for who will get to use the small machine.
|
||
Never originate any packet type other than DATA and that one OPN.
|
||
|
||
Attaching ordinary terminals "directly" to the network is obviously
|
||
undesirable.
|
||
|
||
>Transmission Media
|
||
|
||
This section describes how packets are encapsulated for transmission
|
||
through various media, and what auxiliary hair is needed by each
|
||
medium.
|
||
|
||
|
||
>>Ethernet
|
||
|
||
The messages transmitted through the ether (or CAIOS) net consist of
|
||
a packet followed by a three-word trailer:
|
||
|
||
+----------------+
|
||
| header | 8 words
|
||
+----------------+
|
||
| data | 0-52 words
|
||
+----------------+
|
||
| immediate dest | 1 word
|
||
+----------------+
|
||
| immediate src | 1 word
|
||
+----------------+
|
||
| CRC check word | 1 word
|
||
+----------------+
|
||
|
||
The three trailer words are looked at by the hardware; the last two
|
||
of them are supplied by the hardware. The reason this stuff is in
|
||
a trailer rather than a leader is that the caiosnet hardware actually
|
||
transmits the packet backwards. However, this is transparent to
|
||
the software.
|
||
|
||
Bytes are sent two per word. The low-order byte is first (pdp11 standard).
|
||
|
||
|
||
>>TEN11 Interface
|
||
|
||
[The following total hair has not been checked recently.]
|
||
|
||
Since interrupts can't be sent through the TEN11 interface, the pdp10 can
|
||
only service the network at relatively-infrequent intervals, for instance
|
||
every 1/60'th of a second. Therefore it is necessary to have queues of
|
||
packet buffers in each direction. This provides high speed by allowing
|
||
several packets to be processed at a time.
|
||
|
||
The speed and reliability of the TEN11 interface eliminates any need for
|
||
error checking. (ha ha) [ho ho] <he he> To decrease the load on the AI
|
||
pdp10, it is assumed that the pdp11's will be responsible for swapping the
|
||
bytes in the data portions of packets so that they will be in pdp10
|
||
standard order.
|
||
|
||
Even though the contents of packets will not be error-checked, the
|
||
pdp10 must check addresses to avoid being screwed by a losing pdp11.
|
||
|
||
The form of a packet encapsulated for the TEN11 interface will then be
|
||
|
||
|-----------------|-----------------|----|
|
||
| queue thread | 0=empty, 1=full | 0 |
|
||
|-----------------|-----------------|----|
|
||
| #bytes | opcode | unused | 0 |
|
||
|-----------------|-----------------|----|
|
||
|destination host |destination index| 0 |
|
||
|-----------------|-----------------|----|
|
||
| source host | source index | 0 |
|
||
|-----------------|-----------------|----|
|
||
| packet # | ack packet # | 0 |
|
||
|-----------------|-----------------|----|
|
||
| data 0 | data 1 | data 2 . . . | 0 |
|
||
| | 0 |
|
||
|-----------------|-----------------|----|
|
||
|
||
for a total of 31 36-bit words, or 62 pdp11 words.
|
||
|
||
The queue thread is the pdp11 address of the next packet-buffer in
|
||
a queue, or zero if this is the last. The empty/full indicator
|
||
says whether this buffer currently contains a packet, or not.
|
||
|
||
The following is an attempt to express the algorithms of the
|
||
pdp10 and pdp11 in concise form. Hopefully they are self-
|
||
explanatory.
|
||
|
||
Several queues of buffers need to exist in the pdp11. Only
|
||
two of these are known to the pdp10.
|
||
|
||
TO10QF - first buffer queued for transmission to the 10.
|
||
|
||
TO10QL - last buffer queued for transmission to the 10.
|
||
Exists so that buffers can be appended to the
|
||
list more quickly.
|
||
|
||
TO10AC - first buffer in list of buffers actively being
|
||
gobbled by the 10. Set by 11, cleared by 10.
|
||
|
||
TO10FR - copy of TO10AC. Used to free the buffers after
|
||
the 10 gobbles them.
|
||
|
||
(come-from pdp11-packet-receivers
|
||
when (eq (destination-host ?packet) pdp10)
|
||
;re-arrange 8-bit bytes for 10
|
||
(swap-bytes (data-part packet))
|
||
;Add to to-10 queue
|
||
(set (thread packet) nil) ;nil=0
|
||
(cond ((null TO10QF)
|
||
(setq TO10QF packet TO10QL packet))
|
||
(t (set (thread TO10QL) packet)
|
||
(setq TO10QL packet)))
|
||
;Try to activate to-10 queue
|
||
(cond ((null TO10FR)
|
||
(setq TO10FR TO10QF
|
||
TO10AC TO10QF
|
||
TO10QF nil
|
||
TO10QL nil)))
|
||
)
|
||
|
||
(come-from pdp11-polling-loop
|
||
when (and (not (null TO10FR)) ;buffers were sent to 10
|
||
(null TO10AC)) ;and 10 has finished gobbling
|
||
(mapcar 'make-buffer-free TO10FR) ;mapcar follows queue words
|
||
(setq TO10FR nil) ;OK to activate more buffers now
|
||
(cond ((not (null TO10QF)) ; more stuff waiting, activate it now
|
||
(setq TO10FR TO10QF
|
||
TO10AC TO10QF
|
||
TO10QF nil
|
||
TO10QL nil)))
|
||
)
|
||
|
||
(come-from pdp10-clock-level
|
||
when (and (not (null TO10AC)) ;11 is sending buffers
|
||
(not web-buffers-locked))
|
||
;copy to user, process control message, or reject if buffers full
|
||
(mapcar 'process-input
|
||
TO10AC)
|
||
;signal pdp11 that all packets have been gobbled
|
||
(setq TO10AC nil))
|
||
|
||
|
||
FRBFLS - list of free buffers. cons-buffer gets from here,
|
||
make-buffer-free puts here.
|
||
|
||
FM10AC - list of buffers into which pdp10 may place packets
|
||
set by 11 / cleared by 10.
|
||
|
||
FM10GB - copy of FM10AC, used by 11 to process buffers after
|
||
10 has placed packets into them.
|
||
|
||
(come-from pdp11-polling-loop
|
||
when (and (null FM10GB) ;10 needs some buffers &
|
||
(not (null FRBFLS))) ; free buffers available
|
||
;give the 10 a list of a suitable number of empty buffers
|
||
(repeat max-at-a-whack times
|
||
(and (null FRBFLS) (exitloop))
|
||
(setq buffer (cons-buffer)) ;pull off free list
|
||
(set (full-indicator buffer) nil) ;contains no packet
|
||
(set (thread buffer) FM10GB) ;cons onto list
|
||
(setq FM10GB buffer))
|
||
(setq FM10AC FM10GB) ;give buffer list to 10
|
||
)
|
||
|
||
(come-from pdp11-polling-loop
|
||
when (and (not (null FM10GB)) ;gave 10 some buffers
|
||
(null FM10AC)) ;which it has used
|
||
;process packets sent from 10.
|
||
(mapcar
|
||
'(lambda (buffer)
|
||
(cond ((null (full-indicator buffer))
|
||
(make-buffer-free buffer)) ;didn't get used
|
||
(t (swap-bytes buffer)
|
||
(send-to-destination buffer))))
|
||
FM10GB)
|
||
(setq FM10GB nil)) ;no buffers active in 10 now
|
||
|
||
(come-from pdp10-clock-level
|
||
when (and (not (null FM10AC)) ;buffer space avail
|
||
(not web-buffers-locked)) ;no M.P. interference
|
||
;send as many packets as possible
|
||
(mapcar
|
||
'(lambda (buffer)
|
||
(cond ((needs-to-be-sent ?packet) ;find a sendable packet somewhere
|
||
(copy-into buffer packet)
|
||
(set (full-indicator buffer) t))))
|
||
FM10AC)
|
||
;signal pdp11 to gobble the packets
|
||
(setq FM10AC nil))
|
||
|
||
|
||
To avoid the need for a gross number of exec pages in the pdp10,
|
||
the FM10AC and TO10AC words, and all the packet buffers, should
|
||
lie in a single 4K page. The best location for this page varies
|
||
from machine to machine. On dedicated 11's such as the AI TV11,
|
||
the MC console 11, etc. it should probably just be the first 4K
|
||
of memory. On the logo machine, it would probably be best to put
|
||
this page up in high memory where RUG won't mess with it. In the
|
||
case of the mini-robot system, I'm not sure.
|
||
|
||
It would be best not to try to use this protocol with "general
|
||
purpose" machines, because of problems with finding the list
|
||
headers and packet buffers, problems with telling whether the
|
||
machine is running the right program, etc. It should be used
|
||
just as a way for the AI machine to get a path to the net.
|
||
|
||
>>DL10 & DTE20
|
||
|
||
[Outline only]
|
||
|
||
[Just use the pdp11 as a substitute for a direct chaosnet interface.]
|
||
|
||
[Basically, the 11 says ready to transfer (in either direction), the 10
|
||
sets up the pointers and says to transfer, and the 11 transfers the cruft.
|
||
To eliminate an extra level of buffering, on input transfers the 11 makes a
|
||
copy of the first 4 16-bit words of the header available to the 10 when it first
|
||
says "ready to transfer." The 10 uses these to decide where to copy the
|
||
packet into. It helps if you don't try to use a DL10 on a machine with a
|
||
cache.]
|
||
|
||
>>Asynchronous line
|
||
|
||
Packets are encapsulated by preceding them with start of text (202),
|
||
and following them with a 1-byte additive checksum and an end of text (203).
|
||
The 16-bit words are transmitted low order byte first. If the checksum
|
||
is wrong the receiver ignores the packet. The start and end characters are
|
||
just there to help in ignoring random noise on the line. If they don't
|
||
appear, the packet is ignored. The full 60-word packet is not transmitted;
|
||
the #bytes count is used to determine how many of the data bytes to
|
||
transmit; the receiver fills the remainder with zero (or garbage, at its
|
||
convenience.)
|
||
|
||
This protocol is intended mainly for communication between the plasma
|
||
physics pdp11 in bldg. 38 and a pdp11 in 545, until the caiosnet
|
||
gets extended that far (or a longer-distance, lower-speed caiosnet
|
||
is extended to various machines off in that direction.)
|