github.com/pkimpel.retro-220

Fork 0

mirror of https://github.com/pkimpel/retro-220.git synced 2026-03-01 17:36:17 +00:00

Table of Contents

Using the BAC-Assembler
Introducing the BAC-Assembler

Background
User Interface

Assembler Notation

BAC-Assembler Source Files
Labels and Symbols
Operands

Address Primaries
Address Expressions

Machine Instructions
Pseudo-Instructions

Simple Pseudo-Instructions
CNST Pseudo-Instruction
FBGR Pseudo-Instruction

Pre-Loading the Literal Pool

Using the BAC-Assembler

The Burroughs Algebraic Compiler for the 220 (BAC-220 or BALGOL) was composed of four main parts:

The Generator program, used to customize the compiler for specific environments.
The Compiler Main module, which did a one-pass compilation of the BALGOL source program.
The Compiler Overlay module, which linked to the compiled BALGOL code any library routines and any machine-language routines that were included with the input to the compiler. This module also generated the necessary coding to support overlays and the symbolic dump feature, if used.
The library routines (SIN, COS, SQRT, READ, WRITE, etc.)

Two assemblers were used to prepare object code for the compiler, one for the Generator program, and one for everything else. We have no documentation or software in any form for either of these assemblers. Thus, in order to have a means to create object code from the compiler source listings donated to the Computer History Museum by Professor Donald Knuth, both assemblers had to be reverse-engineered from the listings.

This wiki page describes what we term the BAC-Assembler, used for all BALGOL modules except the Generator program.

Introducing the BAC-Assembler

BAC-Assembler is a cross-assembler. It is written in Javascript and runs in a standard web browser. You can load the assembler from this project's hosting site:

http://www.phkimpel.us/Burroughs-220/software/tools/BAC-Assembler.html

You can also run the assembler from another web server where you have set up the emulator files. The assembler consists of one HTML file, with the Javascript code and CSS style sheet embedded within it. It also depends upon the webUI/B220FramePaper.html and webUI/resources/ajax-spinner.gif files for the retro-220 emulator.

Background

The original assembler for the compiler Main, Overlay and library modules appears to have been an internal tool developed and used within the Burroughs ElectroData division. The assembler used a typical input record format with fixed label and op-code fields, and a comma-delimited operand field. The STAR I and STAR II assemblers that Burroughs made available to its 220 customers, on the other hand, used a strictly columnar record format.

Tom Sawyer uncovered a memo at the Charles Babbage Institute, dated 15 March 1960, from D. L. Stevens to L. P. Robinson, with subject "Erdwinn's ALGOL Compiler." In it Stevens reports on a day-long meeting and demonstration of the BALGOL compiler, which was still in development at the time. Based on statements in this memo, it is likely that the assembler used for BALGOL was "ESAP #1" (Engineering Symbolic Assembly Program #1), originally developed to support the design automation programs for the 3500 project. The 3500 project eventually produced what became known as the B100/200/300-series systems.

The primary purpose of the BAC-Assembler has been to generate object code for the transcribed BALGOL listings. Since we have no information on the original assembler that ran on the 220, the syntax and semantics for BAC-Assembler were determined by inspecting and reverse-engineering the listings of the BALGOL compiler. While it is a fully-functional assembler and can be used for general 220 programming, it is only as functional as has been needed to assemble the BALGOL source code successfully. It does not attempt to be a faithful recreation of whatever assembler was used in 1960. That original assembler likely had other features that are not apparent from these listings, and thus have not been implemented in the BAC-Assembler.

User Interface

When you open the assembler in a browser, you will see a window similar to this:

The interface has two file-picker controls plus additional controls to specify output from the assembler. The large area with scrollbars will display any listing generated during an assembly. The page has a light-yellow background to distinguish it from the GEN-Assembler utility used for the BALGOL Generator program, as the user interface for both assemblers is almost identical.

The controls at the top of the page are:

List Pass 1 -- If this checkbox is ticked, the assembler will produce a listing during its initial pass. The listing will show address assignments but not generated object code. It is not normally useful for regular programming, and the checkbox is not ticked by default.
List Pass 2 -- If this checkbox is ticked, the assembler will produce a listing during its second pass. This listing includes address assignments and generated object code. This checkbox is ticked by default.
Write Checksum -- If this checkbox is ticked, the assembler will generate a checksum word for the generated code and output it as an additional word of object code on the medium selected by the next control. It is simply a sum of all the words output by the assembler, ignoring the three high-order bits in sign digits, and discarding any arithmetic overflow.
Output Mode -- This pull-down list selects the medium to which the generated object code will be written. The choices are:
- No Object -- No object code will be produced by the assembler. This option may be useful for syntax-only assemblies or simply to generate a listing of the code.
- Loadable Deck -- Produces a standard 220 "format-6" image of a self-loadable card deck in a separate temporary window. This is the default selection. Each card image holds up to six words of 220 code per card. Each card bootstraps the next card during the load process. The deck is formatted to be prepared from Cardatron input unit 1.
- Paper Tape -- Produces a self loading paper tape image in a separate temporary window. The image is configured for paper tape reader 1. Bootstrap with a 6 1000 04 0000 instruction in the C register.
- BALGOL ML Deck -- Produces a BALGOL "machine language" card deck image in a separate temporary window. This format also has up to six words of object code per card, but is formatted for use by the BALGOL Object Loader and Generator programs. This is the format in which library routines must be output. See Appendix F of the BALGOL Reference Manual under the heading "PREPARATION OF EXTERNAL PROGRAMS."
- Gen MEDIA Deck -- Produces a Generator Program INPUTMEDIA/OUTPUTMEDIA card deck. This format is used to supply custom input/output routines for use by the compiler and library. It has one word of object code per card, as described in Appendix F of the BALGOL Reference Manual under the heading "INPUT-OUTPUT PROCEDURES."
- Object Tape -- Produces a retro-220 emulator magnetic tape image in a separate temporary window. This image can be saved and used as described in the Using Magnetic Tape wiki page.
Extract Listing -- Clicking this button will select all of the text in the listing area of the assembler page. Once this is done, you can copy/paste the listing text into some other program for printing or saving to disk.
Pre-load Pool -- This control allows you to specify a JSON file that pre-loads the assembler's literal pool. Its use is optional, and is not normally used for regular 220 programming. See the discussion on Pre-Loading the Literal Pool below for more information.
Load Source & Go -- This control selects the source file to be assembled. Selecting a file with this control automatically starts the assembly process, so you should establish any other control settings before using this one. See the discussion below on source files for the assembler.

Note that object code output in the form of card decks or magnetic tape is generated in a separate temporary window. From this window you can save the object code text to a local file or copy/paste the text into another program. Once you have captured the object code, simply close the temporary window.

Assembler Notation

This section describes the syntax and semantics of the notation used by the BAC-Assembler.

BAC-Assembler Source Files

The assembler reads source from ordinary text files. The lines in these files may be delimited by a CR-LF pair, LF only, or CR only. The assembler does not recognize horizontal tab (HT) characters and does not do tab expansion.

Each line in a file contains one machine instruction or assembler pseudo-instruction and is laid out as follows:

Columns	Description
1-4	Ignored by the assembler. In the original assembler, they would have been used for Cardatron format band selection.
5-9	Blank, or a symbolic label, or a point label (i.e., ``label*).
10	Ignored by the assembler.
11-14	Symbolic operation code (standard 220 mnemonics).
15	Override sign digit (0-9, +, -). The digit in this column sets the sign digit in the word.
16	Ignored by the assembler.
17-72	Operands and comments. Operand fields are delimited by commas. The first space terminates the operands. Any text after the space is ignored by the assembler and may be used for comments.
73-80	Card sequence number and identification. These columns are ignored by the assembler.

Labels and Symbols

Symbolic addresses are represented in the assembler by labels. Labels are identifiers that consist of from one to five alphanumeric characters. Labels must begin with a letter and consist only of letters and decimal digits.

By default, the presence of a non-blank symbol in the label field of a source record (columns 5-9) causes the value of the assembler's current address counter to be assigned as the value of that symbol. Certain pseudo-instructions may cause the symbol to have a different value, however.

Labels have two forms, global labels and point labels. A global label is any symbol in the label field of a record that is not prefixed with an asterisk (*). Such a symbol has global scope throughout the assembly and may appear in the label field only once. The assembler will issue an error if the same global symbol is redefined by appearing more than once in a label field. Global labels used as operands are referred to simply by their symbol.

A point label is a symbol in the label field that is prefixed with an asterisk. The asterisk is not part of the symbol; it merely identifies it as representing a point label. Due to the presence of the asterisk, point labels may consist of at most four characters.

Point labels have local scope and may appear multiple times in label fields throughout the assembly. They are frequently used to implement branches and data references to nearby locations. The scope of a point label is:

from the line where it is declared backward to (but not including) the prior declaration of the same point label, or to the beginning of the program if the label was not previously declared; and
from the line where it is declared forward to (but not including) the next declaration of the same point label, or to the end of the program if the label is not declared further on.

A point label used in the operand field of an instruction is coded as the symbol without the asterisk, but followed by a plus (+) or minus (-) sign. A plus sign refers to the location of the next declaration of the point label later in the source; a minus sign refers to the location of the prior declaration of the point label earlier in the source. As an example, here is a code snippet from the BALGOL Overlay module:

0732  4 2006 63 3492   *A    CWF 4 ERFRM+28,42
0733  0 0000 42 0292         LDB   DUMBS
0734  0 9999 20 0738         IBB   A+,9999
0735  0 0000 42 3554         LDB   +SCRTB+117
0736  0 0000 44 1691         STP   LIBRX
0737  0 0000 30 1690         BUN   LIBRF
0738  0 0000 10 0208   *A    CAD   HALT1
0739  0 0008 33 0742         BSA   *+3,8
0740  0 0000 41 3531         LDR   +525005250
0741  0 0001 40 3450         STR   HALT
0742  0 0000 42 0275         LDB   OP
0743  0 0412 40 0751   *A    STB   C+,04
0744  0 9999 20 0752         IBB   A+,9999
0745  0 0000 44 1847         STP   WEMX
0746  0 0000 30 1812         BUN   WEM
0747  3 0102 03 0000         CNST  30102030000
0748  0 0000 42 0751         LDB   C+
0749  1 0000 42 0000         LDB - 0
0750  0 0000 30 0743         BUN   A-
0751  0 0000 00 0000   *C    HLT   0
0752  0 0000 10 0002   *A    CAD   BUF

There are four declarations of the point label A in this code, but the operands referencing it can only reach the immediately next or prior declaration of the label with respect to the location of the operand. To see the way that the point labels resolve to addresses, examine the address fields (last four digits of the instruction words) in the example above.

Operands

The assembler generates machine instructions by "assembling" words using the mnemonic operation code from columns 11-14 of a source record, the sign digit from column 15, and zero or more comma-delimited symbolic address expressions from the operand field in columns 17-72. The address expressions are in turn composed of address primaries and address operators.

The number of address expressions the assembler expects to encounter varies by the mnemonic operation code. A given operation code may require certain address expressions to be specified; some address expressions may be optional and will generally have the value of zero if omitted.

Address Primaries

The BAC-Assembler recognizes the following as primaries (fundamental operands) in address expressions:

Primary	Description
`*`	Current value of the assembler's location counter, i.e., the address of the instruction being assembled.
integer	An unsigned decimal literal of four or fewer digits. The value of the primary is the value of the integer. Such primaries are used for several purposes, depending on the instruction being assembled and the relative position in the list of operands. Uses include absolute memory addresses, offsets to symbolic addresses, partial-word (sL) designators, unit numbers in I/O instructions, and other variant field values.
symbol	An alphanumeric symbol referring to a global label. The value of the primary is the address at which the symbol was defined in a label field.
symbol+	A forward reference to a point label. The value of the primary is the address of the next declaration of that point label later in the source.
symbol-	A backward reference to a point label. The value of the primary is the address of the prior declaration of that point label earlier in the source.
+integer	A positive numeric literal. The integer must be unsigned and ten or fewer decimal digits in length. The value of the primary is the address of the word in the literal pool where literal value will be stored. The sign digit of the literal value will be `0` (positive).
-integer	A negative numeric literal. Similar to a positive numeric literal, except that the sign digit of the word in the literal pool will be `1` (negative).
`_string_`	A string literal. The value of the primary is the address of the first word of the string data in the literal pool. When used in the operand list for a machine instruction, the string data is limited to five characters (one word). Operands for the `CNST` pseudo-instruction may be longer strings. Words in the pool will have a sign of `2`. The `$` characters are string quotes and are not stored in the literal pool. Four ASCII characters are used to represent non-graphic 220 control characters: "`

Address Expressions

Each comma-delimited operand for an instruction is a symbolic address expression. These expressions are made up from address primaries and address operators. The two address operators are + (addition) and - (subtraction).

The value of the expression is computed from the values of the primaries operated upon by the operators. Evaluation of the expression is strictly left-to-right. The resulting value is truncated on the left to a four-digit memory address. If the address is negative, it is converted to its tens-complement value before being truncated (e.g., -123456 would be converted to 6544).

Examples:

Expression	Value
`*`	The current value of the location counter
`ABC`	The address associated with the symbol ABC
`ABC-1`	The address associated with the symbol ABC, minus 1
`ABC-DEF+2`	The difference between the addresses for ABC and DEF, plus 2 words
`+1234`	The address of the literal value +0000001234 in the literal pool
`+1234-+5678`	The difference between the addresses of the two literals in the literal pool
`B+`	The address of the next declaration of the point label B (i.e., *B)
`B-+3`	The address of the prior declaration of the point label B, plus 3 words
`B---1234-1234`	The address of the prior declaration of the point label B minus the address of the pool literal -1234, minus 1234 words

Machine Instructions

The bulk of the source records fed to the assembler will typically be those representing machine instructions. The following table shows how the list of operand address expressions in columns 17-72 of the record are assembled into a machine instruction word. See the Operational Characteristics of the Burroughs 220 reference manual for details on the format of instruction words and the meaning of the sub-fields for each instruction.

Operands in square brackets are optional and may be omitted. If an operand in the middle of the list is omitted, its comma must be retained, although commas for omitted operands at the end of the list may also be omitted. Unless noted otherwise below, the value of omitted operands is zero.

The following conventions are used for operands in the table below:

aaaa -- the operand address. In some instructions, such as shifts, this field is a count and not an address, and not all of the high-order digits are used.
bu -- format band number and unit number, a combined field used in some Cardatron instructions.
c -- a control digit used in the Cardatron CWR instruction to specify how the "T relays" are to be set for card machine control.
cccc or ccc -- control digits not used by the instruction. These fields are often used to store addresses for subroutine parameters, constant values, and repetition counts for loops.
d -- a control or variant digit with a meaning specific to the instruction.
f -- a digit indicating the instruction is to execute in a special mode, e.g., whether it targets a partial-word (sL) operand field or a whole word.
hhu -- a two-digit head (lane) and one-digit unit number -- a combined field used in some magnetic tape instructions.
k -- a digit indexing the category code word for magnetic tape scan (MTC, MFC) instructions.
kk -- the size of a block to be written by magnetic tape write instructions.
nnnn, nn, or n -- a count or other parametric value used by the instruction.
r -- digit used to specify reload-lockout in some Cardatron instructions.
sL -- a partial-word designator, used in instructions that operate on a field of digits within a word. s is the starting digit number in the word and L is the length of the field, starting with the s digit and extending to the left. A value of 0 for either digit is interpreted as 10. Digits in a word are numbered ±1234567890, where ± is the sign digit (which cannot be indexed by s). In order for the partial-word designator to be valid, the relation (s+1) <= L must hold.
u -- unit number for an input/output instruction
v -- a control or variant digit with a meaning specific to the instruction.

Word Format	Mnem	Operands	Notes
`± cccc 00 aaaa`	`HLT`	`[aaaa],[cccc]`
`± cccc 01 aaaa`	`NOP`	`[aaaa],[cccc]`
`± unnv 03 aaaa`	`PRD`	`aaaa,u,nn,[v]`
`± unnv 04 aaaa`	`PRB`	`aaaa,u,[v],[nn]`
`± unnv 05 aaaa`	`PRI`	`aaaa,u,nn,[v]`
`± unn0 06 aaaa`	`PWR`	`aaaa,u,nn`
`± u000 07 aaaa`	`PWI`	`aaaa,u`
`± cccc 08 aaaa`	`KAD`	`[aaaa],[cccc]`
`± dnnf 09 aaaa`	`SPO`	`aaaa,nn,[d]`
`± 0000 10 aaaa`	`CAD`	`aaaa,[cccc]`
`± 0001 10 aaaa`	`CAA`	`aaaa,[cccc]`
`± 0000 11 aaaa`	`CSU`	`aaaa,[cccc]`
`± 0001 11 aaaa`	`CSA`	`aaaa,[cccc]`
`± 0000 12 aaaa`	`ADD`	`aaaa,[cccc]`
`± 0001 12 aaaa`	`ADA`	`aaaa,[cccc]`
`± 0000 13 aaaa`	`SUB`	`aaaa,[cccc]`
`± 0001 13 aaaa`	`SUA`	`aaaa,[cccc]`
`± cccc 14 aaaa`	`MUL`	`aaaa,[cccc]`
`± cccc 15 aaaa`	`DIV`	`aaaa,[cccc]`
`± cccc 16 aaaa`	`RND`	`[aaaa],[cccc]`
`± cccc 17 aaaa`	`EXT`	`aaaa,[cccc]`
`± sLf0 18 aaaa`	`CFA`	`aaaa,[sL]`
`± sLf1 18 aaaa`	`CFR`	`aaaa,[sL]`
`± cccc 19 aaaa`	`ADL`	`aaaa,[cccc]`
`± nnnn 20 aaaa`	`IBB`	`aaaa,nnnn`
`± nnnn 21 aaaa`	`DBB`	`aaaa,nnnn`
`± n000 22 aaaa`	`FAD`	`aaaa,[n]`
`± n001 22 aaaa`	`FAA`	`aaaa,[n]`
`± n000 23 aaaa`	`FSU`	`aaaa,[n]`
`± n001 23 aaaa`	`FSA`	`aaaa,[n]`
`± cccc 24 aaaa`	`FMU`	`aaaa,[cccc]`
`± cccc 25 aaaa`	`FDV`	`aaaa,[cccc]`
`± sLnn 26 aaaa`	`IFL`	`aaaa,sL,nn`
`± sLnn 27 aaaa`	`DFL`	`aaaa,sL,nn`
`± sLnn 28 aaaa`	`DLB`	`aaaa,sL,nn`
`± 0nn0 29 aaaa`	`RTF`	`aaaa,nn`
`± cccc 30 aaaa`	`BUN`	`aaaa,[cccc]`
`± cccc 31 aaaa`	`BOF`	`aaaa,[cccc]`
`± cccc 32 aaaa`	`BRP`	`aaaa,[cccc]`
`± cccd 33 aaaa`	`BSA`	`aaaa,d,[ccc]`
`± cccd 33 aaaa`	`BPA`	`aaaa,[d],[ccc]`
`± cccd 33 aaaa`	`BMA`	`aaaa,[d],[ccc]`	The `d` operand defaults to 1 if omitted
`± ccc0 34 aaaa`	`BCH`	`aaaa,[ccc]`
`± ccc1 34 aaaa`	`BCL`	`aaaa,[ccc]`
`± ccc0 35 aaaa`	`BCE`	`aaaa,[ccc]`
`± ccc1 35 aaaa`	`BCU`	`aaaa,[ccc]`
`± sLnn 36 aaaa`	`BFA`	`aaaa,sL,nn`
`± sLnn 36 aaaa`	`BZA`	`aaaa,[sL],[nn]`
`± sLnn 37 aaaa`	`BFR`	`aaaa,sL,nn`
`± sLnn 37 aaaa`	`BZR`	`aaaa,[sL],[nn]`
`± u000 38 aaaa`	`BCS`	`aaaa,u`
`± ccc0 39 aaaa`	`SOR`	`[aaaa],[ccc]`
`± ccc1 39 aaaa`	`SOH`	`[aaaa],[ccc]`
`± ccc2 39 aaaa`	`IOM`	`aaaa,[ccc]`
`± sLf0 40 aaaa`	`STA`	`aaaa,[sL]`
`± sLf1 40 aaaa`	`STR`	`aaaa,[sL]`
`± sL02 40 aaaa`	`STB`	`aaaa,[sL]`
`± cccc 41 aaaa`	`LDR`	`aaaa,[cccc]`
`± ccc0 42 aaaa`	`LDB`	`aaaa,[ccc]`
`± ccc1 42 aaaa`	`LBC`	`aaaa,[ccc]`
`± cccd 43 aaaa`	`LSA`	`d,[aaaa],[ccc]`
`± cccc 44 aaaa`	`STP`	`aaaa,[cccc]`
`± ccc1 45 aaaa`	`CLA`	`[aaaa],[ccc]`
`± ccc2 45 aaaa`	`CLR`	`[aaaa],[ccc]`
`± ccc3 45 aaaa`	`CAR`	`[aaaa],[ccc]`
`± ccc4 45 aaaa`	`CLB`	`[aaaa],[ccc]`
`± ccc5 45 aaaa`	`CAB`	`[aaaa],[ccc]`
`± ccc6 45 aaaa`	`CRB`	`[aaaa],[ccc]`
`± ccc7 45 aaaa`	`CLT`	`[aaaa],[ccc]`
`± cccc 46 aaaa`	`CLL`	`aaaa,[cccc]`
`± ccc0 48 aaaa`	`SRA`	`aaaa,[ccc]`
`± ccc1 48 aaaa`	`SRT`	`aaaa,[ccc]`
`± ccc2 48 aaaa`	`SRS`	`aaaa,[ccc]`
`± ccc0 49 aaaa`	`SLA`	`aaaa,[ccc]`
`± ccc1 49 aaaa`	`SLT`	`aaaa,[ccc]`
`± ccc2 49 aaaa`	`SLS`	`aaaa,[ccc]`
`± uhh0 50 aaaa`	`MTS`	`aaaa,hhu`
`± uhh0 50 aaaa`	`MFS`	`aaaa,hhu`	Note that the sign digit is initialized to 4.
`± uhh4 50 aaaa`	`MLS`	`hhu,[aaaa]`
`± uhh8 50 aaaa`	`MRW`	`hhu,[aaaa]`
`± uhh9 50 aaaa`	`MDA`	`hhu,[aaaa]`
`± uhhk 51 aaaa`	`MTC`	`aaaa,hhu,k`
`± uhhk 51 aaaa`	`MFC`	`aaaa,hhu,k`	Note that the sign digit is initialized to 4.
`± un00 52 aaaa`	`MRD`	`aaaa,u,n,[v]`	The `v` operand is added to digit 4 in the instruction word. The pre-defined symbol `BMOD`, which has the value 8, may be used for this operand to indicate B-Register modification of words
`± un01 52 aaaa`	`MNC`	`aaaa,u,n,[v]`	The `v` operand is added to digit 4 in the instruction word. The pre-defined symbol `BMOD`, which has the value 8, may be used for this operand to indicate B-Register modification of words
`± un00 53 aaaa`	`MRR`	`aaaa,u,n,[v]`	The `v` operand is added to digit 4 in the instruction word. The pre-defined symbol `BMOD`, which has the value 8, may be used for this operand to indicate B-Register modification of words
`± unkk 54 aaaa`	`MIW`	`aaaa,u,n,kk`
`± unnn 55 aaaa`	`MIR`	`aaaa,u,n,kk`
`± unkk 56 aaaa`	`MOW`	`aaaa,u,n,kk`
`± unkk 57 aaaa`	`MOR`	`aaaa,u,n,kk`
`± un00 58 aaaa`	`MPF`	`u,n,[aaaa]`
`± un01 58 aaaa`	`MPB`	`u,n,[aaaa]`
`± u002 58 aaaa`	`MPE`	`u,[aaaa]`
`± unn0 59 aaaa`	`MIB`	`aaaa,u,[nn]`
`± unn1 59 aaaa`	`MIE`	`aaaa,u,[nn]`
`± unnv 60 aaaa`	`CRD`	`aaaa,u,[v],[nn]`	The predefined symbol `RLO`, which as the value 1, may be used for the `v` operand to indicate reload-lockout
`± u01v 60 aaaa`	`CNC`	`aaaa,u,[v]`	The predefined symbol `RLO`, which as the value 1, may be used for the `v` operand to indicate reload-lockout
`± u0cb 61 aaaa`	`CWR`	`aaaa,bu,[c]`	The value (`b-1`)2 is added* to digit 4 in the instruction word
`± u00r 62 aaaa`	`CRF`	`aaaa,bu,[r]`	The value (`b-1`)2 is added* to digit 4 in the instruction word. The predefined symbol `RLO`, which as the value 1, may be used for the `r` operand to indicate reload-lockout
`± u00r 63 aaaa`	`CWF`	`aaaa,bu,[r]`	The value (`b-1`)2 is added* to digit 4 in the instruction word
`± u000 64 aaaa`	`CRI`	`aaaa,u`
`± u000 65 aaaa`	`CWI`	`aaaa,u`
`± 0nn0 66 aaaa`	`HPW`	`aaaa,nn`
`± cccc 67 aaaa`	`HPI`	`[aaaa],[cccc]`

Pseudo-Instructions

The BAC-Assembler supports a number of pseudo-instructions. These do not represent machine instructions, but instead specify address and location information to the assembler or provide convenient ways to specify constant values and other blocks of data.

Simple Pseudo-Instructions

REM: A remark or comment. This line in the source will appear on any listings with the REM blanked out, but otherwise it is ignored by the assembler.

DEFN: This pseudo defines the value of a symbol. The symbol in the label field of the record is assigned the value of the single address expression in the operand field. The address expression must be resolvable by the assembler during its first pass, i.e., all symbols in the expression must have been defined prior to this point in the source.

LOCN: Normally the assembler's location counter is incremented by one for each word of object code generated. This pseudo sets the location counter to the value of the single address expression in the operand field. As with DEFN, this expression must be resolvable during the first pass of the assembly.

F244: This pseudo assembles a word of data from three operand expressions.

The value of the first expression goes in the sL=22 field.
The value of the second expression goes in the sL=64 field.
The value of the third expression goes in the sL=04 field.

The values of the operand expressions are converted to tens-complement if negative and truncated on the left as necessary to fit into their respective fields. The sign of the word may be specified in column 15 of the source record as usual. Thus, the pseudo-instruction "F2443 -1,7,654321" would generate a word with the value "3 9900 07 4321."

F424: This pseudo assembles a word from three operand expressions similar to the F244 pseudo, but the fields are placed into the sL=44, sL=62, and sL=04 fields instead. This is the same layout as a machine instruction word.

FINI: This pseudo must be the last command in an assembly. Any source records following this one are ignored. The value of the single address expression operand is taken as the address where execution for the program should start. This address is used only when the object code is output as a format-6 card deck.

`CNST` Pseudo-Instruction

The CNST pseudo assembles a comma-delimited list of numeric and string literals in the operand area (columns 17-72) into consecutive data words. Each element of the list may be one of the following:

An unsigned integer literal of 1-11 digits. If the literal is less than 11 digits in length, the sign and high-order digits of the word will be 0.
A plus sign (+) followed by an integer literal of 1-10 digits. The sign of the word will be 0.
A minus sign (-) followed by an integer literal of 1-10 digits. The sign of the word will be 1.
A $-delimited string. The string characters will be translated to 220 character codes and stored in words with sign of 2. If the string is not a multiple of five characters in length, the final word will be padded on the right with spaces (220 code 00). Four ASCII characters are used to represent non-graphic 220 control characters: "|" is used for carriage return (code 16), "~" is used for horizontal tab (code 26), "^" is used for form feed (code 15), and "_" is used for the non-printing character (code 02).

A CNST pseudo with a blank operand field will cause a single word of zeroes to be assembled. If one of the operands is omitted (i.e., there are two consecutive commas in the operand field), a word of zeroes will be stored for that operand.

A string literal may be continued across multiple records by leaving the operation code field on the second and subsequent records blank. The text of the string will continue to column 72 on all but the last record, and will resume in column 17 on all but the first record. The closing $-delimiter appears only on the last record. This is the only BAC-Assembler instruction that may cross multiple source records.

`FBGR` Pseudo-Instruction

The FBGR pseudo generates Cardatron format bands from a somewhat COBOL PICTURE-like representation of the layout for a card image or print line. Here are some examples from the Compiler Main module:

FR1   FBGR  INPUT,T2Z1B4A,15(T5A)
FR2   FBGR  INPUT,16(P5A),P10Z
FR3   FBGR  PRINT,49B,TZZZZZZNNNN,BBB,SBNNNNBNNBNNNN,BT5A,44B
FR6   FBGR  PRINT,49B,TZZZZZZNNNN,BBB,SBNNNNBNNBZZZZ,5BT5A,44B
FR7   FBGR  PRINT,49B,TZZZZZZNNNN,BBB,T6Z10BNNNN,50B
FR4   FBGR  PRINT,7(T5A),85B
FR8   FBGR  PRINT,TZZNNNNZZZZ,4B,16(T5A),32B

There are two classes of format bands -- those used for input from card readers and those used for output to card punches and line printers. Each FBGR pseudo generates a data block of 29 words. These data blocks are referenced by Card Read Format Load (CRF, 62) instructions for input bands and Card Write Format Load (CWF, 63) instructions for output bands. Note that the address in CRF and CWF instructions must reference the last word in the block of 29 words.

The first operand in a band definition is a mnemonic that indicates the input/output class:

INPUT specifies a format band for input from a card reader. The format string must define a card image of exactly 80 characters.
PUNCH specifies a format band for output to a card punch. The format string must define a card image of exactly 80 characters.
PRINT specifies a format band for output to a line printer. The format string must define a line image of exactly 120 characters.

Following the class mnemonic is a list of comma-delimited format phrases. Most phrases describe how a contiguous range of columns on the card machine will be converted to or from one word in 220 memory. The format phrases are terminated by the first space or when column 72 is reached.

The Cardatron split each card column or print position into two parts -- a numeric code (the bottom nine rows on a card: 1-9) and a zone code (the top three rows on a card: +, -, 0). Each code could be transferred to and from the 220 or suppressed individually. The numeric code for a column is transferred before the zone code. Since FBGR considers a format band from the perspective of columns on the card machine, you do not normally need to consider the separate numeric and zone codes, but there are exceptions, especially when dealing with signs. For more information, see Chapter 6 in the Operational Characteristics of the Burroughs 220 reference manual.

The phrases are composed from letter codes that determine how the Cardatron numeric and zone codes are to be treated. A letter code may be prefixed with an integer repeat count. Thus the phrase "TZZZZZZNNNN" is the same as "T6Z4N." A single phrase may also be enclosed in parentheses and prefixed by a repeat count, e.g., "16(T5A)." No commas are permitted within parentheses, and parentheses may not be nested.

The following table lists the phrase letter codes and their use in input and output format bands.

Code	Input Band Use	Output Band Use
A	copy two zone/numeric digits to memory from card machine	copy two digits from memory to card machine
B	ignore two digits from card machine, store nothing in memory	supply two zero digits to card machine, do not transfer digits from memory
N	copy a numeric digit to memory from a card column, ignoring the zone digit	copy one digit from memory to a card column, normally supplying a zero for the zone (see `X`)
P	store a zero for the sign digit, do not transfer a digit from the card machine	ignore sign digit in memory, do not transfer a digit to the card machine
S	ignore the numeric portion of a card column and copy the zone digit to the sign digit of the memory word	copy the sign digit from the memory word as the zone of a separate card column
T	like P, but store a 2 for the sign instead of zero	same as P
X	store zone digit from card machine to memory	copy a digit from memory to card machine as an over-punch for next code
Z	store zero digit in memory, do not transfer a digit from the card machine	skip/ignore a digit in memory, transfer nothing to the card machine

These codes are typically used as follows:

A: transfer a zone/numeric digit pair to or from one column on the card machine as an alphanumeric character.
B: ignore one column on the card machine on input; supply a blank column (code 00) on output.
N: transfer the numeric digit from a card column to or from a digit in memory
P: store a positive sign digit on input; ignore the sign digit in memory on output
S: transfer the sign as a separate card column to/from the sign digit in the memory word
T: store signs of 2 for alphanumeric words on input for use with the console devices; not normally used for output
X: transfer an over-punched sign (zone digit only) as a separate digit to/from memory
Z: fill zero digits in memory on input; skip digits in memory on output

Note that the P, S, and T codes must reference the sign digit of a 220 word. The assembler will issue an error if this is not the case. This assures that the band phrases will be aligned with the 220 memory words.

Here are two of the examples above annotated with what the band phrases do:

FR1   FBGR  INPUT,T2Z1B4A,15(T5A)

Ignore the first column [1B] on the card. In the first memory word of the buffer, store a sign digit of 2 [T], followed by two zero digits [2Z]; then transfer the next four card columns [4A] alphanumerically as two-digit character codes. The two zero digits cause the four alphanumeric characters to be right-justified in the memory word. Transfer the next 75 characters alphanumerically from the card [15(T5A)] to the next 15 memory words, storing the words with signs of 2. Notice that the total number of card columns input is 80, as is required for an INPUT format band.

FR3   FBGR  PRINT,49B,TZZZZZZNNNN,BBB,SBNNNNBNNBNNNN,BT5A,44B

Output 49 columns of spaces [49B] to the line printer. From the first word of the memory buffer, ignore the sign and first six digits [TZZZZZZ] and transfer the low-order four digits of the word numerically [NNNN] to the next four columns on the line. Output three more columns of spaces [BBB] on the line. From the second word in the memory buffer, output the sign digit as a separate column [S], then a space [B], then four columns of numeric digits from the word [NNNN], another space [B], two columns of numeric digits from the word [NN], another space [B], and four columns of numeric digits from the last four digits of the word [NNNN]. Output a space followed by the next memory word as five alphanumeric columns [BT5A]. Finish the line with 44 spaces [44B]. Notice that the total number of print columns output is 120, as is required for a PRINT format band.

Pre-Loading the Literal Pool

This is a feature implemented solely to aid in assembly of the BALGOL compiler and is not something that would be normally useful for other 220 programming. Unless you are recovering an old program such as the BALGOL compiler, you can safely skip this section.

Operand fields on an assembler source record may contain literals. Examples of literals are +1234567, -7654321, $ALPHA$ (an alphanumeric literal), and address expressions such as +START+2. The assembler allocates storage for these literal values and treats each literal value as a symbol that has an associated address. The assembler attempts to assign multiple references to a given literal value at the same address.

Literals are assigned contiguous addresses in an area of memory termed the "literal pool." This pool is allocated at the end of the program, after the FINI pseudo-instruction is encountered. The order of entries within the pool is something determined by the assembler. We do not know the ordering scheme used by the original assembler -- it was probably determined by the way its symbol table was organized -- so it is not surprising that the BAC-Assembler usually generates words in a literal pool with a different sequence than the original assembler did.

This sequencing difference is not a problem in terms of generating instructions that will execute properly, but it is a problem in terms of matching the listings generated by the BAC-Assembler to the transcriptions of the original BALGOL listings in order to verify those transcriptions. That verification is much simpler if the BAC-Assembler can arrange the literal pool words at the same locations as the original assembler did.

Thus, to support verification of the BALGOL transcriptions, BAC-Assembler allows an initial literal pool, including addresses for the literal values, to be specified in advance and pre-loaded prior to the assembly process. The initial literal pool is specified in a text file using JSON notation. This file is termed a "poolSet." As an example, here is the literal pool as transcribed from the listing for the compiler's Main module:

046 70 0     4109         FINI  1
             4109               +0371720000
             4110               +6099999999
             4111               +5822570000
             4112               +9999999999
             4113               +6034037172
             4114               +4959035600
             4115               +4959045600

Here is the text of the corresponding poolSet that can be used to pre-load the literal pool into the BAC-Assembler:

{"poolSet": [
    {"poolLoc": 4109,
     "poolData": [
        "+0371720000",
        "+6099999999",
        "+5822570000",
        "+9999999999",
        "+6034037172",
        "+4959035600",
        "+4959045600"]
    }
]}

A poolSet consists of an object with a single member named poolSet. This member must be an array containing at least one object. Only the first element in this array is used; any additional entries in the array are ignored (the GEN-Assembler supports multiple assembly units, so its poolSet arrays may have multiple objects, but this is not supported by BAC-Assembler).

The first object within the array must have two members:

An integer value named poolLoc. This is the absolute memory address of the start of the pool.
An array of strings named poolData. The strings represent the values of words in the pool. Their value must have one of two forms:
- A ten-digit decimal number, padded on the left with zeroes as necessary to make ten digits, and prefixed by a plus or minus sign.
- A dollar sign ($) followed by one to five alphanumeric characters. A dollar sign can be one of those characters. Note that within a poolSet definition strings are not terminated by a dollar sign.

The format of the poolData strings must match that of the pool literal values output by the original assembler. For more examples, see the scans of the BALGOL listings at the Computer History Museum cited above or the *.baca files in the retro-220 emulator's software/BALGOL directory.