mirror of
https://github.com/PDP-10/its.git
synced 2026-04-24 19:40:28 +00:00
Added SPELL/ESPELL, spellchecker.
This commit is contained in:
committed by
Lars Brinkhoff
parent
b2341133ef
commit
65a92f9a9f
@@ -117,6 +117,7 @@ from scratch.
|
|||||||
- ARCSAL, archive salvager
|
- ARCSAL, archive salvager
|
||||||
- RMTDEV, MLDEV for non-ITS hosts
|
- RMTDEV, MLDEV for non-ITS hosts
|
||||||
- IDLE, list idle users
|
- IDLE, list idle users
|
||||||
|
- SPELL, ESPELL spell checker
|
||||||
|
|
||||||
6. A brand new host table is built from the host table source and
|
6. A brand new host table is built from the host table source and
|
||||||
installed into SYSBIN; HOSTS3 > using H3MAKE.
|
installed into SYSBIN; HOSTS3 > using H3MAKE.
|
||||||
|
|||||||
@@ -667,6 +667,11 @@ respond "*" ":midas device;atsign rmtdev_gz;rmtdev\r"
|
|||||||
respond "*" ":midas sys1;ts idle_gren;idle\r"
|
respond "*" ":midas sys1;ts idle_gren;idle\r"
|
||||||
expect ":KILL"
|
expect ":KILL"
|
||||||
|
|
||||||
|
# spell
|
||||||
|
respond "*" ":midas sys1;ts spell_syseng;spell\r"
|
||||||
|
expect ":KILL"
|
||||||
|
respond "*" ":link sys1;ts espell,sys;ts spell\r"
|
||||||
|
|
||||||
# ndskdmp tape
|
# ndskdmp tape
|
||||||
respond "*" ":link kshack;good ram,.;ram ram\r"
|
respond "*" ":link kshack;good ram,.;ram ram\r"
|
||||||
respond "*" ":link kshack;ddt bin,.;@ ddt\r"
|
respond "*" ":link kshack;ddt bin,.;@ ddt\r"
|
||||||
|
|||||||
577
doc/info/spell.1
Normal file
577
doc/info/spell.1
Normal file
@@ -0,0 +1,577 @@
|
|||||||
|
-*-Text-*-
|
||||||
|
This is the file <info>ispell.info, which documents the ISPELL
|
||||||
|
spelling checker/corrector.
|
||||||
|
|
||||||
|
File: ISPELL Node: Top Up: (DIR) Next: Correcting
|
||||||
|
|
||||||
|
1. INTRODUCTION
|
||||||
|
|
||||||
|
This memo documents a spelling check/correction program maintained on
|
||||||
|
the four MIT ITS computers and on the MIT-XX Twenex computer. The program
|
||||||
|
is called "SPELL" on ITS and "ISPELL" (for ITS-style SPELL) on Twenex.
|
||||||
|
(The name ISPELL is used on Twenex to avoid conflict with the standard DEC
|
||||||
|
TOPS-20 program called SPELL. This memo does NOT apply to the DEC TOPS-20
|
||||||
|
program.)
|
||||||
|
|
||||||
|
The program's behavior is nearly identical on ITS and Twenex. (The
|
||||||
|
principal difference is that arguments to a command are separated by commas
|
||||||
|
on ITS and spaces on Twenex.) A command consists of a command name and its
|
||||||
|
arguments on one line, for example
|
||||||
|
|
||||||
|
SET SCRIBE
|
||||||
|
LOAD MYDICT,3
|
||||||
|
CORRECT FOO BAR,FOO OUT
|
||||||
|
CORRECT FOO BAR,FOO OUT/13
|
||||||
|
ASK PROCEDE
|
||||||
|
|
||||||
|
SPELL is designed to process input text for text formatters TJ6, R,
|
||||||
|
SCRIBE, PUB, and TEX. It has the necessary knowledge about the command
|
||||||
|
structure of these programs to avoid attempting to check the spelling of
|
||||||
|
formatter commands. It can be told by means of special comments to
|
||||||
|
suppress checking of designated portions of the text.
|
||||||
|
|
||||||
|
SPELL's querying mechanism for correcting misspelled words works best
|
||||||
|
when it is used from a display terminal, but SPELL may be effectively used
|
||||||
|
from a printing terminal or slow display, especially if the "context
|
||||||
|
display" and "alternate spellings display" (see below) are disabled. It
|
||||||
|
may be used effectively from uppercase-only terminals, since it uses the
|
||||||
|
capitalization from the original word in the file when a spelling
|
||||||
|
correction is typed in.
|
||||||
|
|
||||||
|
SPELL has a dictionary of about 38000 words and requires about 43K of
|
||||||
|
memory to run.
|
||||||
|
|
||||||
|
Report bugs and complaints to BUG-SPELL@MIT-ML.
|
||||||
|
|
||||||
|
* Menu:
|
||||||
|
|
||||||
|
* Correcting:: The principal operation
|
||||||
|
* Training:: Making a list of all unknown words from a file
|
||||||
|
* Asking:: Asking about specific words, and other fun and games
|
||||||
|
* Emacs:(EMACS)Fixit
|
||||||
|
Using SPELL from EMACS to check the spelling of one word
|
||||||
|
* Load/Dump:: Loading and dumping dictionaries
|
||||||
|
* Options:: Switches that control formatter mode etc.
|
||||||
|
* Syntax:: Syntax of commands and file names, and operation from "JCL"
|
||||||
|
* Esoterica::
|
||||||
|
|
||||||
|
File: ISPELL Node: Correcting Previous: Top Up: Top Next: Training
|
||||||
|
|
||||||
|
2. The CORRECT command
|
||||||
|
|
||||||
|
The principal operation which SPELL performs is that of "correcting" a
|
||||||
|
file. This consists of reading the file and optionally writing an output
|
||||||
|
file in which misspellings are corrected. The corrections are made either
|
||||||
|
by substituting a word known to SPELL or by retyping the word manually.
|
||||||
|
Whether output is being written or not, every misspelling (that is, every
|
||||||
|
word not known to SPELL) is displayed, and the user is queried about the
|
||||||
|
action to be taken.
|
||||||
|
|
||||||
|
The command name "CORRECT" is followed by the filespec of the input
|
||||||
|
file, the optional filespec of the output file, and an optional switch
|
||||||
|
specifying the starting line number. If this line number is given, all
|
||||||
|
earlier lines in the input file are copied to the output file without being
|
||||||
|
checked. If no output is desired, omit the second file specification.
|
||||||
|
|
||||||
|
CORRECT INPUT FILE,OUTPUT FILE
|
||||||
|
CORRECT INPUT FILE,OUTPUT FILE/13
|
||||||
|
CORRECT INPUT FILE (produce no corrected output)
|
||||||
|
|
||||||
|
2.1 Suppressing checking of designated portions of the text
|
||||||
|
|
||||||
|
When SPELL is being operated in a formatter mode, it can suppress
|
||||||
|
checking of regions containing figures drawn with strange fonts or other
|
||||||
|
nonsensical-looking text. It does this by looking for comments looking
|
||||||
|
like (for the R formatter) "^K &&&SPELLON" or "^K &&&SPELLOFF". The
|
||||||
|
comments must be exactly as shown: a control K, space, three ampersands,
|
||||||
|
and the word entirely in capitals. What follows later on the line is
|
||||||
|
immaterial. The SPELLOFF command disables checking until the next SPELLON
|
||||||
|
command. The unchecked text is still copied into the output file. The
|
||||||
|
commands are:
|
||||||
|
|
||||||
|
R "^K &&&SPELLON" and "^K &&&SPELLOFF"
|
||||||
|
TEX "% &&&SPELLON" and "% &&&SPELLOFF"
|
||||||
|
TJ6 ".C &&&SPELLON" and ".C &&&SPELLOFF"
|
||||||
|
PUB ".<< &&&SPELLON" and ".<< &&&SPELLOFF"
|
||||||
|
SCRIBE "@comment(&&&SPELLON)" and "@comment(&&&SPELLOFF)"
|
||||||
|
|
||||||
|
Remember that SPELL does not expand formatter macros. While it would be
|
||||||
|
convenient to put SPELLOFF and SPELLON commands in macros, it would not
|
||||||
|
have the desired effect. SPELL responds to the commands only when they
|
||||||
|
literally appear in the text.
|
||||||
|
|
||||||
|
2.2 Querying during a correction
|
||||||
|
|
||||||
|
When SPELL finds an unknown word during a correction, it displays:
|
||||||
|
(1) The unknown word.
|
||||||
|
(2) If the "LIST" option is on (which it normally is), a
|
||||||
|
list of words that are known to SPELL and are close to
|
||||||
|
the unknown word. One of these might be the word that
|
||||||
|
was intended. These words are displayed with index
|
||||||
|
numbers beginning with zero.
|
||||||
|
(3) If the "DISPLAY" option is on (which it normally is),
|
||||||
|
the line number and a few lines of the file showing the
|
||||||
|
context in which the word appeared.
|
||||||
|
|
||||||
|
The user then types one of the commands listed below. These commands are
|
||||||
|
acted on IMMEDIATELY. No <return> is typed after them. The intention is
|
||||||
|
to allow rapid interaction with the program. However, care is required to
|
||||||
|
avoid typing the wrong thing.
|
||||||
|
|
||||||
|
A or <space> Accept the word, but do not put it in the dictionary. It
|
||||||
|
will still be unknown, so SPELL will query again if it
|
||||||
|
encounters this word again.
|
||||||
|
|
||||||
|
I Accept the word and put it into dictionary number 1. The
|
||||||
|
word will henceforth be known, so SPELL will not query about
|
||||||
|
it again. If the dictionary is written out at the end of
|
||||||
|
the run, it can be read back in on future runs and SPELL
|
||||||
|
will know all such words.
|
||||||
|
|
||||||
|
D1 to D9 Accept the word and put it into dictionary number N.
|
||||||
|
|
||||||
|
0 to 9 Substitute the numbered choice for the unknown word. (This
|
||||||
|
is only useful if corrected output is being written.) The
|
||||||
|
corrected word will be given the same capitalization as the
|
||||||
|
unknown word, if possible. If not possible, SPELL will say
|
||||||
|
so.
|
||||||
|
|
||||||
|
R The user types in a new word to replace the unknown one.
|
||||||
|
(This is only useful if corrected output is being written.)
|
||||||
|
As above, the capitalization of the original word will be
|
||||||
|
used. The capitalization of the typed-in word is ignored.
|
||||||
|
(Hence SPELL can be operated effectively from
|
||||||
|
upper-case-only terminals.)
|
||||||
|
|
||||||
|
+/-<letter> Sets or clears the indicated option, just as for the SET or
|
||||||
|
CLEAR command at the top level. The letter typed after the
|
||||||
|
"+" or "-" is a single letter abbreviation for the option
|
||||||
|
name. It is the first letter of the option name, except for
|
||||||
|
TJ6, whose abbreviation is "J". After giving this command,
|
||||||
|
the user must still dispose of the unknown word.
|
||||||
|
|
||||||
|
W The unknown word, and all future words, are accepted. That
|
||||||
|
is, the rest of the input file is immediately copied to the
|
||||||
|
output file without further checking. This is sometimes
|
||||||
|
useful if the system is going down in one minute. It also
|
||||||
|
complements the optional starting line feature of the
|
||||||
|
CORRECT command.
|
||||||
|
|
||||||
|
^L The display is refreshed.
|
||||||
|
|
||||||
|
^G The entire correction operation is aborted, and SPELL goes
|
||||||
|
back to the top level to await the next operation command.
|
||||||
|
|
||||||
|
? A brief summary of the available commands is displayed,
|
||||||
|
showing the currently selected options.
|
||||||
|
|
||||||
|
File: ISPELL Node: Training Previous: Correcting Up: Top Next: Asking
|
||||||
|
|
||||||
|
3. The TRAIN command
|
||||||
|
|
||||||
|
This is a sort of "off-line" version of correcting. Instead of
|
||||||
|
querying the user, SPELL lists all unknown words in an "exceptions" file,
|
||||||
|
and puts them into dictionary number 1 also. It is often possible to
|
||||||
|
determine quickly whether a file contains errors by training and then
|
||||||
|
examining the exceptions file.
|
||||||
|
|
||||||
|
The word "TRAIN" is followed by the filespecs of the input file and
|
||||||
|
the exceptions file.
|
||||||
|
|
||||||
|
SPELL can suppress its training of designated portions of the text in
|
||||||
|
exactly the same way as during corrections, by using the "&&&SPELLON" and
|
||||||
|
"&&&SPELLOFF" comments.
|
||||||
|
|
||||||
|
File: ISPELL Node: Asking Previous: Training Up: Top Next: Load/Dump
|
||||||
|
|
||||||
|
4. The ASK command
|
||||||
|
|
||||||
|
This is used for manual interrogation of the dictionary about a single
|
||||||
|
word. The command is "ASK" followed by the word. SPELL will indicate
|
||||||
|
whether it knows the word. If the word is known, SPELL will also indicate
|
||||||
|
whether suffix removal was used, and whether the word has suffix flags.
|
||||||
|
(Users generally do not need to be concerned with this information.) If
|
||||||
|
the word is not known, SPELL will list similar words that it knows, if
|
||||||
|
any.
|
||||||
|
|
||||||
|
5. The JUMBLE command
|
||||||
|
|
||||||
|
This command is provided to solve "scrambled word" problems of the
|
||||||
|
sort that appear in some newspapers. The command "JUMBLE" is followed by
|
||||||
|
the string (not more than 8 letters). All permutations of the letters in
|
||||||
|
the string that are known to SPELL are displayed. Because the program is
|
||||||
|
not clever about redundant permutations, it will display words more than
|
||||||
|
once if they contain duplicated letters.
|
||||||
|
|
||||||
|
File: ISPELL Node: Load/Dump Previous: Asking Up: Top Next: Options
|
||||||
|
|
||||||
|
6. The LOAD and DUMP commands
|
||||||
|
|
||||||
|
The LOAD command reads a file and places the words in it into one of 9
|
||||||
|
dictionaries that SPELL has in addition to its main dictionary. The word
|
||||||
|
LOAD is followed by the dictionary filespec and an optional dictionary
|
||||||
|
number (1 through 9). If the number is not given, dictionary number 1 will
|
||||||
|
be loaded. (One incremental dictionary is usually sufficient.) The "DUMP"
|
||||||
|
operation has the same syntax and writes the indicated dictionary onto a
|
||||||
|
file.
|
||||||
|
|
||||||
|
A word may be in only one dictionary. If it is in any dictionary it
|
||||||
|
is "known" for purposes of correcting, training, etc. When loading a
|
||||||
|
dictionary, SPELL ignores any word that it already knows, even if it is in
|
||||||
|
a different dictionary than the one specified.
|
||||||
|
|
||||||
|
Incremental dictionaries are useful for maintaining private lists of
|
||||||
|
words that appear in one's files and are known to be correct but are not in
|
||||||
|
SPELL's main dictionary. When correcting a file, the user should type "I"
|
||||||
|
when such a word is encountered, and then dump the incremental dictionary
|
||||||
|
at the end of the run. This dictionary may than be read back in at the
|
||||||
|
start of the next run.
|
||||||
|
|
||||||
|
Dictionaries other than number 1 are occasionally useful for
|
||||||
|
maintaining different lists of jargon words or words specific to particular
|
||||||
|
subjects.
|
||||||
|
|
||||||
|
File: ISPELL Node: Options Previous: Load/Dump Up: Top Next: Syntax
|
||||||
|
|
||||||
|
7. The SET and CLEAR commands
|
||||||
|
|
||||||
|
SPELL has 7 "option" flags to control its operation:
|
||||||
|
|
||||||
|
"R" option Reading "R" text: ignore commands, font indicators,
|
||||||
|
comments, string and number register names, and inline
|
||||||
|
macro names. It correctly handles "hexadecimal" fonts as
|
||||||
|
in ^FAword^F*. Warning: Since command lines are ignored,
|
||||||
|
titles appearing in chapter, section, and figure macros
|
||||||
|
are not checked and should be inspected manually. The
|
||||||
|
same is true of string registers that are to expand to
|
||||||
|
actual text. Also, SPELL is not clever about things that
|
||||||
|
are quoted or translated, or other obscure features of
|
||||||
|
text formatters. For example, it thinks "^Q^K" starts a
|
||||||
|
comment, because it doesn't know about the action of ^Q.
|
||||||
|
This is true, to varying degrees, for all of the
|
||||||
|
formatter options.
|
||||||
|
|
||||||
|
"TJ6" option Reading TJ6 text: ignore spelling of command lines and
|
||||||
|
font indicators. See the warning under the "R" option.
|
||||||
|
|
||||||
|
"SCRIBE" option Reading SCRIBE text: ignore all words preceded by "@",
|
||||||
|
and the arguments to those commands whose arguments are
|
||||||
|
not expected to be text. Only comments of the form
|
||||||
|
@comment(...) are recognized and ignored. Comments of
|
||||||
|
the form @begin(comment)...@end(comment) are not. The
|
||||||
|
latter types of comments may be protected by additional
|
||||||
|
"&&&SPELLOFF" and "&&&SPELLON" comments, if desired. See
|
||||||
|
the warning under the "R" option.
|
||||||
|
|
||||||
|
"PUB" option Reading PUB text: like TJ6.
|
||||||
|
|
||||||
|
"TEX" option Reading TEX text: ignore commands, font indicators, and
|
||||||
|
comments. See the warning under the "R" option.
|
||||||
|
|
||||||
|
"LIST" option List suggested alternate spellings when it encounters an
|
||||||
|
unknown word. If you don't care about the alternate
|
||||||
|
spellings, you can make it run faster by turning this
|
||||||
|
off.
|
||||||
|
|
||||||
|
"DISPLAY" option Display the context and line number of the unknown word.
|
||||||
|
It saves wear and tear on printing terminals to turn this
|
||||||
|
off if you don't really need it.
|
||||||
|
|
||||||
|
Options are turned on by the command SET (option name), off by CLEAR
|
||||||
|
(option name). At present, only the first letter of the word following SET
|
||||||
|
or CLEAR is looked at, and "J" must be used to denote TJ6, so that "T" can
|
||||||
|
denote TEX. The HELP command displays the currently enabled options. Only
|
||||||
|
one of the formatter options ("TJ6", "R", "PUB", "SCRIBE", and "TEX") may
|
||||||
|
be on at one time. Turning any of them on clears the others.
|
||||||
|
|
||||||
|
The initial options are "R", "LIST", and "DISPLAY". In addition to
|
||||||
|
the top level commands SET and CLEAR, options may be set or cleared when
|
||||||
|
SPELL is querying during a correction. To do this, type a plus sign to
|
||||||
|
set, or a minus sign to clear, followed by a SINGLE LETTER abbreviation for
|
||||||
|
the option (or "J" for TJ6). For example, if you had DISPLAY off and you
|
||||||
|
decide you want to see the context of a particular word after all, you can
|
||||||
|
turn it on by typing "+D".
|
||||||
|
|
||||||
|
8. The QUIT and KILL commands
|
||||||
|
|
||||||
|
"KILL" exits from SPELL and kills the job. "QUIT" exits but allows it
|
||||||
|
to be restarted.
|
||||||
|
|
||||||
|
File: ISPELL Node: Syntax Previous: Options Up: Top Next: Esoterica
|
||||||
|
|
||||||
|
9. ENTERING AND EDITING COMMANDS
|
||||||
|
|
||||||
|
A command consists of a command name ("CORRECT" etc.) followed by
|
||||||
|
whatever arguments (e.g. filespecs) are appropriate, terminated by a
|
||||||
|
carriage return. The command name may be abbreviated to any unambiguous
|
||||||
|
prefix. In the command line, rubout deletes the last character. ^U
|
||||||
|
deletes the entire line, allowing one to start over. ^R redisplays the
|
||||||
|
material typed so far, in case the screen got messed up. A question mark
|
||||||
|
gives a help message. ^Q quotes the following character in a filename.
|
||||||
|
|
||||||
|
Filespecs and other arguments are separated from each other by commas,
|
||||||
|
for example
|
||||||
|
CORR FOO BAR,FOO OUT
|
||||||
|
LOAD MYDICT,3
|
||||||
|
To specify the starting line number for a correction operation, follow the
|
||||||
|
last filespec by a slash and the number, e.g.
|
||||||
|
CORR FOO BAR,FOO OUT/13
|
||||||
|
In the JCL line, an altmode is equivalent to a carriage return in
|
||||||
|
terminating the command, making it possible to put multiple commands into a
|
||||||
|
JCL string meaningfully.
|
||||||
|
|
||||||
|
In filenames, the Sname initially defaults to the user's working
|
||||||
|
directory, and is "sticky", i. e. will subsequently default to the last
|
||||||
|
specification used. The device and second filename always default to
|
||||||
|
"DSK:" and ">", and are not sticky. The first filename must always be
|
||||||
|
typed: it has no default value.
|
||||||
|
|
||||||
|
10. STARTING SPELL WITH A COMMAND LINE
|
||||||
|
|
||||||
|
If a command line ("JCL") is supplied when SPELL is started, it will
|
||||||
|
use that line, for as long as it lasts, instead of taking commands from the
|
||||||
|
terminal. Any command error cancels the rest of the command line and
|
||||||
|
reverts to interactive operation.
|
||||||
|
|
||||||
|
Since <return> can't be put into the command line, SPELL allows
|
||||||
|
altmode to separate commands. This is allowed only when the command is
|
||||||
|
being read from JCL.
|
||||||
|
|
||||||
|
example:
|
||||||
|
:SPELL LOAD MYDICT$CORRECT FOO,BAR
|
||||||
|
loads dictionary MYDICT > and corrects FOO >, writing
|
||||||
|
corrected output to BAR >.
|
||||||
|
|
||||||
|
:SPELL LOAD MYDICT$TRAIN FOO,FOO ERRS$KILL
|
||||||
|
Loads MYDICT > and trains FOO >, putting exceptions
|
||||||
|
into FOO ERRS, and then kills itself.
|
||||||
|
|
||||||
|
File: ISPELL Node: Esoterica Previous: Syntax Up: Top
|
||||||
|
|
||||||
|
11. ESOTERICA
|
||||||
|
|
||||||
|
The following information should not be necessary for normal operation
|
||||||
|
of SPELL. It is included for completeness.
|
||||||
|
|
||||||
|
11.1 File operations
|
||||||
|
|
||||||
|
SPELL reads and writes files in "block image" mode, using blocks of
|
||||||
|
128 words. On writing, the last word is padded with ^C. On reading, any
|
||||||
|
padding of ^@, ^A, ^B, or ^C at the end of the last word is removed.
|
||||||
|
|
||||||
|
11.2 Word identification algorithm
|
||||||
|
|
||||||
|
A word is any uninterrupted sequence of letters and apostrophes, which
|
||||||
|
does not begin or end with an apostrophe. Any punctuation, digit, or
|
||||||
|
control character separates words. Words are not checked if they are being
|
||||||
|
ignored under control of the text formatter mode. Any word consisting of a
|
||||||
|
single letter, or any word more than 40 letters long, is considered to be
|
||||||
|
correctly spelled.
|
||||||
|
|
||||||
|
11.3 Closeness algorithm
|
||||||
|
|
||||||
|
Two words are considered to be "close" if they differ in only one of
|
||||||
|
the following ways:
|
||||||
|
Two adjacent letters are interchanged.
|
||||||
|
One letter is different.
|
||||||
|
One letter is missing in one and present in the other.
|
||||||
|
|
||||||
|
This criterion is used in suggesting close words when an unknown word
|
||||||
|
is found while correcting a file or when an unknown word is asked for with
|
||||||
|
the "A" command. Hence the word SEQUENCE will be suggested if the program
|
||||||
|
encounters SEUQENCE, SERQUENCE, SEQUNCE, or SEQUENCW.
|
||||||
|
|
||||||
|
11.4 Dictionary policy
|
||||||
|
|
||||||
|
It is the policy of this program to contain only one spelling of a
|
||||||
|
word, even if ordinary dictionaries show two or more "acceptable"
|
||||||
|
spellings. Hence, the dictionary contains LABELED and LABELING, but not
|
||||||
|
LABELLED or LABELLING, even though all four are actually acceptable. The
|
||||||
|
intention is to enforce uniformity within each document. The author
|
||||||
|
apologizes for the restriction on creativity and diversity that this
|
||||||
|
necessitates, but believes that it is the best policy for this program.
|
||||||
|
|
||||||
|
The dictionary contains many technical and computer terms such as
|
||||||
|
MICROPROGRAM and DEBUGGER, but does not contain extreme jargon words such
|
||||||
|
as CONTROLIFY or VALRET. The dictionary contains no proper names other
|
||||||
|
than names of countries and states of the United States. The reason is
|
||||||
|
that it would be virtually impossible to contain all of the proper names
|
||||||
|
that commonly arise in normal use. Users should keep proper names (and
|
||||||
|
other correctly spelled words) that arise in their own work in private
|
||||||
|
dictionaries to avoid having to repeatedly tell SPELL to accept them.
|
||||||
|
|
||||||
|
The dictionary is significantly smaller than that found in other
|
||||||
|
spelling checkers, such as the DEC TOPS-20 program. The author believes
|
||||||
|
that the larger dictionary would not reduce the number of false misspelling
|
||||||
|
indications by very much. Users who find words that SPELL does not know,
|
||||||
|
but should, are urged to mail pointers to lists of such words (or the
|
||||||
|
documents in which they appear) to BUG-SPELL@MIT-ML. They will be
|
||||||
|
considered for addition to the dictionary. Users who wish to argue that an
|
||||||
|
extremely large dictionary would be better should mail pointers to specific
|
||||||
|
documents demonstrating this. The argument might be valid, but evidence is
|
||||||
|
needed.
|
||||||
|
|
||||||
|
11.5 The "BASK" command
|
||||||
|
|
||||||
|
The "BASK" command is a variant of the "ASK" command intended for
|
||||||
|
invocations of SPELL by other programs. The command line (presumably from
|
||||||
|
a JCL line) is "BASK" followed by the word to be looked up, and then the
|
||||||
|
name of the file to which the information is to be written, for example:
|
||||||
|
|
||||||
|
:SPELL BASK WHEREVER,TEST OUT$KILL
|
||||||
|
looks up the word "wherever", and writes a report in the file "TEST OUT".
|
||||||
|
The report is similar to the information displayed after the "A" command,
|
||||||
|
except that formatting characters are absent and the result of the search
|
||||||
|
is indicated by the first character of the file.
|
||||||
|
|
||||||
|
If the word was found directly, the file consists of a "*" followed by
|
||||||
|
any dictionary flags that the word may have had.
|
||||||
|
If the word was found through suffix removal, the file consists of a
|
||||||
|
"+", the root word, and the dictionary flags of the root word.
|
||||||
|
If the word was not found and there are no words close to it, the file
|
||||||
|
consists of a "#".
|
||||||
|
If the word was not found but there are close words, the file consists
|
||||||
|
of a "&" immediately followed by a list of close words, separated
|
||||||
|
from each other by newlines.
|
||||||
|
The file is always terminated by a <newline>.
|
||||||
|
|
||||||
|
11.6 Dictionary flags
|
||||||
|
|
||||||
|
Words in SPELL's main dictionary (but not the other dictionaries) may
|
||||||
|
have flags associated with them to indicate the legality of suffixes
|
||||||
|
without the need to keep the full suffixed words in the dictionary. The
|
||||||
|
flags have "names" consisting of single letters. Their meaning is as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
Let # and @ be "variables" that can stand for any letter. Upper case
|
||||||
|
letters are constants. "..." stands for any string of zero or more
|
||||||
|
letters, but note that no word may exist in the dictionary which is not at
|
||||||
|
least 2 letters long, so, for example, FLY may not be produced by placing
|
||||||
|
the "Y" flag on "F". Also, no flag is effective unless the word that it
|
||||||
|
creates is at least 4 letters long, so, for example, WED may not be
|
||||||
|
produced by placing the "D" flag on "WE".
|
||||||
|
|
||||||
|
"V" flag:
|
||||||
|
...E --> ...IVE as in CREATE --> CREATIVE
|
||||||
|
if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
|
||||||
|
|
||||||
|
"N" flag:
|
||||||
|
...E --> ...ION as in CREATE --> CREATION
|
||||||
|
...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
|
||||||
|
if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
|
||||||
|
|
||||||
|
"X" flag:
|
||||||
|
...E --> ...IONS as in CREATE --> CREATIONS
|
||||||
|
...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
|
||||||
|
if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
|
||||||
|
|
||||||
|
"H" flag:
|
||||||
|
...Y --> ...IETH as in TWENTY --> TWENTIETH
|
||||||
|
if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
|
||||||
|
|
||||||
|
"Y" FLAG:
|
||||||
|
... --> ...LY as in QUICK --> QUICKLY
|
||||||
|
|
||||||
|
"G" FLAG:
|
||||||
|
...E --> ...ING as in FILE --> FILING
|
||||||
|
if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
|
||||||
|
|
||||||
|
"J" FLAG"
|
||||||
|
...E --> ...INGS as in FILE --> FILINGS
|
||||||
|
if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
|
||||||
|
|
||||||
|
"D" FLAG:
|
||||||
|
...E --> ...ED as in CREATE --> CREATED
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IED as in IMPLY --> IMPLIED
|
||||||
|
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#ED as in CROSS --> CROSSED
|
||||||
|
or CONVEY --> CONVEYED
|
||||||
|
|
||||||
|
"T" FLAG:
|
||||||
|
...E --> ...EST as in LATE --> LATEST
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IEST as in DIRTY --> DIRTIEST
|
||||||
|
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#EST as in SMALL --> SMALLEST
|
||||||
|
or GRAY --> GRAYEST
|
||||||
|
|
||||||
|
"R" FLAG:
|
||||||
|
...E --> ...ER as in SKATE --> SKATER
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
|
||||||
|
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#ER as in BUILD --> BUILDER
|
||||||
|
or CONVEY --> CONVEYER
|
||||||
|
|
||||||
|
"Z FLAG:
|
||||||
|
...E --> ...ERS as in SKATE --> SKATERS
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
|
||||||
|
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#ERS as in BUILD --> BUILDERS
|
||||||
|
or SLAY --> SLAYERS
|
||||||
|
|
||||||
|
"S" FLAG:
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IES as in IMPLY --> IMPLIES
|
||||||
|
if # .eq. S, X, Z, or H,
|
||||||
|
...# --> ...#ES as in FIX --> FIXES
|
||||||
|
if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...# --> ...#S as in BAT --> BATS
|
||||||
|
or CONVEY --> CONVEYS
|
||||||
|
|
||||||
|
"P" FLAG:
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
|
||||||
|
if # .ne. Y, or @ = A, E, I, O, or U,
|
||||||
|
...@# --> ...@#NESS as in LATE --> LATENESS
|
||||||
|
or GRAY --> GRAYNESS
|
||||||
|
|
||||||
|
"M" FLAG:
|
||||||
|
... --> ...'S as in DOG --> DOG'S
|
||||||
|
|
||||||
|
Note: The existence of a flag on a root word in the directory is not by
|
||||||
|
itself sufficient to cause SPELL to recognize the indicated word ending.
|
||||||
|
If there is more than one root for which a flag will indicate a given word,
|
||||||
|
only one of the roots is the correct one for which the flag is effective;
|
||||||
|
generally it is the longest root. For example, the "D" rule implies that
|
||||||
|
either PASS or PASSE, with a "D" flag, will yield PASSED. The flag must be
|
||||||
|
on PASSE; it will be ineffective on PASS. This is because, when SPELL
|
||||||
|
encounters the word PASSED and fails to find it in its dictionary, it
|
||||||
|
strips off the "D" and looks up PASSE. Upon finding PASSE, it then accepts
|
||||||
|
PASSED if and only if PASSE has the "D" flag. Only if the word PASSE is
|
||||||
|
not in the main dictionary at all does the program strip off the "E" and
|
||||||
|
search for PASS. Furthermore, some combinations of flags are forbidden to
|
||||||
|
allow for dense flag encoding to save space. For example, only one of the
|
||||||
|
"P", "J", or "V" flags may be on in any one word.
|
||||||
|
|
||||||
|
Therefore, be careful when installing dictionary flags. When in doubt
|
||||||
|
about how to install a flag, don't; let the program do it for you. When a
|
||||||
|
word is read into the main dictionary, whenever possible SPELL sets the
|
||||||
|
appropriate flag in an existing word instead of entering the word itself.
|
||||||
|
So, for example, reading PASSED into the dictionary would result in the "D"
|
||||||
|
flag being set correctly.
|
||||||
|
Warning: The above definitions are part of the internal behavior of SPELL,
|
||||||
|
and are subject to change without notice. Users copying the dictionary
|
||||||
|
should always get the then-current version of this document to get an
|
||||||
|
accurate description of the flags.
|
||||||
|
|
||||||
|
11.7 Preparing new versions of SPELL
|
||||||
|
|
||||||
|
The main dictionary is dictionary 0, and may be accessed by that
|
||||||
|
number. When it is dumped, its flags will be written also. A file
|
||||||
|
containing flags must not be read into a dictionary other than zero, nor
|
||||||
|
should it be read in if dictionaries other than zero contain any words.
|
||||||
|
(Otherwise, it may attempt unsuccessfully to set a flag on a word whose
|
||||||
|
dictionary number is nonzero.) Therefore, flags should be used only on the
|
||||||
|
dictionary that is loaded when a new version of SPELL is created.
|
||||||
|
|
||||||
|
To create a new version of SPELL, assemble it under MIDAS and start
|
||||||
|
it. Give the "LOAD" command and read in the master dictionary, specifying
|
||||||
|
dictionary zero. Then give the "WRITE" command, with the name of the
|
||||||
|
desired image file. On ITS, this should be a file such as "TS SPELL". On
|
||||||
|
Twenex, it should be a file such as "<SUBSYS>ISPELL.EXE". On ITS, it will
|
||||||
|
ask whether to purify it. Say "N", for the purification stuff is still not
|
||||||
|
repaired from an ancient version. On Twenex it will always be written in
|
||||||
|
sharable form.
|
||||||
580
doc/info/spell.two
Normal file
580
doc/info/spell.two
Normal file
@@ -0,0 +1,580 @@
|
|||||||
|
-*-Text-*-
|
||||||
|
This is the file <info>ispell.info, which documents the ISPELL
|
||||||
|
spelling checker/corrector. The canonical "R" source for this is
|
||||||
|
[MIT-XX]SRC:<WBA>SPDOOC.R
|
||||||
|
|
||||||
|
File: ISPELL Node: Top Up: (DIR) Next: Correcting
|
||||||
|
|
||||||
|
1. INTRODUCTION
|
||||||
|
|
||||||
|
This memo documents a spelling check/correction program maintained on
|
||||||
|
the four MIT ITS computers and on the MIT-XX Twenex computer. The program
|
||||||
|
is called "SPELL" on ITS and "ISPELL" (for ITS-style SPELL) on Twenex.
|
||||||
|
(The name ISPELL is used on Twenex to avoid conflict with the standard DEC
|
||||||
|
TOPS-20 program called SPELL. This memo does NOT apply to the DEC TOPS-20
|
||||||
|
program or to programs called NSPELL or OSPELL on ITS.)
|
||||||
|
|
||||||
|
The program's behavior is nearly identical on ITS and Twenex. (The
|
||||||
|
principal difference is that arguments to a command are separated by commas
|
||||||
|
on ITS and spaces on Twenex.) A command consists of a command name and its
|
||||||
|
arguments on one line, for example
|
||||||
|
|
||||||
|
SET SCRIBE
|
||||||
|
LOAD MYDICT,3
|
||||||
|
CORRECT FOO BAR,FOO OUT
|
||||||
|
CORRECT FOO BAR,FOO OUT/13
|
||||||
|
ASK PROCEDE
|
||||||
|
|
||||||
|
SPELL is designed to process input text for text formatters TJ6, R,
|
||||||
|
SCRIBE, PUB, and TEX. It has the necessary knowledge about the command
|
||||||
|
structure of these programs to avoid attempting to check the spelling of
|
||||||
|
formatter commands. It can be told by means of special comments to
|
||||||
|
suppress checking of designated portions of the text.
|
||||||
|
|
||||||
|
SPELL's querying mechanism for correcting misspelled words works best
|
||||||
|
when it is used from a display terminal, but SPELL may be effectively used
|
||||||
|
from a printing terminal or slow display, especially if the "context
|
||||||
|
display" and "alternate spellings display" (see below) are disabled. It
|
||||||
|
may be used effectively from uppercase-only terminals, since it uses the
|
||||||
|
capitalization from the original word in the file when a spelling
|
||||||
|
correction is typed in.
|
||||||
|
|
||||||
|
SPELL has a dictionary of about 38000 words and requires about 43K of
|
||||||
|
memory to run.
|
||||||
|
|
||||||
|
Report bugs and complaints to BUG-SPELL@MIT-ML.
|
||||||
|
|
||||||
|
* Menu:
|
||||||
|
|
||||||
|
* Correcting:: The principal operation
|
||||||
|
* Training:: Making a list of all unknown words from a file
|
||||||
|
* Asking:: Asking about specific words, and other fun and games
|
||||||
|
* Load/Dump:: Loading and dumping dictionaries
|
||||||
|
* Options:: Switches that control formatter mode etc.
|
||||||
|
* Syntax:: Syntax of commands and file names, and operation from "JCL"
|
||||||
|
* Esoterica::
|
||||||
|
|
||||||
|
File: ISPELL Node: Correcting Previous: Top Up: Top Next: Training
|
||||||
|
|
||||||
|
2. The CORRECT command
|
||||||
|
|
||||||
|
The principal operation which SPELL performs is that of "correcting" a
|
||||||
|
file. This consists of reading the file and optionally writing an output
|
||||||
|
file in which misspellings are corrected. The corrections are made either
|
||||||
|
by substituting a word known to SPELL or by retyping the word manually.
|
||||||
|
Whether output is being written or not, every misspelling (that is, every
|
||||||
|
word not known to SPELL) is displayed, and the user is queried about the
|
||||||
|
action to be taken.
|
||||||
|
|
||||||
|
The command name "CORRECT" is followed by the filespec of the input
|
||||||
|
file, the optional filespec of the output file, and an optional switch
|
||||||
|
specifying the starting line number. If this line number is given, all
|
||||||
|
earlier lines in the input file are copied to the output file without being
|
||||||
|
checked. If no output is desired, omit the second file specification.
|
||||||
|
|
||||||
|
CORRECT INPUT FILE,OUTPUT FILE
|
||||||
|
CORRECT INPUT FILE,OUTPUT FILE/13
|
||||||
|
CORRECT INPUT FILE (produce no corrected output)
|
||||||
|
|
||||||
|
2.1 Suppressing checking of designated portions of the text
|
||||||
|
|
||||||
|
When SPELL is being operated in a formatter mode, it can suppress
|
||||||
|
checking of regions containing figures drawn with strange fonts or other
|
||||||
|
nonsensical-looking text. It does this by looking for comments looking
|
||||||
|
like (for the R formatter) "^K &&&SPELLON" or "^K &&&SPELLOFF". The
|
||||||
|
comments must be exactly as shown: a control K, space, three ampersands,
|
||||||
|
and the word entirely in capitals. What follows later on the line is
|
||||||
|
immaterial. The SPELLOFF command disables checking until the next SPELLON
|
||||||
|
command. The unchecked text is still copied into the output file. The
|
||||||
|
commands are:
|
||||||
|
|
||||||
|
R "^K &&&SPELLON" and "^K &&&SPELLOFF"
|
||||||
|
TEX "% &&&SPELLON" and "% &&&SPELLOFF"
|
||||||
|
TJ6 ".C &&&SPELLON" and ".C &&&SPELLOFF"
|
||||||
|
PUB ".<< &&&SPELLON" and ".<< &&&SPELLOFF"
|
||||||
|
SCRIBE "@comment(&&&SPELLON)" and "@comment(&&&SPELLOFF)"
|
||||||
|
|
||||||
|
Remember that SPELL does not expand formatter macros. While it would be
|
||||||
|
convenient to put SPELLOFF and SPELLON commands in macros, it would not
|
||||||
|
have the desired effect. SPELL responds to the commands only when they
|
||||||
|
literally appear in the text.
|
||||||
|
|
||||||
|
2.2 Querying during a correction
|
||||||
|
|
||||||
|
When SPELL finds an unknown word during a correction, it displays:
|
||||||
|
(1) The unknown word.
|
||||||
|
(2) If the "LIST" option is on (which it normally is), a
|
||||||
|
list of words that are known to SPELL and are close to
|
||||||
|
the unknown word. One of these might be the word that
|
||||||
|
was intended. These words are displayed with index
|
||||||
|
numbers beginning with zero.
|
||||||
|
(3) If the "DISPLAY" option is on (which it normally is),
|
||||||
|
the line number and a few lines of the file showing the
|
||||||
|
context in which the word appeared.
|
||||||
|
|
||||||
|
The user then types one of the commands listed below. These commands are
|
||||||
|
acted on IMMEDIATELY. No <return> is typed after them. The intention is
|
||||||
|
to allow rapid interaction with the program. However, care is required to
|
||||||
|
avoid typing the wrong thing.
|
||||||
|
|
||||||
|
A or <space> Accept the word, but do not put it in the dictionary. It
|
||||||
|
will still be unknown, so SPELL will query again if it
|
||||||
|
encounters this word again.
|
||||||
|
|
||||||
|
I Accept the word and put it into dictionary number 1. The
|
||||||
|
word will henceforth be known, so SPELL will not query about
|
||||||
|
it again. If the dictionary is written out at the end of
|
||||||
|
the run, it can be read back in on future runs and SPELL
|
||||||
|
will know all such words.
|
||||||
|
|
||||||
|
D1 to D9 Accept the word and put it into dictionary number N.
|
||||||
|
|
||||||
|
0 to 9 Substitute the numbered choice for the unknown word. (This
|
||||||
|
is only useful if corrected output is being written.) The
|
||||||
|
corrected word will be given the same capitalization as the
|
||||||
|
unknown word, if possible. If not possible, SPELL will say
|
||||||
|
so.
|
||||||
|
|
||||||
|
R The user types in a new word to replace the unknown one.
|
||||||
|
(This is only useful if corrected output is being written.)
|
||||||
|
As above, the capitalization of the original word will be
|
||||||
|
used. The capitalization of the typed-in word is ignored.
|
||||||
|
(Hence SPELL can be operated effectively from
|
||||||
|
upper-case-only terminals.) SPELL then checks the spelling
|
||||||
|
of the new word. If the new word is unknown, SPELL queries
|
||||||
|
again, and the user can reply with any of the commands in
|
||||||
|
this list, including another "R". Therefore, if you are not
|
||||||
|
sure the word you are typing in is correctly spelled, type
|
||||||
|
it anyway. If it is wrong, SPELL will give you another
|
||||||
|
chance to fix it, perhaps by displaying alternative
|
||||||
|
spellings. If the new word is correctly spelled but SPELL
|
||||||
|
does not know it, typing "I" when queried the second time
|
||||||
|
will put it in the dictionary.
|
||||||
|
|
||||||
|
+/-<letter> Sets or clears the indicated option, just as for the SET or
|
||||||
|
CLEAR command at the top level. The letter typed after the
|
||||||
|
"+" or "-" is a single letter abbreviation for the option
|
||||||
|
name. It is the first letter of the option name, except for
|
||||||
|
TJ6, whose abbreviation is "J". After giving this command,
|
||||||
|
the user must still dispose of the unknown word.
|
||||||
|
|
||||||
|
W The unknown word, and all future words, are accepted. That
|
||||||
|
is, the rest of the input file is immediately copied to the
|
||||||
|
output file without further checking. This is sometimes
|
||||||
|
useful if the system is going down in one minute. It also
|
||||||
|
complements the optional starting line feature of the
|
||||||
|
CORRECT command.
|
||||||
|
|
||||||
|
^L The display is refreshed.
|
||||||
|
^G The entire correction operation is aborted, and SPELL goes
|
||||||
|
back to the top level to await the next operation command.
|
||||||
|
|
||||||
|
? A brief summary of the available commands is displayed,
|
||||||
|
showing the currently selected options.
|
||||||
|
|
||||||
|
File: ISPELL Node: Training Previous: Correcting Up: Top Next: Asking
|
||||||
|
|
||||||
|
3. The TRAIN command
|
||||||
|
|
||||||
|
This is a sort of "off-line" version of correcting. Instead of
|
||||||
|
querying the user, SPELL lists all unknown words in an "exceptions" file,
|
||||||
|
and puts them into dictionary number 1 also. It is often possible to
|
||||||
|
determine quickly whether a file contains errors by training and then
|
||||||
|
examining the exceptions file.
|
||||||
|
|
||||||
|
The word "TRAIN" is followed by the filespecs of the input file and
|
||||||
|
the exceptions file.
|
||||||
|
|
||||||
|
SPELL can suppress its training of designated portions of the text in
|
||||||
|
exactly the same way as during corrections, by using the "&&&SPELLON" and
|
||||||
|
"&&&SPELLOFF" comments.
|
||||||
|
|
||||||
|
File: ISPELL Node: Asking Previous: Training Up: Top Next: Load/Dump
|
||||||
|
|
||||||
|
4. The ASK command
|
||||||
|
|
||||||
|
This is used for manual interrogation of the dictionary about a single
|
||||||
|
word. The command is "ASK" followed by the word. SPELL will indicate
|
||||||
|
whether it knows the word. If the word is known, SPELL will also indicate
|
||||||
|
whether suffix removal was used, and whether the word has suffix flags.
|
||||||
|
(Users generally do not need to be concerned with this information.) If
|
||||||
|
the word is not known, SPELL will list similar words that it knows, if
|
||||||
|
any.
|
||||||
|
|
||||||
|
5. The JUMBLE command
|
||||||
|
|
||||||
|
This command is provided to solve "scrambled word" problems of the
|
||||||
|
sort that appear in some newspapers. The command "JUMBLE" is followed by
|
||||||
|
the string (not more than 8 letters). All permutations of the letters in
|
||||||
|
the string that are known to SPELL are displayed. Because the program is
|
||||||
|
not clever about redundant permutations, it will display words more than
|
||||||
|
once if they contain duplicated letters.
|
||||||
|
|
||||||
|
File: ISPELL Node: Load/Dump Previous: Asking Up: Top Next: Options
|
||||||
|
|
||||||
|
6. The LOAD and DUMP commands
|
||||||
|
|
||||||
|
The LOAD command reads a file and places the words in it into one of 9
|
||||||
|
dictionaries that SPELL has in addition to its main dictionary. The word
|
||||||
|
LOAD is followed by the dictionary filespec and an optional dictionary
|
||||||
|
number (1 through 9). If the number is not given, dictionary number 1 will
|
||||||
|
be loaded. (One incremental dictionary is usually sufficient.) The "DUMP"
|
||||||
|
operation has the same syntax and writes the indicated dictionary onto a
|
||||||
|
file.
|
||||||
|
A word may be in only one dictionary. If it is in any dictionary it
|
||||||
|
is "known" for purposes of correcting, training, etc. When loading a
|
||||||
|
dictionary, SPELL ignores any word that it already knows, even if it is in
|
||||||
|
a different dictionary than the one specified.
|
||||||
|
|
||||||
|
Incremental dictionaries are useful for maintaining private lists of
|
||||||
|
words that appear in one's files and are known to be correct but are not in
|
||||||
|
SPELL's main dictionary. When correcting a file, the user should type "I"
|
||||||
|
when such a word is encountered, and then dump the incremental dictionary
|
||||||
|
at the end of the run. This dictionary may than be read back in at the
|
||||||
|
start of the next run.
|
||||||
|
|
||||||
|
Dictionaries other than number 1 are occasionally useful for
|
||||||
|
maintaining different lists of jargon words or words specific to particular
|
||||||
|
subjects.
|
||||||
|
|
||||||
|
File: ISPELL Node: Options Previous: Load/Dump Up: Top Next: Syntax
|
||||||
|
|
||||||
|
7. The SET and CLEAR commands
|
||||||
|
|
||||||
|
SPELL has 7 "option" flags to control its operation:
|
||||||
|
|
||||||
|
"R" option Reading "R" text: ignore commands, font indicators,
|
||||||
|
comments, string and number register names, and inline
|
||||||
|
macro names. It correctly handles "hexadecimal" fonts as
|
||||||
|
in ^FAword^F*. Warning: Since command lines are ignored,
|
||||||
|
titles appearing in chapter, section, and figure macros
|
||||||
|
are not checked and should be inspected manually. The
|
||||||
|
same is true of string registers that are to expand to
|
||||||
|
actual text. Also, SPELL is not clever about things that
|
||||||
|
are quoted or translated, or other obscure features of
|
||||||
|
text formatters. For example, it thinks "^Q^K" starts a
|
||||||
|
comment, because it doesn't know about the action of ^Q.
|
||||||
|
This is true, to varying degrees, for all of the
|
||||||
|
formatter options.
|
||||||
|
|
||||||
|
"TJ6" option Reading TJ6 text: ignore spelling of command lines and
|
||||||
|
font indicators. See the warning under the "R" option.
|
||||||
|
|
||||||
|
"SCRIBE" option Reading SCRIBE text: ignore all words preceded by "@",
|
||||||
|
and the arguments to those commands whose arguments are
|
||||||
|
not expected to be text. Only comments of the form
|
||||||
|
@comment(...) are recognized and ignored. Comments of
|
||||||
|
the form @begin(comment)...@end(comment) are not. The
|
||||||
|
latter types of comments may be protected by additional
|
||||||
|
"&&&SPELLOFF" and "&&&SPELLON" comments, if desired. See
|
||||||
|
the warning under the "R" option.
|
||||||
|
|
||||||
|
"PUB" option Reading PUB text: like TJ6.
|
||||||
|
|
||||||
|
"TEX" option Reading TEX text: ignore commands, font indicators, and
|
||||||
|
comments. See the warning under the "R" option.
|
||||||
|
|
||||||
|
"LIST" option List suggested alternate spellings when it encounters an
|
||||||
|
unknown word. If you don't care about the alternate
|
||||||
|
spellings, you can make it run faster by turning this
|
||||||
|
off.
|
||||||
|
|
||||||
|
"DISPLAY" option Display the context and line number of the unknown word.
|
||||||
|
It saves wear and tear on printing terminals to turn this
|
||||||
|
off if you don't really need it.
|
||||||
|
|
||||||
|
Options are turned on by the command SET (option name), off by CLEAR
|
||||||
|
(option name). At present, only the first letter of the word following SET
|
||||||
|
or CLEAR is looked at, and "J" must be used to denote TJ6, so that "T" can
|
||||||
|
denote TEX. The HELP command displays the currently enabled options. Only
|
||||||
|
one of the formatter options ("TJ6", "R", "PUB", "SCRIBE", and "TEX") may
|
||||||
|
be on at one time. Turning any of them on clears the others.
|
||||||
|
|
||||||
|
The initial options are "R", "LIST", and "DISPLAY". In addition to
|
||||||
|
the top level commands SET and CLEAR, options may be set or cleared when
|
||||||
|
SPELL is querying during a correction. To do this, type a plus sign to
|
||||||
|
set, or a minus sign to clear, followed by a SINGLE LETTER abbreviation for
|
||||||
|
the option (or "J" for TJ6). For example, if you had DISPLAY off and you
|
||||||
|
decide you want to see the context of a particular word after all, you can
|
||||||
|
turn it on by typing "+D".
|
||||||
|
|
||||||
|
8. The QUIT and KILL commands
|
||||||
|
|
||||||
|
"KILL" exits from SPELL and kills the job. "QUIT" exits but allows it
|
||||||
|
to be restarted.
|
||||||
|
|
||||||
|
File: ISPELL Node: Syntax Previous: Options Up: Top Next: Esoterica
|
||||||
|
|
||||||
|
9. ENTERING AND EDITING COMMANDS
|
||||||
|
|
||||||
|
A command consists of a command name ("CORRECT" etc.) followed by
|
||||||
|
whatever arguments (e.g. filespecs) are appropriate, terminated by a
|
||||||
|
carriage return. The command name may be abbreviated to any unambiguous
|
||||||
|
prefix. In the command line, rubout deletes the last character. ^U
|
||||||
|
deletes the entire line, allowing one to start over. ^R redisplays the
|
||||||
|
material typed so far, in case the screen got messed up. A question mark
|
||||||
|
gives a help message. ^Q quotes the following character in a filename.
|
||||||
|
|
||||||
|
Filespecs and other arguments are separated from each other by commas,
|
||||||
|
for example
|
||||||
|
CORR FOO BAR,FOO OUT
|
||||||
|
LOAD MYDICT,3
|
||||||
|
To specify the starting line number for a correction operation, follow the
|
||||||
|
last filespec by a slash and the number, e.g.
|
||||||
|
CORR FOO BAR,FOO OUT/13
|
||||||
|
In the JCL line, an altmode is equivalent to a carriage return in
|
||||||
|
terminating the command, making it possible to put multiple commands into a
|
||||||
|
JCL string meaningfully.
|
||||||
|
|
||||||
|
In filenames, the Sname initially defaults to the user's working
|
||||||
|
directory, and is "sticky", i. e. will subsequently default to the last
|
||||||
|
specification used. The device and second filename always default to
|
||||||
|
"DSK:" and ">", and are not sticky. The first filename must always be
|
||||||
|
typed: it has no default value.
|
||||||
|
10. STARTING SPELL WITH A COMMAND LINE
|
||||||
|
|
||||||
|
If a command line ("JCL") is supplied when SPELL is started, it will
|
||||||
|
use that line, for as long as it lasts, instead of taking commands from the
|
||||||
|
terminal. Any command error cancels the rest of the command line and
|
||||||
|
reverts to interactive operation.
|
||||||
|
|
||||||
|
Since <return> can't be put into the command line, SPELL allows
|
||||||
|
altmode to separate commands. This is allowed only when the command is
|
||||||
|
being read from JCL.
|
||||||
|
|
||||||
|
example:
|
||||||
|
:SPELL LOAD MYDICT$CORRECT FOO,BAR
|
||||||
|
loads dictionary MYDICT > and corrects FOO >, writing
|
||||||
|
corrected output to BAR >.
|
||||||
|
|
||||||
|
:SPELL LOAD MYDICT$TRAIN FOO,FOO ERRS$KILL
|
||||||
|
Loads MYDICT > and trains FOO >, putting exceptions
|
||||||
|
into FOO ERRS, and then kills itself.
|
||||||
|
|
||||||
|
File: ISPELL Node: Esoterica Previous: Syntax Up: Top
|
||||||
|
|
||||||
|
11. ESOTERICA
|
||||||
|
|
||||||
|
The following information should not be necessary for normal operation
|
||||||
|
of SPELL. It is included for completeness.
|
||||||
|
|
||||||
|
11.1 File operations
|
||||||
|
|
||||||
|
SPELL reads and writes files in "block image" mode, using blocks of
|
||||||
|
128 words. On writing, the last word is padded with ^C. On reading, any
|
||||||
|
padding of ^@, ^A, ^B, or ^C at the end of the last word is removed.
|
||||||
|
|
||||||
|
11.2 Word identification algorithm
|
||||||
|
|
||||||
|
A word is any uninterrupted sequence of letters and apostrophes, which
|
||||||
|
does not begin or end with an apostrophe. Any punctuation, digit, or
|
||||||
|
control character separates words. Words are not checked if they are being
|
||||||
|
ignored under control of the text formatter mode. Any word consisting of a
|
||||||
|
single letter, or any word more than 40 letters long, is considered to be
|
||||||
|
correctly spelled.
|
||||||
|
|
||||||
|
11.3 Closeness algorithm
|
||||||
|
|
||||||
|
Two words are considered to be "close" if they differ in only one of
|
||||||
|
the following ways:
|
||||||
|
Two adjacent letters are interchanged.
|
||||||
|
One letter is different.
|
||||||
|
One letter is missing in one and present in the other.
|
||||||
|
|
||||||
|
This criterion is used in suggesting close words when an unknown word
|
||||||
|
is found while correcting a file or when an unknown word is asked for with
|
||||||
|
the "A" command. Hence the word SEQUENCE will be suggested if the program
|
||||||
|
encounters SEUQENCE, SERQUENCE, SEQUNCE, or SEQUENCW.
|
||||||
|
11.4 Dictionary policy
|
||||||
|
|
||||||
|
It is the policy of this program to contain only one spelling of a
|
||||||
|
word, even if ordinary dictionaries show two or more "acceptable"
|
||||||
|
spellings. Hence, the dictionary contains LABELED and LABELING, but not
|
||||||
|
LABELLED or LABELLING, even though all four are actually acceptable. The
|
||||||
|
intention is to enforce uniformity within each document. The author
|
||||||
|
apologizes for the restriction on creativity and diversity that this
|
||||||
|
necessitates, but believes that it is the best policy for this program.
|
||||||
|
|
||||||
|
The dictionary contains many technical and computer terms such as
|
||||||
|
MICROPROGRAM and DEBUGGER, but does not contain extreme jargon words such
|
||||||
|
as CONTROLIFY or VALRET. The dictionary contains no proper names other
|
||||||
|
than names of countries and states of the United States. The reason is
|
||||||
|
that it would be virtually impossible to contain all of the proper names
|
||||||
|
that commonly arise in normal use. Users should keep proper names (and
|
||||||
|
other correctly spelled words) that arise in their own work in private
|
||||||
|
dictionaries to avoid having to repeatedly tell SPELL to accept them.
|
||||||
|
|
||||||
|
The dictionary is significantly smaller than that found in other
|
||||||
|
spelling checkers, such as the DEC TOPS-20 program. The author believes
|
||||||
|
that the larger dictionary would not reduce the number of false misspelling
|
||||||
|
indications by very much. Users who find words that SPELL does not know,
|
||||||
|
but should, are urged to mail pointers to lists of such words (or the
|
||||||
|
documents in which they appear) to BUG-SPELL@MIT-ML. They will be
|
||||||
|
considered for addition to the dictionary. Users who wish to argue that an
|
||||||
|
extremely large dictionary would be better should mail pointers to specific
|
||||||
|
documents demonstrating this. The argument might be valid, but evidence is
|
||||||
|
needed.
|
||||||
|
|
||||||
|
11.5 The "BASK" command
|
||||||
|
|
||||||
|
The "BASK" command is a variant of the "ASK" command intended for
|
||||||
|
invocations of SPELL by other programs. The command line (presumably from
|
||||||
|
a JCL line) is "BASK" followed by the word to be looked up, and then the
|
||||||
|
name of the file to which the information is to be written, for example:
|
||||||
|
|
||||||
|
:SPELL BASK WHEREVER,TEST OUT$KILL
|
||||||
|
looks up the word "wherever", and writes a report in the file "TEST OUT".
|
||||||
|
The report is similar to the information displayed after the "A" command,
|
||||||
|
except that formatting characters are absent and the result of the search
|
||||||
|
is indicated by the first character of the file.
|
||||||
|
|
||||||
|
If the word was found directly, the file consists of a "*" followed by
|
||||||
|
any dictionary flags that the word may have had.
|
||||||
|
If the word was found through suffix removal, the file consists of a
|
||||||
|
"+", the root word, and the dictionary flags of the root word.
|
||||||
|
If the word was not found and there are no words close to it, the file
|
||||||
|
consists of a "#".
|
||||||
|
If the word was not found but there are close words, the file consists
|
||||||
|
of a "&" immediately followed by a list of close words, separated
|
||||||
|
from each other by newlines.
|
||||||
|
The file is always terminated by a <newline>.
|
||||||
|
11.6 Dictionary flags
|
||||||
|
|
||||||
|
Words in SPELL's main dictionary (but not the other dictionaries) may
|
||||||
|
have flags associated with them to indicate the legality of suffixes
|
||||||
|
without the need to keep the full suffixed words in the dictionary. The
|
||||||
|
flags have "names" consisting of single letters. Their meaning is as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
Let # and @ be "variables" that can stand for any letter. Upper case
|
||||||
|
letters are constants. "..." stands for any string of zero or more
|
||||||
|
letters, but note that no word may exist in the dictionary which is not at
|
||||||
|
least 2 letters long, so, for example, FLY may not be produced by placing
|
||||||
|
the "Y" flag on "F". Also, no flag is effective unless the word that it
|
||||||
|
creates is at least 4 letters long, so, for example, WED may not be
|
||||||
|
produced by placing the "D" flag on "WE".
|
||||||
|
|
||||||
|
"V" flag:
|
||||||
|
...E --> ...IVE as in CREATE --> CREATIVE
|
||||||
|
if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
|
||||||
|
|
||||||
|
"N" flag:
|
||||||
|
...E --> ...ION as in CREATE --> CREATION
|
||||||
|
...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
|
||||||
|
if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
|
||||||
|
|
||||||
|
"X" flag:
|
||||||
|
...E --> ...IONS as in CREATE --> CREATIONS
|
||||||
|
...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
|
||||||
|
if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
|
||||||
|
|
||||||
|
"H" flag:
|
||||||
|
...Y --> ...IETH as in TWENTY --> TWENTIETH
|
||||||
|
if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
|
||||||
|
|
||||||
|
"Y" FLAG:
|
||||||
|
... --> ...LY as in QUICK --> QUICKLY
|
||||||
|
|
||||||
|
"G" FLAG:
|
||||||
|
...E --> ...ING as in FILE --> FILING
|
||||||
|
if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
|
||||||
|
|
||||||
|
"J" FLAG"
|
||||||
|
...E --> ...INGS as in FILE --> FILINGS
|
||||||
|
if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
|
||||||
|
|
||||||
|
"D" FLAG:
|
||||||
|
...E --> ...ED as in CREATE --> CREATED
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IED as in IMPLY --> IMPLIED
|
||||||
|
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#ED as in CROSS --> CROSSED
|
||||||
|
or CONVEY --> CONVEYED
|
||||||
|
|
||||||
|
"T" FLAG:
|
||||||
|
...E --> ...EST as in LATE --> LATEST
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IEST as in DIRTY --> DIRTIEST
|
||||||
|
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#EST as in SMALL --> SMALLEST
|
||||||
|
or GRAY --> GRAYEST
|
||||||
|
|
||||||
|
"R" FLAG:
|
||||||
|
...E --> ...ER as in SKATE --> SKATER
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
|
||||||
|
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#ER as in BUILD --> BUILDER
|
||||||
|
or CONVEY --> CONVEYER
|
||||||
|
|
||||||
|
"Z FLAG:
|
||||||
|
...E --> ...ERS as in SKATE --> SKATERS
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
|
||||||
|
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#ERS as in BUILD --> BUILDERS
|
||||||
|
or SLAY --> SLAYERS
|
||||||
|
|
||||||
|
"S" FLAG:
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@IES as in IMPLY --> IMPLIES
|
||||||
|
if # .eq. S, X, Z, or H,
|
||||||
|
...# --> ...#ES as in FIX --> FIXES
|
||||||
|
if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||||
|
...@# --> ...@#S as in BAT --> BATS
|
||||||
|
or CONVEY --> CONVEYS
|
||||||
|
|
||||||
|
"P" FLAG:
|
||||||
|
if @ .ne. A, E, I, O, or U,
|
||||||
|
...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
|
||||||
|
if # .ne. Y, or @ = A, E, I, O, or U,
|
||||||
|
...@# --> ...@#NESS as in LATE --> LATENESS
|
||||||
|
or GRAY --> GRAYNESS
|
||||||
|
|
||||||
|
"M" FLAG:
|
||||||
|
... --> ...'S as in DOG --> DOG'S
|
||||||
|
|
||||||
|
Note: The existence of a flag on a root word in the directory is not by
|
||||||
|
itself sufficient to cause SPELL to recognize the indicated word ending.
|
||||||
|
If there is more than one root for which a flag will indicate a given word,
|
||||||
|
only one of the roots is the correct one for which the flag is effective;
|
||||||
|
generally it is the longest root. For example, the "D" rule implies that
|
||||||
|
either PASS or PASSE, with a "D" flag, will yield PASSED. The flag must be
|
||||||
|
on PASSE; it will be ineffective on PASS. This is because, when SPELL
|
||||||
|
encounters the word PASSED and fails to find it in its dictionary, it
|
||||||
|
strips off the "D" and looks up PASSE. Upon finding PASSE, it then accepts
|
||||||
|
PASSED if and only if PASSE has the "D" flag. Only if the word PASSE is
|
||||||
|
not in the main dictionary at all does the program strip off the "E" and
|
||||||
|
search for PASS. Furthermore, some combinations of flags are forbidden to
|
||||||
|
allow for dense flag encoding to save space. For example, only one of the
|
||||||
|
"P", "J", or "V" flags may be on in any one word.
|
||||||
|
Therefore, be careful when installing dictionary flags. When in doubt
|
||||||
|
about how to install a flag, don't; let the program do it for you. When a
|
||||||
|
word is read into the main dictionary, whenever possible SPELL sets the
|
||||||
|
appropriate flag in an existing word instead of entering the word itself.
|
||||||
|
So, for example, reading PASSED into the dictionary would result in the "D"
|
||||||
|
flag being set correctly.
|
||||||
|
Warning: The above definitions are part of the internal behavior of SPELL,
|
||||||
|
and are subject to change without notice. Users copying the dictionary
|
||||||
|
should always get the then-current version of this document to get an
|
||||||
|
accurate description of the flags.
|
||||||
|
|
||||||
|
11.7 Preparing new versions of SPELL
|
||||||
|
|
||||||
|
The main dictionary is dictionary 0, and may be accessed by that
|
||||||
|
number. When it is dumped, its flags will be written also. A file
|
||||||
|
containing flags must not be read into a dictionary other than zero, nor
|
||||||
|
should it be read in if dictionaries other than zero contain any words.
|
||||||
|
(Otherwise, it may attempt unsuccessfully to set a flag on a word whose
|
||||||
|
dictionary number is nonzero.) Therefore, flags should be used only on the
|
||||||
|
dictionary that is loaded when a new version of SPELL is created.
|
||||||
|
|
||||||
|
To create a new version of SPELL, assemble it under MIDAS and start
|
||||||
|
it. Give the "LOAD" command and read in the master dictionary, specifying
|
||||||
|
dictionary zero. Then give the "WRITE" command, with the name of the
|
||||||
|
desired image file. On ITS, this should be a file such as "TS SPELL". On
|
||||||
|
Twenex, it should be a file such as "<SUBSYS>ISPELL.EXE". On ITS, it will
|
||||||
|
ask whether to purify it. Say "N", for the purification stuff is still not
|
||||||
|
repaired from an ancient version. On Twenex it will always be written in
|
||||||
|
sharable form.
|
||||||
|
| ||||||