mirror of
https://github.com/PDP-10/its.git
synced 2026-01-25 19:56:53 +00:00
Added SPELL/ESPELL, spellchecker.
This commit is contained in:
committed by
Lars Brinkhoff
parent
b2341133ef
commit
65a92f9a9f
577
doc/info/spell.1
Normal file
577
doc/info/spell.1
Normal file
@@ -0,0 +1,577 @@
|
||||
-*-Text-*-
|
||||
This is the file <info>ispell.info, which documents the ISPELL
|
||||
spelling checker/corrector.
|
||||
|
||||
File: ISPELL Node: Top Up: (DIR) Next: Correcting
|
||||
|
||||
1. INTRODUCTION
|
||||
|
||||
This memo documents a spelling check/correction program maintained on
|
||||
the four MIT ITS computers and on the MIT-XX Twenex computer. The program
|
||||
is called "SPELL" on ITS and "ISPELL" (for ITS-style SPELL) on Twenex.
|
||||
(The name ISPELL is used on Twenex to avoid conflict with the standard DEC
|
||||
TOPS-20 program called SPELL. This memo does NOT apply to the DEC TOPS-20
|
||||
program.)
|
||||
|
||||
The program's behavior is nearly identical on ITS and Twenex. (The
|
||||
principal difference is that arguments to a command are separated by commas
|
||||
on ITS and spaces on Twenex.) A command consists of a command name and its
|
||||
arguments on one line, for example
|
||||
|
||||
SET SCRIBE
|
||||
LOAD MYDICT,3
|
||||
CORRECT FOO BAR,FOO OUT
|
||||
CORRECT FOO BAR,FOO OUT/13
|
||||
ASK PROCEDE
|
||||
|
||||
SPELL is designed to process input text for text formatters TJ6, R,
|
||||
SCRIBE, PUB, and TEX. It has the necessary knowledge about the command
|
||||
structure of these programs to avoid attempting to check the spelling of
|
||||
formatter commands. It can be told by means of special comments to
|
||||
suppress checking of designated portions of the text.
|
||||
|
||||
SPELL's querying mechanism for correcting misspelled words works best
|
||||
when it is used from a display terminal, but SPELL may be effectively used
|
||||
from a printing terminal or slow display, especially if the "context
|
||||
display" and "alternate spellings display" (see below) are disabled. It
|
||||
may be used effectively from uppercase-only terminals, since it uses the
|
||||
capitalization from the original word in the file when a spelling
|
||||
correction is typed in.
|
||||
|
||||
SPELL has a dictionary of about 38000 words and requires about 43K of
|
||||
memory to run.
|
||||
|
||||
Report bugs and complaints to BUG-SPELL@MIT-ML.
|
||||
|
||||
* Menu:
|
||||
|
||||
* Correcting:: The principal operation
|
||||
* Training:: Making a list of all unknown words from a file
|
||||
* Asking:: Asking about specific words, and other fun and games
|
||||
* Emacs:(EMACS)Fixit
|
||||
Using SPELL from EMACS to check the spelling of one word
|
||||
* Load/Dump:: Loading and dumping dictionaries
|
||||
* Options:: Switches that control formatter mode etc.
|
||||
* Syntax:: Syntax of commands and file names, and operation from "JCL"
|
||||
* Esoterica::
|
||||
|
||||
File: ISPELL Node: Correcting Previous: Top Up: Top Next: Training
|
||||
|
||||
2. The CORRECT command
|
||||
|
||||
The principal operation which SPELL performs is that of "correcting" a
|
||||
file. This consists of reading the file and optionally writing an output
|
||||
file in which misspellings are corrected. The corrections are made either
|
||||
by substituting a word known to SPELL or by retyping the word manually.
|
||||
Whether output is being written or not, every misspelling (that is, every
|
||||
word not known to SPELL) is displayed, and the user is queried about the
|
||||
action to be taken.
|
||||
|
||||
The command name "CORRECT" is followed by the filespec of the input
|
||||
file, the optional filespec of the output file, and an optional switch
|
||||
specifying the starting line number. If this line number is given, all
|
||||
earlier lines in the input file are copied to the output file without being
|
||||
checked. If no output is desired, omit the second file specification.
|
||||
|
||||
CORRECT INPUT FILE,OUTPUT FILE
|
||||
CORRECT INPUT FILE,OUTPUT FILE/13
|
||||
CORRECT INPUT FILE (produce no corrected output)
|
||||
|
||||
2.1 Suppressing checking of designated portions of the text
|
||||
|
||||
When SPELL is being operated in a formatter mode, it can suppress
|
||||
checking of regions containing figures drawn with strange fonts or other
|
||||
nonsensical-looking text. It does this by looking for comments looking
|
||||
like (for the R formatter) "^K &&&SPELLON" or "^K &&&SPELLOFF". The
|
||||
comments must be exactly as shown: a control K, space, three ampersands,
|
||||
and the word entirely in capitals. What follows later on the line is
|
||||
immaterial. The SPELLOFF command disables checking until the next SPELLON
|
||||
command. The unchecked text is still copied into the output file. The
|
||||
commands are:
|
||||
|
||||
R "^K &&&SPELLON" and "^K &&&SPELLOFF"
|
||||
TEX "% &&&SPELLON" and "% &&&SPELLOFF"
|
||||
TJ6 ".C &&&SPELLON" and ".C &&&SPELLOFF"
|
||||
PUB ".<< &&&SPELLON" and ".<< &&&SPELLOFF"
|
||||
SCRIBE "@comment(&&&SPELLON)" and "@comment(&&&SPELLOFF)"
|
||||
|
||||
Remember that SPELL does not expand formatter macros. While it would be
|
||||
convenient to put SPELLOFF and SPELLON commands in macros, it would not
|
||||
have the desired effect. SPELL responds to the commands only when they
|
||||
literally appear in the text.
|
||||
|
||||
2.2 Querying during a correction
|
||||
|
||||
When SPELL finds an unknown word during a correction, it displays:
|
||||
(1) The unknown word.
|
||||
(2) If the "LIST" option is on (which it normally is), a
|
||||
list of words that are known to SPELL and are close to
|
||||
the unknown word. One of these might be the word that
|
||||
was intended. These words are displayed with index
|
||||
numbers beginning with zero.
|
||||
(3) If the "DISPLAY" option is on (which it normally is),
|
||||
the line number and a few lines of the file showing the
|
||||
context in which the word appeared.
|
||||
|
||||
The user then types one of the commands listed below. These commands are
|
||||
acted on IMMEDIATELY. No <return> is typed after them. The intention is
|
||||
to allow rapid interaction with the program. However, care is required to
|
||||
avoid typing the wrong thing.
|
||||
|
||||
A or <space> Accept the word, but do not put it in the dictionary. It
|
||||
will still be unknown, so SPELL will query again if it
|
||||
encounters this word again.
|
||||
|
||||
I Accept the word and put it into dictionary number 1. The
|
||||
word will henceforth be known, so SPELL will not query about
|
||||
it again. If the dictionary is written out at the end of
|
||||
the run, it can be read back in on future runs and SPELL
|
||||
will know all such words.
|
||||
|
||||
D1 to D9 Accept the word and put it into dictionary number N.
|
||||
|
||||
0 to 9 Substitute the numbered choice for the unknown word. (This
|
||||
is only useful if corrected output is being written.) The
|
||||
corrected word will be given the same capitalization as the
|
||||
unknown word, if possible. If not possible, SPELL will say
|
||||
so.
|
||||
|
||||
R The user types in a new word to replace the unknown one.
|
||||
(This is only useful if corrected output is being written.)
|
||||
As above, the capitalization of the original word will be
|
||||
used. The capitalization of the typed-in word is ignored.
|
||||
(Hence SPELL can be operated effectively from
|
||||
upper-case-only terminals.)
|
||||
|
||||
+/-<letter> Sets or clears the indicated option, just as for the SET or
|
||||
CLEAR command at the top level. The letter typed after the
|
||||
"+" or "-" is a single letter abbreviation for the option
|
||||
name. It is the first letter of the option name, except for
|
||||
TJ6, whose abbreviation is "J". After giving this command,
|
||||
the user must still dispose of the unknown word.
|
||||
|
||||
W The unknown word, and all future words, are accepted. That
|
||||
is, the rest of the input file is immediately copied to the
|
||||
output file without further checking. This is sometimes
|
||||
useful if the system is going down in one minute. It also
|
||||
complements the optional starting line feature of the
|
||||
CORRECT command.
|
||||
|
||||
^L The display is refreshed.
|
||||
|
||||
^G The entire correction operation is aborted, and SPELL goes
|
||||
back to the top level to await the next operation command.
|
||||
|
||||
? A brief summary of the available commands is displayed,
|
||||
showing the currently selected options.
|
||||
|
||||
File: ISPELL Node: Training Previous: Correcting Up: Top Next: Asking
|
||||
|
||||
3. The TRAIN command
|
||||
|
||||
This is a sort of "off-line" version of correcting. Instead of
|
||||
querying the user, SPELL lists all unknown words in an "exceptions" file,
|
||||
and puts them into dictionary number 1 also. It is often possible to
|
||||
determine quickly whether a file contains errors by training and then
|
||||
examining the exceptions file.
|
||||
|
||||
The word "TRAIN" is followed by the filespecs of the input file and
|
||||
the exceptions file.
|
||||
|
||||
SPELL can suppress its training of designated portions of the text in
|
||||
exactly the same way as during corrections, by using the "&&&SPELLON" and
|
||||
"&&&SPELLOFF" comments.
|
||||
|
||||
File: ISPELL Node: Asking Previous: Training Up: Top Next: Load/Dump
|
||||
|
||||
4. The ASK command
|
||||
|
||||
This is used for manual interrogation of the dictionary about a single
|
||||
word. The command is "ASK" followed by the word. SPELL will indicate
|
||||
whether it knows the word. If the word is known, SPELL will also indicate
|
||||
whether suffix removal was used, and whether the word has suffix flags.
|
||||
(Users generally do not need to be concerned with this information.) If
|
||||
the word is not known, SPELL will list similar words that it knows, if
|
||||
any.
|
||||
|
||||
5. The JUMBLE command
|
||||
|
||||
This command is provided to solve "scrambled word" problems of the
|
||||
sort that appear in some newspapers. The command "JUMBLE" is followed by
|
||||
the string (not more than 8 letters). All permutations of the letters in
|
||||
the string that are known to SPELL are displayed. Because the program is
|
||||
not clever about redundant permutations, it will display words more than
|
||||
once if they contain duplicated letters.
|
||||
|
||||
File: ISPELL Node: Load/Dump Previous: Asking Up: Top Next: Options
|
||||
|
||||
6. The LOAD and DUMP commands
|
||||
|
||||
The LOAD command reads a file and places the words in it into one of 9
|
||||
dictionaries that SPELL has in addition to its main dictionary. The word
|
||||
LOAD is followed by the dictionary filespec and an optional dictionary
|
||||
number (1 through 9). If the number is not given, dictionary number 1 will
|
||||
be loaded. (One incremental dictionary is usually sufficient.) The "DUMP"
|
||||
operation has the same syntax and writes the indicated dictionary onto a
|
||||
file.
|
||||
|
||||
A word may be in only one dictionary. If it is in any dictionary it
|
||||
is "known" for purposes of correcting, training, etc. When loading a
|
||||
dictionary, SPELL ignores any word that it already knows, even if it is in
|
||||
a different dictionary than the one specified.
|
||||
|
||||
Incremental dictionaries are useful for maintaining private lists of
|
||||
words that appear in one's files and are known to be correct but are not in
|
||||
SPELL's main dictionary. When correcting a file, the user should type "I"
|
||||
when such a word is encountered, and then dump the incremental dictionary
|
||||
at the end of the run. This dictionary may than be read back in at the
|
||||
start of the next run.
|
||||
|
||||
Dictionaries other than number 1 are occasionally useful for
|
||||
maintaining different lists of jargon words or words specific to particular
|
||||
subjects.
|
||||
|
||||
File: ISPELL Node: Options Previous: Load/Dump Up: Top Next: Syntax
|
||||
|
||||
7. The SET and CLEAR commands
|
||||
|
||||
SPELL has 7 "option" flags to control its operation:
|
||||
|
||||
"R" option Reading "R" text: ignore commands, font indicators,
|
||||
comments, string and number register names, and inline
|
||||
macro names. It correctly handles "hexadecimal" fonts as
|
||||
in ^FAword^F*. Warning: Since command lines are ignored,
|
||||
titles appearing in chapter, section, and figure macros
|
||||
are not checked and should be inspected manually. The
|
||||
same is true of string registers that are to expand to
|
||||
actual text. Also, SPELL is not clever about things that
|
||||
are quoted or translated, or other obscure features of
|
||||
text formatters. For example, it thinks "^Q^K" starts a
|
||||
comment, because it doesn't know about the action of ^Q.
|
||||
This is true, to varying degrees, for all of the
|
||||
formatter options.
|
||||
|
||||
"TJ6" option Reading TJ6 text: ignore spelling of command lines and
|
||||
font indicators. See the warning under the "R" option.
|
||||
|
||||
"SCRIBE" option Reading SCRIBE text: ignore all words preceded by "@",
|
||||
and the arguments to those commands whose arguments are
|
||||
not expected to be text. Only comments of the form
|
||||
@comment(...) are recognized and ignored. Comments of
|
||||
the form @begin(comment)...@end(comment) are not. The
|
||||
latter types of comments may be protected by additional
|
||||
"&&&SPELLOFF" and "&&&SPELLON" comments, if desired. See
|
||||
the warning under the "R" option.
|
||||
|
||||
"PUB" option Reading PUB text: like TJ6.
|
||||
|
||||
"TEX" option Reading TEX text: ignore commands, font indicators, and
|
||||
comments. See the warning under the "R" option.
|
||||
|
||||
"LIST" option List suggested alternate spellings when it encounters an
|
||||
unknown word. If you don't care about the alternate
|
||||
spellings, you can make it run faster by turning this
|
||||
off.
|
||||
|
||||
"DISPLAY" option Display the context and line number of the unknown word.
|
||||
It saves wear and tear on printing terminals to turn this
|
||||
off if you don't really need it.
|
||||
|
||||
Options are turned on by the command SET (option name), off by CLEAR
|
||||
(option name). At present, only the first letter of the word following SET
|
||||
or CLEAR is looked at, and "J" must be used to denote TJ6, so that "T" can
|
||||
denote TEX. The HELP command displays the currently enabled options. Only
|
||||
one of the formatter options ("TJ6", "R", "PUB", "SCRIBE", and "TEX") may
|
||||
be on at one time. Turning any of them on clears the others.
|
||||
|
||||
The initial options are "R", "LIST", and "DISPLAY". In addition to
|
||||
the top level commands SET and CLEAR, options may be set or cleared when
|
||||
SPELL is querying during a correction. To do this, type a plus sign to
|
||||
set, or a minus sign to clear, followed by a SINGLE LETTER abbreviation for
|
||||
the option (or "J" for TJ6). For example, if you had DISPLAY off and you
|
||||
decide you want to see the context of a particular word after all, you can
|
||||
turn it on by typing "+D".
|
||||
|
||||
8. The QUIT and KILL commands
|
||||
|
||||
"KILL" exits from SPELL and kills the job. "QUIT" exits but allows it
|
||||
to be restarted.
|
||||
|
||||
File: ISPELL Node: Syntax Previous: Options Up: Top Next: Esoterica
|
||||
|
||||
9. ENTERING AND EDITING COMMANDS
|
||||
|
||||
A command consists of a command name ("CORRECT" etc.) followed by
|
||||
whatever arguments (e.g. filespecs) are appropriate, terminated by a
|
||||
carriage return. The command name may be abbreviated to any unambiguous
|
||||
prefix. In the command line, rubout deletes the last character. ^U
|
||||
deletes the entire line, allowing one to start over. ^R redisplays the
|
||||
material typed so far, in case the screen got messed up. A question mark
|
||||
gives a help message. ^Q quotes the following character in a filename.
|
||||
|
||||
Filespecs and other arguments are separated from each other by commas,
|
||||
for example
|
||||
CORR FOO BAR,FOO OUT
|
||||
LOAD MYDICT,3
|
||||
To specify the starting line number for a correction operation, follow the
|
||||
last filespec by a slash and the number, e.g.
|
||||
CORR FOO BAR,FOO OUT/13
|
||||
In the JCL line, an altmode is equivalent to a carriage return in
|
||||
terminating the command, making it possible to put multiple commands into a
|
||||
JCL string meaningfully.
|
||||
|
||||
In filenames, the Sname initially defaults to the user's working
|
||||
directory, and is "sticky", i. e. will subsequently default to the last
|
||||
specification used. The device and second filename always default to
|
||||
"DSK:" and ">", and are not sticky. The first filename must always be
|
||||
typed: it has no default value.
|
||||
|
||||
10. STARTING SPELL WITH A COMMAND LINE
|
||||
|
||||
If a command line ("JCL") is supplied when SPELL is started, it will
|
||||
use that line, for as long as it lasts, instead of taking commands from the
|
||||
terminal. Any command error cancels the rest of the command line and
|
||||
reverts to interactive operation.
|
||||
|
||||
Since <return> can't be put into the command line, SPELL allows
|
||||
altmode to separate commands. This is allowed only when the command is
|
||||
being read from JCL.
|
||||
|
||||
example:
|
||||
:SPELL LOAD MYDICT$CORRECT FOO,BAR
|
||||
loads dictionary MYDICT > and corrects FOO >, writing
|
||||
corrected output to BAR >.
|
||||
|
||||
:SPELL LOAD MYDICT$TRAIN FOO,FOO ERRS$KILL
|
||||
Loads MYDICT > and trains FOO >, putting exceptions
|
||||
into FOO ERRS, and then kills itself.
|
||||
|
||||
File: ISPELL Node: Esoterica Previous: Syntax Up: Top
|
||||
|
||||
11. ESOTERICA
|
||||
|
||||
The following information should not be necessary for normal operation
|
||||
of SPELL. It is included for completeness.
|
||||
|
||||
11.1 File operations
|
||||
|
||||
SPELL reads and writes files in "block image" mode, using blocks of
|
||||
128 words. On writing, the last word is padded with ^C. On reading, any
|
||||
padding of ^@, ^A, ^B, or ^C at the end of the last word is removed.
|
||||
|
||||
11.2 Word identification algorithm
|
||||
|
||||
A word is any uninterrupted sequence of letters and apostrophes, which
|
||||
does not begin or end with an apostrophe. Any punctuation, digit, or
|
||||
control character separates words. Words are not checked if they are being
|
||||
ignored under control of the text formatter mode. Any word consisting of a
|
||||
single letter, or any word more than 40 letters long, is considered to be
|
||||
correctly spelled.
|
||||
|
||||
11.3 Closeness algorithm
|
||||
|
||||
Two words are considered to be "close" if they differ in only one of
|
||||
the following ways:
|
||||
Two adjacent letters are interchanged.
|
||||
One letter is different.
|
||||
One letter is missing in one and present in the other.
|
||||
|
||||
This criterion is used in suggesting close words when an unknown word
|
||||
is found while correcting a file or when an unknown word is asked for with
|
||||
the "A" command. Hence the word SEQUENCE will be suggested if the program
|
||||
encounters SEUQENCE, SERQUENCE, SEQUNCE, or SEQUENCW.
|
||||
|
||||
11.4 Dictionary policy
|
||||
|
||||
It is the policy of this program to contain only one spelling of a
|
||||
word, even if ordinary dictionaries show two or more "acceptable"
|
||||
spellings. Hence, the dictionary contains LABELED and LABELING, but not
|
||||
LABELLED or LABELLING, even though all four are actually acceptable. The
|
||||
intention is to enforce uniformity within each document. The author
|
||||
apologizes for the restriction on creativity and diversity that this
|
||||
necessitates, but believes that it is the best policy for this program.
|
||||
|
||||
The dictionary contains many technical and computer terms such as
|
||||
MICROPROGRAM and DEBUGGER, but does not contain extreme jargon words such
|
||||
as CONTROLIFY or VALRET. The dictionary contains no proper names other
|
||||
than names of countries and states of the United States. The reason is
|
||||
that it would be virtually impossible to contain all of the proper names
|
||||
that commonly arise in normal use. Users should keep proper names (and
|
||||
other correctly spelled words) that arise in their own work in private
|
||||
dictionaries to avoid having to repeatedly tell SPELL to accept them.
|
||||
|
||||
The dictionary is significantly smaller than that found in other
|
||||
spelling checkers, such as the DEC TOPS-20 program. The author believes
|
||||
that the larger dictionary would not reduce the number of false misspelling
|
||||
indications by very much. Users who find words that SPELL does not know,
|
||||
but should, are urged to mail pointers to lists of such words (or the
|
||||
documents in which they appear) to BUG-SPELL@MIT-ML. They will be
|
||||
considered for addition to the dictionary. Users who wish to argue that an
|
||||
extremely large dictionary would be better should mail pointers to specific
|
||||
documents demonstrating this. The argument might be valid, but evidence is
|
||||
needed.
|
||||
|
||||
11.5 The "BASK" command
|
||||
|
||||
The "BASK" command is a variant of the "ASK" command intended for
|
||||
invocations of SPELL by other programs. The command line (presumably from
|
||||
a JCL line) is "BASK" followed by the word to be looked up, and then the
|
||||
name of the file to which the information is to be written, for example:
|
||||
|
||||
:SPELL BASK WHEREVER,TEST OUT$KILL
|
||||
looks up the word "wherever", and writes a report in the file "TEST OUT".
|
||||
The report is similar to the information displayed after the "A" command,
|
||||
except that formatting characters are absent and the result of the search
|
||||
is indicated by the first character of the file.
|
||||
|
||||
If the word was found directly, the file consists of a "*" followed by
|
||||
any dictionary flags that the word may have had.
|
||||
If the word was found through suffix removal, the file consists of a
|
||||
"+", the root word, and the dictionary flags of the root word.
|
||||
If the word was not found and there are no words close to it, the file
|
||||
consists of a "#".
|
||||
If the word was not found but there are close words, the file consists
|
||||
of a "&" immediately followed by a list of close words, separated
|
||||
from each other by newlines.
|
||||
The file is always terminated by a <newline>.
|
||||
|
||||
11.6 Dictionary flags
|
||||
|
||||
Words in SPELL's main dictionary (but not the other dictionaries) may
|
||||
have flags associated with them to indicate the legality of suffixes
|
||||
without the need to keep the full suffixed words in the dictionary. The
|
||||
flags have "names" consisting of single letters. Their meaning is as
|
||||
follows:
|
||||
|
||||
Let # and @ be "variables" that can stand for any letter. Upper case
|
||||
letters are constants. "..." stands for any string of zero or more
|
||||
letters, but note that no word may exist in the dictionary which is not at
|
||||
least 2 letters long, so, for example, FLY may not be produced by placing
|
||||
the "Y" flag on "F". Also, no flag is effective unless the word that it
|
||||
creates is at least 4 letters long, so, for example, WED may not be
|
||||
produced by placing the "D" flag on "WE".
|
||||
|
||||
"V" flag:
|
||||
...E --> ...IVE as in CREATE --> CREATIVE
|
||||
if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
|
||||
|
||||
"N" flag:
|
||||
...E --> ...ION as in CREATE --> CREATION
|
||||
...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
|
||||
if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
|
||||
|
||||
"X" flag:
|
||||
...E --> ...IONS as in CREATE --> CREATIONS
|
||||
...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
|
||||
if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
|
||||
|
||||
"H" flag:
|
||||
...Y --> ...IETH as in TWENTY --> TWENTIETH
|
||||
if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
|
||||
|
||||
"Y" FLAG:
|
||||
... --> ...LY as in QUICK --> QUICKLY
|
||||
|
||||
"G" FLAG:
|
||||
...E --> ...ING as in FILE --> FILING
|
||||
if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
|
||||
|
||||
"J" FLAG"
|
||||
...E --> ...INGS as in FILE --> FILINGS
|
||||
if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
|
||||
|
||||
"D" FLAG:
|
||||
...E --> ...ED as in CREATE --> CREATED
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IED as in IMPLY --> IMPLIED
|
||||
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#ED as in CROSS --> CROSSED
|
||||
or CONVEY --> CONVEYED
|
||||
|
||||
"T" FLAG:
|
||||
...E --> ...EST as in LATE --> LATEST
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IEST as in DIRTY --> DIRTIEST
|
||||
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#EST as in SMALL --> SMALLEST
|
||||
or GRAY --> GRAYEST
|
||||
|
||||
"R" FLAG:
|
||||
...E --> ...ER as in SKATE --> SKATER
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
|
||||
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#ER as in BUILD --> BUILDER
|
||||
or CONVEY --> CONVEYER
|
||||
|
||||
"Z FLAG:
|
||||
...E --> ...ERS as in SKATE --> SKATERS
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
|
||||
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#ERS as in BUILD --> BUILDERS
|
||||
or SLAY --> SLAYERS
|
||||
|
||||
"S" FLAG:
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IES as in IMPLY --> IMPLIES
|
||||
if # .eq. S, X, Z, or H,
|
||||
...# --> ...#ES as in FIX --> FIXES
|
||||
if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...# --> ...#S as in BAT --> BATS
|
||||
or CONVEY --> CONVEYS
|
||||
|
||||
"P" FLAG:
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
|
||||
if # .ne. Y, or @ = A, E, I, O, or U,
|
||||
...@# --> ...@#NESS as in LATE --> LATENESS
|
||||
or GRAY --> GRAYNESS
|
||||
|
||||
"M" FLAG:
|
||||
... --> ...'S as in DOG --> DOG'S
|
||||
|
||||
Note: The existence of a flag on a root word in the directory is not by
|
||||
itself sufficient to cause SPELL to recognize the indicated word ending.
|
||||
If there is more than one root for which a flag will indicate a given word,
|
||||
only one of the roots is the correct one for which the flag is effective;
|
||||
generally it is the longest root. For example, the "D" rule implies that
|
||||
either PASS or PASSE, with a "D" flag, will yield PASSED. The flag must be
|
||||
on PASSE; it will be ineffective on PASS. This is because, when SPELL
|
||||
encounters the word PASSED and fails to find it in its dictionary, it
|
||||
strips off the "D" and looks up PASSE. Upon finding PASSE, it then accepts
|
||||
PASSED if and only if PASSE has the "D" flag. Only if the word PASSE is
|
||||
not in the main dictionary at all does the program strip off the "E" and
|
||||
search for PASS. Furthermore, some combinations of flags are forbidden to
|
||||
allow for dense flag encoding to save space. For example, only one of the
|
||||
"P", "J", or "V" flags may be on in any one word.
|
||||
|
||||
Therefore, be careful when installing dictionary flags. When in doubt
|
||||
about how to install a flag, don't; let the program do it for you. When a
|
||||
word is read into the main dictionary, whenever possible SPELL sets the
|
||||
appropriate flag in an existing word instead of entering the word itself.
|
||||
So, for example, reading PASSED into the dictionary would result in the "D"
|
||||
flag being set correctly.
|
||||
Warning: The above definitions are part of the internal behavior of SPELL,
|
||||
and are subject to change without notice. Users copying the dictionary
|
||||
should always get the then-current version of this document to get an
|
||||
accurate description of the flags.
|
||||
|
||||
11.7 Preparing new versions of SPELL
|
||||
|
||||
The main dictionary is dictionary 0, and may be accessed by that
|
||||
number. When it is dumped, its flags will be written also. A file
|
||||
containing flags must not be read into a dictionary other than zero, nor
|
||||
should it be read in if dictionaries other than zero contain any words.
|
||||
(Otherwise, it may attempt unsuccessfully to set a flag on a word whose
|
||||
dictionary number is nonzero.) Therefore, flags should be used only on the
|
||||
dictionary that is loaded when a new version of SPELL is created.
|
||||
|
||||
To create a new version of SPELL, assemble it under MIDAS and start
|
||||
it. Give the "LOAD" command and read in the master dictionary, specifying
|
||||
dictionary zero. Then give the "WRITE" command, with the name of the
|
||||
desired image file. On ITS, this should be a file such as "TS SPELL". On
|
||||
Twenex, it should be a file such as "<SUBSYS>ISPELL.EXE". On ITS, it will
|
||||
ask whether to purify it. Say "N", for the purification stuff is still not
|
||||
repaired from an ancient version. On Twenex it will always be written in
|
||||
sharable form.
|
||||
580
doc/info/spell.two
Normal file
580
doc/info/spell.two
Normal file
@@ -0,0 +1,580 @@
|
||||
-*-Text-*-
|
||||
This is the file <info>ispell.info, which documents the ISPELL
|
||||
spelling checker/corrector. The canonical "R" source for this is
|
||||
[MIT-XX]SRC:<WBA>SPDOOC.R
|
||||
|
||||
File: ISPELL Node: Top Up: (DIR) Next: Correcting
|
||||
|
||||
1. INTRODUCTION
|
||||
|
||||
This memo documents a spelling check/correction program maintained on
|
||||
the four MIT ITS computers and on the MIT-XX Twenex computer. The program
|
||||
is called "SPELL" on ITS and "ISPELL" (for ITS-style SPELL) on Twenex.
|
||||
(The name ISPELL is used on Twenex to avoid conflict with the standard DEC
|
||||
TOPS-20 program called SPELL. This memo does NOT apply to the DEC TOPS-20
|
||||
program or to programs called NSPELL or OSPELL on ITS.)
|
||||
|
||||
The program's behavior is nearly identical on ITS and Twenex. (The
|
||||
principal difference is that arguments to a command are separated by commas
|
||||
on ITS and spaces on Twenex.) A command consists of a command name and its
|
||||
arguments on one line, for example
|
||||
|
||||
SET SCRIBE
|
||||
LOAD MYDICT,3
|
||||
CORRECT FOO BAR,FOO OUT
|
||||
CORRECT FOO BAR,FOO OUT/13
|
||||
ASK PROCEDE
|
||||
|
||||
SPELL is designed to process input text for text formatters TJ6, R,
|
||||
SCRIBE, PUB, and TEX. It has the necessary knowledge about the command
|
||||
structure of these programs to avoid attempting to check the spelling of
|
||||
formatter commands. It can be told by means of special comments to
|
||||
suppress checking of designated portions of the text.
|
||||
|
||||
SPELL's querying mechanism for correcting misspelled words works best
|
||||
when it is used from a display terminal, but SPELL may be effectively used
|
||||
from a printing terminal or slow display, especially if the "context
|
||||
display" and "alternate spellings display" (see below) are disabled. It
|
||||
may be used effectively from uppercase-only terminals, since it uses the
|
||||
capitalization from the original word in the file when a spelling
|
||||
correction is typed in.
|
||||
|
||||
SPELL has a dictionary of about 38000 words and requires about 43K of
|
||||
memory to run.
|
||||
|
||||
Report bugs and complaints to BUG-SPELL@MIT-ML.
|
||||
|
||||
* Menu:
|
||||
|
||||
* Correcting:: The principal operation
|
||||
* Training:: Making a list of all unknown words from a file
|
||||
* Asking:: Asking about specific words, and other fun and games
|
||||
* Load/Dump:: Loading and dumping dictionaries
|
||||
* Options:: Switches that control formatter mode etc.
|
||||
* Syntax:: Syntax of commands and file names, and operation from "JCL"
|
||||
* Esoterica::
|
||||
|
||||
File: ISPELL Node: Correcting Previous: Top Up: Top Next: Training
|
||||
|
||||
2. The CORRECT command
|
||||
|
||||
The principal operation which SPELL performs is that of "correcting" a
|
||||
file. This consists of reading the file and optionally writing an output
|
||||
file in which misspellings are corrected. The corrections are made either
|
||||
by substituting a word known to SPELL or by retyping the word manually.
|
||||
Whether output is being written or not, every misspelling (that is, every
|
||||
word not known to SPELL) is displayed, and the user is queried about the
|
||||
action to be taken.
|
||||
|
||||
The command name "CORRECT" is followed by the filespec of the input
|
||||
file, the optional filespec of the output file, and an optional switch
|
||||
specifying the starting line number. If this line number is given, all
|
||||
earlier lines in the input file are copied to the output file without being
|
||||
checked. If no output is desired, omit the second file specification.
|
||||
|
||||
CORRECT INPUT FILE,OUTPUT FILE
|
||||
CORRECT INPUT FILE,OUTPUT FILE/13
|
||||
CORRECT INPUT FILE (produce no corrected output)
|
||||
|
||||
2.1 Suppressing checking of designated portions of the text
|
||||
|
||||
When SPELL is being operated in a formatter mode, it can suppress
|
||||
checking of regions containing figures drawn with strange fonts or other
|
||||
nonsensical-looking text. It does this by looking for comments looking
|
||||
like (for the R formatter) "^K &&&SPELLON" or "^K &&&SPELLOFF". The
|
||||
comments must be exactly as shown: a control K, space, three ampersands,
|
||||
and the word entirely in capitals. What follows later on the line is
|
||||
immaterial. The SPELLOFF command disables checking until the next SPELLON
|
||||
command. The unchecked text is still copied into the output file. The
|
||||
commands are:
|
||||
|
||||
R "^K &&&SPELLON" and "^K &&&SPELLOFF"
|
||||
TEX "% &&&SPELLON" and "% &&&SPELLOFF"
|
||||
TJ6 ".C &&&SPELLON" and ".C &&&SPELLOFF"
|
||||
PUB ".<< &&&SPELLON" and ".<< &&&SPELLOFF"
|
||||
SCRIBE "@comment(&&&SPELLON)" and "@comment(&&&SPELLOFF)"
|
||||
|
||||
Remember that SPELL does not expand formatter macros. While it would be
|
||||
convenient to put SPELLOFF and SPELLON commands in macros, it would not
|
||||
have the desired effect. SPELL responds to the commands only when they
|
||||
literally appear in the text.
|
||||
|
||||
2.2 Querying during a correction
|
||||
|
||||
When SPELL finds an unknown word during a correction, it displays:
|
||||
(1) The unknown word.
|
||||
(2) If the "LIST" option is on (which it normally is), a
|
||||
list of words that are known to SPELL and are close to
|
||||
the unknown word. One of these might be the word that
|
||||
was intended. These words are displayed with index
|
||||
numbers beginning with zero.
|
||||
(3) If the "DISPLAY" option is on (which it normally is),
|
||||
the line number and a few lines of the file showing the
|
||||
context in which the word appeared.
|
||||
|
||||
The user then types one of the commands listed below. These commands are
|
||||
acted on IMMEDIATELY. No <return> is typed after them. The intention is
|
||||
to allow rapid interaction with the program. However, care is required to
|
||||
avoid typing the wrong thing.
|
||||
|
||||
A or <space> Accept the word, but do not put it in the dictionary. It
|
||||
will still be unknown, so SPELL will query again if it
|
||||
encounters this word again.
|
||||
|
||||
I Accept the word and put it into dictionary number 1. The
|
||||
word will henceforth be known, so SPELL will not query about
|
||||
it again. If the dictionary is written out at the end of
|
||||
the run, it can be read back in on future runs and SPELL
|
||||
will know all such words.
|
||||
|
||||
D1 to D9 Accept the word and put it into dictionary number N.
|
||||
|
||||
0 to 9 Substitute the numbered choice for the unknown word. (This
|
||||
is only useful if corrected output is being written.) The
|
||||
corrected word will be given the same capitalization as the
|
||||
unknown word, if possible. If not possible, SPELL will say
|
||||
so.
|
||||
|
||||
R The user types in a new word to replace the unknown one.
|
||||
(This is only useful if corrected output is being written.)
|
||||
As above, the capitalization of the original word will be
|
||||
used. The capitalization of the typed-in word is ignored.
|
||||
(Hence SPELL can be operated effectively from
|
||||
upper-case-only terminals.) SPELL then checks the spelling
|
||||
of the new word. If the new word is unknown, SPELL queries
|
||||
again, and the user can reply with any of the commands in
|
||||
this list, including another "R". Therefore, if you are not
|
||||
sure the word you are typing in is correctly spelled, type
|
||||
it anyway. If it is wrong, SPELL will give you another
|
||||
chance to fix it, perhaps by displaying alternative
|
||||
spellings. If the new word is correctly spelled but SPELL
|
||||
does not know it, typing "I" when queried the second time
|
||||
will put it in the dictionary.
|
||||
|
||||
+/-<letter> Sets or clears the indicated option, just as for the SET or
|
||||
CLEAR command at the top level. The letter typed after the
|
||||
"+" or "-" is a single letter abbreviation for the option
|
||||
name. It is the first letter of the option name, except for
|
||||
TJ6, whose abbreviation is "J". After giving this command,
|
||||
the user must still dispose of the unknown word.
|
||||
|
||||
W The unknown word, and all future words, are accepted. That
|
||||
is, the rest of the input file is immediately copied to the
|
||||
output file without further checking. This is sometimes
|
||||
useful if the system is going down in one minute. It also
|
||||
complements the optional starting line feature of the
|
||||
CORRECT command.
|
||||
|
||||
^L The display is refreshed.
|
||||
^G The entire correction operation is aborted, and SPELL goes
|
||||
back to the top level to await the next operation command.
|
||||
|
||||
? A brief summary of the available commands is displayed,
|
||||
showing the currently selected options.
|
||||
|
||||
File: ISPELL Node: Training Previous: Correcting Up: Top Next: Asking
|
||||
|
||||
3. The TRAIN command
|
||||
|
||||
This is a sort of "off-line" version of correcting. Instead of
|
||||
querying the user, SPELL lists all unknown words in an "exceptions" file,
|
||||
and puts them into dictionary number 1 also. It is often possible to
|
||||
determine quickly whether a file contains errors by training and then
|
||||
examining the exceptions file.
|
||||
|
||||
The word "TRAIN" is followed by the filespecs of the input file and
|
||||
the exceptions file.
|
||||
|
||||
SPELL can suppress its training of designated portions of the text in
|
||||
exactly the same way as during corrections, by using the "&&&SPELLON" and
|
||||
"&&&SPELLOFF" comments.
|
||||
|
||||
File: ISPELL Node: Asking Previous: Training Up: Top Next: Load/Dump
|
||||
|
||||
4. The ASK command
|
||||
|
||||
This is used for manual interrogation of the dictionary about a single
|
||||
word. The command is "ASK" followed by the word. SPELL will indicate
|
||||
whether it knows the word. If the word is known, SPELL will also indicate
|
||||
whether suffix removal was used, and whether the word has suffix flags.
|
||||
(Users generally do not need to be concerned with this information.) If
|
||||
the word is not known, SPELL will list similar words that it knows, if
|
||||
any.
|
||||
|
||||
5. The JUMBLE command
|
||||
|
||||
This command is provided to solve "scrambled word" problems of the
|
||||
sort that appear in some newspapers. The command "JUMBLE" is followed by
|
||||
the string (not more than 8 letters). All permutations of the letters in
|
||||
the string that are known to SPELL are displayed. Because the program is
|
||||
not clever about redundant permutations, it will display words more than
|
||||
once if they contain duplicated letters.
|
||||
|
||||
File: ISPELL Node: Load/Dump Previous: Asking Up: Top Next: Options
|
||||
|
||||
6. The LOAD and DUMP commands
|
||||
|
||||
The LOAD command reads a file and places the words in it into one of 9
|
||||
dictionaries that SPELL has in addition to its main dictionary. The word
|
||||
LOAD is followed by the dictionary filespec and an optional dictionary
|
||||
number (1 through 9). If the number is not given, dictionary number 1 will
|
||||
be loaded. (One incremental dictionary is usually sufficient.) The "DUMP"
|
||||
operation has the same syntax and writes the indicated dictionary onto a
|
||||
file.
|
||||
A word may be in only one dictionary. If it is in any dictionary it
|
||||
is "known" for purposes of correcting, training, etc. When loading a
|
||||
dictionary, SPELL ignores any word that it already knows, even if it is in
|
||||
a different dictionary than the one specified.
|
||||
|
||||
Incremental dictionaries are useful for maintaining private lists of
|
||||
words that appear in one's files and are known to be correct but are not in
|
||||
SPELL's main dictionary. When correcting a file, the user should type "I"
|
||||
when such a word is encountered, and then dump the incremental dictionary
|
||||
at the end of the run. This dictionary may than be read back in at the
|
||||
start of the next run.
|
||||
|
||||
Dictionaries other than number 1 are occasionally useful for
|
||||
maintaining different lists of jargon words or words specific to particular
|
||||
subjects.
|
||||
|
||||
File: ISPELL Node: Options Previous: Load/Dump Up: Top Next: Syntax
|
||||
|
||||
7. The SET and CLEAR commands
|
||||
|
||||
SPELL has 7 "option" flags to control its operation:
|
||||
|
||||
"R" option Reading "R" text: ignore commands, font indicators,
|
||||
comments, string and number register names, and inline
|
||||
macro names. It correctly handles "hexadecimal" fonts as
|
||||
in ^FAword^F*. Warning: Since command lines are ignored,
|
||||
titles appearing in chapter, section, and figure macros
|
||||
are not checked and should be inspected manually. The
|
||||
same is true of string registers that are to expand to
|
||||
actual text. Also, SPELL is not clever about things that
|
||||
are quoted or translated, or other obscure features of
|
||||
text formatters. For example, it thinks "^Q^K" starts a
|
||||
comment, because it doesn't know about the action of ^Q.
|
||||
This is true, to varying degrees, for all of the
|
||||
formatter options.
|
||||
|
||||
"TJ6" option Reading TJ6 text: ignore spelling of command lines and
|
||||
font indicators. See the warning under the "R" option.
|
||||
|
||||
"SCRIBE" option Reading SCRIBE text: ignore all words preceded by "@",
|
||||
and the arguments to those commands whose arguments are
|
||||
not expected to be text. Only comments of the form
|
||||
@comment(...) are recognized and ignored. Comments of
|
||||
the form @begin(comment)...@end(comment) are not. The
|
||||
latter types of comments may be protected by additional
|
||||
"&&&SPELLOFF" and "&&&SPELLON" comments, if desired. See
|
||||
the warning under the "R" option.
|
||||
|
||||
"PUB" option Reading PUB text: like TJ6.
|
||||
|
||||
"TEX" option Reading TEX text: ignore commands, font indicators, and
|
||||
comments. See the warning under the "R" option.
|
||||
|
||||
"LIST" option List suggested alternate spellings when it encounters an
|
||||
unknown word. If you don't care about the alternate
|
||||
spellings, you can make it run faster by turning this
|
||||
off.
|
||||
|
||||
"DISPLAY" option Display the context and line number of the unknown word.
|
||||
It saves wear and tear on printing terminals to turn this
|
||||
off if you don't really need it.
|
||||
|
||||
Options are turned on by the command SET (option name), off by CLEAR
|
||||
(option name). At present, only the first letter of the word following SET
|
||||
or CLEAR is looked at, and "J" must be used to denote TJ6, so that "T" can
|
||||
denote TEX. The HELP command displays the currently enabled options. Only
|
||||
one of the formatter options ("TJ6", "R", "PUB", "SCRIBE", and "TEX") may
|
||||
be on at one time. Turning any of them on clears the others.
|
||||
|
||||
The initial options are "R", "LIST", and "DISPLAY". In addition to
|
||||
the top level commands SET and CLEAR, options may be set or cleared when
|
||||
SPELL is querying during a correction. To do this, type a plus sign to
|
||||
set, or a minus sign to clear, followed by a SINGLE LETTER abbreviation for
|
||||
the option (or "J" for TJ6). For example, if you had DISPLAY off and you
|
||||
decide you want to see the context of a particular word after all, you can
|
||||
turn it on by typing "+D".
|
||||
|
||||
8. The QUIT and KILL commands
|
||||
|
||||
"KILL" exits from SPELL and kills the job. "QUIT" exits but allows it
|
||||
to be restarted.
|
||||
|
||||
File: ISPELL Node: Syntax Previous: Options Up: Top Next: Esoterica
|
||||
|
||||
9. ENTERING AND EDITING COMMANDS
|
||||
|
||||
A command consists of a command name ("CORRECT" etc.) followed by
|
||||
whatever arguments (e.g. filespecs) are appropriate, terminated by a
|
||||
carriage return. The command name may be abbreviated to any unambiguous
|
||||
prefix. In the command line, rubout deletes the last character. ^U
|
||||
deletes the entire line, allowing one to start over. ^R redisplays the
|
||||
material typed so far, in case the screen got messed up. A question mark
|
||||
gives a help message. ^Q quotes the following character in a filename.
|
||||
|
||||
Filespecs and other arguments are separated from each other by commas,
|
||||
for example
|
||||
CORR FOO BAR,FOO OUT
|
||||
LOAD MYDICT,3
|
||||
To specify the starting line number for a correction operation, follow the
|
||||
last filespec by a slash and the number, e.g.
|
||||
CORR FOO BAR,FOO OUT/13
|
||||
In the JCL line, an altmode is equivalent to a carriage return in
|
||||
terminating the command, making it possible to put multiple commands into a
|
||||
JCL string meaningfully.
|
||||
|
||||
In filenames, the Sname initially defaults to the user's working
|
||||
directory, and is "sticky", i. e. will subsequently default to the last
|
||||
specification used. The device and second filename always default to
|
||||
"DSK:" and ">", and are not sticky. The first filename must always be
|
||||
typed: it has no default value.
|
||||
10. STARTING SPELL WITH A COMMAND LINE
|
||||
|
||||
If a command line ("JCL") is supplied when SPELL is started, it will
|
||||
use that line, for as long as it lasts, instead of taking commands from the
|
||||
terminal. Any command error cancels the rest of the command line and
|
||||
reverts to interactive operation.
|
||||
|
||||
Since <return> can't be put into the command line, SPELL allows
|
||||
altmode to separate commands. This is allowed only when the command is
|
||||
being read from JCL.
|
||||
|
||||
example:
|
||||
:SPELL LOAD MYDICT$CORRECT FOO,BAR
|
||||
loads dictionary MYDICT > and corrects FOO >, writing
|
||||
corrected output to BAR >.
|
||||
|
||||
:SPELL LOAD MYDICT$TRAIN FOO,FOO ERRS$KILL
|
||||
Loads MYDICT > and trains FOO >, putting exceptions
|
||||
into FOO ERRS, and then kills itself.
|
||||
|
||||
File: ISPELL Node: Esoterica Previous: Syntax Up: Top
|
||||
|
||||
11. ESOTERICA
|
||||
|
||||
The following information should not be necessary for normal operation
|
||||
of SPELL. It is included for completeness.
|
||||
|
||||
11.1 File operations
|
||||
|
||||
SPELL reads and writes files in "block image" mode, using blocks of
|
||||
128 words. On writing, the last word is padded with ^C. On reading, any
|
||||
padding of ^@, ^A, ^B, or ^C at the end of the last word is removed.
|
||||
|
||||
11.2 Word identification algorithm
|
||||
|
||||
A word is any uninterrupted sequence of letters and apostrophes, which
|
||||
does not begin or end with an apostrophe. Any punctuation, digit, or
|
||||
control character separates words. Words are not checked if they are being
|
||||
ignored under control of the text formatter mode. Any word consisting of a
|
||||
single letter, or any word more than 40 letters long, is considered to be
|
||||
correctly spelled.
|
||||
|
||||
11.3 Closeness algorithm
|
||||
|
||||
Two words are considered to be "close" if they differ in only one of
|
||||
the following ways:
|
||||
Two adjacent letters are interchanged.
|
||||
One letter is different.
|
||||
One letter is missing in one and present in the other.
|
||||
|
||||
This criterion is used in suggesting close words when an unknown word
|
||||
is found while correcting a file or when an unknown word is asked for with
|
||||
the "A" command. Hence the word SEQUENCE will be suggested if the program
|
||||
encounters SEUQENCE, SERQUENCE, SEQUNCE, or SEQUENCW.
|
||||
11.4 Dictionary policy
|
||||
|
||||
It is the policy of this program to contain only one spelling of a
|
||||
word, even if ordinary dictionaries show two or more "acceptable"
|
||||
spellings. Hence, the dictionary contains LABELED and LABELING, but not
|
||||
LABELLED or LABELLING, even though all four are actually acceptable. The
|
||||
intention is to enforce uniformity within each document. The author
|
||||
apologizes for the restriction on creativity and diversity that this
|
||||
necessitates, but believes that it is the best policy for this program.
|
||||
|
||||
The dictionary contains many technical and computer terms such as
|
||||
MICROPROGRAM and DEBUGGER, but does not contain extreme jargon words such
|
||||
as CONTROLIFY or VALRET. The dictionary contains no proper names other
|
||||
than names of countries and states of the United States. The reason is
|
||||
that it would be virtually impossible to contain all of the proper names
|
||||
that commonly arise in normal use. Users should keep proper names (and
|
||||
other correctly spelled words) that arise in their own work in private
|
||||
dictionaries to avoid having to repeatedly tell SPELL to accept them.
|
||||
|
||||
The dictionary is significantly smaller than that found in other
|
||||
spelling checkers, such as the DEC TOPS-20 program. The author believes
|
||||
that the larger dictionary would not reduce the number of false misspelling
|
||||
indications by very much. Users who find words that SPELL does not know,
|
||||
but should, are urged to mail pointers to lists of such words (or the
|
||||
documents in which they appear) to BUG-SPELL@MIT-ML. They will be
|
||||
considered for addition to the dictionary. Users who wish to argue that an
|
||||
extremely large dictionary would be better should mail pointers to specific
|
||||
documents demonstrating this. The argument might be valid, but evidence is
|
||||
needed.
|
||||
|
||||
11.5 The "BASK" command
|
||||
|
||||
The "BASK" command is a variant of the "ASK" command intended for
|
||||
invocations of SPELL by other programs. The command line (presumably from
|
||||
a JCL line) is "BASK" followed by the word to be looked up, and then the
|
||||
name of the file to which the information is to be written, for example:
|
||||
|
||||
:SPELL BASK WHEREVER,TEST OUT$KILL
|
||||
looks up the word "wherever", and writes a report in the file "TEST OUT".
|
||||
The report is similar to the information displayed after the "A" command,
|
||||
except that formatting characters are absent and the result of the search
|
||||
is indicated by the first character of the file.
|
||||
|
||||
If the word was found directly, the file consists of a "*" followed by
|
||||
any dictionary flags that the word may have had.
|
||||
If the word was found through suffix removal, the file consists of a
|
||||
"+", the root word, and the dictionary flags of the root word.
|
||||
If the word was not found and there are no words close to it, the file
|
||||
consists of a "#".
|
||||
If the word was not found but there are close words, the file consists
|
||||
of a "&" immediately followed by a list of close words, separated
|
||||
from each other by newlines.
|
||||
The file is always terminated by a <newline>.
|
||||
11.6 Dictionary flags
|
||||
|
||||
Words in SPELL's main dictionary (but not the other dictionaries) may
|
||||
have flags associated with them to indicate the legality of suffixes
|
||||
without the need to keep the full suffixed words in the dictionary. The
|
||||
flags have "names" consisting of single letters. Their meaning is as
|
||||
follows:
|
||||
|
||||
Let # and @ be "variables" that can stand for any letter. Upper case
|
||||
letters are constants. "..." stands for any string of zero or more
|
||||
letters, but note that no word may exist in the dictionary which is not at
|
||||
least 2 letters long, so, for example, FLY may not be produced by placing
|
||||
the "Y" flag on "F". Also, no flag is effective unless the word that it
|
||||
creates is at least 4 letters long, so, for example, WED may not be
|
||||
produced by placing the "D" flag on "WE".
|
||||
|
||||
"V" flag:
|
||||
...E --> ...IVE as in CREATE --> CREATIVE
|
||||
if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
|
||||
|
||||
"N" flag:
|
||||
...E --> ...ION as in CREATE --> CREATION
|
||||
...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
|
||||
if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
|
||||
|
||||
"X" flag:
|
||||
...E --> ...IONS as in CREATE --> CREATIONS
|
||||
...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
|
||||
if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
|
||||
|
||||
"H" flag:
|
||||
...Y --> ...IETH as in TWENTY --> TWENTIETH
|
||||
if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
|
||||
|
||||
"Y" FLAG:
|
||||
... --> ...LY as in QUICK --> QUICKLY
|
||||
|
||||
"G" FLAG:
|
||||
...E --> ...ING as in FILE --> FILING
|
||||
if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
|
||||
|
||||
"J" FLAG"
|
||||
...E --> ...INGS as in FILE --> FILINGS
|
||||
if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
|
||||
|
||||
"D" FLAG:
|
||||
...E --> ...ED as in CREATE --> CREATED
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IED as in IMPLY --> IMPLIED
|
||||
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#ED as in CROSS --> CROSSED
|
||||
or CONVEY --> CONVEYED
|
||||
|
||||
"T" FLAG:
|
||||
...E --> ...EST as in LATE --> LATEST
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IEST as in DIRTY --> DIRTIEST
|
||||
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#EST as in SMALL --> SMALLEST
|
||||
or GRAY --> GRAYEST
|
||||
|
||||
"R" FLAG:
|
||||
...E --> ...ER as in SKATE --> SKATER
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
|
||||
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#ER as in BUILD --> BUILDER
|
||||
or CONVEY --> CONVEYER
|
||||
|
||||
"Z FLAG:
|
||||
...E --> ...ERS as in SKATE --> SKATERS
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
|
||||
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#ERS as in BUILD --> BUILDERS
|
||||
or SLAY --> SLAYERS
|
||||
|
||||
"S" FLAG:
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@IES as in IMPLY --> IMPLIES
|
||||
if # .eq. S, X, Z, or H,
|
||||
...# --> ...#ES as in FIX --> FIXES
|
||||
if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
|
||||
...@# --> ...@#S as in BAT --> BATS
|
||||
or CONVEY --> CONVEYS
|
||||
|
||||
"P" FLAG:
|
||||
if @ .ne. A, E, I, O, or U,
|
||||
...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
|
||||
if # .ne. Y, or @ = A, E, I, O, or U,
|
||||
...@# --> ...@#NESS as in LATE --> LATENESS
|
||||
or GRAY --> GRAYNESS
|
||||
|
||||
"M" FLAG:
|
||||
... --> ...'S as in DOG --> DOG'S
|
||||
|
||||
Note: The existence of a flag on a root word in the directory is not by
|
||||
itself sufficient to cause SPELL to recognize the indicated word ending.
|
||||
If there is more than one root for which a flag will indicate a given word,
|
||||
only one of the roots is the correct one for which the flag is effective;
|
||||
generally it is the longest root. For example, the "D" rule implies that
|
||||
either PASS or PASSE, with a "D" flag, will yield PASSED. The flag must be
|
||||
on PASSE; it will be ineffective on PASS. This is because, when SPELL
|
||||
encounters the word PASSED and fails to find it in its dictionary, it
|
||||
strips off the "D" and looks up PASSE. Upon finding PASSE, it then accepts
|
||||
PASSED if and only if PASSE has the "D" flag. Only if the word PASSE is
|
||||
not in the main dictionary at all does the program strip off the "E" and
|
||||
search for PASS. Furthermore, some combinations of flags are forbidden to
|
||||
allow for dense flag encoding to save space. For example, only one of the
|
||||
"P", "J", or "V" flags may be on in any one word.
|
||||
Therefore, be careful when installing dictionary flags. When in doubt
|
||||
about how to install a flag, don't; let the program do it for you. When a
|
||||
word is read into the main dictionary, whenever possible SPELL sets the
|
||||
appropriate flag in an existing word instead of entering the word itself.
|
||||
So, for example, reading PASSED into the dictionary would result in the "D"
|
||||
flag being set correctly.
|
||||
Warning: The above definitions are part of the internal behavior of SPELL,
|
||||
and are subject to change without notice. Users copying the dictionary
|
||||
should always get the then-current version of this document to get an
|
||||
accurate description of the flags.
|
||||
|
||||
11.7 Preparing new versions of SPELL
|
||||
|
||||
The main dictionary is dictionary 0, and may be accessed by that
|
||||
number. When it is dumped, its flags will be written also. A file
|
||||
containing flags must not be read into a dictionary other than zero, nor
|
||||
should it be read in if dictionaries other than zero contain any words.
|
||||
(Otherwise, it may attempt unsuccessfully to set a flag on a word whose
|
||||
dictionary number is nonzero.) Therefore, flags should be used only on the
|
||||
dictionary that is loaded when a new version of SPELL is created.
|
||||
|
||||
To create a new version of SPELL, assemble it under MIDAS and start
|
||||
it. Give the "LOAD" command and read in the master dictionary, specifying
|
||||
dictionary zero. Then give the "WRITE" command, with the name of the
|
||||
desired image file. On ITS, this should be a file such as "TS SPELL". On
|
||||
Twenex, it should be a file such as "<SUBSYS>ISPELL.EXE". On ITS, it will
|
||||
ask whether to purify it. Say "N", for the purification stuff is still not
|
||||
repaired from an ancient version. On Twenex it will always be written in
|
||||
sharable form.
|
||||
| ||||