SPELL V2.0 DOCUMENTATION

SPELL V2.0 DOCUMENTATION
Michael C. Adler
December 22, 1982

(C) 1982 Michael C. Adler
This program has been released into the public domain by
the author. It may neither be sold for profit nor included
in a sold software package without permission of the
author.

The first SPELL using this dictionary was probably written
by Ralph Gorin at Stanford. It was transported to MIT by
Wayne Mattson. Both the program at MIT and the dictionary
were most recently revised by William Ackerman at MIT.
Section 5 of this document was copied from portions of Mr.
Ackerman's documentation.

Thanks to all for the effort spent designing the
dictionary!

Spell is a program, written for Z80 processors running CP/M,
designed to detect misspellings in a document.

1. USING SPELL

The minimum configuration of SPELL requires the files
SPELL.COM and DICT.DIC (the main dictionary). At the time of
execution, DICT.DIC must be on either the default drive or drive
A:.

The name of the file to be corrected must be included on the
command line that is used to invoke spell. If a drive name is
specified as a second file name, output is directed to the speci-
fied drive. Thus,

SPELL useless.doc

will check the file "useless.doc" and direct output to the
default drive and

SPELL b:useless.doc c:

will check the file "b:useless.doc" and direct output to disk c.

Spell will check the input file for errors by comparing each
word in the file to the dictionary. If a word is not found, a
null (ascii 0) is placed before the word. To change this marking
character, see section 4, PATCHING SPELL. If a backup version
(.BAK file type) of the input file exists, it will be deleted.
The input file will be renamed to a backup file and the checked
file will replace the input file.

2. USER DICTIONARIES

A user dictionary is a list of correct words that can be

1

loaded by SPELL to augment the main dictionary. Words such as
proper nouns can be placed in user dictionaries to inhibit error
marking. User dictionary files may be formatted in any way that
the user desires, as long as words are delimited by non-alphabe-
tic characters.

SPELL will automatically search for the user dictionary
SPELL.DIC on the default drive and on drive A: if it is not on
the default one. It's contents are then loaded and temporarily
added to the dictionary. It must be loaded again to be included
in subsequent executions of SPELL.

SPELL will also automatically search for d:file.UDC, where
file is the name of the file being corrected and d: is the drive
on which file is found. If found, it is also loaded and tempo-
rarily augments the dictionary. Thus, users may create separate
dictionaries for each text file being corrected. After locating
d:file.UDC, SPELL will search file d:file.ADD. This file is
created by WordStar's ^QL command (see section 3) and is not an
ASCII file. d:file.ADD contains commands generated by WordStar
to include specific words in the user dictionary associated with
d:file. SPELL will temporarily place all of the words in it in
the dictionary and will also save the words by copying them into
d:file.UDC.

It is possible to load additional user dictionaries by
specifying them on the SPELL command line. A list of user dic-
tionaries must be preceded by a dollar sign. A dictionary is
specified by a file name and an optional drive name. If no drive
is specified, the default drive is searched and then drive A: is
checked. Extensions are ignored and default to .DIC. Hence, the
the command line:

SPELL useless.doc b: $dict1 c:dict2 dict3.fun

would correct useless.doc and direct output to drive B:. User
dictionary DICT1.DIC would be loaded from the default drive or
drive A:, dictionary DICT2.DIC would be loaded from drive C:,
and DICT3.DIC would be loaded from the default drive or drive A:.
Notice that the extension .fun was ignored.

3. WordStar's ^QL COMMAND

Files checked by SPELL can be corrected using WordStar. In
response to ^QL, the user is asked which portions of the file
should be searched. WordStar will then position the cursor on
the first marked word and print a menu offering F (Fix word), B
(Bypass word), I (Ignore word), D (Add to dictionary), and S (Add
to supplemental dictionary). The F option deletes the error
marker and returns to the WordStar main menu, allowing the user
to correct the word. B will leave the word marker and will
search for the next misspelled word. In this implementation of
SPELL, the I, D and S options all perform the same function
(although I is easier to use because no question is asked by
WordStar). If either of these options (I, D, S) are chosen, the

2

mark will be removed and the word will be added to file.ADD.
Thus, choosing these options informs SPELL that the word is cor-
rect and should not be marked again. The D and S options do not
add the word to SPELL's main dictionary because the compression
method used to store the dictionary is too complicated to allow
such modification efficiently. After choosing all of the options
except F, WordStar will automatically search for the next marked
word.

4. PATCHING SPELL

It is not necessary to recompile SPELL to change the charac-
ter that marks misspelled words. The byte at 0800H contains the
marking character. In the distribution version of SPELL, it is
null, or 0. DDT or another debugger can be used to change 0800H
to the ASCII value of the desired marker.

5. PROGRAM AND DICTIONARY CHARACTERISTICS

5.1 Word identification algorithm

A word is any uninterrupted sequence of letters and
apostrophes, which does not begin or end with an apostrophe.
Any punctuation, digit, or control character separates words.
Any word consisting of a single letter, or any word more than
40 letters long, is considered to be correctly spelled.

5.2 Dictionary policy

It is the policy of this program to contain only one
spelling of a word, even if ordinary dictionaries show two
or more "acceptable" spellings. Hence, the dictionary
contains LABELED and LABELING, but not LABELLED or LABELLING,
even though all four are actually acceptable. The intention
is to enforce uniformity within each document. The author
apologizes for the restriction on creativity and diversity
that this necessitates, but believes that it is the best policy
for this program.

The dictionary contains many technical and computer
terms such as MICROPROGRAM and DEBUGGER, but does not contain
extreme jargon words such as CONTROLIFY or VALRET. The
dictionary contains no proper names other than names of countries
and states of the United States. The reason is that it
would be virtually impossible to contain all of the proper names
that commonly arise in normal use. Users should keep proper
names (and other correctly spelled words) that arise in
their own work in private dictionaries to avoid having to repeat-
edly tell SPELL to accept them.

The dictionary is significantly smaller than that found
in other spelling checkers, such as the DEC TOPS-20 program.
The author believes that the larger dictionary would not reduce
the number of false misspelling indications by very much.

3

[Note: I believe that this dictionary is actually MUCH larger
than any dictionaries currently available for microcomputers.
-Michael]

5.3 Dictionary flags

Words in SPELL's main dictionary (but not the other dictio-
naries) may have flags associated with them to indicate the
legality of suffixes without the need to keep the full
suffixed words in the dictionary. The flags have "names" consis-
ting of single letters. Their meaning is as follows:

Let # and @ be "variables" that can stand for any letter.
Upper case letters are constants. "..." stands for any
string of zero or more letters, but note that no word may
exist in the dictionary which is not at least 2 letters long, so,
for example, FLY may not be produced by placing the "Y" flag
on "F". Also, no flag is effective unless the word that it
creates is at least 4 letters long, so, for example, WED
may not be produced by placing the "D" flag on "WE".

"V" flag:
...E --> ...IVE as in CREATE --> CREATIVE
if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE

"N" flag:
...E --> ...ION as in CREATE --> CREATION
...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN

"X" flag:
...E --> ...IONS as in CREATE --> CREATIONS
...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS

"H" flag:
...Y --> ...IETH as in TWENTY --> TWENTIETH
if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH

"Y" FLAG:
... --> ...LY as in QUICK --> QUICKLY

"G" FLAG:
...E --> ...ING as in FILE --> FILING
if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING

"J" FLAG"
...E --> ...INGS as in FILE --> FILINGS
if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS

"D" FLAG:
...E --> ...ED as in CREATE --> CREATED
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IED as in IMPLY --> IMPLIED
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)

4

...@# --> ...@#ED as in CROSS --> CROSSED
or CONVEY --> CONVEYED

"T" FLAG:
...E --> ...EST as in LATE --> LATEST
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IEST as in DIRTY --> DIRTIEST
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#EST as in SMALL --> SMALLEST
or GRAY --> GRAYEST

"R" FLAG:
...E --> ...ER as in SKATE --> SKATER
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#ER as in BUILD --> BUILDER
or CONVEY --> CONVEYER

"Z FLAG:
...E --> ...ERS as in SKATE --> SKATERS
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#ERS as in BUILD --> BUILDERS
or SLAY --> SLAYERS

"S" FLAG:
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IES as in IMPLY --> IMPLIES
if # .eq. S, X, Z, or H,
...# --> ...#ES as in FIX --> FIXES
if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
...# --> ...#S as in BAT --> BATS
or CONVEY --> CONVEYS

"P" FLAG:
if @ .ne. A, E, I, O, or U,
...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
if # .ne. Y, or @ = A, E, I, O, or U,
...@# --> ...@#NESS as in LATE --> LATENESS
or GRAY --> GRAYNESS

"M" FLAG:
... --> ...'S as in DOG --> DOG'S

Note: The existence of a flag on a root word in the directory
is not by itself sufficient to cause SPELL to recognize the
indicated word ending. If there is more than one root for
which a flag will indicate a given word, only one of the roots
is the correct one for which the flag is effective; generally it
is the longest root. For example, the "D" rule implies that
either PASS or PASSE, with a "D" flag, will yield PASSED. The
flag must be on PASSE; it will be ineffective on PASS. This
is because, when SPELL encounters the word PASSED and fails to

5

find it in its dictionary, it strips off the "D" and looks
up PASSE. Upon finding PASSE, it then accepts PASSED if and
only if PASSE has the "D" flag. Only if the word PASSE is not in
the main dictionary at all does the program strip off the "E"
and search for PASS. Furthermore, some combinations of flags
are forbidden to allow for dense flag encoding to save space.
For example, only one of the "P", "J", or "V" flags may be on in
any one word.

6. SPELL INTERNALS

SPELL uses a number of temporary files during execution.
The file file.D$$ is the union of file.UDC and file.ADD. At the
end of execution, file.UDC and file.ADD are deleted and file.D$$
is renamed to file.UDC. The file file.$$$ is the output file.
At the end of execution, file.BAK is deleted, the input file is
renamed to file.BAK, and file.$$$ is renamed to the input file
name. Warning: if you do not have room on your disk for
file.BAK, file.DOC and file.$$$ at the same time, either use two
drives or delete file.BAK before you start.

SPELL corrects files with two passes of the input file. On
the first pass, the words in the file are sorted alphabetically
and duplicate words are eliminated. An attempt is then made to
search for the words in the dictionary. Words that are found are
marked. On the second pass of the input file, SPELL determines
whether each word was found by locating them in memory. This
method makes the operation of SPELL more efficient because common
words must be looked up only once and because the dictionary can
be searched sequentially, minimizing disk head travel. If all of
the file does not fit in memory on the first pass, the input file
is partitioned into sections small enough to fit into memory and
is then corrected in a series of two pass operations until the
entire file has been checked. It is unlikely that memory will be
filled in large systems by even large text files as 3000 individ-
ual words should fit easily.

7. DICTIONARY INTERNALS

The dictionary has been compressed, significantly, in order
to save space. Dictionary records are all 256 bytes long and
each record contains as many words as will fit. Individual words
are stored in the following code:

4 bits -- Number of characters to copy from the previous
word. Because the dictionary is stored in
alphabetical order, this saves a large number of
characters. This field is 0 at the beginning of
each record.

x * 5 bits -- Characters are stored in 5 bit code. There may be
any number of 5 bit characters. A character
string is terminated by the following field.

3 bits -- Set to 111 binary to indicate the end of the word.

6

Since 11100 binary is greater than 26, all
alphabetic characters can be stored without using
this combination.

4 bits -- Number of bits of flag data following the word.
The bit position of the flags has been ordered so
that the flags most frequently used are earliest.
Flags not stored are assumed to be off.

x bits -- Flag data. x is determined by the previous field.

Each bit represents one of the 14 suffix flags.

8. MODIFYING THE MAIN DICTIONARY

The source for the main dictionary can currently be found in
the file "[MIT-XX]SRC:<WBA>SPELL.DCT". In order to make it com-
patible with SPELL, all of the "/" characters that delimit flags
must be converted to "%" characters so that flags will be consid-
ered earlier in the alphabet than hyphens (DOG%S should be before
DOG'S). The file must then be sorted alphabetically. No utili-
ties are provided with SPELL to accomplish either of these tasks.
Without high capacity disk drives, you may find it necessary to
perform the above steps on a larger computer.

Once a copy of the main dictionary has been placed on the
microcomputer, use the program DICCRE to create a dictionary.
Include the name of the source file on the DICCRE command line.
DICCRE will create the files DICT.DIC (compressed dictionary) and
SPELL0.MAC (pointer file to dictionary) ON THE DEFAULT DISK
DRIVE. When it has finished converting the input file to the
dictionary file, it will execute a warm boot if the output file
is on the same drive as the input file. However, if the output
file is not on the same disk, it will ask whether another input
file exists. This feature allows the user to put the source file
on two disks in case it does not fit on one. DICCRE will combine
them into one dictionary file. If no more files exist, answer N
to the question. If another file does exist, put the disk with
the new file in the input drive and type Y.

After the dictionary file has been created, it is necessary
to recompile SPELL with the new pointer file, SPELL0.MAC. If
your assembler does not support the INCLUDE statement, you will
have to replace the line INCLUDE SPELL0.MAC in the file SPELL.MAC
with the contents of SPELL0.MAC. After SPELL is recompiled, be
sure to use the correct copy of DICT.DIC with it or you will
obtain unpredictable results.

For more information about dictionaries, see the file:
[MIT-XX]SS:<WBA>DICT.LETTER

Good luck and happy hacking!

Michael Adler (MADLER@MIT-ML)
3 Sunny Knoll Terrace
Lexington, MA 02173

7