SPELL V2.0 DOCUMENTATION
                       Michael C. Adler
                      December 22, 1982

   (C) 1982 Michael C. Adler
   This  program  has been released into the public domain  by
   the author.  It may neither be sold for profit nor included
   in  a  sold  software package  without  permission  of  the
   author.

   The  first SPELL using this dictionary was probably written
   by Ralph Gorin at Stanford.   It was transported to MIT  by
   Wayne Mattson.   Both the program at MIT and the dictionary
   were  most  recently  revised by William Ackerman  at  MIT.
   Section 5 of this document was copied from portions of  Mr.
   Ackerman's documentation.

         Thanks to all for the effort spent designing the
         dictionary!

    Spell is a program, written for Z80 processors running CP/M,
designed to detect misspellings in a document.

1.  USING SPELL

    The  minimum  configuration  of  SPELL  requires  the  files
SPELL.COM  and  DICT.DIC (the main dictionary).   At the time  of
execution,  DICT.DIC must be on either the default drive or drive
A:.

    The name of the file to be corrected must be included on the
command  line that is used to invoke spell.   If a drive name  is
specified as a second file name, output is directed to the speci-
fied drive.  Thus,

              SPELL useless.doc

will  check  the  file  "useless.doc" and direct  output  to  the
default drive and

              SPELL b:useless.doc c:

will check the file "b:useless.doc" and direct output to disk c.

    Spell will check the input file for errors by comparing each
word in the file to the dictionary.   If a word is not  found,  a
null (ascii 0) is placed before the word.  To change this marking
character,  see section 4,  PATCHING SPELL.   If a backup version
(.BAK  file type) of the input file exists,  it will be  deleted.
The  input file will be renamed to a backup file and the  checked
file will replace the input file.

2.  USER DICTIONARIES

    A  user  dictionary is a list of correct words that  can  be


                               1



loaded  by SPELL to augment the main dictionary.   Words such  as
proper  nouns can be placed in user dictionaries to inhibit error
marking.   User dictionary files may be formatted in any way that
the user desires,  as long as words are delimited by non-alphabe-
tic characters.

    SPELL  will  automatically  search for the  user  dictionary
SPELL.DIC  on the default drive and on drive A:  if it is not  on
the default one.   It's contents are then loaded and  temporarily
added to the dictionary.   It must be loaded again to be included
in subsequent executions of SPELL.

    SPELL  will also automatically search for d:file.UDC,  where
file is the name of the file being corrected and d:  is the drive
on which file is found.   If found,  it is also loaded and tempo-
rarily augments the dictionary.   Thus, users may create separate
dictionaries for each text file being corrected.   After locating
d:file.UDC,  SPELL  will search file d:file.ADD.   This  file  is
created  by WordStar's ^QL command (see section 3) and is not  an
ASCII  file.   d:file.ADD contains commands generated by WordStar
to include specific words in the user dictionary associated  with
d:file.   SPELL  will temporarily place all of the words in it in
the dictionary and will also save the words by copying them  into
d:file.UDC.

    It  is  possible  to load additional  user  dictionaries  by
specifying them on the SPELL command line.   A list of user  dic-
tionaries  must  be preceded by a dollar sign.   A dictionary  is
specified by a file name and an optional drive name.  If no drive
is specified,  the default drive is searched and then drive A: is
checked.  Extensions are ignored and default to .DIC.  Hence, the
the command line:

    SPELL useless.doc b: $dict1 c:dict2 dict3.fun

would  correct useless.doc and direct output to drive  B:.   User
dictionary  DICT1.DIC  would be loaded from the default drive  or
drive  A:,   dictionary DICT2.DIC would be loaded from drive  C:,
and DICT3.DIC would be loaded from the default drive or drive A:.
Notice that the extension .fun was ignored.

3.  WordStar's ^QL COMMAND

    Files checked by SPELL can be corrected using WordStar.   In
response  to ^QL,  the user is asked which portions of  the  file
should  be searched.   WordStar will then position the cursor  on
the  first marked word and print a menu offering F (Fix word),  B
(Bypass word), I (Ignore word), D (Add to dictionary), and S (Add
to  supplemental  dictionary).   The F option deletes  the  error
marker and returns to the WordStar main menu,  allowing the  user
to  correct  the  word.   B will leave the word marker  and  will
search for the next misspelled word.   In this implementation  of
SPELL,  the  I,  D  and S options all perform the  same  function
(although  I  is  easier to use because no question is  asked  by
WordStar).   If either of these options (I, D, S) are chosen, the


                               2



mark  will  be removed and the word will be  added  to  file.ADD.
Thus,  choosing these options informs SPELL that the word is cor-
rect and should not be marked again.   The D and S options do not
add  the word to SPELL's main dictionary because the  compression
method  used to store the dictionary is too complicated to  allow
such modification efficiently.  After choosing all of the options
except F,  WordStar will automatically search for the next marked
word.

4.  PATCHING SPELL

    It is not necessary to recompile SPELL to change the charac-
ter that marks misspelled words.   The byte at 0800H contains the
marking character.   In the distribution version of SPELL,  it is
null,  or 0.  DDT or another debugger can be used to change 0800H
to the ASCII value of the desired marker.

5.  PROGRAM AND DICTIONARY CHARACTERISTICS

5.1 Word identification algorithm

    A   word  is  any  uninterrupted  sequence  of  letters  and
apostrophes,  which  does not begin or end with  an   apostrophe.
Any  punctuation,   digit,  or control character separates words.
Any  word consisting of a single  letter,   or any word more than
40 letters long, is considered to be correctly spelled.

5.2  Dictionary policy

    It   is   the  policy  of this program to contain  only  one
spelling of a word,   even  if  ordinary  dictionaries  show  two
or   more   "acceptable"  spellings.    Hence,   the   dictionary
contains  LABELED  and LABELING,  but not LABELLED or  LABELLING,
even  though all four are actually acceptable.     The  intention
is  to  enforce  uniformity  within  each  document.   The author
apologizes  for the  restriction  on  creativity  and   diversity
that  this necessitates,  but believes that it is the best policy
for this program.

    The   dictionary   contains  many  technical   and  computer
terms  such as MICROPROGRAM and DEBUGGER,  but does  not  contain
extreme  jargon   words  such as  CONTROLIFY  or   VALRET.    The
dictionary contains no proper names other than names of countries
and  states  of the United States.     The  reason  is  that   it
would  be virtually impossible to contain all of the proper names
that  commonly arise in normal use.   Users should  keep   proper
names   (and  other  correctly  spelled  words)  that   arise  in
their own work in private dictionaries to avoid having to repeat-
edly tell SPELL to accept them.

    The dictionary is significantly  smaller  than  that   found
in  other spelling  checkers,   such  as the DEC TOPS-20 program.
The  author believes that the larger dictionary would not  reduce
the number of false misspelling indications by very much.



                               3



[Note:   I  believe that this dictionary is actually MUCH  larger
than  any dictionaries currently available for microcomputers.
-Michael]

5.3  Dictionary flags

    Words  in SPELL's main dictionary (but not the other dictio-
naries)  may have flags associated with  them  to  indicate   the
legality   of   suffixes  without  the  need  to  keep  the  full
suffixed words in the dictionary.  The flags have "names" consis-
ting of single  letters.    Their  meaning  is  as follows:

Let   #  and  @  be  "variables"  that can stand for any  letter.
Upper  case  letters  are constants.   "..."   stands   for   any
string   of  zero  or  more letters,   but note that no word  may
exist in the dictionary which is not at least 2 letters long, so,
for example,  FLY may not be produced  by  placing the  "Y"  flag
on  "F".   Also,  no  flag is effective unless the word  that  it
creates is at least 4 letters  long,   so,   for   example,   WED
may  not  be produced by placing the "D" flag on "WE".

"V" flag:
       ...E --> ...IVE  as in CREATE --> CREATIVE
       if # .ne. E, ...# --> ...#IVE  as in PREVENT --> PREVENTIVE

"N" flag:
       ...E --> ...ION  as in CREATE --> CREATION
       ...Y --> ...ICATION  as in MULTIPLY --> MULTIPLICATION
       if # .ne. E or Y, ...# --> ...#EN  as in FALL --> FALLEN

"X" flag:
       ...E --> ...IONS  as in CREATE --> CREATIONS
       ...Y --> ...ICATIONS  as in MULTIPLY --> MULTIPLICATIONS
       if # .ne. E or Y, ...# --> ...#ENS  as in WEAK --> WEAKENS

"H" flag:
       ...Y --> ...IETH  as in TWENTY --> TWENTIETH
       if # .ne. Y, ...# --> ...#TH  as in HUNDRED --> HUNDREDTH

"Y" FLAG:
       ... --> ...LY  as in QUICK --> QUICKLY

"G" FLAG:
       ...E --> ...ING  as in FILE --> FILING
       if # .ne. E, ...# --> ...#ING  as in CROSS --> CROSSING

"J" FLAG"
       ...E --> ...INGS  as in FILE --> FILINGS
       if # .ne. E, ...# --> ...#INGS  as in CROSS --> CROSSINGS

"D" FLAG:
       ...E --> ...ED  as in CREATE --> CREATED
       if @ .ne. A, E, I, O, or U,
               ...@Y --> ...@IED  as in IMPLY --> IMPLIED
       if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)


                               4



               ...@# --> ...@#ED  as in CROSS --> CROSSED
                               or CONVEY --> CONVEYED

"T" FLAG:
       ...E --> ...EST  as in LATE --> LATEST
       if @ .ne. A, E, I, O, or U,
               ...@Y --> ...@IEST  as in DIRTY --> DIRTIEST
       if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
               ...@# --> ...@#EST  as in SMALL --> SMALLEST
                               or GRAY --> GRAYEST

"R" FLAG:
       ...E --> ...ER  as in SKATE --> SKATER
       if @ .ne. A, E, I, O, or U,
               ...@Y --> ...@IER  as in MULTIPLY --> MULTIPLIER
       if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
               ...@# --> ...@#ER  as in BUILD --> BUILDER
                               or CONVEY --> CONVEYER

"Z FLAG:
       ...E --> ...ERS  as in SKATE --> SKATERS
       if @ .ne. A, E, I, O, or U,
               ...@Y --> ...@IERS  as in MULTIPLY --> MULTIPLIERS
       if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
               ...@# --> ...@#ERS  as in BUILD --> BUILDERS
                               or SLAY --> SLAYERS

"S" FLAG:
       if @ .ne. A, E, I, O, or U,
               ...@Y --> ...@IES  as in IMPLY --> IMPLIES
       if # .eq. S, X, Z, or H,
               ...# --> ...#ES  as in FIX --> FIXES
       if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
               ...# --> ...#S  as in BAT --> BATS
                               or CONVEY --> CONVEYS

"P" FLAG:
       if @ .ne. A, E, I, O, or U,
               ...@Y --> ...@INESS  as in CLOUDY --> CLOUDINESS
       if # .ne. Y, or @ = A, E, I, O, or U,
               ...@# --> ...@#NESS  as in LATE --> LATENESS
                               or GRAY --> GRAYNESS

"M" FLAG:
       ... --> ...'S  as in DOG --> DOG'S

Note:    The  existence of a flag on a root word in the directory
is  not  by  itself sufficient to cause SPELL  to  recognize  the
indicated   word   ending.   If there is more than one  root  for
which a flag will indicate a given word,  only  one  of the roots
is the correct one for which the flag is effective;  generally it
is  the longest root.   For example,  the "D" rule  implies  that
either PASS or PASSE,  with a "D" flag,  will yield PASSED.   The
flag must be on  PASSE;  it  will  be  ineffective on PASS.  This
is  because,  when SPELL encounters the word PASSED and fails  to


                               5



find  it  in  its  dictionary,   it strips off the "D" and  looks
up  PASSE.   Upon finding PASSE,  it then accepts PASSED  if  and
only if PASSE has the "D" flag.  Only if the word PASSE is not in
the  main dictionary at all does the program strip off  the   "E"
and search  for PASS.   Furthermore,  some combinations of  flags
are  forbidden  to allow for dense flag encoding to  save  space.
For example, only one of  the "P", "J", or "V" flags may be on in
any one word.

6.  SPELL INTERNALS

    SPELL  uses  a number of temporary files  during  execution.
The file file.D$$ is the union of file.UDC and file.ADD.   At the
end of execution,  file.UDC and file.ADD are deleted and file.D$$
is  renamed to file.UDC.   The file file.$$$ is the output  file.
At the end of execution,  file.BAK is deleted,  the input file is
renamed  to file.BAK,  and file.$$$ is renamed to the input  file
name.   Warning:   if  you  do  not have room on  your  disk  for
file.BAK,  file.DOC and file.$$$ at the same time, either use two
drives  or delete file.BAK before you start.

    SPELL corrects files with two passes of the input file.   On
the first pass,  the words in the file are sorted  alphabetically
and  duplicate words are eliminated.   An attempt is then made to
search for the words in the dictionary.  Words that are found are
marked.   On the second pass of the input file,  SPELL determines
whether  each word was found by locating them  in  memory.   This
method makes the operation of SPELL more efficient because common
words  must be looked up only once and because the dictionary can
be searched sequentially, minimizing disk head travel.  If all of
the file does not fit in memory on the first pass, the input file
is partitioned into sections small enough to fit into memory  and
is  then  corrected in a series of two pass operations until  the
entire file has been checked.  It is unlikely that memory will be
filled in large systems by even large text files as 3000 individ-
ual words should fit easily.

7.  DICTIONARY INTERNALS

    The dictionary has been compressed,  significantly, in order
to  save space.   Dictionary records are all 256 bytes  long  and
each record contains as many words as will fit.  Individual words
are stored in the following code:

    4 bits -- Number  of  characters to copy from  the  previous
              word.    Because  the  dictionary  is  stored   in
              alphabetical  order,  this saves a large number of
              characters.   This field is 0 at the beginning  of
              each record.

x * 5 bits -- Characters are stored in 5 bit code.  There may be
              any  number  of  5 bit  characters.   A  character
              string is terminated by the following field.

    3 bits -- Set to 111 binary to indicate the end of the word.


                               6



              Since  11100  binary  is  greater  than  26,   all
              alphabetic characters can be stored without  using
              this combination.

    4 bits -- Number  of  bits of flag data following the  word.
              The bit position of the flags has been ordered  so
              that  the flags most frequently used are earliest.
              Flags not stored are assumed to be off.

    x bits -- Flag data.  x is determined by the previous field.

  Each bit represents one of the 14 suffix flags.

8.  MODIFYING THE MAIN DICTIONARY

    The source for the main dictionary can currently be found in
the file "[MIT-XX]SRC:<WBA>SPELL.DCT".   In order to make it com-
patible with SPELL,  all of the "/" characters that delimit flags
must be converted to "%" characters so that flags will be consid-
ered earlier in the alphabet than hyphens (DOG%S should be before
DOG'S).   The file must then be sorted alphabetically.  No utili-
ties are provided with SPELL to accomplish either of these tasks.
Without high capacity disk drives,  you may find it necessary  to
perform the above steps on a larger computer.

    Once  a copy of the main dictionary has been placed  on  the
microcomputer,  use  the  program DICCRE to create a  dictionary.
Include  the name of the source file on the DICCRE command  line.
DICCRE will create the files DICT.DIC (compressed dictionary) and
SPELL0.MAC  (pointer  file  to dictionary) ON  THE  DEFAULT  DISK
DRIVE.   When  it has finished converting the input file  to  the
dictionary  file,  it will execute a warm boot if the output file
is on the same drive as the input file.   However,  if the output
file is not on the same disk,  it will ask whether another  input
file exists.  This feature allows the user to put the source file
on two disks in case it does not fit on one.  DICCRE will combine
them into one dictionary file.   If no more files exist, answer N
to the question.   If another file does exist,  put the disk with
the new file in the input drive and type Y.

    After the dictionary file has been created,  it is necessary
to  recompile SPELL with the new pointer  file,  SPELL0.MAC.   If
your  assembler does not support the INCLUDE statement,  you will
have to replace the line INCLUDE SPELL0.MAC in the file SPELL.MAC
with the contents of SPELL0.MAC.   After SPELL is recompiled,  be
sure  to  use  the correct copy of DICT.DIC with it or  you  will
obtain unpredictable results.

    For more information about dictionaries, see the file:
         [MIT-XX]SS:<WBA>DICT.LETTER

Good luck and happy hacking!

Michael Adler       (MADLER@MIT-ML)
3 Sunny Knoll Terrace
Lexington, MA  02173


                               7