\input blue.tex \loadindexmacros \report

\bluepictures\indmodelpic

\bluechapter Creating an Index

\beginsummary
The creation of a modest index within
a one-pass \TeX{} job has been treated.
In general a proof run and a final run are needed.
\endsummary

Making an index is an art. The fundamental problem is
\bluedisplaycenterline What to include in an index?

Computer-assisted indexing is not simple either.
Issues are
\bitem the markup of keywords or phrases
\bitem to associate page numbers
\bitem to sort and compress raw Index Reminders (^{IR}s), and
\bitem to typeset the result.
\smallbreak
My approach is to create proof indexes\Dash also called mini-indexes\Dash
for each chapter and
learn from those what should be included in the total index.
I perceived this as very pleasant in practice.
Even if you prefer \cs{makeindex} for the real index, this
processing on the fly of a chapter index can be of great help.\ftn{It is said
  that the automatic generation of an index is a feature of the Literate Programming tools.
  For LP with \TeX{} as such, as for example Gurari's Pro\TeX, this on-the-fly indexing
  within \TeX{} can be used.}

\bluehead Use

I'll show how to mark up Knuth's four types of IRs,
how to mark up accents,
how to mark up font switching, and
how to mark up spaces as part of the IR.

\blueexample Markup, commands and resulting index

The right column has been obtained via
\bitem ^|\loadindexmacros|, at the beginning of the script
\bitem ^|\sortindex|, at the place of indexing, and
\bitem ^|\pasteupindex|, for the pasteup of
      the index.
\smallbreak
\thisverbatim{\catcode`\|=12
  \catcode`!=12 \unmc
  \catcode`\*=0 }
\begindemo
Types of IR
0  ^{return}
1  ^|verbatim|
2  ^|\controlsequence|
3  ^\<syntactic quantity>
Accents ^{\'el\`eve!},
font changing ^{\bf bold}
and spaces ^{control\ symbol}
Control sequences
 ^{\TeX, and \AmSTeX}
 ^{Lamport and \LaTeX}
brackets ^{\tt< \rm and \tt>}
\newpage ^{return}
\newpage ^{return}%on purpose
\sortindex\pasteupindex\bye
*yields\obeylines
\quad {\tt {}< \rm {}and \tt {}>}{} {\oldstyle1}
\quad {\bf {}bold}{} {\oldstyle1}
\quad {control\ symbol}{} {\oldstyle1}
\quad {\tt \char 92\hbox {controlsequence}}{} {\oldstyle1}
\quad {\'el\`eve!}{} {\oldstyle1}
\quad {Lamport and \LaTeX{}} {\oldstyle1}
\quad {\TeX, and \AmSTeX{}} {\oldstyle1}
\quad {return}{} {\oldstyle1}--{\oldstyle3}
\quad $\langle \hbox {syntactic\
            quantity}\rangle ${} {\oldstyle1}
\quad {\tt verbatim}{} {\oldstyle1}
\enddemo
The  representation of page numbers as a range comes out automatically.

\exercise What makes a good index?  Of course this is a million-dollar
         question. Let us concentrate on the number of entries and on the
         number of page numbers per entry. Which of the two extremes
         sketched below is the better one in your opinion?
         One with many entries pointing to issues spread throughout the book\Dash
         like \TB{} ;-))), and pushing the limits just for the imagination,
         an index with pointers to related work on the internet, accessible
         by just clicking the mouse\Dash
         or one with few page numbers per entry\ftn{Courtesy Erik Frambach.}?
\answer As usual it all depends on your application. End of answer.
       But\Dash there is always a but\Dash the complaint I heard
       most about \TB{} was that the information is spread all over,
       and that it is hard to find what you are looking for.
       Therefore I consider a few page numbers per entry beneficial.
       (Let us forget about the intrinsic complexity of the
        subject, certainly at the time.)
       BLUe's format supports scrutinizing parts of an index,
       because it is so easy to generate an index per chapter on the fly.
       It is hardly not more difficult than generating a table of contents.
       An index per chapter can be scrutinized more easily, and
       redundancies removed.
       That the index provides a mechanism
       to link things over chapters is a good thing, however.
       Don't misunderstand me.
       But don't overuse it, IMHO, with all respect. Remember DeVinne's
       adage `The last thing to learn is simplicity.'

%end answer

\bluehead Markup of Index Reminders

IR-s are at the heart of the process. ^^{IR,\ markup}
Knuth distinguished {\oldstyle4} types to facilitate the outside processing.
I'll adopt his IRs syntax and types.

\bluesubhead Syntax

Knuth's IRs obey the following syntax. ^^{IR,\ syntax}
\begincenterverbatim
<word(s)>!]!!<digit>!]<page number>.
!endcenterverbatim
The digits  {\oldstyle0}, {\oldstyle1}, {\oldstyle2}, or {\oldstyle3}
denote the types:
words,
verbatim words,
control sequences, and
syntactic quantities.
A user does not have to bother about the digits nor about the
page numbers.
Knuth has adopted the accompanying conventions for
the word(s) of IRs.\ftn{See \TB{} {\oldstyle424}, for the IR types,
  and what is typeset in the result.
  In \cs{vref} the markup is inserted as replacement
  text of \cs{next}. What is set in the index is governed by the macros
  which are included after \cs{begindoublecolumns} in the \TeX book script.}
$$\vbox{\offinterlineskip\def\tstrut{\vrule height2.5ex depth.5ex width0pt}
\halign{\tstrut#\hfill\quad\vrule\quad&#\hfill\quad\vrule\quad&#\hfill\cr
Mark up&Typeset in copy$^*$&IR \cr
\noalign{\hrule}
|^{...}|  &\dots        &|... !!0 |$\langle page\, no\rangle$.\cr
|^!vrt...!vrt|  &|!vrt...!vrt|&
                             |... !!1 |$\langle page\, no\rangle$.\cr
|^!vrt\...!vrt| &|!vrt\...!vrt| &
                              |... !!2 |$\langle page\, no\rangle$.\cr
|^\<...>|&$\langle\dots\rangle^{**}$&
                              |... !!3 |$\langle page\, no\rangle$.\cr
\noalign{\vskip.5ex\hrule width1cm\relax\vskip1ex}
\multispan3{\quad$^*\,$\vrt\dots\vrt\ denotes manmac's, TUGboat's,\dots
  verbatim  \hfil}\cr
\multispan3{\quad$^{**}\,$in \cs{rm} \hfil}\cr
}}$$
For the user the word(s) is (are) important.
The markup allowed  for the IRs and the result in the copy are
given in the accompanying table.

\bluesubhead Markup

The markup for IRs is near to natural.
Precede the entry by a circumflex, or a double one in case of
a silent\ftn{Silent IRs mean that these  will appear
  only in the index, not on the page.}
index entry.

\blueexample IR markup

\thisverbatim{\catcode`\|=12 \catcode`\^=12
  \catcode`!=12 \catcode`*=0 \unmc}
\beginverbatim
^{\'el\`eve!}^|verbatim text|^|\controlsequence|^\<a metalinguistic variable>
^^\<a metalinguistic variable>  %for silent ones, double the ^
{\sl^{ligatures}} |'$|^|\,||$''|%from the TeX book script
^^{markup commands, see control sequences}
^{Lamport and \LaTeX}           %text and control sequences with sort keys
*endverbatim

\thissubsubhead{\runintrue}
\bluesubsubhead Spaces \par are  difficult as always. ^^{IR\ and\ spaces}
In the IR they separate parts of the IR and are used
in the word part.

\bitem Just typing a space has as an effect that
      it will be neglected during  sorting
\bitem The markup `\cs{\char32}', a control space,
      will yield a space subject to sorting, according to
      the ordering table
\bitem \cs{space} as markup will be neglected during sorting.
      This token is default member of
      the set of  control sequences to be ignored.
      It will be set in the index as \cs{\char32}.
\smallbreak

\exercise What to do when part of a title should reappear in the index?
\answer The naive approach is to enclose that part by braces and precede
       it by a circumflex. However, that goes wrong because a title
       is stored and reused in many places. So copy the words and
       mark them as a silent IR.

\blueexample Spaces

\thisverbatim{\catcode`\|=12 }
\begindemo
^{\space}%an ignored cs
^{a\ a}  %control space
^{aa}
^{a\ b}
^{a \TeX}
^{a\ \bf a}
^{\TeX book}
^{xyz beta}%space neglected in
          %sorting
^{xyza}
^|\space|
\sortindex\pasteupindex\bye
!yields
\noindent  Sorted result in file index.srt
\thisverbatim{\catcode`\!=12
             \catcode`\;=0 }
\beginverbatim
\space {} !0 1.
a\ \bf a{} !0 1.
a\ a{} !0 1.
a\ b{} !0 1.
aa{} !0 1.
a \TeX {} !0 1.
space{} !2 1.
\TeX book{} !0 1.
xyza{} !0 1.
xyz beta{} !0 1.
;endverbatim
\enddemo
Explanation. \cs{space} belongs to the set of control sequences
to be ignored, ^^{ignored\ control\ sequences}
ICSs for short.
This means that it is skipped with respect to sorting,
except when it occurs as the last token of the word part.
In that case they are ordered as a space,
i.e., according to the lowest value.
This explains the position of `\cs{space}.'

`\cs{TeX},' and `\cs{TeX} book,' are subject to the default sorting keys.

`xyza' precedes `xyz beta,' because the space is silent.
When word ordering is preferred a \cs{\char32}, a control space,  must
be included.

\bluehead Special tokens

Tokens are either neglected
or replaced by another sequence while sorting.
\bluetex{} provides two sets of tokens to be ignored while sorting:
^|\conseqs| and ^|\consyms|.\ftn{There are two sets because of
  the handling of the space after the token in the result.}
Replacing a control sequence  by another sequence
is called associating a sorting key to the
control sequence.

Active symbols can't be part of the IR, for the moment.

\bluesubhead Tokens to be ignored

In practice I needed things like \cs{tt}
as part of the IR, which must be neglected while sorting.\ftn{The reason is
 that {\tt <, and >} are used, and printed wrongly.}
I decided to ignore those tokens while sorting and to include
the tokens in the final index.elm as such.
Default \bluetex{} knows about the following sets of tokens to be ignored.
\begincenterverbatim
\conseqs{\c\space\bf\it\rm\tt\sub\relax}
\consyms{\`\'\"\^\~}
!endcenterverbatim

\bluesubhead Sorting keys

In order to  extend a set, use the macro \cs{add}.

\blueexample Use of sorting keys

Default \bluetex{} provides the  following
sorting keys.
\begincenterverbatim
\srtkeypairs{\AmSTeX{amstex}
            \LAMSTeX{lamstex}
            \LaTeX{latex}
            \TeX{tex}
            \PS{PostScript}}
!endcenterverbatim
Suppose that we have \cs{fourtex} and that we like this to be sorted
as `4tex.'
This can be done by extending the set of ^|\srtkeypairs|, ^^|\add|
as follows.
\thisverbatim{\unmc}
\begindemo
\add\fourtex{4tex}to\srtkeypairs
Copy with ^{IR \fourtex}
^{IR 1}
^{IR 5}
^{IR a}
%
\sortindex   %with 4tex for \fourtex
\pasteupindex%Set `IR \fourtex{}
            %<pagenumbers>'
\bye
!yields
then the file index.srt will contain the IRs
\thisverbatim{\catcode`\!=12
             \catcode`\;=0 }
\beginverbatim
IR 1 !0 <pageno>.
IR \fourtex{} !0 <pageno>.
IR 5 !0 <pageno>.
IR a !0 <pageno>.
;endverbatim
\enddemo
with \cs{fourtex}  sorted on 4tex.

\exercise What to do when `to' is part of the sorting key?
\answer Add an extra level of braces.


\bluehead Ordering

A fundamental issue with indexes is the ordering. ^^{ordering}
The ^{ASCII} table is not suited because lowercase and uppercase
letters differ by 32. I decided to rank these as equal, more
precisely to assign the lowercase ASCII values to both. I prefer
from the accompanying table the {\oldstyle1}$^{st}$ column to the
{\oldstyle2}$^{nd}$ one.

Moreover, accented letters are not part of ASCII. How should we
order for example e, \'e, \`e, \^e, \"e?
I decided to rank accented letters equal to those without an accent,
because I prefer from the accompanying table
the {\oldstyle3}$^{rd}$ column to the {\oldstyle4}$^{th}$ one.

I know that non-letters precede letters, but what about their relative
ordering? I decided to stay as close as possible to the ASCII ordering.

Then there is the problem of digits. In IRs they come as part of the
word(s) and as page numbers. For the latter I used the numerical ordering.
For the former I used the alphabetical ordering.\ftn{I could have applied
 a look ahead mechanism and use numerical ordering throughout.
 Maybe another time.}

Furthermore, a user can select
the so-called ^{word\ ordering},\ftn{This means that a space precedes
   all letters. A space as such is neglected in the ordering.}
by \cs{\char32}, \TeX nically a control space, as markup for a space.
Personally, I like from the accompanying table
the {\oldstyle5}$^{th}$ column  better than the {\oldstyle6}$^{th}$.
\def\btablecaption{}
\def\footer{}
\nonframed
\def\rowstblst{}
$$\def\header{lower vs.\ upper case\cs
             accents vs.\ unaccented\cs
             word ordering}
\vruled\btable{\vtop{\halign{&\tstrut\quad#\hfil\cr
el      &  el               \cr
El\`eve &  em               \cr
em      &  El\`eve \cr}}\cs
\vtop{\halign{&\tstrut\quad#\hfil\cr
el                 & el                \cr
\hbox{\'el\`eve}   & em                \cr
em                 & \hbox{\'el\`eve}  \cr}}
\cs\vtop{\halign{&\tstrut\quad#\hfil\cr
sea lion & seal    \cr
seal     & sea lion\cr}}
}%end \btable
$$

\bluehead Typesetting the index

The specifications for typesetting a \bluetex{} index are
\bitem represent the four IR types the same as in the \TeX book
\bitem set in two-columns, balanced, possibly preceded
      by one-column copy
\bitem set subsidiary entries analogous to the \TeX book
\bitem indent continuation lines by {\oldstyle2}em
\bitem indent subsidiary entries by {\oldstyle1}em
%\bitem underline page numbers which represent the definition or
%       the main source of information
%\bitem represent a page number in italics when that page contains
%       an instructive example of the concept in question.
\smallbreak
Users can  edit index.elm\Dash read: add markup\Dash and
provide the necessary macros in for example \cs{preindex}.
In short follow  Knuth. To please Frans Goddijn I introduced the tag
^|\numberstyle|, by default  equal to \cs{oldstyle}.

\exercise And what about subentries?
\answer  My approach is to consider subentries as a typesetting problem in the sense
  that the full entries are specified and only when sorted and typeset redundant first
  parts can be suppressed, if one considers this better. This is similar to how
  BLUe's format system typesets references, inspired by the \AMS.
  For the moment I did not implement subentries handling, because as of {\oldstyle1995}
  I consider it of low priority.

\bluehead Customization

A user might wish to interfere in places
\bitem to include other tokens to be ignored while sorting
\bitem to supply an ordering of his/her own
\bitem to enrich the sorted and compressed file index.elm.
\smallbreak

\bluesubhead Adding tokens

What are reasonable requirements to impose upon the handling of
markup control sequences (cs for short)? In my opinion
\bitem the cs must be defined
\bitem ^|\makexref| writes the cs unexpanded
\bitem ordering? unknown, and therefore must be supplied
\bitem ^|\setupnxtokens| guards that the cs-s are written %, unexpanded,
      to ^{index.srt} and ^{index.elm}.
\smallbreak
As a consequence I decided to neglect the `in between'
control sequences while sorting.
For those who favour a one-pass job, I have provided the following,
though.\ftn{It is simpler to add those control sequences to
      index.elm.}

The extension of a set of tokens can be done via ^^|\add|
\beginverbatim
\add\hfil to\conseqs  or  \add\`to\consyms  or  \add\hfil{hfil}to\srtkeypairs
%with auxiliary \def\add#1to#2{...}
!endverbatim
Each element  from \cs{conseqs} is redefined in such a way
that the control sequence token is written to the file
with a space appended.\ftn{\cs{noexpand} is used instead of \cs{string}.}

\bluesubhead Modifying ordering

A general way is to `copy' the ordering table
and to modify it.\ftn{My \cs{fifo} is just a shortcut, which also
  prevents typos in assigning the ASCII values. For \cs{fifo}, see
  my `FIFO and LIFO sing the BLUes.'}

And what about a macro to add to the table?
This can be done easily, and superficially looks convenient
for an innocent user.
At the moment I don't trust the macros to be worthwhile
for an innocent user, unless a very modest index has to be made.
And this completes the circle: different ordering is not wanted, I guess.

\bluesubhead The process and files involved

Like in manmac, \bluetex{} stores the raw IRs in the file index.
^^{IR,\ processes\ and\ files}
The file index^^{index,\ file}\ftn{Default index is the value of
 the toks variable ^|\irfile|, which is used in \cs{sortindex}.}
is read and stored in an array for internal sorting.
After sorting, the number of  entries is
reduced,\ftn{Those which differ by page
  number are collected in one entry.}
and the result is written to the file ^{index.srt}.
Then, index.srt is transformed into the file
index.elm.\ftn{Default ^{index.elm} is the value of the toks variable
     ^|\indexfile|, which is used in |\pasteupindex|.
     The transformation abandons the IR syntax.
     The part which specifies the kind of IR
     is deleted and the word part marked up
     accordingly.}
The result is typeset via ^|\pasteupindex|.
Schematically it comes down to the following.
\bigskip
$$\vbox{\hsize.5\hsize%
\indmodelpic
}$$
^|\loadindexmacros| loads the index and sorting macros, and performs
initializations. It is safeguarded against double loading.\ftn{I introduced
   this because I  start each chapter with \cs{loadindexmacros},
   independent from whether it is run on its own or as part of the total.}


\bluesubhead Enriching the index

This use is necessary when for example ^^{index,\ enrich}
\bitem control sequences have to be typeset
\bitem special symbols are needed, or
\bitem cross-references within the index are required.
\smallbreak
The best way  is to start from the ^{index.elm} file.

\bluesubhead Typesetting the enriched file

When the default name is used\Dash index.elm\Dash
just say \cs{pasteupindex}.
For another file name  assign this name to the toks
variable \cs{indexfile},
prior to the invocation of \cs{pasteupindex}.

\bluehead Extras

Ubdoubtedly people favour their own subset of \TeX, or more likely \LaTeX.
There is good news. You don't have to use BLUe's format system. I gathered the sorting
and indexing stuff as an independent self-contained set in the file plainindex.tpl.

The bad news is that up till now I did not do much about preventing name clashes.

\bluesubhead \TeX nical details

The details with respect to indexing have been treated in `BLUe's Indexes,' and
the sorting aspects have been treated in `Sorting in BLUe,' both  available from the CTAN.
\endinput
\bye