SI_TEX.CHA Change file for SISISI-hyphenation in TeX 3.1 16.7.1991
(c) for SISISI-hyphenation: Wilhelm Barth, Heinrich Nirschl
(c) for TeX changes: Heini Hofstaedter, Harald Mueller
additional changes until 10. October 1991 HH
=========================================================================
This is the change file for inclusion of SISISI into TeX V.3.14.
There are three main changes in this file (which are of course quite
intermingled):
1) the replacement of Liang's hyphenation by SISISI.
2) the change of the interface of the actual hyphenation
to the post-hyphenation and the line-break algorithm.
This change is necessary because in original TeX, only the
information "hyphen - no hyphen" can be transmitted to the
line break algorithm because of two "bottlenecks":
1. Hyphenation --> Post-Hyphenation: Here a hyphen at position i
is marked by the 1-bit information odd(hyf[i]).
2. Post-Hyphenation --> Line-Breaking: Here a hyphen is marked
by the existence of a discretionary node.
From SISISI, there must be more information flowing to the
line-break algorithm:
a. Is a hyphen a main hyphen or a "nebenhyphen"? To this end,
a change is made such that a penalty node is allowed as
the first node in the prebreak list of a discretionary
node. This penalty node contains the penalty associated with
this specific hyphen. For "nebenhyphens", this value is set
by an additional primitive "\nebenhyphenpenalty".
Instead of the hyf-array, an array hyfpen is used that
contains the actual penalties; the old "if odd(hyf[i])" now
is converted ti "if hyfpen[i] <> inf_penalty".
b. A hyphen may be more complicated than a single -: ck-hyphens
and triple-consonant-hyphens may be present. To include these,
another array hyf_disc is introduced that contains ready-made
discretionary nodes for such hyphens.
At first, we simply wanted to include these discretionary
during the post-hyphenation into the new node list. But then
Heini felt that the impossiblity of ligatures with parts of such
a discretionary node was not acceptable; therefore he introdiced
the "x-mechanism" (characterized by many variables starting with
"x") and changed the reconstitute function so that more than the
single hyphen_char could be introduced into the reconstitution
process.
3) "Dirty" changes for the hyphenation of words with umlauts.
These changes are necessary because the umlauts in TeX are not
(necessarily) real characters, but something else (see definition
of umlauts in plain.tex or german.tex).
Ordinarily, the hyphenation only accepts sequences of letters,
ligatures (which are expanded to their single characters), and
so-called "implicit kerns". An umlaut, however, is usually
represented as a letter character and another character consisting
of two dots ("dieresis character"), which are connected by
explicit kerns (not implicite ones!). Therefore, words containing
umlauts are not hyphenable words in TeX, which is not acceptable
for german texts.
Therefore, we devised a "bypass" for umlauts as follows: An umlaut
is now represented as a special discretionary node (so-called
"dirty umlaut") as follows:
\discretionary{\kern4321sp m\kern<code>sp\kern<back>sp\kern-1em}%
{}%
{<umlaut>}
where the 4321sp is a "marker" for the "dirty umlaut", the "m"
letter is used to derive the font of the umlaut, <code> is
the font number that should be associated with this umlaut (the
font numbers of the umlauts are given below); and <back> is used
to cancel the preceding kerns, i.e.\ <back> = - (4321 + <code>).
In other words, the prebreak list contains the information that
this is an umlaut (albeit a "dirty" one) and some more
information needed for the hyphenation, the postbreak list is
empty, and the main list contains the umlaut proper. By a
special if-statement in the line-break algorithm, a hyphenation
at a discretionary node of this format is suppressed, so that
always the umlaut of the main list is printed.
The font numbers for the lowercase umlauts (which are given
in the second \kern node) must be:
ae 228
oe 246
ue 252
The codes for the uppercase umlauts can be chosen arbitrarily.
In order to get the hyphenation for these "dirty umlauts"
running, one must include lccode assignments of the form
\lccode<code>=<lccode>
for each of the new letters (see example below).
A typical declaration of the "dirty umlauts" runs like this:
%%%%%%%%%%%%%%%%%%%%% declaration of dirty umlauts %%%%%%%%%%%%%%%%%%%%%%
\def\A"{\discretionary{\kern 4321sp m\kern 196sp\kern-4517sp\kern-1em}{}{\"A}}
\def\a"{\discretionary{\kern 4321sp m\kern 228sp\kern-4549sp\kern-1em}{}{\"a}}
\def\O"{\discretionary{\kern 4321sp m\kern 214sp\kern-4535sp\kern-1em}{}{\"O}}
\def\o"{\discretionary{\kern 4321sp m\kern 246sp\kern-4567sp\kern-1em}{}{\"o}}
\def\U"{\discretionary{\kern 4321sp m\kern 220sp\kern-4541sp\kern-1em}{}{\"U}}
\def\u"{\discretionary{\kern 4321sp m\kern 252sp\kern-4573sp\kern-1em}{}{\"u}}
\lccode196=228 % uppercase umlaut A
\lccode214=246 % uppercase umlaut O
\lccode220=252 % uppercase umlaut U
\lccode228=228 % uppercase umlaut a
\lccode246=246 % uppercase umlaut o
\lccode252=252 % uppercase umlaut u
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
When (if?) one day N.Schwarz's DC/EC fonts arrive, the changes of the
third kind can be eliminated. How long this will take ... well, wait and
see.
But when they arrive, it's not longer possible to choose the codes for
uppercase umlauts arbitrarily. The codes for uppercase umlauts will
then be those, shown in the example above.
A number of procedures of SISISI are declared after the TeX program. As
these procedures are used in the hyphenation, of course, they must be
declared as `forward'.
In web2c gibt es keine forward's !!!!
%@x
%procedure initialize; {this procedure gets things started properly}
%@y
%@<forward declarations@>
%procedure initialize; {this procedure gets things started properly}
%@z
-------------------------------------------------------------------------
%@x
%@d pool_name_length=8
%@y
%@d pool_name_length=11
% {number of characters of ``pool_f_name''}
%@z
%see the following change of the pool file name "pool_f_name"
-------------------------------------------------------------------------
To have all messages, coming from SI_TEX, displayed on screen
and written to the log-file simultanously, we make the following,
additional definitions.
temp nicht mehr vorhanden, in log ve�ndert.
@x
@d wlog_ln(#)==write_ln(log_file,#)
@d wlog_cr==write_ln(log_file)
@y
@d wlog_ln(#)==write_ln(log_file,#)
@d wlog_cr==write_ln(log_file)
@d si_put(#) == @+ begin
if (file_offset<>0)or(term_offset<>0) then print_ln;
if selector<>log_only then wterm(#);
if (selector=log_only)or(selector=term_and_log) then
wlog(#);
end
@d si_put_ln(#) == @+ begin
if (file_offset<>0)or(term_offset<>0) then print_ln;
if selector<>log_only then wterm_ln(#);
if (selector=log_only)or(selector=term_and_log) then
wlog_ln(#);
end
@d si_put_cr == print_ln
We add a new primitive \nebenhyphenpenalty for setting the penalty of a
minor hyphen:
@x
@d int_pars=55 {total number of integer parameters}
@d count_base=int_base+int_pars {256 user \.{\\count} registers}
@y
@d neben_hyphen_penalty_code=55 {penalty for break after neben-hyphen}
@d si_debug_code=56 {value is treated as an set of active debugging sections}
@d supress_debug_mark_code=57 {same as above; if a section number is set, then
the output of the section number and the debugging output banner
of this section is supressed}
@d int_pars=58 {total number of integer parameters}
@d count_base=int_base+int_pars {256 user \.{\\count} registers}
@z
-------------------------------------------------------------------------
@x
@<Assign the values |depth_threshold:=show_box_depth|...@>=
@y
@d neben_hyphen_penalty==int_par(neben_hyphen_penalty_code)
@d si_debug(#) == if odd(int_par(si_debug_code) div #) then begin
if not odd(int_par(supress_debug_mark_code)div #) then
begin
si_put_cr;
si_put_ln(' <* debugging output section #',#:#,' *>');
end; {marking supression}
{ dig_count entfernt !!! }
@d gubed_si == end
@f si_debug == begin
@f gubed_si == end
@f gubed == nil
As now hyphens may have different penalties, we must somehow associate
a penalty with a discretionary node. We do this by allowing a penalty
node as the first node in the prebreak list of a discretionary node.
The following changes are in two case statements that check the nodes
in the sublists of a discretionary node; we add an empty case for
penalty nodes (two more case statements follow a little bit later).
The following change is in the line-break section, where try_break is
given the penalty stored in the penalty node. This change is the reason
for all other changes concerning penalties. Moreover, in this change
a line break is suppressed if the discretionary node is actually
a "dirty umlaut" (via "if not is_dirty_umlaut(cur_p) then ... try_break").
@x
begin s:=pre_break(cur_p); disc_width:=0;
if s=null then try_break(ex_hyphen_penalty,hyphenated)
else begin repeat @<Add the width of node |s| to |disc_width|@>;
s:=link(s);
until s=null;
act_width:=act_width+disc_width;
try_break(hyphen_penalty,hyphenated);
act_width:=act_width-disc_width;
end;
r:=replace_count(cur_p); s:=link(cur_p);
while r>0 do
begin @<Add the width of node |s| to |act_width|@>;
decr(r); s:=link(s);
end;
prev_p:=cur_p; cur_p:=s; goto done5;
end
@y
begin
s:=pre_break(cur_p); disc_width:=0;
if s=null then try_break(ex_hyphen_penalty,hyphenated)
else if (type(s)=penalty_node) and (link(s)=null) then begin
flush_node_list(s); s := null;
try_break(ex_hyphen_penalty,hyphenated); end
else if not is_dirty_umlaut(cur_p) then begin
if type(s)=penalty_node then s:=link(s);
repeat @<Add the width of node |s| to |disc_width|@>;
s:=link(s);
until s=null;
act_width:=act_width+disc_width;
if type(pre_break(cur_p))=penalty_node then begin
try_break(penalty(pre_break(cur_p)),hyphenated) end
else begin try_break(hyphen_penalty,hyphenated); end;
act_width:=act_width-disc_width;
end;
r:=replace_count(cur_p); s:=link(cur_p);
while r>0 do begin
@<Add the width of node |s| to |act_width|@>;
decr(r); s:=link(s);
end;
prev_p:=cur_p; cur_p:=s; goto done5;
end
There's a little problem, converting a device independent (``.dvi'')
to a post script file (``.ps''). I think that this program part could
cause this problem. so let's see...
!a <-- = --> @x With this, i want to try to reproduce this error.
Because, after i had inserted this change, that error didn`t occur
again!
@<Modify the end of the line...@>=
q:=cur_break(cur_p); disc_break:=false; post_disc_break:=false;
if q<>null then {|q| cannot be a |char_node|}
if type(q)=glue_node then
begin delete_glue_ref(glue_ptr(q));
glue_ptr(q):=right_skip;
subtype(q):=right_skip_code+1; add_glue_ref(right_skip);
goto done;
end
else begin if type(q)=disc_node then
@<Change discretionary to compulsory and set
|disc_break:=true|@>
else if (type(q)=math_node)or(type(q)=kern_node) then width(q):=0;
end
else begin q:=temp_head;
while link(q)<>null do q:=link(q);
end;
@<Put the \(r)\.{\\rightskip} glue after node |q|@>;
done:
!a <-- = --> @y
@<Modify the end of the line...@>=
q:=cur_break(cur_p); disc_break:=false; post_disc_break:=false;
if q<>null then {|q| cannot be a |char_node|}
if type(q)=glue_node then
begin delete_glue_ref(glue_ptr(q));
glue_ptr(q):=right_skip;
subtype(q):=right_skip_code+1; add_glue_ref(right_skip);
goto done;
end
else begin
if type(q)=disc_node then begin
hyfdeb_b:=false;
if is_dirty_umlaut(q) then begin
si_debug(64) hyfdeb_b:=true; gubed_si;
si_put_cr;
si_put_ln(' ??? There's a line break at a dirty umlaut!?');
end;
if not hyfdeb_b then begin
@<Change discretionary to compulsory and set |disc_break:=true|@>
end;
end else
if (type(q)=math_node)or(type(q)=kern_node) then width(q):=0;
end else begin
q:=temp_head;
while link(q)<>null do q:=link(q);
end;
@<Put the \(r)\.{\\rightskip} glue after node |q|@>;
done:
!a <-- = --> @z
If the break actually occurs, the penalty node is made a separate list
with only one node, which is immediately flushed.
@x
begin s:=pre_break(q); link(q):=s;
while link(s)<>null do s:=link(s);
pre_break(q):=null; q:=s;
end
@y
begin
s:=pre_break(q);
if type(s)=penalty_node then begin
s:=link(s);
link(pre_break(q)):= null;
flush_node_list(pre_break(q));
end;
link(q):=s;
while link(s)<>null do s:=link(s);
pre_break(q):=null; q:=s;
end
@z
The type `array[0..65] of 0..256' is needed more often, so we make it
an explicit type (see big SISISI change at end of change file). hc is
now declared of this type.
@x
@!hc:array[0..65] of 0..256; {word to be hyphenated}
@!hn:small_number; {the number of positions occupied in |hc|}
@!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result}
@y
@!hc:a_word; {word to be hyphenated}
@!hn:small_number; {the number of positions occupied in |hc|}
@!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result}
{ some variables for debugging output }
@!hyfdeb_i,hyfdeb_j: integer;
@!hyfdeb_b:boolean;
@!si_p, si_q, si_r, si_s: pointer;
@z
Our hyphenation routines need even more "more local variables".
@x
@ Hyphenation routines need a few more local variables.
@y
@!dirty_umlaut:array [-1..64] of pointer; {the [-1] field is used, if a
whatsit node occurs inbetween a word. This can happen due to
Partl's changes in ``german.sty'' }
@ Hyphenation routines need a few more local variables.
@z
The dirty_umlaut array must be initialized for each word.
And we enable watching the node list, representing the word
to be hyphenated.
@x
if s<>null then
begin @<Skip to node |ha|, or |goto done1| if no hyphenation
@y
si_p:=cur_p; si_q:=null;
for jj:=-1 to 64 do dirty_umlaut[jj] := null;
if s<>null then begin
si_debug(1)
{ hyfdeb_i:=depth_threshold; depth_threshold:=100;
hyfdeb_j:=breadth_max; breadth_max:=100;
}
show_node_list(si_p);
{ depth_threshold:=hyfdeb_i; breadth_max:=hyfdeb_j; }
gubed_si;
@<Skip to node |ha|, or |goto done1| if no hyphenation
@z
@x
hyphenate;
end;
done1: end
@y
hyphenate;
end;
done1:
si_debug(32)
hyfdeb_i := depth_threshold; hyfdeb_j := breadth_max;
if si_q<>null then begin
depth_threshold:=100; {show all levels of a list}
breadth_max:=200; {show all nodes of one level of a list}
si_r:=link(si_q); link(si_q):=null;
end else begin
(* take the values, set by the user, for this variables! *)
breadth_max:=show_box_breadth; depth_threshold:=show_box_depth;
end;
show_node_list(si_p);
si_put_cr;
if si_q <> null then link( si_q ) := si_r;
depth_threshold:=hyfdeb_i; breadth_max:=hyfdeb_j;
gubed_si;
end
@z
The prehyphenation loop must now also skip over the "dirty umlauts".
@x
loop@+ begin if is_char_node(s) then
begin c:=qo(character(s)); hf:=font(s);
end
else if type(s)=ligature_node then
if lig_ptr(s)=null then goto continue
else begin q:=lig_ptr(s); c:=qo(character(q)); hf:=font(q);
end
else if (type(s)=kern_node)and(subtype(s)=normal) then goto continue
else if type(s)=whatsit_node then
begin @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>;
goto continue;
end
else goto done1;
if lc_code(c)<>0 then
if (lc_code(c)=c)or(uc_hyph>0) then goto done2
@y
loop@+ begin
if is_char_node(s) then begin
c:=qo(character(s)); hf:=font(s);
end else if type(s)=ligature_node then begin
if lig_ptr(s)=null then goto continue
else begin
q:=lig_ptr(s); c:=qo(character(q)); hf:=font(q);
end
end else if is_dirty_umlaut(s) then begin
c:=get_char_of(s); hf:=get_font_of(s);
end else if (type(s)=kern_node)and(subtype(s)=normal) then goto continue
else if type(s)=whatsit_node then begin
@<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>;
goto continue;
end else goto done1;
if lc_code(c)<>0 then
if (lc_code(c)=c)or(uc_hyph>0) then goto done2
@z
The interface variables for the hyphenation are extended: hyfpen[i]
holds the actual penalty of a hyphen at position i, hyf_disc[i]
contains a "special discretionary" for triple-consonant-hyphens and
ck-hyphens. hc_si is the word as given to the SISISI algorithm;
hyf_si is returned from SISISI and contains markers for the hyphens.
After this change, there are many changes where `odd(hyf[i])' is
simply replaced by `hyfpen[i]<>inf_penalty'.
Another procedure to be changed is `reconstitute': It now has to handle
special dicretionary nodes. For a detailed feeling about the code changes
in reconstitute, either (1) look through it bit by bit, or (2) ask Heini
Hofstaedter.
@x
end
else if type(s)=ligature_node then
@<Move the characters of a ligature node to |hu| and |hc|;
but |goto done3| if they are not all letters@>
@y
end else if is_dirty_umlaut(s) then begin
if get_font_of(s)<>hf then goto done3;
c:=get_char_of(s);
if lc_code(c)=0 then goto done3;
if hn=63 then goto done3;
hb:=s; incr(hn); hu[hn]:=c; hc[hn]:=lc_code(c);
dirty_umlaut[hn]:=s;
for jj:=1 to replace_count(s) do s:=link(s);
end else if type(s)=whatsit_node then begin
@<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>;
dirty_umlaut[-1]:=s;
end else if type(s)=ligature_node then
@<Move the characters of a ligature node to |hu| and |hc|;
but |goto done3| if they are not all letters@>
@z
Also in the conversion loop (where the node list is shuffled into the
hc array), we must skip a "dirty umlaut".
@x
loop@+ begin if not(is_char_node(s)) then
case type(s) of
ligature_node: do_nothing;
@y
loop@+ begin
if not(is_char_node(s)) then
if is_dirty_umlaut(s) then begin
for jj:=1 to replace_count(s)-1 do s:=link(s);
end else
case type(s) of
ligature_node: do_nothing;
@z
Well, there are a lot of new variables that are necessary for the new
hyphenation. Most of them are necessary for Heini's change in the
reconstitute function, where now discretionary nodes for hyphens with
more than one character (e.g. the german ck-hyphens or
three-consonant-hyphens) can be inserted.
@x
@ @<Local variables for hyphenation@>=
@!i,@!j,@!l:0..65; {indices into |hc| or |hu|}
@!q,@!r,@!s:pointer; {temporary registers for list manipulation}
@!bchar:halfword; {right boundary character of hyphenated word, or |non_char|}
@y
@!hyfpen:array [0..64] of integer; {penalties for possible discretionary
hyphens}
@!hyf_si:trennvektor;
@!hc_si:wort;
@!hyf_disc:array [0..64] of pointer;
@ @<Local variables for hyphenation@>=
@!i,@!j,@!l,@!kk:0..65; {indices into |hc| or |hu|}
@!q,@!r,@!s,@!qq,@!qr,@!hold_r:pointer; {temporary registers for list manipulation}
@!bchar:halfword; {right boundary character of hyphenated word, or |non_char|}
@!xchg_char:array [0..10] of 0..256; {type of |hu| and |hc|}
@!add_char, {additional characters when hyphenating e.g.\
Schiffahrt $\rightarrow$ Schif{\sl f}-fahrt: here
|add_char| = 1}
@!xchg_i,xchg_k,xchg_hu: 0..65; {type like |i|, |j| and |l| }
@!spec_hyf: pointer; {special hyphen; causes different spelling of the
word, when it is hyphenated at this point}
@!si_j:integer;
@z
The new variable hold_r is initialized (the kern node is a dummy node,
as far as I can see).
@x
if not is_char_node(hb) then
if type(hb)=ligature_node then if odd(subtype(hb)) then
@y
hold_r := new_kern(1235); link(hold_r):=r;
if not is_char_node(hb) then
if type(hb)=ligature_node then if odd(subtype(hb)) then
@z
Initialize some variables if the list starts with a "dirty umlaut".
@x
else if type(ha)=ligature_node then
if font(lig_char(ha))<>hf then goto found2
else begin init_list:=lig_ptr(ha); init_lig:=true; init_lft:=(subtype(ha)>1);
@y
else if is_dirty_umlaut(ha) then begin
if get_font_of(ha) <> hf then goto found2
else begin init_list:=ha; init_lig := false; hu[0]:=get_char_of(ha);
end
end else if type(ha)=ligature_node then
if font(lig_char(ha))<>hf then goto found2
else begin init_list:=lig_ptr(ha); init_lig:=true; init_lft:=(subtype(ha)>1);
@z
@x
common_ending: flush_node_list(r);
@<Reconstitute nodes for the hyphenated word, inserting discretionary hyphens@>;
flush_list(init_list)
@y
common_ending:
qq:=hold_r;
for kk:={1}j to hn do if dirty_umlaut[kk] <> null then begin
while link(qq) <> dirty_umlaut[kk] do qq := link(qq);
qr := link(qq);
for i:=1 to replace_count(qr) do
if qr<>null then qr:=link(qr)
else begin {test:}
si_put_ln('...??..X --> ????? !');
end; {:test}
if qr<>null then begin
link(qq):=link(qr); link(qr):=null;
end else link(qq):=null;
end;
if dirty_umlaut[-1]<>null then begin
qq:=hold_r;
while (link(qq)<>dirty_umlaut[-1])and(link(qq)<>null) do qq:=link(qq);
if link(qq)=dirty_umlaut[-1] then begin
link(qq):=link(dirty_umlaut[-1]);
link(dirty_umlaut[-1]):=null;
end else begin
si_put(' *** ERROR: node list is damaged! ');
si_put_ln('(post hyphenation - unhinge a whatsit node)');
end;
end;
{}
r:=hold_r;
while link(r)<>null do begin
if (mem_min<=link(r))and(link(r)<=mem_max) then r:=link(r)
else begin
wlog_cr; wterm_cr;
wlog_ln('???$ - ERROR in node list!');
wterm_ln('???$ - ERROR in node list!');
kk:=depth_threshold; l:=breadth_max;
depth_threshold:=100; breadth_max:=100;
show_node_list(hold_r);
link(r):=null;
depth_threshold:=kk; breadth_max:=l;
end;
end;
{}
flush_node_list(hold_r);
@<Reconstitute nodes for the hyphenated word, inserting discretionary hyphens@>;
flush_list(init_list)
@z
Another odd->inf_penalty change (this time only textual).
@x
If there exists an index |k| in the range $j\le k\le m$ such that |hyf[k]|
is odd and such that the result of |reconstitute| would have been different
if $x_{k+1}$ had been |hchar|, then |reconstitute| sets |hyphen_passed|
to the smallest such~|k|. Otherwise it sets |hyphen_passed| to zero.
@y
If there exists an index |k| in the range $j\le k\le m$ such that
|hyfpen[k]<>inf_penalty| and such that the result of |reconstitute|
would have been different if $x_{k+1}$ had been |hchar|, then |reconstitute|
sets |hyphen_passed| to the smallest such~|k|. Otherwise it sets
|hyphen_passed| to zero.
@z
In the reconstitute function, we set the boundary value n immediately
before the next "dirty umlaut" by a simple for-loop starting at the
original n.
@x
begin hyphen_passed:=0; t:=hold_head; w:=0; link(hold_head):=null;
@y
@!ii:small_number;
begin hyphen_passed:=0; t:=hold_head; w:=0; link(hold_head):=null;
for ii:=n downto j do
if dirty_umlaut[ii] <> null then n:=ii-1;
@z
@x
else begin if hchar<non_char then if odd(hyf[j]) then
begin hyphen_passed:=j; hchar:=non_char;
end;
if op_byte(q)<kern_flag then
@<Carry out a ligature replacement, updating the cursor structure
@y
else begin if hchar<non_char then if hyfpen[j]<>inf_penalty then
begin hyphen_passed:=j; hchar:=non_char;
end;
if op_byte(q)<kern_flag then
@<Carry out a ligature replacement, updating the cursor structure
@z
Now we are in the post-hyphenation's main loop; here must append a
"dirty umlaut" to the growing node list. This is simply done by an
if-statement that checks for the presence of a "dirty umlaut" and
appends it.
Moreover, we have another textual odd->inf_penalty change.
@x
repeat l:=j; j:=reconstitute(j,hn,bchar,qi(hyf_char))+1;
if hyphen_passed=0 then
begin link(s):=link(hold_head);
while link(s)>null do s:=link(s);
if odd(hyf[j-1]) then
begin l:=j; hyphen_passed:=j-1; link(hold_head):=null;
end;
end;
if hyphen_passed>0 then
@<Create and append a discretionary node as an alternative to the
unhyphenated word, and continue to develop both branches until they
become equivalent@>;
until j>hn;
link(s):=q
@ In this repeat loop we will insert another discretionary if |hyf[j-1]| is
odd, when both branches of the previous discretionary end at position |j-1|.
@y
if dirty_umlaut[-1]<>null then begin
{hook in the previous saved whatsit node before the first
character of the hyphenated word}
link(s):=dirty_umlaut[-1];
s:=link(s);
end;
repeat
if dirty_umlaut[j]<>null then begin
link(s):=dirty_umlaut[j];
while link(s)>null do s:=link(s);
link(hold_head):=null;
j:=j+1; hyphen_passed:=0;
if hyfpen[j-1]<>inf_penalty then begin
l:=j; hyphen_passed:=j-1;
si_debug(16)
si_put_ln(' *** append a hyphen immediatly after a dirty umlaut!');
gubed_si;
end;
end else begin
l:=j; j:=reconstitute(j,hn,bchar,qi(hyf_char))+1;
if hyphen_passed=0 then begin
link(s):=link(hold_head);
while link(s)>null do s:=link(s);
if hyfpen[j-1]<>inf_penalty then begin
l:=j; hyphen_passed:=j-1; link(hold_head):=null;
end;
end;
end; {if dirty_umlaut[j]<>null then-else}
if hyphen_passed>0 then
@<Create and append a discretionary node as an alternative to the
unhyphenated word, and continue to develop both branches until they
become equivalent@>;
until j>hn;
link(s):=q;
si_q:=s; {save the last node of the hyphenated word for debugging output}
for j:=0 to 64 do if hyf_disc[j]<>null then begin
flush_node_list( hyf_disc[j]);
end
@ In this repeat loop we will insert another discretionary if
|hyfpen[j-1]<>inf_penalty|, when both branches of the previous
discretionary end at position |j-1|.
@z
Here comes the big reconstitution action (cheers to Heini!).
@x
@<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>=
minor_tail:=null; pre_break(r):=null; hyf_node:=new_character(hf,hyf_char);
if hyf_node<>null then
begin incr(i); c:=hu[i]; hu[i]:=hyf_char; free_avail(hyf_node);
end;
while l<=i do
begin l:=reconstitute(l,i,font_bchar[hf],non_char)+1;
if link(hold_head)>null then
begin if minor_tail=null then pre_break(r):=link(hold_head)
@y
As the value of |hyfpen[i]| is necessary for creating a penalty node,
we must create this node before we ``kill'' the hyphenation information in
|hyfpen[i]| by assigning |inf_penalty|. |minor_tail| contains the
pointer to the penalty node or |null|, if no such node has been created.
@<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>=
if (hyfpen[i]<>inf_penalty) and (hyfpen[i]<>hyphen_penalty)
then minor_tail:=new_penalty(hyfpen[i])
else minor_tail:=null;
hyfpen[i]:=inf_penalty;
pre_break(r):=minor_tail;
hyf_node:=new_character(hf,hyf_char);
{Exchange |pre_break|-list of the selfbuilt |disc_node| with |hu|}
xchg_k:=i;
if hyf_disc[xchg_k]<>null then begin
spec_hyf:=pre_break(hyf_disc[xchg_k]);
add_char:=subtype(spec_hyf);
xchg_i:=1;
repeat
spec_hyf:=link(spec_hyf);
if spec_hyf<>null then begin
xchg_char[xchg_i]:=qo(character(spec_hyf));
incr(xchg_i);
end;{if}
until spec_hyf=null;
if hyf_node<>null then begin
xchg_char[xchg_i]:=hyf_char;
free_avail(hyf_node);
end else decr(xchg_i);
xchg_hu:=i-xchg_i+add_char+2; {this is the index of |hu| where the first
character is replaced by a character of the
|pre_break|-list of |spec_hyf|}
for xchg_k:=1 to xchg_i do begin
xchg_char[0]:=hu[xchg_hu];
hu[xchg_hu]:=xchg_char[xchg_k];
xchg_char[xchg_k]:=xchg_char[0];
incr(xchg_hu);
end; {|for|}
xchg_k:=i;
i:=xchg_hu-1;
xchg_hu:=xchg_hu-xchg_i;
end {|if hyf_disc[]<>null|}
else if hyf_node<>null then begin
incr(i); c:=hu[i]; hu[i]:=hyf_char; free_avail(hyf_node);
end;
while l<=i do begin
l:=reconstitute(l,i,font_bchar[hf],non_char)+1;
if link(hold_head)>null then
begin if minor_tail=null then pre_break(r):=link(hold_head)
@z
@x
if hyf_node<>null then
begin hu[i]:=c; {restore the character in the hyphen position}
l:=i; decr(i);
end
@y
if hyf_disc[xchg_k]<>null then begin
{ Restore the original word in |hu| }
i:=xchg_k; l:=i+1;
for xchg_k:=1 to xchg_i do begin
hu[xchg_hu]:=xchg_char[xchg_k];
incr(xchg_hu);
end;{for}
end else if hyf_node<>null then begin
hu[i]:=c; {restore the character in the hyphen position}
l:=i; decr(i);
end
.. but now we kill all of Liang's code and replace it by some re- and
back-en- and decoding, a call to abteilen, and the construction of the
special discretionaries. Then we go right into the SISISI hyphenation.
=========================================================================
@x
if trie_char(cur_lang+1)<>qi(cur_lang) then return; {no patterns for |cur_lang|}
hc[0]:=0; hc[hn+1]:=0; hc[hn+2]:=256; {insert delimiters}
for j:=0 to hn-r_hyf+1 do
begin z:=trie_link(cur_lang+1)+hc[j]; l:=j;
while hc[l]=qo(trie_char(z)) do
begin if trie_op(z)<>min_quarterword then
@<Store \(m)maximum values in the |hyf| table@>;
incr(l); z:=trie_link(z)+hc[l];
end;
end;
found: for j:=0 to l_hyf-1 do hyf[j]:=0;
for j:=0 to r_hyf-1 do hyf[hn-j]:=0
@ @<Store \(m)maximum values in the |hyf| table@>=
begin v:=trie_op(z);
repeat v:=v+op_start[cur_lang]; i:=l-hyf_distance[v];
if hyf_num[v]>hyf[i] then hyf[i]:=hyf_num[v];
v:=hyf_next[v];
until v=min_quarterword;
end
@ The exception table that is built by \TeX's \.{\\hyphenation} primitive is
@y
for j:=0 to hn do
hc_si[j] := ktab[hc[j]];
si_debug(2)
si_put(' hc:');
for j:=0 to hn do si_put( xchr[hc[j]] );
si_put_ln(':');
si_put(' hc_si:');
for j:=0 to hn do si_put( xchr[rev_tab[hc_si[j]]] );
si_put_ln(':');
gubed_si;
si_debug(4)
si_put_ln('< decimal character codes >');
si_put(' hc:');
for j:=0 to hn do begin
si_put(hc[j]:3,'-');
if ((j+1) mod 16)=0 then begin
si_put_cr; si_put(' ');
end;
end;
si_put_cr;
si_put(' hc_si:');
for j:=0 to hn do begin
si_put( rev_tab[hc_si[j]]:3,'-' );
if ((j+1) mod 16)=0 then begin
si_put_cr; si_put(' ');
end;
end;
si_put_cr; si_put_cr;
gubed_si;
abteilen(hc_si,hyf_si,hn,funktioniert,zerleg);
for j:=0 to 64 do begin
hyf_disc[j]:=null; hyfpen[j]:=inf_penalty;
end;
if funktioniert then
for j:=0 to hn do begin
if hyf_si[j]=neben then hyfpen[j] := neben_hyphen_penalty
else if hyf_si[j]=haupt then hyfpen[j] := hyphen_penalty
else if hyf_si[j]=nebenck then begin
hyfpen[j+1] := neben_hyphen_penalty;
s:=get_node(small_node_size); {discretionary node erzeugen}
type(s):=disc_node;
hyf_disc[j+1]:=s;
{|pre_break|-Liste: }
pre_break(s):=new_penalty(neben_hyphen_penalty); s:=pre_break(s);
subtype(s):=0; { number of additional characters }
link(s):=new_character(hf,hu[j+2]); {'k' oder'K'}
link(link(s)):=null;
s:=hyf_disc[j+1];
post_break(s):=null; link(s):=null;
end
else if hyf_si[j]=haupt3 then begin
hyfpen[j-1]:=hyphen_penalty;
s:=get_node(small_node_size); {discretionary node erzeugen}
type(s):=disc_node;
hyf_disc[j-1]:=s;
{|pre_break|-Liste: }
pre_break(s):=new_penalty(hyphen_penalty); s:=pre_break(s);
subtype(s):=1;
link(s):=new_character(hf,hu[j]); { Dreifachkonsonant }
link(link(s)):=null;
{|post_break|- und |link|-Liste: }
s:=hyf_disc[j-1];
post_break(s):=null;
link(s):=null;
end
end; {|if funktioniert|}
si_debug(8)
if funktioniert then begin
for j:=0 to hn do
if hyfpen[j]<>inf_penalty then begin
if hyf_disc[j]<>null then begin
if subtype(pre_break(hyf_disc[j]))=0 then
si_put( xchr[hu[j+1]], '-' )
else {if subty...}
si_put( xchr[hu[j+1]], xchr[hu[j+1]], '=' );
{ i hope so }
end else begin
si_put( xchr[hu[j]] );
if hyfpen[j]=hyphen_penalty then
si_put('=')
else si_put('-');
end;
end else begin
si_put( xchr[hu[j]] );
end;
end else begin
si_put_ln(' no hyphens were found in this word!');
end;
si_put_cr;
gubed_si;
goto done1;
found:
hyfpen[0]:=inf_penalty;
for j:=1 to hn do
if odd(hyf[j]) then hyfpen[j]:=hyphen_penalty
else hyfpen[j]:=inf_penalty;
done1:
@ Das folgende Modul ersetzt die gesamte Liang'sche Trennung. Aufgabe
dieses Modules ist es, ein in hc (ein Array of 0..255 ) liegendes Wort
nach Trennstellen zu untersuchen und die gefundenen Trennstellen in
hyf (ebenfalls ein Array of neben...) an den entsprechenden Stellen
einzutragen. Die genaue Beschreibung der Vorgangsweise findet man im
INSTITUTSBERICHT NR. 26, TECHNISCHE UNIVERSIT\"AT WIEN, INSTITUT
F\"UR PRAKTISCHE INFORMATIK.
Alle Prozeduren f\"ur die Silbentrennung \SISISI, die bei jedem
Start von TeX aufgerufen werden.
@ Dieser Modul beinhaltet alle f\"ur die neue Silbentrennung |abteilen|
n\"otigen Initialisierungen bzw. Prozeduren, die nur einmal am Anfang
ausgef\"uhrt werden sollen.
@
@<Types in the outer block@>=
string80 = packed array [0..80] of eight_bits;
buchstabe = eight_bits;
a_word = array[0..maxwl] of 0..256;
wort = a_word;
sperrvektor=array [0..maxwl] of boolean;
dudt=array [0..maxwl] of boolean;{frueher kein type !!}
{ trennung = (keine, neben, nebenck, haupt, haupt3); }
trennvektor = array [0..maxwl] of integer; { Trennstelle bei i ==>
zwischen Buchstabe i und i+1 wird getrennt }
ktabtype = packed array [0..255] of buchstabe;
verdschlue = 0..8388647; { 2**23 - 1 } { -PRAK!- unsigned long in C }
varr = array [0..maxwl] of verdschlue;
intarr = array [0..maxwl] of integer;
{ darf nicht als var Parameter uebergeben werden !!}
infotype = record
{ frueher ein variantenRecord }
ausnahme:boolean;
untrennbar:boolean;
{ erste :0..7;
zweite:0..7; }
erste :integer;
zweite:integer;
endung,vorsilbe,stamm,ehervor,stammallein:boolean;
end;
infobyte = eight_bits;
{ darf nicht als var Parameter uebergeben werden !!}
tableentry=record
tabv:verdschlue;
tabb:infobyte;
frei:boolean;
wiederfrei:boolean;
end;
hashelement = packed array [0..3] of eight_bits;
htabtype = packed array [0..tabsize] of hashelement;
function dig_count( v: integer): integer;
{ calculates the number of decimal digits of ``v'' }
var x, e, c: integer;
begin
x := v div 10; e:=1; c:=1;
while e<=x do begin
e := e * 10; incr(c);
end;
dig_count:=c;
end;
@<Handling of dirty umlauts@>=
procedure hashfunk (var w:wort; anfang,ende:integer;
var v:varr; var ind,k:intarr );
{ -PRAK!- in C long statt integer }
var vacc, indacc, kacc : integer;
i : integer;
begin
vacc := 31415; indacc := 152; kacc := 271;
dbg_put_ln(' * HFNC =');
for i := anfang to ende do begin
dbg_put(' * HFNC ',i-anfang:3,' A');
vacc := (vacc * 31 + w [i]) mod 8388648;
dbg_put('B');
v[i] := vacc;
dbg_put('C');
indacc := (indacc * 33 + w[i]) mod (tabsize+1);
dbg_put('D');
ind[i] := indacc;
dbg_put('E');
kacc := (kacc * 15 + w[i]) mod ((tabsize+1) div 2);
dbg_put('F');
k[i] := 2*kacc + 1;
dbg_put_ln('G');
end; {|for|}
end; {|hashfunk|}
procedure unpackinfo ( b:infobyte;
{ var i:infotype }
var iausnahme :boolean;
var iuntrennbar :boolean;
var ierste :integer;
var izweite :integer;
var iendung :boolean;
var ivorsilbe :boolean;
var istamm :boolean;
var iehervor :boolean;
var istammallein:boolean );
begin
{ with i do begin }
if odd(b) then begin { 87654321 }
iausnahme:= true;
b:= b div 2; { 08765432 }
if odd(b) then iuntrennbar:= true
else iuntrennbar:= false;
b:=b div 2; { 00876543 }
ierste := b mod 8; { 00876xxx }
izweite := b div 8; { 00000876 }
end else begin
iausnahme:= false; { 87654321 }
b:= b div 2; { 08765432 }
if odd(b) then iendung:= true
else iendung:= false;
b:= b div 2; { 00876543 }
if odd(b) then ivorsilbe:= true
else ivorsilbe:= false;
b:= b div 2; { 00087654 }
if odd(b) then istamm:= true
else istamm:= false;
b:= b div 2; { 00008765 }
if odd(b) then iehervor:= true
else iehervor:= false;
b:= b div 2; { 00000876 }
if odd(b) then istammallein:= true
else istammallein:= false;
end; {|if-then-else|}
{ end; |with|}
end; {|unpackinfo|}
procedure unpackentry( h:hashelement;
{ var e:tableentry }
var etabv:verdschlue;
var etabb:infobyte;
var efrei:boolean;
var ewiederfrei:boolean );
begin
{ with e do begin }
dbg_put_cr;
dbg_put(' ==> UPCKNTY 1');
etabv:= h[0]+h[1]*256+h[2]*256*256;
dbg_put(' 2');
if odd(h[3]) then begin { Ausnahme }
dbg_put(' 3 A');
etabb:= h[3];
dbg_put('1');
efrei:= false;
ewiederfrei:= false;
end
else begin
dbg_put(' 3 B');
etabb:= h[3] mod 64;
dbg_put('1');
{ h[3]:=h[3] div 64;} {-PRAK!- in C ein var-Paramter, darf daher
nicht ver�ndert werden. }
dbg_put('2');
{ durch folgende 2 Abfragen ersetzt. -PRAK!- }
if odd(h[3] div 64) then efrei := true
else efrei := false;
if odd(h[3] div 128) then ewiederfrei := true
else ewiederfrei := false;
end;
dbg_put_ln('- 4');
{ end; |with|}
end; {|unpackentry|}
procedure hashsuch ( v:verdschlue;
ind,k : integer;
{ var i:infotype; }
var iausnahme :boolean;
var iuntrennbar :boolean;
var ierste :integer;
var izweite :integer;
var iendung :boolean;
var ivorsilbe :boolean;
var istamm :boolean;
var iehervor :boolean;
var istammallein:boolean;
var g:boolean );
{ sucht Eintragung in der Hashtabelle, g ist TRUE falls gefunden }
var
gef,ngef:boolean;
j:integer;
entry : tableentry;
begin
gef := false; ngef := false; j := ind;
repeat
unpackentry(htab[j],entry.tabv,entry.tabb,entry.frei,entry.wiederfrei);
if entry.frei then ngef := true
else if (v = entry.tabv) and not entry.wiederfrei then begin
unpackinfo(entry.tabb,iausnahme,iuntrennbar,ierste,izweite,
iendung,ivorsilbe,istamm,iehervor,istammallein);
gef := true;
end else
j := (j+k) mod (tabsize + 1)
until gef or ngef;
g := gef;
end; {|hashsuch|}
procedure hashload ( var status:integer );
{ einlesen der Hash-Tabelle von file fn }
var i : integer;
e : tableentry;
begin
{ Orginal sitex3.cha -PRAK!-
for i:=1 to file_name_size do name_of_file:= ' ';
name_of_file[1]:='h';
name_of_file[2]:='f';
name_of_file[3]:='3';
name_length := 3;
anzahl := 0;
if a_open_in(hashfile) then begin
}
{ geaendert !! -PRAK!- }
for i:=1 to file_name_size do name_of_file[i]:= ' ';
name_of_file[1]:='h';
name_of_file[2]:='f';
name_of_file[3]:='3';
name_length := 3;
anzahl := 0;
{ -PRAK!- texinputpath f�r C notwendig }
if a_open_in(hashfile, TEX_INPUT_PATH) then begin
for i := 0 to tabsize do begin
{ -PRAK!- in C Probleme mit den typen beim einlesen }
read (hashfile, htab[i][0]); { htab = packed array [0..tabsize] of }
read (hashfile, htab[i][1]); { hashelement, und }
read (hashfile, htab[i][2]); { hashelement = packed array 0..3 }
read (hashfile, htab[i][3]);
unpackentry (htab[i],e.tabv,e.tabb,e.frei,e.wiederfrei);
if not e.frei and not e.wiederfrei then anzahl := anzahl + 1;
end; {|for|}
status:= 0;
a_close(hashfile);
end else status:= 1;
end; {|hashload|}
procedure inittrennen;
var ch: 0..255;
bst: buchstabe;
i, status: integer;
is_iniTeX: boolean;
begin
{ Konversionstabelle |ktab| initialisieren }
{ ======================================== }
for ch := 0 to 255 do begin
{ if (ch in [" ",",",";",":","-","/","?","!","(",")",".","""","'","^"]) then }
case ch of
" ",",",";",":","-","/","?","!","(",")",".","""","'","^":
if ch="." then ktab[ch]:= "."
else ktab[ch] := trennzeichen
{ else ktab[ch] := sonderzeichen geht nicht !!}
end; { case }
if (ktab[ch] <> ".") and (ktab[ch] <> trennzeichen) then ktab[ch] := sonderzeichen;
end; { for }
for bst := bst_a to bst_z do begin
ktab[ord('A')+ord(bst)-ord(bst_a)]:=bst;
ktab[ord('a')+ord(bst)-ord(bst_a)]:=bst
end;
{ Umlaute und scharfes s - f\"ur ``production \TeX version'' }
ktab[255]:=bst_sz;
ktab[246]:=bst_oe;
ktab[252]:=bst_ue;
ktab[228]:=bst_ae;
{ Umlaute und scharfes s - f\"ur ``initialization \TeX version'' }
ktab["1"]:=bst_sz;
ktab["2"]:=bst_ue;
ktab["3"]:=bst_oe;
ktab["4"]:=bst_ae;
ktab[25]:=bst_sz; {?-wieso? HH}
{testing:}
for i:=0 to 255 do rev_tab[ktab[i]]:=i;
rev_tab[ktab[255]]:=223;
xchr[223]:=chr(223); xchr[228]:=chr(228); xchr[246]:=chr(246);
xchr[252]:=chr(252); xchr[255]:=chr(223); xchr[ 25]:=chr(223);
{:end of testing}
Die Prozedur ``hashload'' darf im Initialisierungslauf von TeX auf
keinen Fall aufgerufen werden. Deshalb wird zun\"achst festgestellt, ob
dieser Programmlauf ein Initialisierungslauf ist oder nicht. Abh\"angig
davon wird ``hashload'' nur aufgerufen, wenn dies ein ``Arbeits''-lauf
(production-version) von \TeX ist.
=============================================================================}
is_iniTeX:=false;
init is_iniTeX:=true; tini
if not is_iniTeX then begin
hashload(status);
if status <> 0 then begin
si_put_cr;
si_put_ln
(' * * * * * * * * * *!');
si_put_ln('**** ERROR: Can not read hash-file!');
si_put_cr;
jump_out;
end;
end; {|not iniTeX|}
end; {|inittrennen|}
begin
b:=0;
dbg_put_cr;
{ with i do begin }
if i.ausnahme then
begin
dbg_put(' ### PCKINF 1A utr:');
if i.untrennbar then dbg_put('Y') else dbg_put('N');
dbg_put_ln(',',ord(i.untrennbar):3,' 1.',i.erste:3,' 2.',i.zweite:3);
b:=1+2*ord(i.untrennbar)+4*i.erste+32*i.zweite
; dbg_put('B');
end
else
begin
dbg_put(' ### PCKINF 2A');
b:= 2*ord(i.endung)+4*ord(i.vorsilbe)+8*ord(i.stamm)+16*ord(i.ehervor)
+32*ord(i.stammallein);
dbg_put('B');
end;
{ end; with }
dbg_put_ln('-3');
end; {|packinfo|}
procedure packentry( e:tableentry; var h:hashelement );
{ /infobyte+frei,wiederfrei/verdschhigh/verdschmiddle/verdschllow/ }
begin
{ with e do begin }
dbg_put_cr;
dbg_put(' +++ PCKNTY A');
h[0]:= e.tabv mod 256; { 87654321abcdefghxxxxxxxx}
dbg_put('B');
e.tabv:= e.tabv div 256; { 0000000087654321abcdefgh}
dbg_put('C');
h[1]:= e.tabv mod 256; { 0000000087654321xxxxxxxx}
dbg_put('D');
h[2]:= e.tabv div 256; { 000000000000000087654321}
dbg_put('E');
if odd(e.tabb) then h[3]:= e.tabb { Ausnahme !! }
else
begin
dbg_put('F1');
h[3]:=e.tabb+64*ord(e.frei)+128*ord(e.wiederfrei); { wfxxxxxx }
dbg_put('-2');
end;
dbg_put_ln('G');
{ end; |with|}
end; {|packentry|}
procedure hashetr ( var w:wort; laenge:integer; i:infotype;
var g:boolean );
begin
dbg_put_ln(' ** HTR 1');
if anzahl >= tabsize then begin
dbg_put_ln(' ** HTR 2a');
g := false;
end
else begin
dbg_put_ln(' ** HTR 2b');
if (anzahl < 10) or (anzahl mod 100 = 0) then begin
si_put_cr; si_put(' -Anzahl:',anzahl,'. ');
end;
dbg_put_ln(' ** HTR 3');
hashfunk(w,1,laenge,v1,ind1,k1);
dbg_put_ln(' ** HTR 4');
v := v1[laenge];
ind := ind1[laenge];
k := k1[laenge];
gef := false; ngef := false;
repeat
dbg_put_ln(' ** HTR 5 A');
unpackentry(htab[ind],entry.tabv,entry.tabb,entry.frei,entry.wiederfrei);
dbg_put_ln(' ** HTR B');
if entry.frei or entry.wiederfrei then begin
anzahl := anzahl + 1;
ngef := true;
entry.tabv := v;
dbg_put_ln(' ** HTR C1');
packinfo (i,entry.tabb);
dbg_put_ln(' ** HTR D1');
entry.frei := false;
entry.wiederfrei := false;
dbg_put_ln(' ** HTR E1');
packentry(entry,helem);
dbg_put_ln(' ** HTR F1');
htab[ind] := helem; { -PRAK!- in C memcpy() }
end else begin
dbg_put_ln(' ** HTR C2');
gef := v = entry.tabv;
if not gef then ind := (ind+k) mod (tabsize + 1);
dbg_put_ln(' ** HTR D2');
end;
until gef or ngef;
dbg_put_ln(' ** HTR 6');
g := ngef;
end
end; {|hashetr|}
procedure hashempty;
var i:integer;
elem : tableentry;
he : hashelement;
begin
{ with elem do begin }
elem.tabv:=0;
elem.tabb := 0; { !!!!!! Irgend ein Affe hat diese Initialisierung vergessen.
( Und dies ist eine Beleidigung fuer jeden Affen ) HM }
elem.frei := true;
elem.wiederfrei := false;
{ end; }
packentry (elem, he);
for i := 0 to tabsize do htab[i] := he; { -PRAK!- in C memcpy() }
anzahl := 0;
end; {|hashempty|}
procedure hashsave (var status:integer);
{ abspeichern der Hash-Tabelle auf file }
var i: integer;
begin
for i:=1 to file_name_size do name_of_file[i]:=' ';
name_of_file[1]:='h';
name_of_file[2]:='f';
name_of_file[3]:='3';
name_length:=3;
if a_open_out(hashfile) then begin
for i := 0 to tabsize do begin
write (hashfile, htab[i][0]:4);
write (hashfile, htab[i][1]:4);
write (hashfile, htab[i][2]:4);
write (hashfile, htab[i][3]:4);
write_ln (hashfile);
end;
status:=0;
a_close(hashfile);
end else status:=1;
end; {|hashsave|}
function hashsize :integer;
begin
hashsize:=anzahl;
end;
procedure zahl(line:string80;var i:integer; ende:integer; var j:integer);
var aus: boolean;
begin
j:=0;
repeat
if i<=ende then
if ktab[line[i]]="." then i:=i+1
else aus:=true
else aus:=true
until aus;
repeat
if i<=ende then begin
{ if line[i] in ["0".."9"] then begin }
case line[i] of
"0","1","2","3","4","5","6","7","8","9":
begin j:=10*j+line[i]-"0";i:=i+1
end;
{ else aus:=true geht nicht !!! }
end;
if (line[i] <> "0") and
(line[i] <> "1") and
(line[i] <> "2") and
(line[i] <> "3") and
(line[i] <> "4") and
(line[i] <> "5") and
(line[i] <> "6") and
(line[i] <> "7") and
(line[i] <> "8") and
(line[i] <> "9") then aus :=true;
end else aus:=true
until aus;
end; {|zahl|}
procedure infobau(line: string80; anfang,ende: integer;
{ var inform: infotype;}
var informausnahme :boolean;
var informuntrennbar :boolean;
var informerste :integer;
var informzweite :integer;
var informendung :boolean;
var informvorsilbe :boolean;
var informstamm :boolean;
var informehervor :boolean;
var informstammallein:boolean;
var g: boolean);
var i,j:integer;ok,fehler:boolean;
b:buchstabe;
begin
g:=false;
{ with inform do begin } { initialisierung }
informausnahme:=false;
informendung:=false;
informvorsilbe:=false;
informehervor:=false;
informstamm:=false;
informstammallein:=false;
i:=anfang; ok:=false; fehler:=false;
repeat
if i<=ende then
if line[i]="." then i:=i+1 else ok:=true
else fehler:=true
until ok or fehler;
if not fehler then begin
b:=ktab[line[i]];
if b = bst_a then begin
{ausnahme}
informausnahme:=true;
i:=i+1;
if ktab[line[i]] = bst_u then begin
i := i + 1; informuntrennbar := true;
end else informuntrennbar := false;
zahl(line,i,ende,j);
if j=0 then begin
informerste:=7; informzweite:=7;
{ Record-komponente ``zweite'' wird }
{ initialisiert, damit beim packen keine }
{ undefinierten Ergebnisse entstehen (die u.U. zu }
{ RUN-TIME-ERRORS f\"uhren k\"onnen!) }
end else begin
j:=j-2;
if (j>=0) and (j<=6) then informerste:=j
else fehler:=true;
zahl(line,i,ende,j);
if j=0 then informzweite:=7
else begin
j:=j-informerste-4;
if (j>=0) and (j<=6) then informzweite:=j
else fehler:=true;
end;
end;
if not fehler then g:=true;
end {|if b = bst_a |}
else begin
ok:=false;
repeat
case b of
{bst_v} 22: informvorsilbe:=true;
{bst_e} 5: informendung:=true;
{bst_s} 19: informstamm:=true;
{bst_b} 2: informehervor:=true;
{bst_t} 20: informstammallein:=true;
{trennzeichen} 31:;
othercases fehler:=true
endcases;
if i=ende then ok:=true
else begin
i:=i+1;
b:=ktab[line[i]]
end
until ok or fehler;
if not fehler then
g := (informvorsilbe or informendung or informstamm)
and (not informehervor or informvorsilbe and informendung)
and (not informstammallein or informstamm);
end; {|if b = bst_a then-else|}
end {|if not fehler|}
{ end |with inform|}
end; {|infobau|}
procedure eintragen (line: string80; l: integer );
var i, laenge: integer;
inform: infotype;
w: wort;
g: boolean;
begin
laenge := 1;
while line [laenge] <> "." do laenge := laenge + 1;
laenge := laenge -1;
if laenge >= 1 then begin
if laenge <= maxwl then begin
infobau(line,laenge+1,l,
inform.ausnahme,inform.untrennbar,
inform.erste,inform.zweite,inform.endung,
inform.vorsilbe,inform.stamm,inform.ehervor,
inform.stammallein,g);
if not g then begin
si_put_cr; si_put(' Info falsch: ')
end else begin
for i:= 1 to laenge do w[i]:=ktab[line[i]];
hashetr(w,laenge,inform,g);
if not g then begin
si_put_cr; si_put(' Tabellenfehler: ');
end;
end;
if not g then begin
for i:=1 to l do si_put( xchr[line[i]] );
si_put_cr;
end
end {|if laenge <= maxwl|}
else begin
si_put_cr; si_put_ln(' zu langes Wort: ');
for i:=1 to l do si_put( xchr[line[i]] );
si_put_cr;
end
end {|if laenge >= 1|}
else begin
si_put_cr; si_put_ln(' Falsche Zeile: ');
for i:=1 to l do si_put( xchr[line[i]] );
si_put_cr;
end
end; {|eintragen|}
procedure naechsterbst(var i:integer;var tr:trennvektor;
var dudstop:sperrvektor;var b:buchstabe;
var w:wort);
begin
if (i>0) and (tr[i]=keine) and not dudstop[i]
then begin b:=w[i];i:=i-1 end
else b:=trennzeichen
end;
{---------------------------------------------------------------------------}
{ duden(tr) -> duden(tr,w,tr1,dudtr,dud,dudstop,laenge,ok,zerlegungen) }
procedure duden (tr:trennvektor;
var w:wort;
var tr1 :trennvektor;
var dudtr:trennvektor;
dud :dudt; { array [0..maxwl] of boolean; }
var dudstop:sperrvektor;
laenge : integer;
var ok :boolean;
var zerlegungen:integer );
{ traegt in dudtr die Trennstellen nach Dudenregeln ein }
{ frueher mtype laut = (.....) }
var i:integer;
j:integer;
zust:integer;
aus:boolean;
tre:integer;
letzte:integer;
l,lalt:integer;{ vormals laut }
b:buchstabe;
begin { duden }
i:=laenge;
for j := 0 to laenge do dudtr[j] := tr[j];
while (i>0) and not dud [i] do i:=i-1;
while i>0 do begin
zust:=1; aus:=false; letzte := 0;
l:=kons;
b:=w[i];
i:=i-1;
repeat
{naechsten Laut lesen}
{ procedure nlaut; }
begin { nlaut }
tre:=i+1;
lalt:=l;
case b of
{bst_a } 1:begin l:=vok_a; naechsterbst(i,tr,dudstop,b,w) end;
{bst_o } 15:begin l:=vok_o; naechsterbst(i,tr,dudstop,b,w) end;
{bst_ue} 29:begin l:=vok_ue; naechsterbst(i,tr,dudstop,b,w) end;
{bst_oe} 28:begin l:=vok_oe; naechsterbst(i,tr,dudstop,b,w) end;
{bst_ae} 27:begin l:=vok_ae; naechsterbst(i,tr,dudstop,b,w) end;
{bst_e } 5:begin
l:=vok_e; naechsterbst(i,tr,dudstop,b,w);
if b = bst_i then begin
l:=vok_ie;
naechsterbst(i,tr,dudstop,b,w);
if b = bst_e then begin
{ungetbst = } i := i + 1;
l:=vok_e;
b := bst_i;
end;
end
end;
{bst_i} 9:begin
l:=vok_i; naechsterbst(i,tr,dudstop,b,w);
if b = bst_e then begin l:=vok_ei; naechsterbst(i,tr,dudstop,b,w) end
else if b = bst_a then begin l:=vok_ai; naechsterbst(i,tr,dudstop,b,w) end
end;
{bst_u} 21:begin
l:=vok_u; naechsterbst(i,tr,dudstop,b,w);
if b = bst_e then begin l:=vok_eu; naechsterbst(i,tr,dudstop,b,w) end
else if b = bst_a then begin l:=vok_au; naechsterbst(i,tr,dudstop,b,w) end
else if b = bst_ae then begin l:=vok_aeu; naechsterbst(i,tr,dudstop,b,w) end
else if b = bst_q then begin l:=kons; naechsterbst(i,tr,dudstop,b,w) end
end;
{bst_t} 20:begin
l:=kons; naechsterbst(i,tr,dudstop,b,w);
if b = bst_s then naechsterbst(i,tr,dudstop,b,w)
end;
{bst_h} 8:begin
l:=kons; naechsterbst(i,tr,dudstop,b,w);
if b = bst_c then begin
naechsterbst(i,tr,dudstop,b,w);
if b = bst_s then naechsterbst(i,tr,dudstop,b,w)
end
else if b = bst_p then naechsterbst(i,tr,dudstop,b,w)
else if b = bst_t then naechsterbst(i,tr,dudstop,b,w)
end;
{bst_b, bst_c, bst_d, bst_f, bst_g, bst_j, bst_k, bst_l, bst_m,
bst_n, bst_p,
bst_q, bst_r, bst_s, bst_v, bst_w, bst_x, bst_z, bst_sz }
if not aus then
{Naechster Zustand}
case zust of
1:if l=kons then zust:=2
else zust:=4;
2:if l<>kons then zust:=3;
3:if l=kons then zust:=6
else if l<>lalt then begin
zust:=4;
letzte:=tre;
dudtr[letzte]:=neben
end;
4:if l=kons then zust:=6
else zust:=3;
6:if l=kons then begin
zust:=2;
letzte:=tre;
if (w[tre]=bst_c) and (w[tre+1] = bst_k) then
dudtr[letzte]:=nebenck
else dudtr[letzte]:=neben
end
else begin
zust:=4;
letzte:=tre;
dudtr[letzte]:=neben
end
end
until aus;
if zust=2 then dudtr[letzte]:=keine
else if zust=4 then
{ if not (lalt in [vok_ai,vok_ei,vok_au,vok_aeu,vok_eu,vok_ie]) then }
if (lalt < vok_ie) or (lalt > vok_eu) then
dudtr[letzte]:=keine;
while (i>0) and not dud[i] do i:=i-1
end;
end { duden };
{---------------------------------------------------------------------------}
{ procedure trennen (zustand : integer; anfang: integer; spv:sperrvektor;
var ok1 : boolean);
}
procedure trennen (
{ trennen }
zustand : integer;
anfang : integer;
spv : sperrvektor;
var ok1 : boolean;
{ abteilen }
var tr :trennvektor;
var dudtr:trennvektor;
var zerlegungen:integer ;
var dud:dudt; { array [0..maxwl] of boolean; }
var v:varr;
var ind,k:intarr;
var dudstop:sperrvektor;
{ abteilen aufruf }
var w:wort;
var tr1:trennvektor;
laenge : integer );
{ Zustand : 1 ... Es kann nur Vorsilbe oder Stamm kommen,
rechts muss noch ein Stamm kommen.
2 ... Es kann Endung, Vorsilbe oder Stamm kommen.
3 ... Es kann nur Vorsilbe oder Stamm kommen, im linken
Teil muessen noch Trennstellen eingefuegt werden,
rechts muss noch ein Stamm kommen.
4 ... Es kann nur Vorsilbe oder Stamm kommen, im linken
Teil muessen noch Trennstellen eingefuegt werden.
5 ... Trennstelle mit entfallenem Konsonanten }
var
gef, nichtok:boolean;
schnitt:integer;
inform:infotype;
i:integer;
ok_stamm, ok_vor, ok_end, ok_help:boolean;
tri:integer;
{ p_stamm }
i1,tre:integer;
ok : boolean;
stop_ptr :integer;
spvtmp :sperrvektor; { lokale Kopie des Sperrvektors,
fr die bersetzung nach C notwendig.
-PRAK!- }
{ procedure p_endung(var endok:boolean); }
{ procedure p_vorsilbe(var vorok:boolean); }
{ procedure p_stamm(var stok:boolean); }
{ procedure p_stammallein (var staok:boolean); }
begin { trennen }
spvtmp := spv;
ok1:=false;
if anfang = laenge + 1
then { Ende des Wortes erreicht}
if (zustand = 2) or (zustand=4) then {erfolgreich zerschnitten }
begin
dud[anfang-1]:=true;
for i := laenge downto 1 do
begin
if dudtr[i-1] = haupt3 then
begin
dudtr[i-1] := keine;
tri := haupt3;
end
else
tri := dudtr[i];
if zerlegungen = 0 then
tr1[i] := tri
else
tr1[i] := kombtab[tr1[i],tri];
end;
zerlegungen:=zerlegungen+1;
ok1:= true
end
else { kein Stamm im Wortrest ok1 = false }
else { restlichen Teil zerschneiden }
begin
hashfunk (w,anfang,laenge,v,ind,k);
schnitt := laenge;
nichtok := false;
repeat
{ links abschneiden }
repeat
if spv[schnitt] then { Schnittstelle gesperrt }
gef:=false
else
begin
hashsuch (v[schnitt],ind[schnitt],k[schnitt],
inform.ausnahme,inform.untrennbar,
inform.erste,inform.zweite,inform.endung,
inform.vorsilbe,inform.stamm,inform.ehervor,
inform.stammallein,gef);
end;
if gef then
spv[schnitt]:=true
else
schnitt := schnitt - 1
until gef or (schnitt = anfang - 1);
if gef then { teilwortanfang ist bekanntes wort }
begin
if not inform.ausnahme then
begin
if inform.endung and (zustand=2) then
begin
{ --- p_endung(ok_end); --- }
ok_end:=false;
if (w[anfang]=bst_c) or (w[anfang]=bst_h) then
begin
dud[anfang-1]:=true;
tr[anfang-1]:=neben
end;
{ Fugen 's' bevorzugen }
if schnitt < laenge then
if (w[schnitt+1] = bst_s) and not spv[schnitt+1]
then begin
spv[schnitt+1]:=true;
trennen (2, schnitt+2, spv, ok_help,
tr,dudtr,zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
ok_end := ok_help
end;
trennen (2, schnitt+1, spv, ok_help,
tr,dudtr,zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
ok_end := ok_end or ok_help;
{eingetragene Trennstellen loeschen}
tr[anfang-1]:=keine;
dud[anfang-1]:=false;
{ --- p_endung(ok_end) - ende --- }
ok1 := ok1 or ok_end
end
else
ok_end:=false;
if inform.vorsilbe
then begin
{ --- p_vorsilbe(ok_vor); --- }
{ Trennstelle(n) vor der Vorsilbe eintragen }
if zustand = 1 then
tr[anfang-1]:=neben
else begin
dud[anfang-1]:=true;
if zustand = 5 then
tr[anfang-1] := haupt3
else
tr[anfang-1]:=haupt;
end;
{ Trennstellen innerhalb der Vorsilbe eintragen }
dud[schnitt]:=true;
{ Rest des Wortes zerschneiden }
trennen(1, schnitt+1, spv, ok_vor,
tr,dudtr,zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
{eingetragene Trennstellen loeschen}
tr[anfang-1]:=keine;
dud[anfang-1]:=false;
dud[schnitt]:=false;
{ --- p_vorsilbe(ok_vor) - ende --- }
ok1 := ok1 or ok_vor
end
else
ok_vor := false;
if inform.stamm and not ok_vor
then begin
if not inform.stammallein
then begin
{ --- p_stamm(ok_stamm); --- }
ok_stamm := false;
stop_ptr := 0;
if zustand = 1 then
tr[anfang-1]:=neben
else begin
dud[anfang-1]:=true;
if zustand = 5 then
tr[anfang-1] := haupt3
else
tr[anfang-1]:=haupt
end;
{ev. Ausnahmetrennstellen eintragen}
if inform.ausnahme then begin
if inform.erste<>7 then begin
tre:=anfang+inform.erste+1;
tr[tre]:=neben;
if inform.zweite<>7 then begin
tre:=tre+inform.zweite+2;
if inform.untrennbar then begin
stop_ptr := tre;
dudstop[stop_ptr] := true;
end
else
tr[tre]:=neben
end
else begin
if inform.untrennbar then begin
tr[tre] := keine;
stop_ptr := tre;
dudstop[tre]:=true;
end
end
end
else begin {inform.erste = 7}
stop_ptr := schnitt-1;
dudstop[stop_ptr] := true
end
end;
{ Fugen 's' bevorzugen }
if schnitt < laenge then
if (w[schnitt+1] = bst_s) and not spv[schnitt+1]
then begin
spv [schnitt+1] := true;
trennen (2, schnitt+2, spv, ok_help,tr,dudtr,
zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
ok_stamm := ok_help
end;
{ Sonderbehandlung eines in der Wortfuge entfallenen mitlauts }
if (schnitt >=2) and (schnitt < laenge) then
if konsonant[w[schnitt]] and not konsonant[w[schnitt+1]] and
(w[schnitt] = w[schnitt-1]) then begin
trennen(5, schnitt, spv, ok_help,tr,dudtr,
zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
ok_stamm := ok_stamm or ok_help;
end;
{ restlichen Teil zerschneiden }
trennen (2, schnitt+1, spv, ok_help,tr,dudtr,
zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
ok_stamm := ok_stamm or ok_help;
{ eingetragene Trennstellen entfernen }
if inform.ausnahme then
for i1:=anfang-1 to schnitt do
begin
tr[i1]:=keine;
dud[i1]:=false;
dudstop[stop_ptr] := false;
end
else begin
tr[anfang-1]:=keine;
dud[anfang-1]:=false;
end;
{ --- p_stamm(ok_stamm) - ende --- }
ok1 := ok1 or ok_stamm;
end
else
if (anfang=1) and (schnitt=laenge)
then begin
if zustand = 1 then
tr[anfang-1]:=neben
else begin
dud[anfang-1]:=true;
if zustand = 5 then
tr[anfang-1] := haupt3
else
tr[anfang-1]:=haupt
end;
{ev. Ausnahmetrennstellen eintragen}
if inform.ausnahme then begin
if inform.erste<>7 then begin
tre:=anfang+inform.erste+1;
tr[tre]:=neben;
if inform.zweite<>7 then begin
tre:=tre+inform.zweite+2;
if inform.untrennbar then begin
stop_ptr := tre;
dudstop[stop_ptr] := true;
end
else
tr[tre]:=neben
end
else begin
if inform.untrennbar then begin
tr[tre] := keine;
stop_ptr := tre;
dudstop[tre]:=true;
end
end
end
else begin {inform.erste = 7}
stop_ptr := schnitt-1;
dudstop[stop_ptr] := true;
end;
end;
{ Fugen 's' bevorzugen }
if schnitt < laenge then
if (w[schnitt+1] = bst_s) and not spv[schnitt+1]
then begin
spv [schnitt+1] := true;
trennen (2, schnitt+2, spv, ok_help,tr,dudtr,
zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
ok_stamm := ok_help
end;
{ Sonderbehandlung eines in der Wortfuge entfallenen mitlauts }
if (schnitt >=2) and (schnitt < laenge) then
if konsonant[w[schnitt]] and not konsonant[w[schnitt+1]] and
(w[schnitt] = w[schnitt-1]) then begin
trennen(5, schnitt, spv, ok_help,tr,dudtr,
zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
ok_stamm := ok_stamm or ok_help;
end;
{ restlichen Teil zerschneiden }
trennen (2, schnitt+1, spv, ok_help,tr,dudtr,
zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
ok_stamm := ok_stamm or ok_help;
{ eingetragene Trennstellen entfernen }
if inform.ausnahme then
for i1:=anfang-1 to schnitt do
begin
tr[i1]:=keine;
dud[i1]:=false;
dudstop[stop_ptr] := false;
end
else begin
tr[anfang-1]:=keine;
dud[anfang-1]:=false;
end;
{ --- p_stamm(ok_stamm) - ende --- }
ok1 := ok1 or ok_stamm;
end;
{naechsten Schleifendurchlauf vorbereiten}
schnitt := schnitt - 1;
nichtok := schnitt = anfang - 1
end
else { kein bekanntes Wort am Beginn des Teilwortes }
nichtok := true;
until nichtok
end;
spv := spvtmp; { spv wiederherstellen, fr die C Version. -PRAK!- }
end { trennen };
procedure abteilen ( var w:wort; var tr1:trennvektor;
laenge : integer; var ok:boolean;
var zerlegungen:integer );
label 99;
{ kommt in den outer Block }
{ mtype sperrvektor=array [0..maxwl] of boolean; }
var
i:integer;
dud:dudt; { array [0..maxwl] of boolean; }
v:varr;
ind,k:intarr;
spv,dudstop:sperrvektor;
tr, dudtr:trennvektor;
begin { abteilen }
tr1[0] := keine;
for i:=0 to laenge do begin
tr[i]:=keine; spv[i]:=false; dudstop[i]:=false; dud[i]:=false;
if i > 0 then
if (w[i] = trennzeichen) or (w[i] = sonderzeichen)
then begin
zerlegungen := 0;
ok := false;
goto 99;
end;
end;
zerlegungen:=0;
trennen(3, 1, spv, ok,
tr,dudtr,zerlegungen,dud,v,ind,k,
dudstop,w,tr1,laenge);
for i:=0 to laenge do
if tr1[i]=nebenck then begin
tr1[i]:=keine; { c-k wird zu -ck da sp\"ater dement- }
tr1[i-1]:=nebenck; {sprechend getrennt wird }
end;
99:
end;{ abteilen }
{----------------------------------------------------------------------------}
{identification number for dirty umlauts; see description in
change file}
function is_dirty_umlaut(p:pointer):boolean;
var res:boolean;
q:pointer;
begin
res:=false;
if p<>null then
if type(p)=disc_node then
if pre_break(p)<>null then
if link(pre_break(p))<>null then
if link(link(pre_break(p)))<>null then begin
q:=pre_break(p);
if type(q)=penalty_node then q:=link(q);
if (type(q)=kern_node)and(width(q)=spec_letter_ID)and
is_char_node(link(q))and(type(link(link(q)))=kern_node) then res:=true;
end;
is_dirty_umlaut:=res;
end; {|is_dirty_umlaut|}
function get_char_of(s:pointer):integer;
var q:pointer;
begin
q:=pre_break(s);
if type(q)=penalty_node then q:=link(q);
get_char_of:=width(link(link(q)));
end; {|get_char_of|}
function get_font_of(s:pointer):integer;
var q:pointer;
begin
q:=pre_break(s);
if type(q)=penalty_node then q:=link(q);
get_font_of:=font(link(q));
end; {|get_font_of|}
@x
@ Now let's go back to the easier problem, of building the linked
trie. When \.{INITEX} has scanned the `\.{\\patterns}' control
sequence, it calls on |new_patterns| to do the right thing.
@<Declare procedures for preprocessing hyph...@>=
procedure new_patterns; {initializes the hyphenation pattern data}
label done, done1;
var k,@!l:small_number; {indices into |hc| and |hyf|}
@!digit_sensed:boolean; {should the next digit be treated as a letter?}
@!v:quarterword; {trie op code}
@!p,@!q:trie_pointer; {nodes of trie traversed during insertion}
@!first_child:boolean; {is |p=trie_l[q]|?}
@!c:ASCII_code; {character being inserted}
begin if trie_not_ready then
begin set_cur_lang; scan_left_brace; {a left brace must follow \.{\\patterns}}
@<Enter all of the patterns into a linked trie, until coming to a right
brace@>;
end
else begin print_err("Too late for "); print_esc("patterns");
help1("All patterns must be given before typesetting begins.");
error; link(garbage):=scan_toks(false,false); flush_list(def_ref);
end;
end;
@y
@ We do not build a trie (wether a linked nor a packed). Instead, we
create a hash table for the \SISISI-hyphenation algorithm. But we
still use the data provided by the `\.{\\patterns}' primitive.
@<Declare procedures for preprocessing hyph...@>=
procedure new_patterns; {initializes the hyphenation pattern data}
label done;
var k: small_number; {index into |zeile|}
@!zeile:string80; { Silbe <space> Type <,> als Eintrag der Hashtabelle }
@!status:integer;
@!c:ASCII_code; {character being inserted}
begin
if trie_not_ready then begin
set_cur_lang; scan_left_brace; {a left brace must follow \.{\\patterns}}
hashempty; { create the empty hash table }
@<Enter all of the patterns into a hash table, until coming to a right
brace; then skip an optional space@>;
hashsave(status);
if status<>0 then begin
wterm_ln('Fehler bei hash save !!!!!'{?});
wlog_ln( 'Fehler bei hash save !!!!!'{?});
end;
end else begin
print_err("Too late for "); print_esc("patterns");
help1("All patterns must be given before typesetting begins.");
error; link(garbage):=scan_toks(false,false); flush_list(def_ref);
end;
end;
@z
@x
@ Novices are not supposed to be using \.{\\patterns}, so the error
messages are terse. (Note that all error messages appear in \TeX's string
pool, even if they are used only by \.{INITEX}.)
@<Enter all of the patterns into a linked trie...@>=
k:=0; hyf[0]:=0; digit_sensed:=false;
loop@+ begin get_x_token;
case cur_cmd of
letter,other_char:@<Append a new letter or a hyphen level@>;
spacer,right_brace: begin if k>0 then
@<Insert a new pattern into the linked trie@>;
if cur_cmd=right_brace then goto done;
k:=0; hyf[0]:=0; digit_sensed:=false;
end;
othercases begin print_err("Bad "); print_esc("patterns");
@.Bad \\patterns@>
help1("(See Appendix H.)"); error;
end
endcases;
end;
done:
@y
@ Here we build the hash table for the \SISISI-hyphenation.
@<Enter all of the patterns into a hash table, until coming to a right...@>=
for k:=0 to maxwl do zeile[k]:=" ";
k:=0;
loop@+ begin
get_x_token; { \SISISI Worttabelle hat das Format 'Text.Type,'.
Der Beistrich wird eigens abgefragt, da er in TeX
umdefiniert werden kann. }
if (cur_cmd = right_brace)or(cur_chr = ",") then
begin
{ Ein Eintrag der Worttabelle - letzter
(|cur_cmd|=|right_brace|) - gelesen }
if k>0 then eintragen(zeile,k); { ein Eintrag vorhanden }
if cur_cmd = right_brace then goto done;
k := 0; { hyf[] ist \"uberfl\"ussig, da die
Trenninformation in zeile mitverpackt ist. }
end
else { Das Wort ist noch nicht fertig gelesen }
if (cur_cmd = letter) or (cur_cmd = other_char)
or (cur_cmd = spacer) or (cur_chr = ".")
then begin
@<Append a new letter or a hyphen level@>;
end else begin
print_err("Bad "); print_esc("patterns");
@.Bad \\patterns@>
help1("(See Appendix H.)"); error;
end;
end; {|loop|}
done:
@z
@x
@ @<Append a new letter or a hyphen level@>=
if digit_sensed or(cur_chr<"0")or(cur_chr>"9") then
begin if cur_chr="." then cur_chr:=0 {edge-of-word delimiter}
else begin cur_chr:=lc_code(cur_chr);
if cur_chr=0 then
begin print_err("Nonletter");
@.Nonletter@>
help1("(See Appendix H.)"); error;
end;
end;
if k<63 then
begin incr(k); hc[k]:=cur_chr; hyf[k]:=0; digit_sensed:=false;
end;
end
else if k<63 then
begin hyf[k]:=cur_chr-"0"; digit_sensed:=true;
end
@y
@ @<Append a new letter or a hyphen level@>=
if (cur_chr<>" ")then begin
{ Das Leerzeichen wird \"uberlesen! }
if (cur_chr <> ".") then begin
{ der Punkt trennt die Silbe von der Silbenart und bleibt erhalten }
if (cur_chr<"0")or(cur_chr>"9") then begin
cur_chr:=lc_code(cur_chr);
if cur_chr=0 then begin
print_err("Nonletter");
@.Nonletter@>
help1("(See Appendix H.)"); error; cur_chr:=128;
end;
end;
end;
if k<63 then begin
incr(k); zeile[k]:=cur_chr;
end;
end;
@z
@x
@ When the following code comes into play, the pattern $p_1\ldots p_k$
appears in |hc[1..k]|, and the corresponding sequence of numbers $n_0\ldots
n_k$ appears in |hyf[0..k]|.
@<Insert a new pattern into the linked trie@>=
begin @<Compute the trie op code, |v|, and set |l:=0|@>;
q:=0; hc[0]:=cur_lang;
while l<=k do
begin c:=hc[l]; incr(l); p:=trie_l[q]; first_child:=true;
while (p>0)and(c>so(trie_c[p])) do
begin q:=p; p:=trie_r[q]; first_child:=false;
end;
if (p=0)or(c<so(trie_c[p])) then
@<Insert a new trie node between |q| and |p|, and
make |p| point to it@>;
q:=p; {now node |q| represents $p_1\ldots p_l$}
end;
if trie_o[q]<>min_quarterword then
begin print_err("Duplicate pattern");
@.Duplicate pattern@>
help1("(See Appendix H.)"); error;
end;
trie_o[q]:=v;
end
@y
@ Well, that was everything about the initialization of the \SISISI hash
table (at this place).
All following sections, describing the initialization of the trie are
replaced by ``empty sections'' in order to have same section numbering
in SI\TeX as in \TeX.
@z
@x
@ @<Insert a new trie node between |q| and |p|...@>=
begin if trie_ptr=trie_size then overflow("pattern memory",trie_size);
@:TeX capacity exceeded pattern memory}{\quad pattern memory@>
incr(trie_ptr); trie_r[trie_ptr]:=p; p:=trie_ptr; trie_l[p]:=0;
if first_child then trie_l[q]:=p@+else trie_r[q]:=p;
trie_c[p]:=si(c); trie_o[p]:=min_quarterword;
end
@y
@ ``Empty section''.
@z
------------------------------------------------------------------------------
@x
@ @<Compute the trie op code, |v|...@>=
if hc[1]=0 then hyf[0]:=0;
if hc[k]=0 then hyf[k]:=0;
l:=k; v:=min_quarterword;
loop@+ begin if hyf[l]<>0 then v:=new_trie_op(k-l,hyf[l],v);
if l>0 then decr(l)@+else goto done1;
end;
done1:
@y
@ Further ``empty section''.
@z
@x
@ Finally we put everything together: Here is how the trie gets to its
final, efficient form.
The following packing routine is rigged so that the root of the linked
tree gets mapped into location 1 of |trie|, as required by the hyphenation
algorithm. This happens because the first call of |first_fit| will
``take'' location~1.
@<Declare procedures for preprocessing hyphenation patterns@>=
procedure init_trie;
var @!p:trie_pointer; {pointer for initialization}
@!j,@!k,@!t:integer; {all-purpose registers for initialization}
@!r,@!s:trie_pointer; {used to clean up the packed |trie|}
@!h:two_halves; {template used to zero out |trie|'s holes}
begin @<Get ready to compress the trie@>;
if trie_root<>0 then
begin first_fit(trie_root); trie_pack(trie_root);
end;
@y
@ Because we do not use the trie, we fill it with dummy data.
But the main reason, why we do not remove this procedure is, that
it is called and the used variables are referenced somewhere in \TeX.
@<Declare procedures for preprocessing hyphenation patterns@>=
procedure init_trie;
var @!p:trie_pointer; {pointer for initialization}
@!j,@!k,@!t:integer; {all-purpose registers for initialization}
@!r,@!s:trie_pointer; {used to clean up the packed |trie|}
@!h:two_halves; {template used to zero out |trie|'s holes}
begin
trie_root:=0;
@z