# koi8-plaintex.rus - KOI8 to PlainTeX conversion for translit
#
# version 1.0
#
# Created by Alexander L. Belikoff, 1997
# The TeX tranliteration sequences follow AMS cyrillic convention for
# WNCYR fonts with cyracc.def file
# To be used with translit.c program by Jan Labanowski. For a format of
# this file consult translit documentation
# Processed the KOI8 encoced file by translit. For example:
# translit -i myfile.ko8 -o myfile.tex -t koi8-tex.rus
# Then process it with Plain TeX by using:
# tex myfile.tex
# and then use your favorite dvi2something program. E.g., for PostScript use:
# dvips myfile.dvi
#
# characters which are not in ASCII (and DEL) and not in KOI8 to *
0 [\0x7F-\0xA2\0xA4-\0xB2\0xB4-\0xBF] 0 "$\star$"
# dehyphenate words, e.g. con- (NL)cert is changed to concert(NL)
# Below is a complicated (?) regular expression. It joins a hyphenated
# word. It looks for one of more letters (saves them as substring 1)
# followed by a hyphen (which may be followed by zero or more spaces
# or tabs). The hyphen must be followed by a NewLine (characters 0A-0D hex
# are various new line sequences) and saves NewLine sequence. Then it looks
# for zero or more tabs and spaces (at the beginning of the line). Then it
# looks for the rest of the hyphenated word and saves it as substring 3.
# The word may have punctuation attached. Then it looks again for some spaces
# or tabs. The substitute string junks all sequences which were not withn (),
# i.e., hyphen and spaces/tabs and inserts only substrings but in a different
# order. The 1 (word beginning) is followed by 3 (word end) and followed by
# the NewLine. The {\2\1\3} would be equally good. The string is then returned
# back for processing (output code is -1). Note that since input regular
# expression is very long, I chopped it into several lines by using \NL.
# If \ is followed by a white space, the \ and all white space which follow it
# is removed by the program. Be carefull not to use "\white_space" in strings,
# lists or regular expressions. If you must, enter \ as a code (i.e., \0x5C).
# uncomment lines below if you want to dehyphenate
# these can be represented correctly only in Latin charset
0 "_" 1 "\_"
0 "&" 1 "\&"
0 "#" 1 "\#"
0 "@" 1 "@"
# Cyrillic letters
0 "\0xF4\0xFD" 2 "T{\cydot}Shch" # to prevent C
0 "\0xF4\0xDD" 2 "T{\cydot}shch" # to prevent C
0 "\0xD4\0xFD" 2 "t{\cydot}Shch" # to prevent C
0 "\0xD4\0xDD" 2 "t{\cydot}shch" # to prevent C
0 "\0xF4\0xFB" 2 "T{\cydot}Sh" # to prevent C
0 "\0xF4\0xDB" 2 "T{\cydot}sh" # to prevent C
0 "\0xD4\0xFB" 2 "t{\cydot}Sh" # to prevent C
0 "\0xD4\0xDB" 2 "t{\cydot}sh" # to prevent C
0 "\0xF4\0xF3" 2 "T{\cydot}S" # to prevent C
0 "\0xF4\0xD3" 2 "T{\cydot}s" # to prevent C
0 "\0xD4\0xF3" 2 "t{\cydot}S" # to prevent c
0 "\0xD4\0xD3" 2 "t{\cydot}s" # to prevent c