Subj : BBS Promotion
To   : Nicholas Boel
From : mark lewis
Date : Fri Feb 10 2017 08:35 pm


On 2017 Feb 10 07:32:52, you wrote to me:

NB>>> TimEd is probably trying to convert the UTF-8 Russian characters to
NB>>> IBMPC, which won't happen.

ml>> FWIW: there is no ""conversion""... it is simply displaying the
ml>> glyphs represented by those raw bytes in their CP437 codepage
ml>> positions... CP437 and other old-school codepage characters are only
ml>> one byte wide... any ""conversion"" might come from translating
ml>> between single byte codepages where the character glyph is
ml>> transliterated from one position in the first codepage to another
ml>> position in the second codepage where its glyph is stored... in that
ml>> case, the raw byte changes because the position in the codepage
ml>> changed and the byte is the position...

NB> You say potato, etc..

yes and no... it is really easy to understand though...

NB> Fact of the matter is CP437/IBMPC will not display Russian characters
NB> properly,

of course not... their glyphs are different than latin glyphs... this is really
simple when looking at the old school way... there are numerous tables of 256
bytes... each byte represents one character, a glyph... some are actually
control characters (eg: CR, LF) and others are just language characters aka
glyphs... in one table, the space character is held in position 32decimal (aka
20hex)... another table also has the space in position 32decimal (aka 20hex)...
great! no ""conversion"" is needed for the space character... now, if the
capital letter 'A' is held in the first table at position 65decimal (aka 41hex)
and the capital letter 'A' is held in position 25 decimal (aka 19hex) in the
second table then some ""conversion"" is needed or you will see the wrong
character when using one of the two pages... one will be right and the other
just won't be... this is actually transliteration... there are mapping files
created to point to the proper position for the 'A' when using the second table
(aka codepage)... this is easily seen when overlaying CP855 on top of CP437...
most characters will align in the same cells of the table but some are
different... they are generally up in the higher-than-127 range where the line
drawing and box characters reside in CP437...

then someone came along and said "hey! we can do better" so UTF-8, UTF-16 and
UTF-32 were born... UTF-8 is 8bit lossless and contains 1112064 positions in
its table instead of the original 256... converting from codepages to UTF-8 is
easy because every character exists in its huge table... going the other way is
not guaranteed because the glyphs just don't all map over... in some languages,
they have used "double characters" like "ae" to indicate the single ae
character which i don't know how to make on this OS... other languages may also
have an "ae" character but in them you cannot use "a" and "e" side by side to
indicate the single "ae" character... i don't know why, that's just the way it
is...

anyway, i'm just trying to help you understand why there's no ""conversion"" as
such in the old school code pages... there is transliteration where on glyph
lives in one spot in this table and another spot in that table... UTF stuff
just greatly expands the size of the tables which means that the glyphs are now
represented by one or more bytes which are/were the old table position numbers
in the old school code pages...

NB> whether they're UTF-8 or not.

true...

NB> The only somewhat possible way for him to read it properly would be to
NB> change his default encoding to CP866 or KOI8-R,

eaxctly...

NB> and even then there is no guarantee that the translation from UTF-8
NB> will work as expected.

because it depends also on what his OS can display... what i mean by this is
that he has to be able to load the OS with the needed code page to view them
correctly but if he does that, he'll lose all the normal latin glyphs...
switching to UTF-8 on the OS will alleviate this but it requires that the
software is also able to transliterate the characters to their new positions in
the UTF-8 table so they can be rendered properly... we've seen this with the
box and line drawing characters... there's one or two BBS related packages out
there that do properly transliterate them to their new positions in the UTF-8
table... i don't recall who did them or what packages they are/were but they
are or have been participants in AGORANET and at least one of them was either a
BBS or a terminal program...

so, ok... too long a day... only 20:30 here and i'm already going to call it a
night... on a friday damned night at that :(

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it
wrong...
... Well done! is better than well said!
---
* Origin:  (1:3634/12.73)