Subj : not all is lost but far too much for far too long

Subj : not all is lost but far too much for far too long
To : Rob Swindell
From : Maurice Kinal
Date : Wed Jul 03 2019 10:13 pm

Hallo Rob!

RS> It's an idea. But that's not how *other* charsets/encodings work

Other than the existance of 8-bit characters utf-8 is totally different than
standard 8-bit character sets. If one is to scan msgs for 8-bit characters it
won't help to decypher the message without knowing beforehand what the
character set is, whereas with utf-8 it doesn't matter. The "CHRS: UTF-8 4" is
totally useless especially when it is wrong such as in "CHRS: UTF-8 2" which
still happens.

ON> So, if we wanted to help enforce at a reader (or even tosser
ON> level) how to handle, I would offer this up as a required BOM to
ON> the message body that is UTF8.

RS> And why is that better than a header field ("control paragraph"
RS> as defined in FTS-5003) which indicates UTF-8?

It isn't. Also the 0x8d bug present in many fidonet software will ensure that
the utf-8 kludge will be false if there is the existance of that particular
trailing byte. With true 8-bit character sets, such as CP866 that only affects
one character in 128 whereas in utf-8 increases to about 3 or four per language
probably more for many of the 24-bit languages.

-={ echo "A Møøse once bit my sister..." | file -b - }=-
UTF-8 Unicode text

Going purely with UTF-8 is the way to go. It is rather obvious that it is
superior to all the 8-bit character sets combined ... which is a ridiculous
statement but true.

Het leven is goed,
Maurice

... Huil niet om mij, ik heb vi.
--- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)