Subj : Re: not all is lost but far too much for far too long
To : Maurice Kinal
From : Ozz Nixon
Date : Fri Jun 28 2019 09:23 pm
On 2019-06-28 02:01:09 +0000, Maurice Kinal -> Torsten Bamberg said:
FTN Header versus actual message body conveying Unicode.
When I telnet to a SQL server that speaks Unicode only, it always
returns the following characters (pascal): #239#187#191
When I telnet to a web page that speaks Unicode, it too returns
#239#187#191 plus the <!doctype html> etc.
So... would it not stand true that systems that are posting UTF8 do the
same introduction on the message body? Then authors *know* it
potentially has Unicode and leave it damn well alone, and also parse it
based upon UTF8 instead of 8bit char...
This is how I am coding things here, just based upon NexusSQL,
PremierSQL, MS SQL, Apache and Nexus Web Service. I do not have access
to my Oracle box nor the MySQL 5 server to see if they do the same
during the initial connection negotiation(s).
A quick google: It's the utf8 byte order mark. Some editors save the
BOM inside the file (in order to be used as a header) which regularly
causes confusion because it is optional.
So, if we wanted to help enforce at a reader (or even tosser level) how
to handle, I would offer this up as a required BOM to the message body
that is UTF8.
Ozz
--- ExchangeBBS NNTP Server v3.1/Linux64
* Origin: (1:1/123)