Subj : UTF-8 nodelist report
To   : Sergey Dorofeev
From : Michiel van der Vlist
Date : Sun Mar 09 2025 11:42 am

Hello Sergey,

On Friday March 07 2025 15:01, you wrote to me:

MV>> He insists on entering the 'a' and 'o' with umlaut in Säve and
MV>> Björn in 202/208 in Latin-1 in the normal ASCII nodelist. So in
MV>> the ASCII list they are replaced by question marks by MakeNl. In
MV>> the UTF list which in his case is just a copy of the ASCII
MV>> segment submitted, they appear "as submitted" and the line is
MV>> flagged as in error by my program.

SD> I think it is not very contradictory. I he will success in entering
SD> non-ASCII chars in nodelist (making it full 8-bit), encoding must be
SD> defined.

The encoding for the regular nodelist IS defined: ASCII and ASCII only. For backward compatibility it must stay that way. There still may be nodelist processing software around that breaks when he highest bit is not zero. That is why MakeNl (without the ALLOW8BIT setting) substitutes a question mark for characters with the highest bit set.

The encoding for the UTF nodelist is also defined: UTF-8.

SD>  Ok, if it will be latin-1, but let it be only for European
SD> segments. That is, lets define encoding on per-region or even
SD> per-network basis.

Very bad idea. Having more than one encoding within the same file is a bad idea anyway, not just for the nodelist but for ANY text file.

SD> So when importing nodelist, it must be split back on segments and
SD> correctly transcoded. E.g. default encoding if ASCII, so Zone records
SD> must be ASCII. But zone may specify own encoding, so regions in it may
SD> use it in own record, and define encoding for underlying regions.
SD> Further, region record use zone encoding, and may define encoding for
SD> networks. Network record use region encoding and may define encoding
SD> for node records.

Are you serious? You really still want every back alley in Fidonet to have its own 8 bit encoding? With all the forward and backward re-encoding and other limitations? C'mon.. That's chaos! Unicode was invented for the very purpose of getting rid of all this codepage shit.

Why do you think Microsoft went full Unicode internally? Three decades ago. Why do you think 99% of what is on the web is UTF-8? To get rid of the mess of all the hundreds of 8 bit encodings that floated around!

Nah, as far as the nodelist goes, it is either just ASCII or UTF-8. No more codepage shit.


Cheers, Michiel

--- GoldED+/W32-MSVC 1.1.5-b20170303
* Origin: Nieuw Schnøørd (2:280/5555)