Subj : UTF-8 nodelist report
To   : Nicholas Boel
From : Michiel van der Vlist
Date : Sun Mar 09 2025 10:23 pm

Hello Nicholas,

On Sunday March 09 2025 12:47, you wrote to me:

>> My notoriously unreliable memory tells me that you name should
>> actually be spelled "Boël" with two dots on the 'e'. Is that
>> correct?

NB> That is correct. However:

OK.

NB> 1) The US keyboard doesn't easily allow for it

It is not the keyboard. Contrary to what they do in Belgium, in The Netherlands we do not use a special Dutch keyboard with dedicated keys fo the special characters. We use the same physical US keyboard that you have. It is the keyboard driver that takes care of the special characters. Some keys have been made into "dead keys" by the driver. E,g, to get an 'e' with diaresis I first press the key for the double quote, followed by the 'e'. To type the double quote itself, i have to press that key twice. It is easy once you are used to it.

NB> 2) I don't care to type some kind of alt-key combination to produce
NB> diaereses every time I want to type my name.

NB> 3) Whether I write it that way or not, most Americans don't realize
NB> what they are actually there for (or even the definition of a
NB> diaereses and what it actually means). All through life it has been
NB> pronounced "bowl" or something random like that, unless they have
NB> actually asked myself, a family member, or friend in the past.

Understood. So I take it you are not really interested in the UTF nodelist project to get your name with an 'ë' in the nodelist?

>> This particular problem could indeed be easely fixed at the ZC's
>> side. But it should not be needed. Plus that it is hard to fight
>> against someone knwowingly and willingly looking for loopholes to
>> derail the system. If this particular loophole were blocked it
>> could easely turn into an arms race that benefits no one. The
>> weekly error may be the lesser evil...

NB> This 'loophole' was originally done for the ASCII nodelist, though,
NB> correct? I seem to recall it being in the original nodelist long
NB> before it was an issue in the UTF-8 version.

I do not recall the exact time frame, but it is correct that the main issue is with the ASCII nodelist. That it also appears as an error in the UTF-8 nodelist is "colleteral damage".

The ZC does double processing. One for the ASCII list and one for the UTF list. For the UTF list MakeNl is run with ALLOW8BIT set. For regions that dot not participate in the UTF list, the ZC uses the ASCII segment for both list. For regions that do participate, their seperate UTF region segment is used for the UTF list.

RC20 does not participate in the UTF nodelist project. He does not send two different segments. But his segment is not pure ASCII, it contain some characters in Latin-1. So for the ASCII list, the ZC's MakNl substitutes question marks for the non ASCII characters. For the UTF list, they are passed "as is". But that was only was noticed after I started my weekly error report.


Cheers, Michiel

--- GoldED+/W32-MSVC 1.1.5-b20170303
* Origin: Nieuw Schnøørd (2:280/5555)