| Document: FSC-0083

| Document: FSC-0083
| Version: 001
| Date: 17 June 1995
|
| Jonathan de Boyne Pollard, FIDONET#2:440/4.0

A proposed standard for message IDs on FTN systems.

by

Jonathan de Boyne Pollard, FIDONET#2:440/4.0

Version 0.02, Sun 19950507

This document is (c) Copyright 1995 Jonathan de Boyne Pollard, all
rights reserved. Originally written on Tuesday 19950131.

Permission is hereby granted to copy and use this document without
modification in any way that you see fit, provided that you do not
attempt to make money from it, and that you understand that I take no
responsibility whatsoever for any effect that it may have on your
machine, data, marital status, or cat.

Especial permission to freely use and redistribute this document in
its original form is given to developers of FTN softwares and whatever
FIDONET Technical Standards bodies may exist from time to time.

��
�� 0.0 Definition of terms ��
��

This document assumes familiarity with several terms in common use in
discussion of mail systems, such as `User Agent', `Message Transport
Agent', and so forth.

Robot mail programs qualify as UAs, incidentally.

0.1 Knackered Backward Form
��

This specification uses a modified BNF notation for discussion of
textual representation of message IDs.

Literal syntax elements (terminal nodes of the grammar) are enclosed
in single quotes.

'MSGID:' '@' '<' '"'

Non-terminal nodes are enclosed in angle brackets (greater than and
less then signs).

<quoted-text> <hex-text> <q-p-site-identifier>

Production rules comprise a non-terminal, followed by productions.
Alternate productions for the same non-terminal are separated by a
vertical bar.

<qtext-chars> ::=
'"' '"'
| <any-character-except-quotes-NUL-or-CR>

Optional sequences within a production are indicated in two ways.
Square brackets enclose a sequence that may occur exactly once or not
at all.

[ '@' <dns-name> ':' ]

Curly braces enclose a sequence that may be repeated any number of
times. A leading numeric prefix (usually 0 or 1) indicates the
minimum number of repetitions.

1*{ <hex-character> }

0.1.1 Some standard production rules
��

<whitespace-char> ::= <tab> | <space>

<whitespace> ::= 1*{ <whitespace-char> }

<hex-character> ::=
'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|
'A'|'B'|'C'|'D'|'E'|'F'|
'a'|'b'|'c'|'d'|'e'|'f'

<upper-hex-char> ::=
'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|'A'|'B'|'C'|'D'|'E'|'F'

<qtext-char> ::=
'"' '"'
| <any-ASCII-character-except-quotes-NUL-or-CR>

<quoted-text> ::= '"' 0*{ <qtext-char> } '"'

<quoted-char> ::=
<any-ASCII-character-except-quotes-backslash-NUL-or-CR>
| '\' <any-ASCII-character-execpt-NUL-or-CR>

<quoted-string> ::= '"' 0*{ <quoted-char> } '"'

<word> ::= 1*{ <any-ASCII-character-above-SPACE-and-below-DEL> }

Note the difference between the two forms of quoting. <quoted-text>
is a string with embedded quotation marks represented by double
quotation marks (the way that most BASIC languages do). However,
<quoted-string> is a string with all quotation marks and backslashes
(and, indeed, any other character) escaped by the backslash character,
in the style of the C and C++ languages.

��
�� 1.0 Definition and use of message IDs ��
��

For the purposes of this document, the network is considered to form a
vast distributed database of messages, which uses replication and
store and forward distribution to ensure that all carriers of the
database are kept up to date. Every message, whether netmail or
echomail, carries a primary message ID that uniquely identifies it,
and zero or more reference message IDs that uniquely identify any
messages that it refers to.

A primary message ID is a globally unique key that is used for
uniquely identifying any single given mail message in the database
(that is, counting all replicas of a message over all of the network
as "one"). The reference message IDs are used by user agents to form
a reply graph, allowing the the user to easily navigate the
messagebase.

Message transport protocols may require the data in a message ID to be
encoded so that it may be safely transported. This standard
distinguishes between the "underlying" message IDs and the encoded
forms. This chapter discusses the underlying message IDs and the
concepts behind them without reference to a particular encoding, and
subsequent chapters discuss the various encoded forms.

1.1 Components of a message ID
��

A message ID comprises two parts, namely a site identifier and a local
part. Both of these parts are arbitrary 8-bit binary data, that
implementations are free to store in any way they choose, but which
they should never alter. There are no distinguished characters in
either the site identifier or local part, especially not terminating
characters. So implementations must usually store an additional
length count for both.

The "minimum maximum" lengths for the site ID and local part are 64
octets each, and conforming implementations may not impose shorter
maximum length restrictions. In fact, implementations are encouraged
to impose no length restrictions on message IDs whatsoever (for
example, it is not unreasonable to expect site IDs to exceed 256
octets on occasion).

1.2 Preservation of uniqueness
��

A site that creates messages (by entering them into the distributed
database) must also issue message IDs, and must ensure that the global
uniqueness property of message IDs is preserved.

A site MUST ensure that it issues unique local parts to individual
messages. Two or more sites may not have the same site identifier,
unless they *all* co-operate to ensure that they do not issue
duplicate local parts.

The administrative procedures necessary to obtain a unique site
identifier are beyond the scope of this document. Usually site
identifiers will be FTN 5D addresses, or fully qualified DNS names,
because administrative procedures for assigning such are already in
place. However, they are not restricted to be such.

The means by which a site invents new local parts is beyond the scope
of this document. A discussion of some example options for
implementors to consider is given in an appendix.

1.3 Reference message IDs
��

Reference message IDs in a message denote messages to which it is
related, comprising a "local subset" of the overall reply graph (i.e.
the direct and indirect ancestors of the message), which each message
carries around with it.

Carrying around multiple reference message IDs provides overlap,
allowing for the overall reply graph to be reconstructed even in the
absence of intermediate messages (if they had expired, or had not yet
arrived due to propagation lag, for example).

UAs that conform to this standard MUST ensure that only messages that
start new threads (i.e. messages entered into the network not in
response to any existing message) have no reference message IDs.

All other messages that they create MUST contain at least one
reference message ID, being that of the message that is being
responded to.

[[ Luckily, schemes already in existence mean that in practice
non-conforming User Agents will generally preserve this single back
link, as well. ]]

When responding to a message, user agents must create the reference
message ID list of the response by taking the list of reference
message IDs from the original message, and appending the primary
message ID of the original message to the tail.

A reference message ID list should not be truncated, unless transport
or storage limitations are in danger of being exceeded. In which
case, message IDs may only be removed from the head of the list.
Removing from the tail would eliminate links to immediate ancestor
messages, and removing from the middle would alter the reply graph.

��
�� 2.0 Quoted printable encoding for storing 8-bit data in 7-bit transports ��
��

To encode the 8-bit data in message IDs for transport by 7-bit
transport layers, we use a variation on the widely used Quoted
Printable form [RFC1521] [RFC1522].

2.1 Grammar of Quoted Printable encoding
��

The grammar of the 7-bit encoding of 8-bit data in a quoted printable
word is as follows.

<q-p-word> ::=
<word>
| <quoted-text>
| [ '=' ] 1*{ <q-p-character> } [ '=' ]

<q-p-character} ::=
<any-ASCII-character-bar-ctls-wspace-quote-and-equals>
| <q-p-quoted-char>

<q-p-quoted-char> ::= '=' <upper-hex-char> <upper-hex-char>

2.2 Conversion from 8-bit to 7-bit
��

Rule #1 (non-quoted transparent 7-bit): Where the 8-bit data consist
of nothing but ASCII characters above SPACE and below DEL, they
may be copied literally to the 7-bit representation.

Rule #2 (quoted transparent 7-bit): Where the 8-bit data consist of
nothing but ASCII characters except CR and NUL, they may be
converted to the 7-bit representation by enclosing them in quotes,
and escaping every embedded quotation mark with a second quotation
mark.

Rule #3 (8-bit quoted): Where the 8-bit data contain CR or NUL, or any
non-ASCII characters, they are converted to a 7-bit representation
in two stages.

Firstly, all non-ASCII characters, all ASCII control characters,
SPACE, DEL, '"', and '=', are converted to "quoted" form. Quoted
form is an '=' character followed by the hexadecimal value of the
character represented as two uppercase hexadecimal digits.

Secondly, the entire string is then enclosed by one leading and
one trailing '=' character.

2.3 Conversion from 7-bit to 8-bit
��

Where the 7-bit field is delimited by equals signs, it is a fair bet
that it comprises 8-bit data to which Rule 3 has been applied.
However, it is possible that sites in the 7-bit world may produce data
with leading and trailing equals signs.

Reverse of Rule #3 : If, after stripping the leading and trailing '=',
the remaining text can be converted back using the reverse of Rule
3, then that 8-bit data is the actual message ID. Otherwise the
reverse of Rule 2 should be applied to the original 7-bit data.

Reverse of Rule #2 : If the 7-bit data are enclosed by quotes the
reverse of Rule 2 should be applied to remove the enclsing quotes
and any embedded quotes (8-bit form does not have delimiter
characters and so does not require quoting). Otherwise the
reverse of Rule 1 should be applied.

Reverse of Rule #1 : The 7-bit data are copied to the 8-bit data.

2.4 Rationale
��

The intention is that <q-p-word> tokens will not be parsed as separate
words by most 7-bit grammars. The elimination of quotes, whitespace,
and control characters by Rule 3 is part of achieving this.

Rules 1 and 2 allow message IDs created by 7-bit standards to enter
and travel within the 8-bit world, and be restored to their original
form when they return to the 7-bit world. Returning 7-bit message IDs
to their original form means that 7-bit duplicate checking is not
broken by 8-bit gateways.

The unfortunate side-effect is that any 8-bit data generated in the
7-bit world will be returned to the 7-bit world as 7-bit data in Q-P
encoded form. However, the original 8-bit data are unlikely to work
in the 7-bit world in the first place, so this is no great loss.

Rule 3 is the most general rule of the three. Rule 3 applies to true
8-bit message IDs generated in the 8-bit world that use 8-bit
characters, allowing them to travel across the 7-bit world with a
reasonable chance of remaining intact.

The elimination of the equals sign by Rule 3, replacing it with its
Q-P encoding, ensures that the decoding process can assume that an
equals sign not followed by two uppercase hex characters is not a
valid Rule 3 encoding, and so fall back to decoding Rule 2.

��
�� 3.0 Storage of message IDs in type 2.0, 2.0+, and 2.2 message packets ��
��

Type 2.0 message packets [FTS0001], type 2.0+ message packets
[FSC0039], and type 2.2 message packets [FSC0045] are used for message
transport over much of FIDONET. They do not have space in their
message headers available for message IDs (along with a lot of other
things), therefore message IDs must be transferred to the body of the
message for transport in these forms, and retrieved from the body of
the message afterwards.

The existing "kludge line" mechanisms [FSC0068] are used to do this.

There are two concerns here.

Firstly, it is preferable that as much of the reply graph as possible
is preserved, even in the face of tools that use existing MSGID/REPLY
schemes [FTS0009].

Secondly, message IDs are 8 bit data, and must be encoded into a 7-bit
form that will be reliably transported in the bodies of type 2.0,
2.0+, and 2.2 message packets.

3.1 Conversion to and from kludge lines
��

The primary message ID of a message is stored to and retrieved from a
"MSGID:" kludge line.

All of the reference message IDs of a message are stored, in order
from first to last, in a single "REFER:" kludge line. The last
reference message ID of a message (its immediate ancestor, in other
words) is stored in a "REPLY:" kludge line. Note that the information
in the "REFER:" kludge line is a superset of the information in the
"REPLY:" kludge line.

If a message has zero reference message IDs (it is the start of a new
thread), then the "REFER:" and "REPLY:" kludge lines are omitted.

If, upon decoding from type 2.0, 2.0+, or 2.2 message transport
format, a "REFER:" kludge line exists, then its contents are assumed
to be the complete list of reference message IDs (in encoded form) for
the message, and the "REPLY:" kludge line is ignored. Otherwise, the
content of the "REPLY:" kludge line (if any) is used for the single
reference message ID of the message.

3.2 Compatibility with existing MSGID/REPLY schemes
��

There are two compatibility considerations. It is important that
encoded message IDs be correctly parsed by implementations using older
less versatile standards. It is also important that implementations
expecting older MSGID/REPLY pairs will destroy as little linking
information as possible.

3.2.1 Grammar considerations
��

There are two valid interpretations of FTS-0009, both of which
(should) use the following grammar :

<msgid> ::= <soh> 'MSGID: ' <address-text> <whitespace> <hex-text>
<reply> ::= <soh> 'REPLY: ' <address-text> <whitespace> <hex-text>

<soh> ::= ASCII SOH character
<address-text> ::= <quoted-text> | <word>
<hex-text> ::= 1*{ <hex-character> }

The "VFIDO" interpretation assumes that MSGID/REPLY kludges are the
textual representation of an (address, number) ordered pair. Systems
using this interpretation may change the case of <hex-text> or may
renormalise <quoted-text> if they find it to be a FTN 5D address.

Message IDs from this standard that are stored in MSGID/REPLY kludges
will be mangled by software applying the VFIDO interpretation of
FTS-0009. Such software is not compatible with this standard.

The "Mark Kimes" interpretation assumes that MSGID/REPLY kludges are
text separated by whitespace, and preserves the contents of
<quoted-text> and <hex-text> without change.

The encoding scheme outlined in section 2.2 produces two whitespace
separated text fields. So software applying the "Mark Kimes"
interpretation of FTS-0009 will not mangle the encoded message IDs.

In many cases, softwares using the "Mark Kimes" interpretation will in
fact parse <hex-text> as

<hex-text> ::= <word>

As long as software applying the "Mark Kimes" interpretation of
FTS-0009 is not written to truncate either field, or complain about a
non-numeric <hex-text> portion, it is compatible with this standard.

3.2.2 Reply linking
��

FTS-0009 implementations will generate MSGID kludges, transfer the
content (Mark Kimes interpretation) of the MSGID kludge data of an
original message into the REPLY data of a response message, and will
not generate a REFER kludge.

So reply linking will be preserved, but reference information beyond
the immediate ancestor of a message will be lost.

3.3 Quoted printable encoding
��

The 8-bit data in message IDs is encoded into 7-bit MSGID/REPLY data
for transport in type 2.0, 2.0+, and 2.2 message packets by using the
quoted printable encoding outlined in chapter 2, along with the
following grammar.

<msgid> ::= <soh> 'MSGID: ' <7-bit-encoding>
<reply> ::= <soh> 'REPLY: ' <7-bit-encoding>
<refer> ::= <soh> 'REFER: '
<7-bit-encoding> 0*{ <whitespace> <7-bit-encoding> }

<7-bit-encoding> ::= <q-p-site-ID> <whitespace> <q-p-local-part>

<q-p-site-ID> ::= <q-p-word>
<q-p-local-part> ::= <q-p-word>

Applying Rule 1 of Q-P encoding to local parts is safe as long as
<hex-text> (from the FTS-0009 grammar) is in actuality treated as
<word> by most implementations, as outlined in the compatibility
notes.

Rule 2 should not be applied to local parts, because the grammar of
FTS-0009 does not allow for quoted text in the <hex-text> portion.

The restrictions in Rule 3 have deliberate effect here. FTS-0009
sites will rarely produce data with leading and trailing equals signs,
so reversing Rule 3 will be unlikely to be subject to spurious data.
In theory, relaxing Rule 3 reversal to include decoding lowercase
hexadecimal as well as uppercase hexadecimal would mean that sites
that convert the case of MSGID/REPLY (as part of the "VFIDO"
interpretation) would not break Q-P encoding.

However, the "VFIDO" interpretation will usually do far more damage
than simple case conversion, which will be impossible to restore.
Rather than attempt the reverse conversion (which could have the
undesirable effect of causing different messages to end up with the
same 8-bit message ID if the local part were truncated to eight
characters in the 7-bit world), any "VFIDO" mangling that occurs will
prevent Q-P decoding from succeeding.

This means that 8-bit message IDs that look like incomplete or damaged
Q-P encodings are not gateway problems, but are more likely to be the
result of a site using the "VFIDO" interpretation in the 7-bit world.

��
�� 4.0 Storage of message IDs in type 2.3 message packets ��
��

The storage format of type 2.3 messages (so-called "extensible type 2"
[TYPE2EXT]) provides space in the message headers for both a primary
message ID and an arbitrary list of reference message IDs.

All message IDs are stored as 8-bit binary strings, using length
counts rather than delimiters. Therefore message IDs can be stored
directly in type 2.3 messages.

��
�� 5.0 Storage of message IDs in type 3.x message packets ��
��

There is such a wide variety of type 3 message formats that this
standard doesn't hope to cover them all.

For those with binary "chunks", chunk types 'PMID' (primary message
ID) and 'RFER' (reference message IDs) are expected to have the
following form :

��Ŀ
� Length of site identifier WORD32 �
��Ĵ
� Site identifider ... �
��Ĵ
� Length of local part WORD32 �
��Ĵ
� Local part ... �
��

Those schemes that use text format headers and require field
delimiters may care to use the Q-P encoding outlined in chapter 2.

��
�� 6.0 Storage of message IDs in RFC822 and RFC1036 messages ��
��

The grammar of "Internet" messages is defined by the standards for
ARPA text messages [RFC0822] and for Usenet news messages [RFC1036].

6.1 Restrictions on interconversion
��

Interconversion between a FIDO message ID and an RFC822 Message-ID is
restricted by several factors. The major factor is that RFC0822
actually places greater restrictions upon Message-IDs than this
standard does upon FIDO message IDs (in part because this standard is
designed to also be able to handle X.400 message identifiers and
others transparently as well). It mandates that the <address> portion
of a Message-ID be a valid DNS name.

A secondary factor is reversibility, in that many gateways exist
between FTN and RFC822, and so message IDs that cross the boundary
more than once will retain as much of their original ID information as
possible. There is more information contained within a FIDO message
ID than in an RFC822 Message-ID. In particular, the <address>
portions of RFC822 Message-IDs are not case sensitive, whereas the
site ID of a FIDO message ID is treated as 8-bit data for the purposes
of comparison.

These are handled by restricting the allowable conversions that a
conformant gateway may use on a message ID, by ensuring that all of
the FIDO information is not lost when converted to the (narrower
bandwidth) RFC822 Message-ID format, and by allowing gateway softwares
to infer a meaning from the site identifier portion of a message ID.

This is the *only* part of this standard where it is allowed for
softwares to place a meaning on the site identifier of a message ID.

6.1 Converting to RFC822 form
��

6.1.1 Site identifier recognition
��

Gateway softwares are allowed to examine a site identifier of a
message ID and determine whether it is in a format that they recognise
or not. This standard specifies what gateway softwares should do when
they encounter a site identifier that is a recognisable DNS name or
one that is recognisable FIDO 5D address, and what form the DNS name
for RFC822 must take.

Site identifiers that are not FIDO 5D addresses are really beyond the
scope of FIDONET documentation. If an implementation recognises
another form of site identifier (such as X.400 O/R addresses) then it
is free to translate that site identifier to and from DNS form, as
long as it knows how (there are RFCs on how to perform X.400
conversion).

This message ID standard imposes no restrictions on site identifiers,
allowing any scheme to be administered on FIDONET. It is therefore up
to the site identification schemes themselves to provide their own
mappings to and from DNS names.

Gateways are free to drop messages with message IDs that they do not
understand how to convert. Both the FIDONET and RFC worlds depend
heavily upon message IDs for detecting messages duplicates, and so it
is better that a gateway should NOT distribute messages with message
ID formats that it doesn't understand how to convert to RFC822 form,
rather than that it does so incorrectly.

6.1.1.1 Site identifiers that are DNS names
��

If the site identifier of a message ID can be parsed as a legal DNS
name according to the grammar of RFC822 then, even if it cannot be
resolved to an IP address or MX record, it must be used as the domain
name of the RFC message ID, and the local part must be passed through
unchanged.

This allows for RFC message IDs to enter and leave 8-bit FIDONET
without change, even via gateways that have no knowledge of or
connectivity to the originating RFC host.

6.1.1.2 Site identifiers that are FIDO 5D addresses
��

The conversion process for message IDs where the site identifier can
be parsed as a FIDO 5D address in the forms DOMAIN#Z:N/N.P or
Z:N/N.P@DOMAIN depends from the "domain" (in the FIDO sense of the
word) of the address.

6.1.1.2.1 Site identifiers that are 5D addresses in FIDONET
��

If the site identifier of a message ID is parseable as a FIDO 5D
address of the form Z:N/N.P@FIDONET or FIDONET#Z:N/N.P (i.e. in the
FIDONET domain itself), then the DNS name used for the RFC message ID
must be the DNS equivalent of that address.

This is because MX records exist in the DNS for all of the zone:net
pairs for 5D addresses in the FIDONET "domain", in the form

p#.f#.n#.z#.fidonet.org

where # is a number without leading zeroes giving the appropriate
portion of the 5D address. Therefore this is the conversion that must
be used.

6.1.1.2.2 Site identifiers that are 5D addresses outside of FIDONET
��

Most other "domains" (in the FIDO sense of the word), are free to
choose their own DNS domain name, but have not yet done so.

Therefore, constructs such as p3.f0.n444.z81.os2net.ftn (which several
people have INCORRECTLY inferred from other FTS documentation) are NOT
ALLOWED as the DNS name in an RFC Message-ID. .ftn is NOT a valid
top-level DNS domain, for a start, and there is no guarantee that
OS2NET would adopt that DNS name, either.

(p#.f#.n#.z#.os2net.fidonet.org anyone ?)

6.1.1.2.3 Conversion of local parts
��

Where a gateway has recognised a site identifier to represent a FIDO
5D address that it knows the DNS name for, the local part must then be
encoded.

According to the grammar in RFC822, any ASCII character (from NUL to
DEL) is legal in the local part of an RFC822 Message-ID, because
<quoted-pair> (q.v.) allows any special characters to be escaped.

Since RFC822 transport is merely 7-bit just like type 2.0, 2.0, and
2.2 message packets are, we use the quoted-printable scheme given in
chapter 2.

However,

6.1.1.3 Site identifiers that are not recognisable 5D addresses
��

No implementation may extend the FIDO 5D address to DNS name
conversions for site IDs that are given above. If the message ID is
"almost, but not quite" a FIDO 5D address, then the message should for
preference be discarded at the gateway rather than being passed
through.

Message IDs with abitrary site identifiers are perfectly acceptable to
this standard, since it ascribes no meaning to site identifiers within
FIDONET. However, RFC822 and the existing RFC domain name system
can only handle a restricted subset of the whole range of FIDO 5D
addresses.

6.1.1.4 Other site identifiers
��

As mentioned before, gateways are allowed to support other site
identification schemes that are not FIDO 5D addresses, and convert
site identifiers in those forms to DNS names as they please.

It should be borne in mind when designing such conversion schemes that
the domain part of an RFC 822 message ID can only contain ASCII
characters that are not control characters, whitespace, or special
delimiter characters, because of the definition of <atom> in that
standard (q.v.). The quoted printable encoding outlined in chapter 2
of this document is probably not sufficient for handling full 8-bit
site identifier schemes, in which case the scheme in RFC1522 should be
investigated.

6.1.2 Preserving information
��

Although this standard recognises two forms for a FIDO 5D address,
there is only one valid form for that address in the DNS. For reverse
conversions to succeed, when an RFC message re-enters 8-bit FIDONET
(possibly via another gateway), the *exact form* of the original site
identifier must be reconstructed, otherwise FIDO softwares will treat
the two message IDs as different.

Although other schemes exist, which encode the 5D address in the local
part, and use a "generic" domain name of "fidonet.org" (which is not a
valid host name), it is preferred that the semantics of a message ID
("WXYZ local part generated at ABCDE site") be preserved, especially
as FIDONET sites are visible to the RFC world via the DNS anyway.

It is therefore suggested that the original FIDONET site identifier
(since it will be 7-bit text) be encoded as a <comment> token
immediately following the relevant message ID, using quoting to escape
any embedded punctuation (q.v. the grammar in RFC 822).

6.2 Converting from RFC822 form
��

When converting from RFC822 form back to 8-bit FIDONET message IDs,
gateways should determine whether the address portion of the
Message-ID is a hostname under the fidonet.org domain.

If it is, a comment token should be scanned for to find the original
form of the 5D address, and the site identifier should be
reconstructed from it if found, or from the given DNS name in the form
DOMAIN#Z:N/N.P if no comment token were present. The inverse of the
quoted printable encoding outlined in chapter 2 should then be applied
to the local part.

Otherwise, the 7-bit RFC822 Message-ID should be stored in the 8-bit
FIDONET message ID without change.

6.3 Reply linking
��

According to RFC1036, message IDs occur in the Message-ID: and in the
References: header for news (echomail). Although RFC822 specifies an
In-Reply-To: header for mail (netmail), it makes it difficult to use,
because it need not contain a message ID.

The model for message identification used by RFC1036 closely matches
the model outlined in this standard (it is probable that there is only
one way to skin this particular cat). There is thus a direct mapping
between the primary message ID defined by this standard and the
RFC1036 Message-ID: header, and also between the reference message
IDs defined by this standard and the RFC1036 References: header.

This means that in normal use the reference message ID list will be
properly maintained by Usenet softwares.

��
�� A.0 Discussion on generating unique local parts ��
��

How any given site generates unique local parts is up to it. So this
appendix should only be taken as a guideline.

On sites where there is only one piece of software assigning message
IDs (e.g. there is only one UA, or the MTA itself assigns message
IDs), then a simple "take a ticket" scheme could work. Multiple
instances of that piece of software running simultaneously would need
to arbitrate access to that "ticket dispenser" amongst themselves.

A discussion of `sequencers' (which is the proper name for this idea)
and how atomic operations on them can be implemented, can be found in
any good computer science textbook on concurrent systems.

Unfortunately, in today's heterogeneous world, it is difficult to the
point of impossibility to get every piece of software to agree to use
one single central sequencer.

It is obvious that using just the date/time for a message ID is
insufficient on multitasking systems, or even on single tasking
systems that can generate multiple messages per clock tick.

What is less obvious is that it is not a good idea to use the name of
the software generating the message ID and a sequencer maintained by
that software as the unique local part. The problem here is that it
is not guaranteed that different softwares will use different names
(especially if they are called "Message Editor" (-:), so it is
possible that different softwares could generate duplicate local
parts.

Some form of "product ID code" would of course rectify this, but given
the amount of software in use and under development these days, a
centrally administered product ID database hasn't been a viable option
for decades now.

There are, of course, simpler schemes, that can guarantee to produce
unique local parts, because they rely on features that are guaranteed
unique to every individual application running, and do not rely on
different applications co-operating to use the same central
facilities, such as a site-wide sequencer.

One commonly used scheme is to use a combination of the current date
and time and the process and thread IDs of the software creating the
message ID.

e.g. 1995Jan31.123426.26.1
or 1995013112343600260001

This doesn't have to be human-readable calendar time, of course. It
could equally well be the POSIX 1003.1 time (seconds since The Epoch),
or the Julian date plus the time of day.

If the time isn't granular enough, a sequence number (which can be
maintained individually by each process) can be added to increase its
granularity.

On just about every operating system in the world, including
multi-user ones, the <time,process,thread,seq> 4-tuple will be unique
on one machine *forever* (or until the clock wraps around, at least).

e.g. 1995Jan31.123426.26.1.2
or 19950131123436002600010003

On multiple machine sites, where all machines share the one site
identifier, the above scheme can be extended to include the "hidden"
local machine name, which will be assumed to be made available (in
some fashion) to the softwares generating the message IDs.

This yields a unique <machine,time,process,thread,seq> 5-tuple.

e.g. utopium.1995Jan31.123848.26.1.4
or utopium.19950131123907002600010005

Again, the "intra-site" machine name can be anything, from the local
uname() (for UNIX people) to the NETBIOS machine name (for PC based
LAN systems).

��
�� Bibliography and Author ��
��

[FTS0001] A Basic FIDONET Technical Standard, version 15. Randy Bush,
Pacific Systems Group. FIDONET#1:105/6.0. 30th August
1990.

( Defines the type 2.0 packet message transport format. )

[FTS0009] A standard for message identifiers and reply chain linkage,
version 1. Jim Nutt. FIDONET#1:114/30.0. 17th December
1991.

( Defines the MSGID/REPLY kludges. )

[FSC0034] Gateways to and from FIDONET. Technical, administrative,
and policy considerations. Randy Bush, Pacific Systems
Group. FIDONET 1:105/6.0. 30th August 1990.

( Discussion on features that should be preserved across
gateways, and on good gateway behaviour in general. )

[FSC0039] A type 2 packet extension proposal, version 4. Mark A.
Howard. FIDONET#1:260/340. 29th September 1990.

( Defines the type 2.0+ packet message transport format. )

[FSC0045] A proposal for a new packet format, version 1. Thom
Henderson. FIDONET#1:107/542.1. 17th April 1990.

( Defines the type 2.2 packet message transport format. )

[FSC0068] A proposed replacement for FTS-0004, version 1. Mark Kimes.
FIDONET#1:380/16.0. 13th December 1992.

( Defines kludge lines. )

[RFC0822] Standard for the format of ARPA Internet text messages.
David Crocker, University of Delaware. 13th August 1982.

( Defines the grammar and semantics of RFC messages. )

[RFC1036] Standard for the interchange of USENET messages. M Horton,
AT&T bell labs; and R. Adams, Centre for seismic studies.
December 1987.

( Defines changes to the grammar and semantics of RFC822
that are required for news instead of mail, including
reply linking. )

[RFC1521] MIME (Multipurpose Internet Mail Extensions) Part One:
Mechanisms for specifying and describing the format of
Internet message bodies. N. Borenstien, Bellcore; and N.
Freed, Innosoft. September 1993.

( Defines Quoted Printable encoding of text. )

[RFC1522] MIME (Multipurpose Internet Mail Extensions) Part One:
Message header extensions for non ASCII text. K. Moore,
University of Tennesee. September 1993.

( Defines how to use Q-P encoding in message headers. )

[TYPE2EXT] An extension to type 2.0, 2.0+, and 2.2 message transport
formats to eliminate most kludge lines from the message
body. Jonathan de Boyne Pollard. FIDONET#2:440/4.0.
[ Not yet released. ]

��
Jonathan de Boyne Pollard
FIDONET#2:440/4.0