Document: FSC-0007
Version:  002
Date:     17-Apr-90





              FidoNet(r) RFC822-Style Message Format
   (Informal Proposed Message Format Specification - Draft, revised)
                    Robert Heller @ 1:321/153.0
                         April 17, 1990





Status of this document:

    This FSC suggests a proposed protocol for the FidoNet(r) community,
    and requests discussion and suggestions for improvements.
    Distribution of this document is unlimited.

    Fido and FidoNet are registered marks of Tom Jennings and Fido
    Software.





1. Purpose.
===========

The purpose of this document is to outline my ideas concerning
FidoNet (r) message format, both as stored on disk as message
files handled by BBS (or other "conferencing" programs) and as
these messages exist packed into "bundles" or "packets" as
transmitted from machine to machine.  I think using a uniform
format for normal message storage will make things easier, at
least in terms of "standardized" message bundling and transmiting
software is concerned.  If done right it also makes things easier
for BBS and conferencing software writers too.

This specification is only a first draft proposal.  Just
something to put on the table for discussion. Feel free to
comment on it. I am open to suggestions.

2. Preliminary Definations.
===========================

I will be using BNF notation to describe the format of data
fields.  This is a fairly standard notation and should be familar
to anyone who has taken a compiler design course.  To make things
a little briefer, I will be using some pre-defined
psuedo-terminal symbols.  These symbols are defined as:

  o The symbol ALPHA referes to any ASCII alphabetic character,
    including the uppercase letters ('A' thru 'Z', 41H thru
    5AH), the lowercase letters ('a' thru 'z', 61H thru 7AH) and
    these characters: '#' (23H), '$' (24H), '&' (26H), '*'
    (2AH), '+' (2BH), '-' (2DH), '=' (3DH), '^' (5EH), and '_'
    (5FH).

  o The symbol DIGIT refers to any of the ASCII characters '0'
    thru '9' (30H thru 39H).

|  o The symbol NEWLINE refers to the single ASCII character LF (0AH),
|    when the message is in transit, and refers to the local O/S's
|    newline convention for text files (i.e. LF under UNIX, CRLF under
|    MS-DOS and CP/M, CR under OS-9, etc.), or whatever is convient for
|    the BBS software.

  o The symbol WHITESPACE refers to one or more ASCII space
    (20H) or tab (09H) characters.

  o The symbol OPTWHITESPACE referes to zero or more ASCII space
    (20H) or tab (09H) characters.

  o The symbol TEXT referes to zero or more printable ASCII
    characters not including a NEWLINE sequence.

  o The symbol NULL referes to the null string (no characters at
    all).

Oh, one other thing:  message files contain only printable ASCII
characters and NEWLINE sequences (packed messages will have
non-printable bytes).  Also, I'll number the definations.  I am
also only using six BNF operator characters: a vertical bar (|)
for alteration, braces ({}) for comments, single quotes (') for
character and string literals and parens (()) for expression
grouping.


3. Definations 1: Stored Message.
=================================


Changed or added definations are indicated by an '*' after the def number.

The goal symbol is "<message>".

  { A message consists of a header followed followed by a
    NEWLINE followed by a message body. }
<message>::=<header> NEWLINE <message-body>             {Def 1.1}
  { A message body is just unbounded text. }
<message-body>::=NULL | (TEXT NEWLINE <message-body>)   {Def 1.2}
  { A header is more complicated:  There are a series of
    header line types. }
<header>::= NULL | (<header-line> NEWLINE <header>)     {Def 1.3} *
  { This syntax defines the posiblity of a null header -
    this needs to be checked for by sematic routines,
    since it makes no sense. }
<header-line>::=<to>|<from>|<date>|<attributes>|
               <cost>|<subject>|<via>|<origin>|
               <area>|<seen-by>|<path>|<message-id>|
               <processed-by>|<other-header-line>      {Def 1.4} *
  { Now for the header line formats themselves.
    Some notes:  certain header lines are required (<to>,
    <from>, and <date>), and some can only occur once (<to>,
    <from>, <date>, and <subject>). Except for these
    restrictions, most header lines can either be omited or can
    occur more than once. }
<to>::='To: ' <address>                                 {Def 1.5}
<from>::='From: ' <address>                             {Def 1.6}
<address>::=<user> OPTWHITESPACE '@' OPTWHITESPACE
           <nodeid>                                    {Def 1.7}
<user>::= ALPHA <user1>                                 {Def 1.8}
<user1>::= (ALPHA | DIGIT | WHITESPACE | NULL) <user1>  {Def 1.9}
       { Note: this is the full blown FidoNet node address -
         includes optional zone and point numbers.
         It does not include the "domain". I am not sure
         about this - I think more discussion on the whole
         idea of "domains" and "zones" is needed.  My feeling
         is we should look into a symbolic addressing system,
         simular to what the InterNet uses. }
<nodeid>::= ((<digits> ':') | NULL) {zone}
           <digits> '/' <digits>   {basic net/node}
           (('.' <digits>) | NULL) {point}             {Def 1.10}
<digits>::= DIGIT | (DIGIT <digits>)                    {Def 1.11}
<date>::='Date: ' <date-string>                         {Def 1.12}
       { Here it is: my idea for a *standard* date string }
 { day-of-week month date, year hour:minute AM/PM time-zone }
 { Although not specified, hours and minutes are zero padded to
   two digits.  The date and year are not padded at all.}
<date-string>::= <day> ' ' <month> ' ' <digits> ', '
                <digits> ' ' <digits> ':' <digits>
                <am-pm> ((' ' <time-zone> | NULL)      {Def 1.13}
<day>::= 'Mon" | 'Tue' | 'Wed' | 'Thu' | 'Fri' |
        'Sat' | 'Sun'                                  {Def 1.14}
<month>::= 'Jan' | 'Feb' | 'Mar' | 'Apr' | 'May' |
          'Jun' | 'Jul' | 'Aug' | 'Sep' | 'Oct' |
          'Nov' | 'Dec'                                {Def 1.15}
       { If the AM/PM indicator is missing (null), the hours
         field is assumed to in 24-hour format (i.e. 00 to 23) }
<am-pm>::= 'AM' | 'PM' | NULL                           {Def 1.16}
       { This field is optional.  It makes sense given that
         FidoNet <tm> is international. }
<time-zone>::= ALPHA | (ALPHA <time-zone>)              {Def 1.17}
<subject>::=('Subject: ' | 'Subject (Private): ') TEXT  {Def 1.18}
<cost>::='Cost: ' <money-sign> <digits>
        (('.' <digits>) | NULL)                        {Def 1.19}
       { This is tricky, given the internationalness
         of FidoNet(r).  I guess it isn't critical. }
<money-sign>::= '$' | '#' | NULL                        {Def 1.20}
<via>::= 'Via: ' <nodeid> ', ' <date-string> <program>  {Def 1.21}
<program>::= NULL | (' ' TEXT)                          {Def 1.22}
<processed-by>::= 'Processed-by: ' TEXT                 {Def 1.22.1} *
       { This replaces the 'tear' line. }
<origin>::= 'Origin: ' TEXT '(' <nodeid> ')'            {Def 1.23} *
<area>::= 'Area: ' <areaname>                           {Def 1.24}
       { I'm leaving the question of all caps for the
         area name open:  other than ease of comparision,
         is it neccessary to be all caps? }
<areaname>::= ALPHA | (ALPHA <areaname>)                {Def 1.25}
<seen-by>::= 'Seen-By: ' <node-list>                    {Def 1.26} *
<node-list>::= <nodelist-nodeid> |
                 (<nodelist-nodeid> <node-list>)       {Def 1.27} *
<nodelist-nodeid> ::= ((<digits> ':') | NULL)
                 ((<digits> '/') | NULL)
                 (<digits> | NULL)
                 (('.' <digits>) | NULL)               {Def 1.28} *
       { This is also open-ended.  Should there be a
         standard format for this?
         The syntax here is somewhat ambigious - it
         allows for certain bogus forms.  It needs sematic
         routines to handle these forms (raise an error
         or whatever).  Writing the grammer to avoid these
         problems would add complexity not needed at this
         level. }
<path>::= 'Path: ' <node-list>                          {Def 1.28.1} *
<message-id>::= 'Message-id: ' <nodeid> ' ' <serialnum> {Def 1.29} *
       { This is the syntax proposed by Jim Nutt }
<serialnum>::= {8 hex digits}                           {Def 1.29.1} *
       { I've left out a proper grammer rule or token
         for a hexidecimal number. }
<attributes>::= 'Attributes: ' <attrlist>               {Def 1.30}
<attrlist>::=<attribute> | (<attribute> ', ' <attrlist>){Def 1.31}
       { This is probably not complete, but...}
<attribute>::='Kill Sent' | 'File Attached' | 'File Request' |
             'Sent' | 'Crash' | 'Audit'                {Def 1.32}
       { Maybe we should forget about an 'Attributes: '
         header tag and instead have a collection of
         additional header tags to handle each posible
         attibute - i.e. 'File-Attached: <filename>',
         'File-Request: <filename> <update-info>',
         'Sent: <date-sent>', etc. header lines. }
<other-header-line>::=<tagname> ': ' (TEXT | NULL)      {Def 1.33}
       { This is the expandsion hook. }
<tagname>::= ALPHA <tagname1>                           {Def 1.34}
<tagname1>::=NULL | ((ALPHA | WHITESPACE | DIGIT |
            <pun> <tagname1>)                          {Def 1.35}
       { This is also open-ended.  Restriction: colon (:)
         cannot be allowed! }
<pun>::='(' | ')' | '.' | ',' | ';'                     {Def 1.36}

4. Packed Message Format.
=========================

A packed message is simply a regular message with some binary
header (i.e. an "envelope") info and a NUL (00H) byte after the message
text:

      Offset
     dec hex
             .-----------------------------------------------.
       0   0 |    0     |     3      |    0      |    0      |
             +-----------------------+-----------------------+
       2   2 | origZone (low order)  | origZone (high order) |
             +-----------------------+-----------------------+
       4   4 | origNet  (low order)  | origNet  (high order) |
             +-----------------------+-----------------------+
       6   8 | origNode (low order)  | origNode (high order) |
             +-----------------------+-----------------------+
       8   8 | origPoint (low order) | origPoint (high order)|
             +-----------------------+-----------------------+
      10   A | destZone (low order)  | destZone (high order) |
             +-----------------------+-----------------------+
      12   C | destNet  (low order)  | destNet  (high order) |
             +-----------------------+-----------------------+
      14   E | destNode (low order)  | destNode (high order) |
             +-----------------------+-----------------------+
      16  10 | destPoint (low order) | destPoint (high order)|
             +-----------------------+-----------------------+
      18  12 | Attribute (low order) | Attribute (high order)|
             +-----------------------+-----------------------+
      20  14 |      message text (includes ASCII header)     |
             ~                    unbounded                  ~
             |                 null terminated               |
             `-----------------------------------------------'

Some notes:  I've included both the Zone and Point addresses in
the packed message headers.  This does not really affect things
like routing and point mapping.  The packets themselves have
addressing info in their headers (as described in FSC001).  The
addressing in the packet header - this addressing is used by the
transmitting programs.  The internal addressing info is processed
by re-packing programs - that is programs which peel routed
messages (messages that are "just passing through") and re-packet
them for later re-transmitsion to another node during a future
mail event.  Messages destined for the current node (one whose
address exactly matches all four destination address words), get
extracted from the packet and stored in the message base.  Note
that only the ASCII message text is stored.  The binary header is
discarded at this point.

5. Conclusions.
===============

It is my idea that FidoNet(r) is going to sooner or later going
need some of the extendablity provided by this sort of message
format. If fact it allready needs some of these fields, and has
been "faking it" for some time now: things like EchoMail (
"Area: ", "Origin: ", "Seen-By: ", and "Path: " header lines), points
and zones (extra addressing hacks), uucp gatewaying (more extra
addressing hacks), routing ("Via: " header lines).  Going to a
RFC822-style message format also helps to increase the varity of
BBS and conferencing software - this will help improve the "state
of the art" in this regard.  Also, using a RFC822-style message
format allows indefinite extensablity - as new ideas regarding
messages and conferencing develope, the message format can be
easily extended to handle these new ideas with ease.

6. Contact Info.
================

Comments, suggestions, gripes, etc. can be sent to me at any of
these addresses:

       ARPANet:        [email protected]
       BITNET:         [email protected]
       Genie:          RHELLER
       BIX:            lockshill.bbs
       CompuServe:     71450,3432
       FidoNet         Robert Heller @ 1:321/153.0
       USMail:         HC82 Box 29 LH1, Locks Hill Road, Wendell, MA 01379
       Voice Phone:    Home: 617-544-6933, Work: 413-545-0528
       Data Phone:     617-544-8337 at 300, 1200, or 2400 BAUD 24hours,
                       except during FidoNet(r) mail periods.

7. More Information.
====================

I have written a set of EchoMail processing using a message format
described in this document.  The code is in C and is freely available
for evalation.  If you would like a copy, let me know and I will get a
copy to you.  I developed the code under OS-9/68000, but the code should
easily port to other platforms.