Z-machine Common Save-File Format Standard.

Z-machine Common Save-File Format Standard.
also called Quetzal:
Quetzal Unifies Efficiently The Z-Machine Archive Language
version 1.4 (03-Nov-97)

- 1 - Conventions used within this document, and within the file

1.1 A 'byte' is an 8-bit unsigned quantity.

1.2 A 'word' is a 16-bit unsigned quantity.

1.3 Bitfields are represented as blocks of characters, with the first
character representing the most significant bit of the byte in
question. Multi-bit subfields are indicated by using the same character
multiple times, and values of 0 or 1 indicate that these bits are
always of the specified value. Therefore a bitfield described as
010abbcc cccdd111 would be a two-byte bitfield containing four
subfields, a, of 1 bit, b, 2 bits, c, 5 bits, and d, 2 bits, together
with a field 'hardwired' to 010 and one to 111.

1.4 All multi-byte numbers are stored in big-endian form: most significant
byte first, then in strictly descending order of significance.

1.5 The reader is assumed to already be familiar with the Z-machine;
in particular its instruction set, memory map and stack conventions.

- 2 - Overall structure

2.1 For the purposes of flexibility, the overall format will be a new IFF
type. A standard core is defined, and customised information can be
stored by specific interpreters in such a way that it can be easily
read by others. The FORM type is 'IFZS'.

2.2 Several chunks are defined within this document to appear in the IFZS
FORM.

'IFhd' 5.4
'CMem' 3.7
'UMem' 3.8
'Stks' 4.10
'IntD' 7.8

2.3 Several chunks may also appear by convention in any IFF FORM:

'AUTH' 7.2, 7.3
'(c) ' 7.2, 7.4
'ANNO' 7.2, 7.5

- 3 - Contents of dynamic memory

3.1 Since the contents of dynamic memory may be anything up to 65534 bytes,
it is desirable to have some form of compression available as an
option. Bryan Scattergood's port of ITF uses a method that is both
elegant and effective, and this is the method adopted.

3.2 The data is compressed by exclusive-oring the current contents of
dynamic memory with the original (from the original story file). The
result is then compressed with a simple run-length scheme: a non-zero
byte in the output represents the byte itself, but a zero byte is
followed by a length byte, and the pair represent a block of n+1 zero
bytes, where n is the value of the length byte.

3.3 It is not necessary to compress optimally, if to do so would be
difficult. For example, an interpreter that does not store the whole
of dynamic memory in physical memory may compress a single page at a
time, ignoring the possibility of a run crossing a page boundary;
this case can be encoded as two adjacent runs of bytes. It is
required, however, that interpreters read encoded data even if it does
not happen to be compressed to their particular page-boundary
preferences. This is not difficult, requiring merely the maintenance of
a small amount of state (namely the current run length, if any) across
page boundaries on a read.

3.4 If the decoded data is shorter than the length of dynamic memory, then
the missing section is assumed to be a run of zeroes (and hence equal
to the original contents of that part of dynamic memory). This permits
the removal of redundant runs at the end of the encoded block; again
it is not necessary to implement this on writes, but it must be
understood on reads.

3.5 Two error cases are possible on reads: the decoded data may be larger
than dynamic memory, and the encoded data may finish with an incomplete
run (a zero byte without a length byte). These should be dealt with in
whatever way seems appropriate to the interpreter writer.

3.6 Dissenting voices have suggested that compression is unnecessary in
today's world of cheap storage, and so the format also includes the
capability to dump the contents of dynamic memory without modification.
The ability to write such files is optional; the ability to read both
types is necessary. It is an error for this dump to be shorter or
longer than the expected length of dynamic memory.

3.7 The IFF chunk used to contain the compressed data has type 'CMem'.
Its format is as follows:

3.7.1 4 bytes 'CMem' chunk ID
3.7.2 4 bytes n chunk length
3.7.3 n bytes ... compressed data as above

3.8 The chunk used to contain the uncompressed data has type 'UMem'. It
has the format:

3.8.1 4 bytes 'UMem' chunk ID
3.8.2 4 bytes n chunk length
3.8.3 n bytes ... simple dump of dynamic memory

- 4 - Contents of stacks

4.1 One of the biggest differences between current interpreters is how they
handle the Z-machine's stacks. Conceptually, there are two, but many
interpreters store both in the same array. This format stores both in
the same IFF chunk, which has chunk ID 'Stks'.

4.2 The IFF format includes a length field on each chunk, so we can write
only the used portion of the stacks, to save space. The least recent
frames on the stacks are saved first, to ensure that the missing part
appears at the end of the data in the file.

4.3 Each frame has the format:

4.3.1 3 bytes ... return PC (byte address)
4.3.2 1 byte 000pvvvv flags
4.3.3 1 byte ... variable number to store result
4.3.4 1 byte 0gfedcba arguments supplied
4.3.5 1 word n number of words of evaluation
stack used by this call
4.3.6 v words ... local variables
4.3.7 n words ... evaluation stack for this call

4.4 The return PC is a byte offset from the start of the story file.

4.6 The p flag is set on calls made by CALL_xN (discard result), in which
case the variable number is meaningless (and should be written as a
zero).

4.7 Assigning each of the possible 7 supplied arguments a letter a-g in
order, each bit is set if its respective argument is supplied. The
evaluation stack count allows the reconstruction of the chain of frame
pointers for all possible stack models. Words on the evaluation stack
are also stored least recent first.

4.8 Although some interpreters may impose an arbitrary limit on the size of
the stacks (such as ZIP's 1024-word total stack size), others may not,
or may set larger limits. This means that the size of a stack dump may
be larger than will fit. If you cannot dynamically resize your stack
you must trap this as an error.

4.9 The stack pointer itself is not stored anywhere in the save file,
except implicitly, as the top frame on the stack will be the last
saved.

4.10 The chunk itself is simply a sequence of frames as above:

4.10.1 4 bytes 'Stks' chunk ID
4.10.2 4 bytes n chunk length
4.10.3 n bytes ... frames (oldest first)

4.11 In Z-machine versions other than V6 execution starts at an address
rather than at a routine, and therefore data can be pushed on the
evaluation stack without anything being on the call stack. Therefore,
in all versions other than V6 a dummy stack frame must be stored as
the first in the file (the oldest chunk).

4.11.1 The dummy frame has all fields set to zero except n, the amount
of evaluation stack used. Note that this may also be zero if the
game does not use any evaluation stack at the top level.

4.11.2 This frame must be written even if no evaluation stack is used at
the top level, and therefore interpreters may assume its presence on
savefiles for V1-5 and V7-8 games.

- 5 - Associated Story File

5.1 We now come to one of the most difficult (yet most important) parts of
the format: how to find the story file associated with this save file,
or the related (but easier) problem of checking whether a given save
file belongs to a given story.

5.2 Considering the easier second problem first, the actual name of the
story file is often not much use. Firstly, filenames are highly
dependent on the operating system in use, and secondly, many original
Infocom story files were called simply 'story.data' or similar.

5.3 The method most existing interpreters use is to compare the variables
at offsets $2, $12, and $1C in the header (that is, the release number,
the serial number and the checksum), and refuse to load if they differ.
These variables are duplicated in the file (since the header will be
compressed with the rest of dynamic memory).

5.4 This data will be stored in a chunk of type 'IFhd'. This chunk must
come before the [CU]Mem and Stks chunks to save interpreters the
trouble of decoding these only to find that the wrong story file is
loaded. The format is:

5.4.1 4 bytes 'IFhd' chunk ID
5.4.2 4 bytes 13 chunk length
5.4.3 1 word ... release number ($2 in header)
5.4.4 6 bytes ... serial number ($12 in header)
5.4.5 1 word ... checksum ($1C in header)
5.4.6 3 bytes ... PC (see 5.8)

5.5 If the save file belongs to an old game that does not have a checksum,
it should be calculated in the normal way from the original story file
when saving. It is possible that a future version of this format may
have a larger IFhd chunk, but the first 13 bytes will always contain
this data, and if the other chunks described herein are present they
will be guaranteed to contain the data specified.

5.6 The first problem (of trying to find a story file given only a save
file) cannot really be solved in an operating-system independent
manner, and so there is provision for OS-dependent chunks to handle
this.

5.7 It should be noted that the current state of the IFhd chunk means
it has odd length (13 bytes). It should, of course, be written with
a pad byte (as mentioned in 8.4.1).

5.8 The value of the PC saved in the chunk depends on the version of the
Z-machine which the story runs on.

5.8.1 On Z-machine versions 3 and below, the SAVE instruction takes a
branch depending on the success of the save. The saved PC points to
the one or two bytes which describe this branch.

5.8.2 On versions 4 and above, the SAVE instruction stores a value
depending on the success of the save. The saved PC points to the single
byte describing where to store the result.

5.8.3 This behaviour differs from that specified by previous versions of this
standard, but the behaviour expected there would be difficult to
implement in existing interpreters. The situation has been complicated
as the patches available for the Zip interpreter did not correctly
implement the previous standard; instead, they behaved as specified
here.

- 6 - Miscellaneous

6.1 It must be specified exactly what the magic cookie returned by CATCH
is, since this value can be stored in any random variable, on the
evaluation stack, or indeed anywhere in memory.

6.2 For greatest independence of internal interpreter implementation, CATCH
is hereby specified to return the number of frames currently on the
system stack. This makes THROW slightly inefficient on many
interpreters (a current frame count can be maintained internally to
avoid problems with CATCH), but this is unavoidable without using two
stacks and a fixed-size activation record (always 15 local variables).
Since most applications of CATCH/THROW do not unwind enormous depths,
(and they are somewhat infrequent), this should not be too much of a
problem.

6.3 The numbers of pictures and sounds do not need specification, since
they are requested by number by the story file itself.

- 7 - Extensions to the Format

7.1 One of the advantages of the IFF standard is that extra chunks can be
added to the format to extend it in various ways. For example, there
are three standard chunk types defined, namely 'AUTH', '(c) ', and
'ANNO'.

7.2 'AUTH', '(c) ', and 'ANNO' chunks all contain simple ASCII text
(all characters in the range 0x20 to 0x7E).

7.2.1 The only indication of the length of this text is the chunk length
(there is no zero byte termination as in C, for example).

7.2.2 The IFF standard suggests a maximum of 256 characters in this text
as it may be displayed to the user upon reading, although it could
get longer if required.

7.3 The 'AUTH' chunk, if present, contains the name of the author or
creator of the file. This could be a login name on multi-user systems,
for example. There should only be one such chunk per file.

7.4 The '(c) ' chunk contains the copyright message (date and holder,
without the actual copyright symbol). This is unlikely to be useful on
save files. There should only be one such chunk per file.

7.5 The 'ANNO' chunk contains any textual annotation that the user or
writing program sees fit to include. For save files, interpreters
could prompt the user for an annotation when saving, and could write
an ANNO with the score and time for V3 games, or a chunk containing
the name/version of the interpreter saving it, and many other things.

7.6 The 'ANNO', '(c) ' and 'AUTH' chunks are all user-level information.
Interpreters must not rely on the presence or absence of these chunks,
and should not store any internal magic that would not make sense to
a user in them.

7.7 These chunks should be either ignored or (optionally) displayed to
the user. '(c) ' chunks should be prefixed with a copyright symbol
if displayed.

7.8 The save-file may contain interpreter-dependent information. This is
stored in an 'IntD' chunk, which has format:

7.8.1 4 bytes 'IntD' chunk ID
7.8.2 4 bytes n chunk length
7.8.3 4 bytes ... operating system ID
7.8.4 1 byte 000000sc flags
7.8.5 1 byte ... contents ID
7.8.6 2 bytes 0 reserved
7.8.7 4 bytes ... interpreter ID
7.8.8 n-12 bytes ... data

7.9 The operating system and interpreter IDs are normal IFF 4-character
IDs in form. Please register IDs used with me <[email protected]>, so
this can be managed sensibly. They can then be added to future
versions of this specification, and contents IDs can be assigned.

7.10 If the s flag is set, then the contents are only meaningful on the
same machine/network on which they were saved. This covers filenames
and similar things. How to handle checking if this is indeed the same
machine is an open question, and beyond the scope of this document.
It is certainly true, however, that if the operating system ID does
not match the current system and this bit is set, then the chunk
should not be copied.

7.11 If the c flag is set, the contents should not be copied when loading
and saving a game--they are only relevant to the exact current
state of play as stored in the file. The data need not be copied
even if this flag is clear, but must not be copied if it is set.

7.12 If the interpreter ID is ' ' (four spaces), then the chunk contains
information useful to *all* interpreters running on a particular
system. This can store a magical OS-dependent reference to the original
story file, which need not worry about vagaries of filename handling on
more than one system. This chunk may contain anything that can be put
in a file and retrieved intact. If the file is restored on a suitable
system this can be used to do Good Things.

7.13 If the operating-system ID is ' ', then the chunk contains data
useful to *all* ports of a particular interpreter. This may or may
not be useful.

7.14 The interpreter and operating-system IDs may not both be ' '.
This should not be neccessary.

7.15 If neither ID is ' ', the contents are meaningful only to a
particular port of a particular interpreter. Save-file specific
preferences probably fall into this category.

7.16 The contents ID will be defined when chunk IDs are picked. Its
purpose is to allow multiple chunks to be written containing
different data, which is necessary if they need different settings
of the c and s flags.

7.17 These extensions add no overhead to interpreters which choose not to
handle them, except for larger save files and more chunks to skip
when reading files written on another program. Interpreters are not
expected to preserve these optional chunks when files are re-saved,
although some may be copied, at the option of the interpreter writer
or user.

7.18 The only required chunks are 'IFhd', either 'CMem' or 'UMem', and
'Stks'. The total overhead to a save file is 12 bytes plus 8 for each
chunk; in the minimal case ('IFhd', '[CU]Mem', 'Stks' = 3 chunks), this
comes to 36 bytes.

7.19 The following operating system IDs have been registered:

7.19.1 'DOS ' MS-DOS (also PC-DOS, DR-DOS)
7.19.2 'MACS' Macintosh
7.19.3 'UNIX' Generic UNIX

7.20 The following interpreter IDs have been registered:

7.20.1 'JZIP' JZIP, the enhanced ZIP by John Holder

7.21 The following extension chunks have been registered to date:

System ID Interp ID Content ID Section
7.21.1 'MACS' ' ' 0 7.22

7.22 The following chunk has been registered for MacOS, to enable a
Macintosh interpreter to find a story file given a save file using
the System 7 ResolveAlias call. The MacOS alias record can be of
variable size: the actual size can be calculated from the chunk size.
Aliases are valid only on the same network as they were saved.

7.22.1 4 bytes 'IntD' chunk ID
7.22.2 4 bytes n chunk length (variable)
7.22.3 4 bytes 'MACS' operating system ID: MacOS
7.22.4 1 byte 00000010 flags (s set; c clear)
7.22.5 1 byte 0 contents ID
7.22.6 2 bytes 0 reserved
7.22.7 4 bytes ' ' interpreter ID: any
7.22.8 n-12 bytes ... MacOS alias record referencing
the story file; from NewAlias

- 8 - Introduction to the IFF format.

8.1 This is based on the official IFF standards document, which is rather
long and contains much that is irrelevant to the task in hand. Feel
free to mail me if there are errors, inconsistencies, or omissions.
For the inquisitive, a document containing much of the original
standard, including the philosophy behind the structure, can be found
at http://www.cica.indiana.edu/graphics/image_specs/ilbm.format.txt

8.2 IFF stands for "Interchange File Format", and was developed by a
committee consisting of people from Commodore-Amiga, Electronic Arts
and Apple. It draws strongly on the Macintosh's concept of resources.

8.3 The most fundamental concept in an IFF file is that of a chunk.
8.3.1 A chunk starts with an ID and a length.
8.3.2 The ID is the concatenation of four ASCII characters in the range 0x20
to 0x7E.
8.3.3 If spaces are present, they must be the last characters (there
must be no printing characters after a space).
8.3.4 IDs are compared using a simple 32-bit equality test - note that this
implies case sensitivity.
8.3.5 The length is a 32-bit unsigned integer, stored in big-endian format
(most significant byte, then second most, and so on).

8.4 After the ID and length, there follow (length) bytes of data.
8.4.1 If length is odd, these are followed by a single zero byte. This byte
is *not* included in the chunk length, but it is very important, as
otherwise many 68000-based readers will crash.

8.5 A simple IFF file (such as the ones we will be considering) consists of
a *single* chunk of type 'FORM'.
8.5.1 The contents of a FORM chunk start with another 4-character ID.
8.5.2 This ID is also the concatenation of four characters, but these
characters may only be uppercase letters and trailing spaces. This is
to allow the FORM sub-ID to be used as a filename extension.

8.6 After the sub-ID comes a concatenation of chunks. The interpretation of
these chunks depends on the FORM sub-ID (in this proposal, the sub-ID
is 'IFZS'), except that a few chunk types always have the same meaning
(notably the 'AUTH', '(c) ' and 'ANNO' chunks described in section 7).
For reference, the other reserved types are: 'FOR[M1-9]', 'CAT[ 1-9]',
'LIS[T1-9]', 'TEXT', and ' ' (that is, four spaces).

8.7 Each of these chunks may contain as much data as required, in whatever
format is required.

8.8 Multiple chunks with the same ID may appear; the interpretation of such
chunks depends on the chunk. For example, multiple ANNO chunks are
acceptable, and simply refer to multiple annotations. If more than one
chunk of a certain type is found, when the reader was only expecting
one, (for example, two 'IFhd' chunks), the later chunks should simply
be ignored (hopefully with a warning to the user).

8.9 Indeed, skipping is the expected procedure for dealing with any unknown
or unexpected chunk.

8.10 Certain chunks may be compulsory if the FORM is meaningless without
them. In this case the 'IFhd', '[CU]Mem' and 'Stks' are compulsory.

- 9 - Resources available

9.1 A set of patches exists for the Zip and Frotz interpreters, adding
Quetzal support. They can be obtained from:

http://www.geocities.com/SiliconValley/Vista/6631/

9.2 A utility, 'ckifzs' is available as C source code to check the
validity of generated save files. A small set of correct Quetzal
files are also available. These may be of use in debugging an
interpreter supporting Quetzal. These may be obtained from the
web page mentioned in 9.1.

9.3 This document is updated whenever errors are noticed or new extension
chunks are registered. The latest text version will always be available
from the above web page. The latest revision designated stable
(currently version 1.3) will be in the the IF archive, ftp.gmd.de,
in the directory /if-archive/infocom/interpreters/specification/.

9.4 This document is itself available in a number of forms. The base
version is this text version, but there is also a PDF version
(converted by John Holder) and an HTML version (converted by Graham
Nelson). Links to all of these may be found on the web page.

9.5 A few interpreters support Quetzal; details will appear here as
they become available.

- 10 - Credits.

10.1 This standard was created by Martin Frost <[email protected]>. Comments
and suggestions are always welcome (and any errors in this document
are entirely my own).

10.2 The following people have contributed with ideas and criticism
(alphabetic order):

King Dale <[email protected]>
Marnix Klooster <[email protected]>
Graham Nelson <[email protected]>
Andrew Plotkin <[email protected]>
Matthew T. Russotto <[email protected]>
Bryan Scattergood <[email protected]>
Miron Schmidt <[email protected]>
Colin Turnbull <[email protected]>
John Wood <[email protected]>