Second Generation Adventure Games

Second Generation Adventure Games
Reprinted from
The Journal of Computer Game Design
Volume 1, number 2, (August 1987): pages 4-7
Copyright 1987 by David Graves
[email protected]

Expanding the horizons of the traditional adventure game format has created a
new genre in literary entertainment: the participant novel. In a
participant novel, the player is more a character in a plot than a solver of
puzzles. Implementing a participant novel demands special computer modeling
techniques. As the program increases in complexity, use of the proper
representation of the novel's simulated environment becomes critical.
Without a strong model of the world and its physical laws, and without a
dynamic interface to this world, the developer is doomed to producing a
limited and unstimulating game. The field of Artificial Intelligence
provides programming constructs useful for creating these games of greater
complexity and sophistication. This paper will present a few ideas borrowed
from the field of AI: use of lists and trees, attributes, natural language
parsing, English generation, semantic frames, and goal-oriented routines, and
show how they support the complex model that a participant novel requires.
All of the ideas presented have been successfully applied to a working game
system.

LISTS AND TREES

Given that a participant novel will be proliferated with a great many people,
places, and things (hereafter called "objects"), there must be a dynamic
method for tracking and altering the position and relationships of each
physical object. One can create a model of the world as lists of containers:
cities contain houses, houses contain rooms, rooms contain things, and things
contain other things. The traditional method for representing complex
relationships of objects is through lists of lists, also called trees. Under
this arrangement, every object has pointers to its parent (the thing that
contains it), its children (the things it contains), and its siblings (the
things in the same place with it). This can be coded as three arrays:
parent, child, and sibling. Each slot in the array contains either an
object's identifying number or "nil" (a null pointer). An object that is
empty would have nil in its child slot, and an object that is alone in a
container would have nil in its sibling slot. (Object numbers would not be
declared explicitly, of course; they would be assigned to each object's
symbol name by a compiler).

Objects containing several other objects are represented using the child and
sibling arrays together. The container points to the first child and the
first child points to his sibling, who points to the next sibling. The end
of the list is indicated with a nil sibling pointer. For example, a chest
containing a lamp and a rope would have the object number of the lamp in its
child slot, and the object number of the rope would be in the lamp's sibling
slot. The lamp's sibling slot would be nil. Both the lamp and rope would
have the object number of the chest in their parent slots, and nil in their
child slots.

The sibling and child links define a formal tree. The links from a child
back up to its parent result in a directed graph rather than a tree proper.
The inclusion of the parent links allow easier traversal of the tree in
either direction.

Representing the world in this way makes it very easy to manipulate groups of
objects. Moving a chest to new location involves changing only the pointers
of the old and new locations and the chest. The contents of the chest never
need be considered. No matter how large the object tree becomes, moving any
object and all of its contents can be done by changing only a few pointers.
Since people, places and things are all represented in the same type of data
structure, only one routine is required for all types of object movement.

It is important that the constructs used in the model be simple and
consistent. The model should be general enough that special constructs will
not be required each time the programmer thinks of a new type of object to
implement. For example, supporting surfaces (such as tables and chairs) can
be implemented using the same parent, child, sibling scheme. The objects
resting on the table are its children. The internal representation of the
table and its children is identical to the container representation, so
supporting surfaces are easily added to the model.

ATTRIBUTES OF OBJECTS

All objects have characteristics or "attributes" that define that object's
limitations. Some attributes are boolean (stating whether this object is
flammable or immovable for example) and some may have a numeric value
associated with them, such as the weight and volume of an object. Some
attributes are unchangeable (such as flammability and mass) and some are
variable, describing only the current state of an object (i.e. closed,
locked, and burning).

Using attributes to define how an object may be manipulated within the model
world allows the programmer to define a much more consistent and dynamic
world. Consider the case where the programmer creates a fireplace and some
logs. He adds some special case logic to his program that will support
burning the logs with the expectation that the player will attempt to burn
the logs in the fireplace. However, if the player tries burning a chair in
the fireplace, nothing happens because the luckless programmer didn't think
of the player trying that.

This mess can be avoided by building the game's verb routines around a system
of attributes. By defining the attribute "flammable" and making the Burn
verb check for the presence of this attribute, the programmer allows the
player to burn down anything flammable, such as furniture, papers, doors, and
fireplace logs. Thus, he defines the generalized phenomenon of "burning"
rather than coding individual cases.

Once the programmer has decided on the set of attributes he wishes to
support, he need only write individual action routines that manipulate those
attributes, then decide which attributes apply to each of his objects. Thus,
some whiskey could be defined as having the attributes liquid, edible, and
flammable. Groups of attributes can work together in subtle and wonderful
ways, allowing the player to perform activities in ways not even considered
by the programmer. Would you think of putting moonshine whiskey in a lantern
if you had no kerosene?

RECOGNIZING NATURAL LANGUAGE

Understanding natural language has proven to be a difficult problem in AI,
but much can be accomplished in a participant novel by recognizing relatively
simple statements in the following syntactic form:

[Subject] VerbClause [DirectObject] [Preposition IndirectObject]

Optional sections of the syntax are enclosed in brackets. Each word in a
command typed by the player is looked up by a dictionary routine and is
identified as a verb, noun, adjective, or whatever. The most simple command
or "statement of intent" is simply a verb. Instructions to another person
are indicated by using that person's name as the subject of the sentence, as
in "Sam, run!". Omitting the subject of the sentence implies that the player
himself is the actor. Direct and indirect objects are noun clauses, which
consist of some optional adjectives followed by a noun. Prepositions like
"at," "in," and "on" separate a direct object from an indirect object.
Conjunctions (such as "and"), punctuation, and articles (such as "a" and
"the") can be tossed out by the parser as "noise", or verified to appear in
the expected positions.

This syntax will accept input strings such as "Put the gray stone into the
large oak chest". By further extending the parser to allow a list of objects
as a direct object and allowing sentences to be strung together, the program
can recognize statements like "Open the chest, take the lamp, the knife, and
the long rope, then climb the stairs". Since everyone has a slightly
different way of saying what they mean, the program must have a very large
dictionary with links between synonyms. Thus, "Set the lamp on the table"
and "Place the lantern onto the table" could be recognized as having the same
meaning.

Travel is something that happens frequently in this type of game, so it
should be very easy for the player to express what is desired. The parsing
scheme above will recognize travel syntax such as "Climb the tree" or "Enter
the cottage," but not "Go east" or its abbreviated forms, "East" and "e".
Recognition of these few irregular forms can be added to the parser as a
special case.

The parser should be able to recognize modifiers for verbs. The simple case
is the use of an adverb, as in "Carefully open the door". A more complex
case is where the player uses a verb and a preposition together in a way that
changes the meaning of the verb. For example "put down" means drop, "put
out" means extinguish, and "put on" means wear. This can be implemented by
creating a table with verb/preposition pairs followed by the resulting verb.
When a verb/preposition pair in the command string matches an entry in the
table, it can be replaced with the resulting verb. Parsing then continues as
if no preposition has been seen yet. This technique also allows recognizing
a command with a dangling preposition such as "Take the cloak off" which
would otherwise be rejected by the parser as an incomplete sentence.

We are all accustomed to using pronouns (such as "it") and expecting our
listener to resolve what we mean by reviewing the preceding context. A
natural language parser might be expected to do the same. A simple rule for
implementing parsing of "it" could be "Replace the 'it' with the last direct
object named". This allows proper parsing of "Put the lamp on the table and
light it," but not "Put the lamp into the chest, then close it". In the
second statement, the player probably meant to close the chest, not the lamp.
Changing the rule to use the last direct or indirect object in place of "it"
won't work in all cases either. The effect of this rule on the command "Put
the lamp on the table and light it" would be to set the table on fire. While
it is possible to define the semantics of "it" for each of these cases
through complex programming, it might make more sense to choose a simple
rule, such as "use the last direct object in place of 'it' in all cases,"
then inform the user of the limitations of the parser through the game
documentation.

The parser could be expanded to allow words such as "everything" or "all" in
place of a direct object. This lets the player make statements like "Drop
everything into the chest". Note that the semantics of "everything" change
with the verb used. In "Drop everything," "everything" really means
"everything that I'm holding," while in "Get everything," it means
"everything that I'm not holding". For some verbs, such as "Examine," the
scope of the word "everything" is "all visible objects, whether held or not".
The player may also wish to set the scope of "everything" to a limited
domain, as in "Look at everything on the table". Regardless of how
"everything" is used in a sentence, it can be implemented by simply calling
the verb action routine once for every object within the the command's scope.

GENERATING ENGLISH

While interacting with a participant novel, the player is going to read vast
quantities of text. Some of this may be "canned text" and some will need to
be generated. The constant attributes of a location could be described with
fixed text, but certainly not the descriptions of the objects there. Since
most objects can be moved around, placed in and on each other, and change
state, the program must be capable of dynamically producing the text to
describe them. The key to generating interesting text is to vary the length
of the sentences generated, vary the sentence structure, and use different
synonyms whenever possible. For example, to describe the "children" of a
table the program might generate: "There is a knife, some bread, and a key
on the table," or "A knife, some bread, and a key are on the table," or
"Resting on the table are a knife, some bread, and a key". The description
generator would have a list of synonymous phrases such as "resting on,"
"lying on," and "sitting on".

A text generator must pay careful attention to the issue of grammar. When a
sentence contains a list of objects, the objects must be counted so that the
conjunction "and" can be inserted before the last object. Lists containing
more than one object will use "is" instead of "are" (except when using the
form "There is a <list>...."). Each object in a list should be followed by a
comma (except the last two) and preceded by a the appropriate article ("a,"
"the" or "some"). Selecting the article for an object depends on the
attributes of the object and the type of sentence. When introducing an
ordinary object, the article is "a" or "an". An object that has the
attribute of being "a quantity" (bread, wine, cheese) the article would be
"some". Familiar people would be mentioned by name, without any preceding
article. Objects that were mentioned recently would have the article "the".
The attributes of an object will also dictate the type of preposition used in
the generated text. For instance, describing the children of a surface (such
as a table) would dictate the use of "on," while containers would have things
"in" them.

The result of the player's actions can be reported via a text generation
facility as well. When the player enters a vague command such as "Drink,"
the program would give all the details of what happened, like "You drink some
cool water from the stream," thus informing the player that his canteen has
not been taxed. This reporting mechanism should make use of pronouns (like
"he," "she," or "it") when an object is mentioned more than once in a
sentence, to avoid generating text that sounds too mechanical. An example of
this would be "You give the bowling ball to Sam, but he drops it". As
before, the attributes of an object will aid in selecting the appropriate
pronoun.

FRAME-BASED EXCEPTIONS

Any non-trivial model of reality will contain a large number of objects whose
characteristics and behaviors differ from normal expectations. These
exceptions might range from personality quirks to magical or "high-
technology" physical properties that deviate from the norm. A computer model
of such a world needs a simple method to define, organize, and deal with all
these exceptions. One method is to attach to each exceptional object some
information about the context in which an exception applies, and the effect
of invoking that exceptional behavior. The effect is defined as a fragment
of executable code. The context is defined in a semantic frame. In AI, a
frame may be defined as "an instantiation of a context". In this scheme, a
frame is a simple template for a single semantic action using a specific verb
type and specific objects. Any object can point to a unique frame, which
points to a code fragment. This way, a frame defines the semantic context in
which an exception handling code fragment applies.

[Addendum, August, 1991: Note that in object-oriented languages, frames are
a built-in construct (usually called "selectors"). Thus an object-oriented
language makes it very easy to implement a base of general rules with
hierarchies of exceptional rules (methods)].

Assume we had modeled the mouth of a volcano as a large container. Normal
semantics for an object tossed into container dictate that the object will
come to rest inside the container. In this case, however, we want to have
the object be completely destroyed, thus eliminating any possibility of
getting it back. The volcano could then be defined with a frame that reads
"Throw anything into the volcano", to provide the context in which the
exception will be invoked. Whenever an actor throws any object into the
volcano, this frame is selected as matching the actor's action. After the
default processing for this action, the code fragment attached to the frame
is executed. The code fragment would cause the object to be destroyed, and
perhaps display a colorful description of the event.

Frames may specify a specific context (by requiring a specific verb, direct
object, and indirect object) or a more general context (with a range of
verbs, and "wild card" matching of direct or indirect objects). Moreover,
some frames must take precedence over others. Any action might match the
frames attached to the direct object, the indirect object, the location, or
the character performing the action. Since an action might invoke several
conflicting exceptions, the game program must select the one frame that is
the best match to this action. In most cases, "best match" means selecting
the matching frame with the most specific statement of action. (Other
criteria for precedence may be used as well, such as selecting the frame
attached to the direct object rather than the indirect object). Consider the
case of an explosive with a context frame stating "Throw the explosive into
the volcano". This frame is more specific than the volcano's "Throw anything
into the volcano", so the explosive's frame is selected. The code fragment
can then define some effect that is an exception to the "normal" exception.

An object may have any number of frames attached to it, and multiple frames
may point to the same code fragment. A code fragment may override the
default semantics for an action completely, or simply append additional
semantics. Finally, objects may be defined which inherit frames from other
objects, which speeds development and lowers the chance of coding error when
defining objects with similar exceptional properties. It is important to
note that the use of semantic frames is intended only for those actions or
properties that are exceptional, and should therefore be used as sparingly as
possible. Attaching the same exception frame to many objects can be an
indication that it is not so exceptional after all. In this case, one may
want to define a new object attribute for this "non-exceptional" exception,
and handle those objects via the normal semantics routines.

Incidentally, one can use a scheme of frames and code fragments to control
behavior for simulated actors in the story, but the results are very limited.
The actor becomes only a "stimulis-response" being; he responds only to
actions or commands from the player, and speaks only when spoken to. For
example, an actor named Sam might have a frame attached to him, stating
"Throw anything at Sam". If the player (or another actor) throws any object
at Sam, the attached code fragment is executed, which might put out a message
like: "Sam is outraged by the insult, and draws his weapon". Clearly, the
actor can only demonstrate those behaviors that were pre-programmed. Things
become much more interesting when actors are capable of original behavior.
Each computer-controlled actor should be able to recognize a need, then
create a plan to fulfill it. Different personality types (based on
attributes, once again) would give rise to different types of behavior.

GOALS AND SUB-GOALS

A standard feature in many adventure games is the requirement that the player
remember long sequences of actions in minute detail and type these sequences
when needed. Forgetting to perform one of the steps in a sequence results in
an error message. For example, if the player sees a bottle of beer on the
table and says "Drink the beer," he is likely to get an error message like
"You aren't holding the beer" or "The beer isn't open". Memorizing detailed
sequences of instructions is not what most people would call fun. Wouldn't
it be much nicer if the computer could just "understand" the player's intent?

It turns out that it is relatively easy to create a system where the
programmer defines a corrective action for such an error, instead of giving
an error message. By defining the prerequisite state attributes for any
action, the system can automatically break a goal into sub-goals. Thus, when
the Drink routine finds that the player is not holding the beer, a new sub-
goal is set: get the beer. Upon resolving that sub-goal, the Drink routine
is re-entered. It then checks to see if this container is open and if not,
it sets a new goal: open the container. In the end the player is told of
how all these events proceeded with a message like: "You take the beer from
the table, open it, and drink from it".

Filling in the implied prerequisite actions is a simple form of planning,
called backwards chaining. Any new goals must be stored on a stack rather
than in a static structure, since any goal can create new sub-goals, which
may create others. Each goal must be resolved in order, the latest goal
first. The game program simply attempts the action on the top of the stack,
pops it off if successful, or pushes new sub-goals if needed. It is
important to note that only errors concerning variable attributes are
correctable. Errors caused by conflicts in fixed attributes (such as
attempting to set fire to a stone) are not correctable by creating a new
sub-goal. If a non-correctable error occurs, the player is informed and his
goal stack is cleared.

Note that once all of the verb routine have all their prerequisite states
defined as sub-goals, it becomes very easy to simulate intelligent behavior
by other actors in the story. For example, if the player asks another
character to "Go out, get the rope and return," the character appears to make
intelligent decisions like opening the door before attempting to exit, and
untying the rope from a tree before attempting to walk away with it. The
pace of the game remains brisk since the player need not specify each
detailed step in a process. The software "understands" what is implied by
the player.

SUMMARY

By selecting a simple, yet powerful representation for the system, the
designer can create a much more dynamic, consistent model of the novel's
fantasy world. Building each verb routine around a set of attributes instead
of coding special cases allows the player to perform actions that were not
thought of by the designer. A well defined parsing scheme, aided by a large
dictionary, allows a player to form his commands many different ways, instead
of just the one "right way". Dynamic generation of text in a variety of
formats can make the computer generated text much more interesting for the
player to read. Defining exceptions to normal semantics via frame-activated
code fragments aids in managing the complexity of such exceptions. Routines
that can detect simple state errors and create correcting sub-goals can fill
in the implied actions, and thereby help the game keep pace. Of course, even
the best software needs a well written story to make an interesting
participant novel, but that a topic for another paper altogether.