Evaluating the Use of Action Templates to

advertisement
Intelligent Narrative Technologies: Papers from the 2013 AIIDE Workshop (WS-13-21)
Evaluating the Use of Action Templates to
Create Suspense in Narrative Planning
Matthew William Fendt, David Roberts, R. Michael Young
Digital Games Research Center
Department of Computer Science
NC State University
Raleigh, NC 27695-8206
Abstract
The IRIS story generation system is a narrative planner that
is designed to create suspenseful stories at the story-action
level. It uses a set of action templates that are injected into
the planning process to create suspense around a given goal.
The action templates were evaluated independent of the rest
of the IRIS system against a series of other techniques designed to create suspense. Results show that it is possible to
use these set of action templates to create localized moments
of suspense in a story at the story-action level without the
need for discourse-level suspense.
Introduction
Figure 1: The IRIS story generation system. The part of the
system that is being evaluated in this paper is the set of suspense action templates. They are used in the story planning
process to produce plans whose execution will be suspenseful. The plans are used to create a story outline containing
suspense that can then be instantiated in different media.
In recent years, the narrative aspect of games has grown
in importance to game developers and players. Games like
Fallout: New Vegas and Mass Effect 3 extol the importance
of player decisions on the outcome of the story. This emphasis on narrative provides an opportunity to utilize the benefits
of computer science by improving the game author’s ability
to incorporate his or her story into the game. Researchers
seek to dynamically generate content, model player gaming
behavior, influence player agency and decision making, provide a tailored game experience, and more, all in the interest
of improving the narrative game experience for the player.
The IRIS story generation system (Fendt 2010), too, seeks
to improve the player’s narrative experience. One of the system’s components is a small set of action templates that are
introduced in the story plan creation process to create localized moments of story action-level suspense. Story-action
level refers to the selection and combination of the story actions themselves, independent of how they are told or the
medium in which they are presented. These action templates
were evaluated independent of the rest of the IRIS system, in
order to validate their effectiveness in creating suspense. The
IRIS system will use these templates to produce a story outline which contains suspenseful action sequences. The output provides game designers an alternative to writing all of
the suspenseful story moments manually. Additionally, the
outline nature of the output of IRIS allows for flexibility in
its instantiation into the game world, allowing choice of how
the suspense is realized in game. An overview of the IRIS
system is shown in Figure 1.
Over the course of two experiments, four different suspense creation processes were compared: the IRIS action
templates, an operationalization of a case-based reasoning
system with narrative properties called MINSTREL (Turner
1994), a human author with creative writing experience, and
a logically consistent random baseline. Each process created
a set of suspenseful story fragments given the same domain
of a murder mystery and a grammar from which to produce
the story fragments. Participants evaluated pair-wise comparisons of the different processes with regards to suspense.
We found that stories produced by the human author and by
most of the IRIS fragments, particularly those involving a
disparity of knowledge between the audience and the protagonist, were significantly more suspenseful than the random
story fragments. Additionally, the IRIS action templates did
not perform significantly differently than the human author.
This validates the IRIS action templates as a viable means
of creating suspense at the story-action level.
Related Work
Suspense has been defined multiple different ways (Cheong
and Young 2008), (Rodell 1943), but the definition that
we will use is: “the feeling of excitement or anxiety that
audience members feel when they are waiting for some-
c 2013, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
16
Description of Suspenseful Action Templates
thing to happen and are uncertain about a significant outcome” (Cheong and Young 2008). Suspense can be created
many different ways. Some ways involve having the audience identify and empathize with the protagonist (Vorderer
1996), (Zillmann 1996). Other ways to build suspense come
at the discourse, or narration, level, such as showing less
detail and allowing the audience’s imagination to fill in the
detail (Wulff 1996) or having the audience anticipate failure,
even if the possibility of failure is actually low (Wulff 1996).
These ways of creating suspense are difficult to achieve at
the story-action level, however, since most of them involve
carefully crafted discourse. One way to create suspense at
the story-action level is to demonstrate that there are only a
few paths to success of the outcome, and then systematically
cut off those paths (Cheong and Young 2008), (Gerrig and
Bernardo 1994). For example, the hero is trying to get off an
island with an erupting volcano. The hero tries to get to the
helicopter, but finds that someone flew off in it. Then, the
hero tries to cross the bridge, only to find that it has burnt
down. The dwindling options produce a feeling of suspense
in the reader because the success of the hero getting off of
the island is uncertain. This feeling of suspense is partially
influenced by the suspense intrinsic in the sequence of the
story actions themselves. Discourse can also be used to create suspense; however, in this paper we focus on the former.
This story-action level method for creating suspense is the
backbone of the action templates evaluated in this paper.
There have been relatively few systems that introduce suspense into story generation. Besides IRIS, two story generation systems that have included suspense are Suspenser and
MINSTREL. Suspenser (Cheong and Young 2008) is a planning system that generates suspenseful stories. It generates
suspense at the discourse level. Suspenser first uses a planner to produce the story events. It then iteratively increases
suspense by replacing and reordering story events. However,
since Suspenser generates suspense at the discourse level, it
may not be generating stories that are intrinsically suspenseful, but rather adding suspense to the stories after they have
been generated. Suspenser also is not guaranteed to generate
high suspense stories, since it can get stuck in a local maximum when adding suspense. The discourse-level suspense
created by Suspenser is complimentary to the story-action
level suspense our IRIS system creates. In future work, the
two approaches could be combined; however, that is beyond
the scope of this paper.
MINSTREL (Turner 1994) is a case-based reasoning system that creates stories that have, among other properties,
suspense. It creates suspense both at the story-action level
and the discourse level. MINSTREL has a rigid model that
determines when and how to create this suspense. The only
appropriate time to create suspense in its model is in a scene
that the author wants to emphasize, and when the character’s life is in danger. MINSTREL provides a framework for
when, when not, and how to create suspense around an outcome. Its weakness is the simplistic way in which each of
these criteria are determined. This is likely due to the fact
that suspense is just one of several narrative phenomena that
the system is trying to capture. Additionally, it only provides
this one formula to generate suspense.
IRIS (Fendt 2010) is a narrative planner that uses a Drama
Manager and belief/desire/intention mental modeling with
intention revision to produce suspenseful stories. This paper reports the results of two experiments on the suspenseproducing components of the IRIS system and shows that
a set of action templates is sufficient to create story-action
level suspense in story fragment generation. IRIS produces
local moments of suspense around certain goals at the storyaction level. The long-term goal for the IRIS system is to
take these templates and produce a story outline that is created using these templates to create suspense. The story outline can then be adapted to many different discourses in media such as text-based and three dimensional games.
The IRIS suspense templates were created to cover the
different ways suspense can be enumerated within a specific definition of suspense. The templates were designed to
create suspense at the story-action level. Within the storyaction level, the IRIS system creates suspense from protagonist plan failure. The protagonist’s plan will fail if one of the
preconditions of the actions in its plan does not hold. There
are three ways that the precondition can fail to hold.
The first way a precondition can fail to hold is an inconsistency between the protagonist’s beliefs and the actual state
of the world. This inconsistency can be unknown to the audience too, so that the audience is surprised by the plan failure,
or known to the audience, resulting in dramatic irony.
The second way a precondition can fail to hold is if the
system deliberately modifies the state of the world to introduce plan failure. In this case, a precondition of the protagonist’s plan was true up to a point, but then was reversed
by the system. This is analogous to initial state revision in
planning (Riedl and Young 2005). This modification is unknown to the protagonist, but also unknown to the audience.
It is kept secret from the audience because the change is
a retroactive modification to the initial state of the world
which seemingly has no perpetrator, so it would look unnatural for the audience if they saw it changed.
The third way a precondition can fail to hold is if an antagonistic agent takes action to thwart the precondition in the
protagonist’s plan. This agent could be another character or
nature. The action may or may not be observed by the protagonist. It is observed by the audience, because otherwise
this condition would be a subset of the first condition where,
unknown to the protagonist and the audience, the state of the
world is inconsistent with the protagonist’s beliefs.
The high level Drama Manager actions from Figure 2
were translated into the suspense templates shown in Figure 3. This translation was made by enumerating the way
these actions could fail using the three types of plan failure
described above. The suspense templates that were not rated
as suspenseful in the experiment are indicated by an asterisk.
Initial Experimental Design
An initial experiment evaluated the IRIS templates against
multiple other suspense creation processes: three human authors, an operationalization of the MINSTREL system, and
a logically and temporally consistent random baseline. The
17
The participants were first given a link to the consent form.
Then, they were given a demographic survey asking their
age and gender. In the demographic survey and the comparisons, the participants always had the option to leave any
question unanswered. Next, they were given the background
information about the domain of the story.
The participants were then given a series of sentence fragment pairs. They were asked to read each fragment and compare which was more suspenseful. They were given four options: Story Fragment A was suspenseful and Story Fragment B was not, Story Fragment B was suspenseful and
Story Fragment A was not, Both are suspenseful, or Neither are suspenseful. After selecting their choice hitting the
Next button, they were shown another pairing.
Four hypotheses were tested:
Hypothesis 1: The story fragments generated with the
IRIS action templates will be comparably suspenseful to
the fragments generated by the human authors selecting
from the same set of story actions. To ensure that the story
fragments that the action templates and the human authors
generate are able to be compared, they both compose their
fragments from the same set of story actions. This puts a
limitation on the human author since he or she can only use
a limited set of story actions instead of an unlimited vocabulary. If this hypothesis holds, then the action templates will
have been shown to be sufficient to generate suspense at the
story-action level comparable to a human author selecting
from the same set of story actions.
Hypothesis 2: The story fragments generated with the
IRIS action templates will be more suspenseful than the
randomly-generated baseline. It needs to be shown that the
action templates are an improvement over no method of creating suspense at all.
Hypothesis 3: The story fragments generated by
the human authors will be more suspenseful than the
randomly-generated baseline. This demonstrates that the
human authors were actually able to create more suspense
than random combinations of story events.
Hypothesis 4: The story fragments generated with the
IRIS action templates will be at least as suspenseful as
the MINSTREL formula. Since the IRIS action templates
are more complicated to implement and apply than the MINSTREL formula, it is desirable that they perform at least as
well as the simpler MINSTREL approach.
Successful Action, SA(a): A character successfully performs an action a .
Failed Action, FA(a): A character attempts action a and
fails.
Conflicting Action, CA(a): A character successfully performs an action a which will interfere with the protagonist’s
plan.
Hidden Alteration, HA(a): A change to the world state a
is made without the characters’ or audience’s knowledge.
True Observation, TO(a): A character makes a correct observation about some state of the world a.
True Utterance, TU(a): One character truthfully informs
another character about some state of the world a.
Discourse Statement, DS(a): A statement a about the
world that is made to the audience but not the characters.
Figure 2: The high level actions in the IRIS system that can
be combined to create suspense.
Cut Off Path
∗ 1) CA(a), TO(a)
∗ 2) FA(a), TO(a)
3) CA(a), withhold TO(a), FA(b), TO(b)
4) CA(a) (without character present), FA(b), TO(b)
Introduce New Info But Then Cut off Path
∗ 5) TU(a), HA(a), FA(b), TO(b)
∗ 6) TO(a), HA(a), FA(b), TO(b)
7) TU(a), CA(b), withhold TO(b), FA(c), TO(c)
8)TO(a), CA(b), withhold TO(b), FA(c), TO(c)
9) TU(a), CA(b), (without character present), FA(c), TO(c)
10) TO(a),CA(b) (without character present), FA(c), TO(c)
Dramatic Irony
∗ 11) DS(a), ... , plan failure due to world condition in DS(a)
Figure 3: Action templates that combine the high level actions from Figure 2 into suspenseful combinations of actions. The templates with an asterisk were not rated as suspenseful in the evaluation.
hypotheses were that the human authors and the IRIS fragments would be rated higher than the random baseline, and
that the IRIS fragments would be rated comparably to the
human authors. This rating would be done by participants
reading the fragments and selecting which fragment, if any,
was more suspenseful.
In the initial experiment, the story creation processes were
given a bank of English sentences to use in story fragment
creation. The domain was a knight trying to slay a dragon in
a medieval setting. Example sentences included “The knight
attacks the dragon,” and “The wizard tells the knight where
the dragon is hiding.” Some simple rules were also given
to ensure that the story would be logically and temporally
consistent. This was done to ensure that that participants did
not rate the fragment poorly because it did not make sense.
The study was conducted online in HTML with JavaScript
to help handle the selection and display of the story fragments. Participants were recruited by snowball sampling
through email, message boards, in person, and on Facebook.
Initial Experimental Results
117 people completed the demographic survey and were presented with the experiment. There were 34 women with an
average age of 33.6 +/- 15.2, and 83 men, with an average
age of 26.2 +/- 9.8.
Table 1 and Figure 4 show the results of the comparisons
with the IRIS action templates. Using a series of one-tailed,
2x2 Chi-Squared tests without Yates’ corrections, the IRIS
templates never significantly outperformed the random baseline. Some templates, namely 1, 5, 6, and 11, performed
poorly against all four of the other fragment creation processes. This is significant because they were included in the
follow up experiment and also performed poorly, where as
18
Table 1: p values for IRIS action templates versus the five
other creation processes in the initial experiment. A onetailed Chi-Squared test was performed versus the random
baseline, and a two-tailed Chi-Squared test was performed
versus the others. A dash indicates that more than 1/3 of the
responses rated neither as suspenseful.
Rand MINSTREL Hum 1 Hum 2 Hum 3
T1
0.03
0.03
0.05
0.18
0.03
T3
0.09
0.06
0.37
0.09
T4
0.11
0.02
0.03
T5
0.11
0.03
0.01
0.001
T6
0.3
0.001
0.05
0.001
0.03
T7
0.24
0.87
0.55
0.05
0.3
T8
0.36
1
0.11
0.06
0.05
T9
0.18
0.66
0.51
0.06
0.29
T 10
0.54
0.002
T 11 0.44
0.73
0.57
0.43
Figure 4: A visualization of the comparison between the
IRIS action templates and the three other creation processes
in the initial experiment. Red indicates the the column was
rated higher and black that they were not significantly different, both at the p = 0.05 level. Gray indicates that more
than 1/3 of the responses rated neither as suspenseful.
squared test testing if the one sentence fragment were rated
lower than those of two or more was significant (χ 2 (1)
= 38.05, p <0.0001). There was no significant difference
among fragments with regards to length as long as they had
two or more sentences. The initial set of IRIS templates had
several one sentence templates. These were removed from
the follow-up experiment (described below).
Finally, the participants were given 30 pair-wise comparisons to make among the different story fragments. There
was some concern that they may experience fatigue partway
through the experiment and begin to answer the questions
differently. Pair-wise 2x2 two-tailed Chi-squared tests were
run on each pairing of the 30 questions. No significant difference was found for the responses as the survey progresses.
the templates with marginal results in the initial experiment
performed much better in the follow up experiment. This
suggests inter-experiment validity because the templates that
did poorly in the first also did poorly in the second. If they
had seen a significant improvement, it would have been less
likely that the sentence creation in the initial experiment was
confounding the performance of the IRIS action templates.
The other fragment creation processes were also compared to the random baseline. The null hypothesis for these
comparisons was that they would perform comparably to
random. The alternative hypothesis was that they would perform better than random. A one-tailed, 2x2 Chi-Squared
test without Yates’ correction was performed for human authors 2 (χ 2 (1) = 0.34, p = 0.28) and 3 (χ 2 (1) = 3.00, p =
0.06). Human author 3 seemed to perform better than random, but human author 2 did not. For human author 1 and
MINSTREL, the participants reported more counts of preferring random then these methods, so a two-tailed, 2x2 ChiSquared test was performed for human author 1 ( χ 2 (1) =
4.06, p = 0.04) and MINSTREL ( χ 2 (1) = 0.85, p = 0.89).
This indicates that these methods did not perform better than
random; human author 1 performed worse than random.
Follow Up Experimental Design
For the follow-up experiment, the domain and story actions
were changed but the design and hypotheses remained the
same as in the initial experiment. All of the different creation
methods being tested produced story fragments using the
same fragment creation grammar. An excerpt of the grammar is shown in Figure 5. The domain was a murder mystery. In this story, there is a detective that is trying to figure
out who killed the host of a dinner party. This is the goal that
all of the story fragments will try to create suspense around.
To prevent the possible introduction of the discourse-level
suspense that proved problematic in the initial experiment, a
grammar was used for the creation of story fragments. Fragment length was limited to range between 2 and 6 sentences
in order to prevent readers from potential distraction regarding the suspenseful goals. Additionally, the same person or
place was not allowed to be used for both parts of a fragment.
For example “The maid attacks the maid” was not allowed.
A human author with creative writing experience was
recruited to produce suspenseful story fragments. He was
given the creation grammar, the rules described in the previous paragraph, and told to create as many suspenseful story
fragments around the goal of the detective trying to find the
killer as he felt he could generate. He produced seven story
fragments between lengths four and six sentences.
Next, a set of story fragments were created using the
MINSTREL method of creating suspense (see Section 2).
Since IRIS is designed to create suspense at the story-action
level and not the discourse level, the step where emotion is
Initial Experimental Discussion
The results of of this initial experiment showed that all creation processes, with the exception of one of the human authors, were rated similarly with regard to suspense. Also,
few of the fragments were rated as unsuspenseful. It was
hypothesized that something other than the creation process
was introducing suspense, i.e., the sentences themselves. In
the follow-up experiment, the story actions were constructed
in a template-based way instead of the sentences of the first
experiment, to mitigate the effect of discourse-level suspense being introduced into the sentences themselves.
Another interesting result was that about halfway through
the initial experiment, it was observed that the one sentence
fragment stories were rated significantly lower than fragments with two or more sentences. A one-tailed 2x2 Chi-
19
interjected at the discourse level was omitted. An example
MINSTREL fragment was “The maid attacked the detective.
The detective tried to move to the attic but failed.”
Then, the story fragments from the IRIS action formulas
shown in Figure 3 were created. There were multiple ways
to create fragments from each of the action formulae, so all
of the combinations were generated. There is some subtlety
in templates 3, 7, & 8, and 4, 9, & 10. In the first three, an
action is performed without the main character’s knowledge,
and in the second three an action is performed without that
character being present. However, the suspense is being generated at the story-action level and not the discourse level, so
the templates do not make any commitment as to how this
action is conveyed to the audience. Since the participants
will be reading the story as text and not LISP-style action
syntax, a thin but necessary discourse is introduced. These
are the template-based “Modifiers” described in Figure 5.
For the logically and temporally consistent baseline, a
program was created that generated random story fragments
following the story creation rules. The actual story fragments were generated at run time during each participant’s
experimental session.
The study was conducted online in HTML with JavaScript
to help handle the selection and display of the story fragments. Participants were recruited by snowball sampling in
email and in person, and through undergraduate and graduate computer science classes. For students recruited through
courses, participation fulfilled an experimental course requirement. The participants were first given a link to the
consent form. Then, they were given a demographic survey
asking their age and gender. In the demographic survey and
the comparisons, the participants always had the option to
leave any question unanswered. Next, they were given the
background information about the domain of the story.
Figure 6: A screenshot of the survey, where participants read
and compared the suspensefulness of two story fragments.
There were 36 pair-wise combinations: 11 IRIS action
templates x 3 other story creation methods + the 3 combinations of these 3 methods. At run time, a random fragment
from the possible ones of the given approach was selected.
The same hypotheses were tested as in the previous experiment.
Follow-Up Experimental Results
83 people completed the demographic survey and were presented with the experiment, with an average age of 22.1 +/2.7. There were 9 women and 74 men who participated in the
experiment. There were too few women to separate them out
and compare their responses to those of the men, so this is a
potential threat to external validity. A sizable portion of the
participants were computer science undergraduates, which
accounts for the low average age. To evaluate if the IRIS
templates would be rated comparably to the human authors
and if the templates would be rated comparably to MINSTREL, we used the two-tailed 2x2 Chi-square test without
Yates’ correction to compare the participants’ response to a
null hypothesis of equal responses for both approaches. The
responses for “Both were suspenseful” were added to the
counts for believing one method or the other was more suspenseful. For example, 32 people reported that they found
template 7 more suspenseful than MINSTREL, seven found
MINSTREL more suspenseful than template 7, 12 reported
them both suspenseful, and 12 reported neither suspenseful.
So the 12 that reported both were suspenseful were added
to the counts for each of the two approaches, bringing the
count up to 44 for template 7 and 19 for MINSTREL. This
was compared with a null hypothesis of 31 for each, which
is half of the responses going to each approach (Chi-square
tests can’t use decimal numbers). If there was no difference
in the two methods, then the reported counts would not differ significantly from the expected results. In this case, χ2 (1)
= 5.13, p = 0.02, indicating that the participants report template 7 more suspenseful than MINSTREL.
To evaluate Hypothesis 2, that the IRIS templates would
be rated higher than random, and Hypothesis 3, that the human authors would be rated higher than random, we used
the one-tailed 2x2 Chi-square test without Yates’ correction
to compare the participants’ response to a null hypothesis of
equal responses for both approaches. Here the process was
Actions
<person> observes <condition>.
<person1> attacks <person2>.
<person> moves to <place>.
The detective tries to arrest <person>.
Conditions
that <person> is dead that <person> is at <place>
Modifiers
(...) but fails
(Unbeknownst to the detective) ...
People
Places
The detective
The dining room
The maid
The bedroom
Figure 5: Excerpt of the story fragment creation grammar.
The participants were then given a series of sentence fragment pairs. They were asked to read each fragment and indicate which was more suspenseful. They were given four
options: Story Fragment A was suspenseful and Story Fragment B was not, Story Fragment B was suspenseful and
Story Fragment A was not, Both were suspenseful, or Neither were suspenseful. After selecting their choice and hitting the Next button, they were shown another pairing. A
screenshot of the survey page is shown in Figure 6.
20
when comparing a generation system to the human author.
Hypothesis 4, that the IRIS templates would perform at
least as well as MINSTREL, was also supported. The IRIS
fragments performed much better than MINSTREL, and
MINSTREL did not perform significantly differently than
random. This could be due to the fact that MINSTREL requires some discourse level suspense that was purposively
minimized in the experiment.
For six of the 11 IRIS templates, Hypotheses 1 and 2 were
proven as well. The exceptions were templates 1, 2, 5, 6, and
11. These templates will be removed from the IRIS template
bank in future work. The most successful templates were
those where the audience had knowledge about the protagonist’s imminent plan failure and the protagonist did not. This
makes intuitive sense and matches the system’s definition of
suspense which involves anxiety and uncertainty around the
success of an outcome from the audience’s perspective.
These results show that it is possible to create localized,
story-action level moments of suspense in a story using a
small set of action templates comparable to a human author
selecting from the same set of story actions.
Table 2: p values for IRIS action templates versus the three
other creation processes in the follow-up experiment. A onetailed Chi-Squared test was performed versus the random
baseline, and a two-tailed Chi-Squared test was performed
versus the other two. A dash indicates that more than 1/3 of
the responses rated neither as suspenseful.
T1
T2
T3
T4
T5
T6
T7
T8
T9
T 10
T 11
Rand
0.001
0.003
0.35
0.34
0.002
0.001
0.13
0.04
-
MINSTREL
0.46
0.01
0.05
0.44
0.52
0.02
0.05
0.04
0.17
-
Human
0.06
0.0001
0.47
0.92
0.12
0.09
0.10
0.39
0.52
0.51
Future Work
With the suspense action templates having been experimentally verified, they will be incorporated into the IRIS narrative planning system. The action templates will be combined
with a cognitive model of the protagonist along with a model
of intention revision to produce a complete narrative planner.
The action templates also have value independent of the
IRIS system. It has been shown that suspense, comparable
to a human author selecting from the same set of actions,
can be created by following a small set of action templates.
These can be incorporated into a range of narrative generation systems, so long as the action templates can be built
into those systems. The action templates can also help guide
human authors themselves. Authors can use the templates as
building blocks for suspense action sequences which could
then be made even more suspenseful in different ways by the
addition of discourse level suspense. Alternatively, discourse
could augment another narrative phenomenon to create templates that are instead mysterious or surprising.
Figure 7: A visualization of the comparison between the
IRIS action templates and the three other creation processes
in the follow-up experiment. Red indicates the column was
rated higher, green that the row was rated higher, black that
they were not significantly different, all at the p = 0.05 level.
Gray indicates that more than 1/3 of the responses rated
neither as suspenseful. One comparison was right at this 1/3
but had a significant p value.
similar to before, except that the responses “Both were suspenseful” were discarded since we wanted to measure if one
performed higher, not just comparable.
In all cases, if more than 1/3 of the responses were “Neither were suspenseful,” then the above tests were not performed. This is a threshold we decided on to indicate that
the participants found both approaches to lack suspense.
One-tailed 2x2 Chi-Squared tests were performed to measure if MINSTREL and the human author performed significantly better than random. The test for MINSTREL failed to
reject the null hypothesis of performing comparably to the
random base line, (χ 2 (1) = 0.01, p = 0.45), but the test for
the human author rejects the null hypothesis of performing
comparably to the random, (χ 2 (1) = 2.33, p = 0.06).
Conclusion
The IRIS action templates were intended to create localized
moments of suspense at the story-action level in a narrative.
They were evaluated against several other suspenseful fragment creation processes. Participants read and made pairwise comparisons rating the suspense of these story fragments. It was found that human authors and most of the IRIS
action templates, particularly those involving a disparity of
knowledge between the audience and the protagonist, were
rated significantly more suspenseful than the random baseline. Additionally, the IRIS action templates were rated comparably to the human author. This validates the IRIS action
templates as a viable means of creating story-action level
suspense in narrative generation.
Follow-Up Experimental Discussion
Hypothesis 3, that the human author would perform better
than random, was supported. This is important because a
human author is often considered the ideal that story generation systems strive towards, and it needs to be shown that the
human author performed well enough to make useful claims
21
References
Cheong, Y., and Young, R. 2008. Narrative generation for
suspense: Modeling and evaluation. Interactive Storytelling
144–155.
Fendt, M. W. 2010. Dynamic social planning and intention
revision in generative story planning. In Proceedings of the
Fifth International Conference on the Foundations of Digital
Games, 254–255. ACM.
Gerrig, R. J., and Bernardo, A. B. 1994. Readers as problemsolvers in the experience of suspense. Poetics 22(6):459–
472.
Riedl, M. O., and Young, R. M. 2005. Open-world planning
for story generation. In Proceedings of the international
joint conference on artificial intelligence (ijcai), 1719–1720.
Rodell, M. F. 1943. Mystery fiction: theory and technique.
Duell, Sloan and Pearce.
Turner, S. 1994. The creative process: A computer model of
storytelling and creativity. Lawrence Erlbaum.
Vorderer, P. 1996. Toward a psychological theory of suspense. Suspense: Conceptualizations, theoretical analyses,
and empirical explorations 233:254.
Wulff, H. J. 1996. Suspense and the influence of cataphora
on viewers’ expectations. Mahwah, New Jersey: Lawrence
Erlbaum Associates.
Zillmann, D. 1996. The psychology of suspense in dramatic
exposition. Suspense: Conceptualizations, theoretical analyses, and empirical explorations 199–231.
22
Download