Proceedings Template - WORD

advertisement
The Effect of Audio-Visual Semantic Congruence on the
Believability of Visual Effects
Christian Jacobsen
Dann Sandgreen
Niels Christian Nilsson
Thomas D. T. Miksa
Aalborg University Copenhagen
08ml705@imi.aau.dk
ABSTRACT
The present experiment is designed to test the assumption that semantic congruence in cross-modal stimuli can function as
a variable for affecting the perceived believability of a visual effects sequence. More specifically, we argue that increasing
audio-visual semantic congruency will result in a corresponding increase in believability. In the experiment, participants
were asked to view a visual effects sequence and rate several elements of that sequence as being either real or not real.
Believability was assessed by having participants attach a confidence to their real, not real rating by expressing their
certainty on a six-step Likert scale. The visual effects sequence was presented with three different soundtracks, exhibiting
varying levels of semantic congruence with the visuals. The results were ultimately inconclusive, showing no meaningful
relationship across varying congruency levels, but we believe that the fault lies in the experiment design due to the
difficulty of testing ecologically valid complex stimuli. Consequently, possible refinements to the experiment design are
presented for future investigations into the subject.
Keywords
Cross-modal integration, audio-visual semantic congruency, ecological validity, measuring believability, visual effects,
soundscapes.
manipulating the mind to more readily accept CGI as being
believable. So the purpose of the experiment was to investigate
this assumption in a more ecologically valid context than
previously conducted experiments, by using more complex
stimuli.
1. INTRODUCTION
Previous research has shown that semantic congruence in crossmodal perception, specifically audio-visual stimulation, plays a
significant role in the integration of information, and has been
proven to influence behavioural performance significantly
(Olivetti Belardinelli, 2004 p. 167). The research specifically
shows a marked improvement in behavioural performance when
semantically congruent cross-modal stimuli is presented, with the
opposite occurring when it is incongruent (Laurienti, et al., 2004
p. 412). However, these results originate from experiments using
highly simplified stimuli sets, e.g. a blue disc presented with an
audio recording of the word blue, or pictures of animals with
accompanying environmental sounds. Respectively, the given task
was to indicate the colour of the disc and the type of animal, with
both scenarios showing performance enhancements when the
presented cross-modal stimuli was semantically congruent
(Reissner, 2008 pp. 24-25). These types of experiments can be
accused of lacking the richness of content that normal everyday
cross-modal stimuli-sets are comprised of (Laurienti, et al., 2004
p. 411).
Note that believability in this context should be understood not as
the suspension of disbelief generally characterising fictional
visual effects (Walton, 2004 p. 335), but rather as a term closely
related to realism. So at the positive extreme of the scale, defined
as the highest possible level of believability, the presented visual
effect stimulus is perceived as real.
Our motive for focusing exclusively on semantic congruency in
visual effects is that, when looking at film conventions,
experience accrued over the past century would seem to point to
the conclusion that semantic congruence is not necessarily a
prerequisite for visual effects being perceived as believable. Film
audio often only possess a moderate level of congruency with the
presented visuals, and yet they are most often able to convey a
very convincing illusion (Sonnenschein, 2001 pp. 190-195).
The main reason for this phenomenon could be explained by the
ability of the human nervous system to bind multi-sensory
percepts in close spatial and temporal context into a coherent
whole (Laurienti, et al., 2004 p. 405). For example, the visual
stimulus of a car is instantly bound to the simultaneous auditory
stimuli of an engine running, creating a coherent integrated
impression of a car with the ignition turned on. However, this
impression is created on the basis of associations made throughout
a person’s lifetime, and is therefore naturally governed by a
This level of richness is often what creators of visual effects or
computer generated imagery (CGI), in both film and
documentaries, strive for in order to assure a level of realism that
will have the audience judge what they see as convincing in terms
of believability. The basic premise for the present experiment is
an assumption that the underlying mechanism causing the
enhanced behavioural performance present with semantically
congruent cross-modal stimuli, can have the transferred effect of
1
contextual or semantic inference based on the person’s
accumulated experience (Laurienti, et al., 2004 p. 406).
Consequently, the believability of visual effects must inevitably
be governed by semantic content, as well as spatial and temporal
parameters.
form of binary evaluation was insufficient since the present
experiment sought to investigate the degree to which the variable
congruency components were perceived as real.
In order to make up for this deficiency, the binary rating was
coupled with a certainty score making the overall measurement of
believability gradable. An added benefit was that this form of
believability assessment, contrary to the binary rating, would
produce analysable results even if all ratings turned out to be not
real.
Based on the presented information, our assumption was that
audio-visual semantic congruency could work as a variable for
affecting the believability of visual effects, and the experiment
presented in this paper was designed to investigate if this claim
would hold in practice. This assumption is expressed in the
hypothesis presented in the following section.
2.1 Participants
The participants partaking in the experiment comprised 60 adult
volunteers (mean age 25 years, 43 males, and 17 females) with
self-reported normal hearing and normal or corrected-to-normal
vision. The participants were selected by addressing arbitrary
people at the Copenhagen University College of Engineering and
Aalborg University Copenhagen, making the employed sampling
technique classifiable as convenience sampling (Trochim, 2006).
All the participants were consequently Medialogy undergraduate
students, engineering students, faculty, or other employees at
these two educational institutions. When assigning the participants
to the three experimental conditions, the aspiration was to assign
an equal number of Medialogy students, engineering students,
faculty and other employees to each of the three groups. This was
believed to reduce the possible bias induced by the difference in
theoretical and practical knowledge of visual effects
accompanying the different educational backgrounds. All
participants gave written informed consent and were offered a
beverage for participating.
1.1 Hypothesis
By presenting a visual effects sequence with increasingly
semantically congruent soundtracks, a corresponding increase in
perceived believability of the visual effects will be present and
measurable.
2. EXPERIMENT DESIGN
The experiment follows the overall paradigm of independent
group design. One visual effects sequence and three different
soundtracks were specifically designed for the experiment which
was carried out spanning three similar groups of individuals.
The operational definition of the independent variable audiovisual semantic congruence does, in principle, only necessitate
two levels of congruency within the context of the present
experiment. More specifically, a comparison of the effects of high
and low congruency should suffice when attempting to determine
if the believability of visual effects changes as the result of
difference in congruency. However, an independent variable with
only two levels does not provide much information about the
relationship between it and the dependent variable (Cozby, 1997
p. 147). Simply using a high and low level of congruency would
for instance provide little and possibly even faulty information
about how a soundtrack conforming to the aforementioned film
conventions would influence believability. Hence, the
independent variable audio-visual semantic congruency was
designed to have three levels. As the experiment revolved around
how an increasingly congruent soundtrack would influence the
believability of the visual effects, the stimuli defining the level of
audio-visual semantic congruence necessarily had to be the audio.
2.2 Stimuli
Two categories of stimuli had to be designed for the experiment;
namely visual and auditory stimuli. As outlined in the first
paragraph of this section, the auditory stimuli constituted the
independent variable while the visual stimuli comprised the
dependent variable of the experiment. More accurately, the
dependent variable was the subjective believability of the visual
stimuli; the end result being that a single visual sequence was
needed while three different auditory stimuli sets were required.
The following two subsections outline the functional and aesthetic
choices made in designing these different sets of stimuli.
2.2.1 Visual stimuli
The dependent variable believability can be operationally defined
as the degree to which the individual computer generated
elements of a visual effects sequence are mistakable for their real
world correlates. Here the individuality of the visual elements is
stressed; since the believability of each element may change as a
result of variations in the unique auditory components associated
with them. Visual elements with varying audio across the three
levels of semantic congruence will throughout the following be
referred to as variable congruency components.
Three primary requirements had to be satisfied in creating the
visual stimuli. First; it should contain four congruency
components for the audio to work with, i.e. components that have
a variable semantic congruency connection to a corresponding
auditory element. Second; the computer generated elements would
have to exhibit a high level of realism to be useful in this
experimental context; in other words, they should to some extent
be believable as real objects. And third; in order to be perceived
as believable, the content should also be relatable to personal
experience for the intended audience to judge properly. The
thematic choice for the final sequence was the concept of a
computer generated thunderstorm.
Whereas believability within some contexts may be scalable it is
less certain whether perceived realism is gradable or not. So far,
no evidence suggests that we are able to distinguish between more
than two grades of real (Rademacher, 2002 p. 25). This arguably
implies that we possess a single internal threshold determining
whether we consider a given percept to be real or not
(Rademacher, 2002 pp. 81-82). The present experiment adopts
this stance entailing that the believability of the individual
computer generated elements first and foremost is defined by
whether they are regarded as either real or not real. However, this
The primary reason for choosing the theme of a thunderstorm lay
in the animations. With the thunderstorm theme we were able to
create a dynamic sequence with large amounts of moving
elements present. It also enabled us to rely on physics based
systems, e.g. wind and gravity, and consequently avoid any
animation problems associated with the Uncanny Valley; where
2
suboptimal animations can lead to uncanny movement that would
certainly compromise the believability factor (Bartneck, et al.,
2007 p. 368).
to not make the tested elements stand out. These added elements
consists of the grass, the mountains, the background trees as well
as the foreground branch occasionally visible in the upper left
corner of the frame. An equally important reason for adding these
experimentally extraneous elements, were that a certain level of
complexity was needed in the scene to reach a desirable level of
ecological validity. To elaborate, a final sequence with low
consistency in the form of visibly missing elements would not
provide the desired ecological validity or stimuli complexity.
The final visual effect sequence is set in a location resembling the
African savannah, a thematic choice based solely on aesthetic
considerations, with a view over a great grass field, a tree in the
middleground and flashing thunderclouds raging above. Half way
through the sequence, the tree is struck by lightning and breaks
apart at the lower forked part of the trunk and is subsequently
engulfed in flames. The four visual elements chosen as variable
congruency components were; the thunderclaps, the lightning
strike, the tree breaking, and the fire, all elements deemed to have
a high level of semantic familiarity. In other words, both the
auditory and visual stimuli associated with the four chosen
components, as well as the connection between them, should be
highly familiar to all but a few test subjects, even when doing
convenience based sampling.
Additional information concerning the visual stimuli can be found
in the exam enclosure under visual design. The three video
sequences used for the experiment can be found on the appended
CD in the folder experiment stimuli.
2.2.2 Auditory stimuli
An important feature of the auditory stimuli was that it had to
compliment the visual stimuli in creating the sense of being
situated on the African savannah. Creating a sonic impression of
being in a specific location is a common practice in the film
industry and referred to as a soundscape; defined by film sound
theorist Rick Altman as “the characteristic types of sound
commonly heard in a given period or location” (Altman, 1992 p.
252).
Initially, a choice had to be made between creating a purely
virtual sequence with no real objects present, or a composite shot
containing live action footage as well as computer generated
elements. The purely virtual sequence was the better choice for
two main reasons; one, we did not have easy access to a location
that corresponded visually to the chosen theme of the African
savannah; and two, the presence of real footage alongside the
computer generated elements could lead to quality requirements
of the virtual elements that we might not be able to satisfy.
Consequently, the final sequence consists solely of virtual
computer generated elements. The reason why this was an
acceptable compromise was because the elements ultimately need
not be rated as real to provide usable test results, an argument that
was established earlier in this section. Figure 1 shows a screenshot
from the final visual effects sequence.
The characteristic sounds for the African savannah were identified
as cicada shrill, wind, trees and vegetation swaying in the wind;
all referenced from a savannah documentary (BBC, 2005).
Additional auditory elements for the thunderstorm were identified
as thunderclaps, lightning strikes, rustling leaves, tree cracking,
falling, burning and fire.
Establishing a soundscape for a savannah thunderstorm delimited
what the three soundtracks should contain, and following this
delimitation the semantic components needed to be varied through
three levels of congruency; high, medium and low. This meant
one level for each soundtrack linked to the four variable
congruency components; the thunderclaps, the lightning strike, the
tree breaking, and the fire.
The auditory elements in the soundtracks were mixed as if
captured with a microphone mounted on a camera. Additionally,
the function of the three soundtracks is, unlike those common in
film, not designed to guide the viewer’s attention to specific
narrative elements in the sequence (Sonnenschein, 2001 pp. 195198). Instead, they function to create believable soundscapes that
place the viewer on the savannah during a thunderstorm.
Since all the visual components in the sequence are computer
generated they obviously have no inherent capability of
generating any sonic characteristics. Their audio elements could
therefore never be truly congruent, thus the reason to denote the
congruency level high as opposed to real. In this regard, the
semantic criterion for the high congruency soundtrack was that its
audio elements were recordings of authentic events correlating as
close as possible to the events as they would have occurred in the
real world. For example, when a part of the tree trunk breaks, the
sound of actual wood breaking should be heard in order to
facilitate a high semantic audio-visual congruency.
Figure 1 - Screenshot from the final visual effects sequence.
With the intention of obtaining believable camera motion, and at
the same time acquire real world scenery to build the virtual scene
on, the sequence was based on live action footage, consequently
meaning that while all elements in the final sequence are
computer generated, the camera movements are transferred
directly from the live action footage. Since there are no real
objects present in the sequence, several elements, aside from the
four variable congruency components, had to be added in order to
create a consistent holistic impression of the sequence, as well as
The technique used for the medium congruency level is similar to
what is commonly denoted as Foley effects, a term known from
the film industry. A Foley effect is a recreation of an audio event
on a soundtrack which, although meant to support the
believability of a scene (Wyat, et al., 2007 pp. 166-167), is
3
produced by objects that sound similar but not necessarily directly
correspond to the visuals presented (Singer, 2008).
The participants were asked to evaluate whether the individual
elements were real or not real. The option, don’t remember, was
also made available in case the participants did not remember
seeing certain elements in the sequence. For each realism rating
the participants additionally had to specify how certain they were
on a six point Likert scale; where one signified, very uncertain
and six very certain. The order in which the different elements
were presented on paper was pseudo randomized. More
specifically, three different types of questionnaires containing the
same items in three different orders were randomly assigned to the
participants. This ensured that all items on the questionnaire
appeared in the first, second and last third of the questionnaire an
equal number of times, thus reducing the risk of order effects
(Cozby, 1997 p. 118).
As a result, the criterion for the medium congruency soundtrack
was that the audio elements should consist of sonic characteristics
similar to those used in the high congruency one, but instead
consists of recorded audio comprising a wide semantic gap to the
viewed events. For example, the audible thunderclaps were not
actual thunderclaps, but instead generated through recordings of
rockslides containing similar frequency characteristics and
amplitude envelopes, with applied post-processing effects
emphasizing audio characteristics similar to those of real
thunderclaps.
Lastly, the semantic criterion for the low congruency soundtrack
was that the audio elements should only correlate to the same
basic type of audio events occurring in the sequence, e.g. the
burning branch hitting the ground is accompanied by an impact
sound. Consequently, the semantic components carried a lowered
congruency in using audio events with material properties
conflicting with the properties in the sequence. For example, by
linking the audio event; branch hitting ground, with a brick hitting
a metal surface instead of the more congruent wood and leaves
hitting dirt.
A reference image labelled with the names of the elements,
intended to aid the participants in identifying elements, was
provided after they had watched the sequence. The quality of the
image was, however, greatly reduced making it impossible to use
it as a reference when evaluating the individual elements. The
participants were additionally informed that the sole purpose of
the image was to help them distinguish between the different
items in the questionnaire.
Additional information concerning the auditory stimuli can be
found in the exam enclosure under audio design. A soundtrack
comparison video can be found on the appended CD in the folder
experiment stimuli.
All material pertaining to the experimental procedure can be
found on the appended CD in the folder experiment design.
2.4 Data Analysis
The nine items on the questionnaire, each corresponding to an
element in the sequence, were analyzed independently of one
another. The three possible answers - real, not real and don’t
remember - associated with the realism rating were treated as
nominal data yielding a Chi-square test for statistical significance
(Cozby, 1997 p. 214). These results will be referred to as nominal
ratings throughout the rest of the paper. The level of certainty
obtained from the Likert scales were treated as interval data and
one-way analysis of variance (ANOVA) was used to test for
significant difference between group means (Cozby, 1997 p. 215).
As the ANOVA only account for the overall effect of the
independent variable, t-tests were employed to investigate the
differences between the individual congruency level groups
(Tullis, et al., 2008 p. 31). All statistical tests were performed with
a significance level α = 0,05 entailing that p < α would indicate a
statistical difference.
2.3 Procedure
The objective of the experiment was to measure how the
participants passively assessed the believability of the sequence,
since this resembled the form of assessment associated with
watching a film (Rademacher, 2002 p. 27). The information
supplied prior to the participants watching the sequence did for
this reason not specify what they were to evaluate, but simply that
they were to evaluate the sequence after watching it. An
independent groups design was employed for the experiment,
entailing that each participant only experienced one of the three
levels of audio-visual semantic congruence. Each group
comprised 20 participants.
The visual stimuli was displayed in its original resolution
(720x576) on a 14,1" WXGA+ screen with a resolution of
1440x900, using Monacor MD-4300 headphones to present the
auditory stimuli. Volume, brightness and contrast levels were
adjusted in advance and were identical for all participants.
Assuming that the results were statistically significant, the
hypothesis would be supported if; an increase in the level of
audio-visual semantic congruence would result in more people
rating the variable congruency components as real. A similar
pattern should emerge in relation to the level of certainty
accompanying the real ratings. Here the group averages should
increase as the participants would become increasingly certain
that these elements are in fact real. The participants who rated
certain elements in the sequence as being not real should
conversely become increasingly doubtful as the level of audiovisual semantic congruence increased.
After experiencing the sequence, participants were asked to
account for their initial impression of the individual elements in
the sequence by filling out a questionnaire. The questionnaire
comprised the following nine items out of which four was related
to the variable congruency components (written in italic): The
grass, the foreground branch, the lightning strike, the background
trees, the mountains, the clouds, the middleground tree, the fire
and the rain. Please note that the clouds-item in the questionnaire
comprise both the clouds and the associated thunder claps. The
five visual elements without a varying semantic component were
included for two reasons. First; to mask the significance of the
variable congruency components and secondly; to investigate if
the changes in audio-visual semantic congruence also would
influence how the non-varying congruency elements were
perceived.
3. RESULTS
This section first and foremost outlines the results related to the
four variable congruency components as these had direct
relevance for testing the hypothesis. For the nominal data,
frequency distributions are reported, while for the certainty ratings
mean distributions are reported including the standard deviation.
4
An Excel sheet containing the complete dataset, the associated
graphs and significance tests can be found on the appended CD in
the folder experiment design.
Fire - Rating frequency per group
Real
3.1 Nominal Ratings
Frequency of ratings
6
4
Low
datasets generally had a slightly larger number of don’t remember
ratings distributed across the three levels of congruence than the
four variable congruency components.
Neither of the calculated p-values for the four variable
congruency components, nor the remaining visual elements, were
lower than the selected significance level α = 0.05. This implies
that none of the frequencies of ratings for the three congruency
levels of each dataset are statistically different from one another
(Nelson, 1999 p. 91).
3.2 Mean Certainty Levels
The mean levels of certainty associated with the real ratings of the
middleground tree were 4.0±0.8 for low congruence, 4.5±1.4 for
medium congruence, and 4.6±0.7 for high congruence. The
corresponding mean levels of certainty for the not real ratings
were 4.4±0.9 for low congruence, 4.1±1.4 for medium
congruence, and 4.8±1.1 for high congruence.
For the participants who rated the lightning strike as real, the
mean levels of certainty were 3.9±0.5 for low congruence, 3.6±0.9
for medium congruence, and 5.2±0.8 for high congruence. For the
not real ratings the averages were 4.5±0.7 for low congruence,
4.3±1.2 for medium congruence, and 5.0±1.0 for high congruence.
The mean levels of certainty related to the real ratings of the fire
were 3.8±1.1 for low congruence, 3.9±1.5 for medium
Clouds - Average certainty per group
Clouds - Average certainty per group
Real ratings
Not real ratings
6
12
5
5
8
6
4
4
3
2
1
2
0
Level of certainty
6
Level of certainty
14
10
Low
Medium
High
Level of audio-visual semantic congruence
Figure 3 - The frequency distribution of the nominal
data for the clouds.
0
High
Figure 2 - Frequency distribution of the nominal data
for the fire.
The remaining five datasets in the sequence showed little or no
indication of any significant patterns in the distribution of
frequencies across groups. It is additionally notable that these five
Don't remember
Medium
Level of audio-visual semantic congruence
The nominal ratings related to the clouds showed more irregular
variation in the frequencies across the three groups. At the lowest
level of audio-visual semantic congruence, 14 of 20 rated the
clouds real while 6 rated them not real. Out of the 20 people who
experienced the medium level of congruency, 6 rated the clouds
real while 13 rated them not real. Finally, the clouds accompanied
by highly congruent audio were perceived as real by 10
participants and not real by 6 participants. Note that 4 out of the
20 participants experiencing this soundtrack did not remember the
clouds. The bar chart in figure 3 illustrates the frequency
distribution of the nominal ratings for the clouds.
Frequency of ratings
8
0
The number of participants who rated the fire as real increased
together with the level of audio-visual semantic congruence. More
specifically, the frequency of real ratings was 6 for low
congruence, 7 for medium congruence, and 9 for high congruence.
Conversely, the frequency of not real ratings decreased in that 14
rated the fire not real for low congruence, 12 for medium
congruence, and 10 for high congruence. The bar chart displayed
in figure 2 illustrates how these nominal ratings were distributed
across the three groups.
Clouds - Rating frequency per group
10
2
A somewhat similar distribution of nominal ratings was apparent
from data related to the lightning strike. Here the number of real
and not real ratings for low and high congruence was identical.
More specifically, 11 of 20 participants rated the lightning strike
real while 8 rated it not real. The lightning strike in the sequence
with a medium level of congruency was rated both real and not
real by 10 of the 20 participants.
Not real
Don't remember
12
The number of people who rated the middleground tree as real
and not real was close to identical for all three levels of audiovisual semantic congruence. The biggest difference being that 10
out of 20 rated the tree as not real for medium congruence, while
8 rated the high and low congruence not real. The number of real
ratings was 9 for low congruence and 10 for both medium and
high congruence.
Real
Not real
14
4
3
2
1
Low
Medium
High
Congruency groups
Figure 4 - Mean certainty levels for the clouds rated
real. Error bars indicate ± standard deviation from
the mean.
5
0
Low
Medium
High
Congruency groups
Figure 5 - Mean certainty levels for the clouds rated
not real. Error bars indicate ± standard deviation
from the mean.
congruence, and 4.8±0.7 for high congruence. The corresponding
mean values for the not real ratings were 4.8±1.1 for low
congruence, 5.0±0.6 for medium congruence, and 4.8±1.5 for high
congruence.
Within the category of stimuli design, one possible mitigating
factor could be the quality of the visual stimuli. Qualitative
observations during testing combined with the number of real
ratings would seem to indicate that this was not a factor; however,
there is insufficient data to rule it out completely. Another
possible factor was the complexity of the visual stimuli. The
number of extraneous elements not directly related to the
experiment could have significantly affected test participants
subjective assessment of the variable congruency components.
This visual density of the sequence coupled with the fact that the
soundtracks were not intentionally designed to focus the attention
of the participants, prevented us from being able to accurately
predict what elements the participants would be aware of;
something that is strongly highlighted by the arbitrary distribution
of the don’t remember ratings. Additionally, the semantic
difference between the three congruency levels could have been
too small to properly differentiate the three soundtracks. This
issue would not be readily apparent in the test data with such a
relatively small sample size, and with no qualitative data to
dismiss the claim we have to acknowledge it as a possible source
of error.
Finally, the mean certainty levels associated with the clouds rated
as real were 3.9±1.0 for low congruence, 4.5±0.8 for medium
congruence, and 4.8±0.7 for high congruence. The mean
certainties for the not real ratings were oppositely 4.5±0.8 for low
congruence, 4.2±1.4 for medium congruence, and 4.1±0.9 for high
congruence. Figure 4 and figure 5 illustrate the mean certainty of
the real and not real ratings for the clouds.
The mean certainty levels connected with the real and not real
ratings of the remaining five elements in the sequence roughly
correspond to the data sets reported for the middleground tree, the
lightning strike, and the fire.
Neither of the ANOVAs comparing the individual sets of
certainty levels for each of the elements rated real revealed any
significant difference. This was also the case for the comparison
of certainty levels accompanying the not real ratings since all pvalues were higher than the selected significance level α = 0.05.
With one exception, all t-tests showed no significant differences
between the three groups for neither real nor not real ratings. The
t-test used to compare the high and low congruence certainty for
the clouds which were rated as real, indicated a significant
difference in that p=0.03.
In the category of test design, the overriding flaw that is likely to
have affected participant response was the phrasing of the real,
not real rating. Qualitative observations pointed to some
participants interpreting the real ratings as appearing real. This
tendency could likely have been offset by phrasing the question
differently, e.g. real-world object versus computer generated
object. More specifically, this would have established a
framework for the term real to operate within (Rademacher, 2002
p. 3). Questions regarding elements that did not have a variable
semantic connection were included to mask the significance of the
variable congruency components, but also to investigate whether
the varying congruency levels would affect the subjective ratings
of said elements. This decision could further have exacerbated the
issues created by the visual density of the sequence. Together,
these two factors are likely to have saturated test participants
memory capacity and consequently affected their assessment of
the variable congruency components, all the while adding directly
to the number of don’t remember ratings. This saturation effect
could be explained by the limited short term memory capacity of
the test participants, common to all human beings, leading to
capacity overflow. If the participants’ attention was not directed at
the variable congruency components, this capacity overflow could
have prevented them from storing relevant information in either
working or long-term memory, subsequently preventing them
from accurately assessing the nominal ratings of the four
congruency components (Sutcliffe, 2003 p. 38). Conversely, the
four variable congruency components arguably have a moderate
to high attentional salience by virtue of the movement, vivid
colours, or contrast to the remaining elements of the sequence.
This attentional salience could have counteracted the saturation
issue (Sutcliffe, 2003 p. 29).
4. DISCUSSION
Ultimately, the experimental results are inconclusive and
statistically insignificant across the board. With that said, there
were some individual data sets that showed somewhat weak, but
interesting tendencies. The nominal rating of the fire showed
indication of a concomitant relationship between an increasing
level of congruency and the number of ratings marked as real.
This relationship is in concordance with the ideal test data
established in the section on data analysis, and could be
interpreted as being in support of the hypothesis. The level of
certainty tied to the nominal rating shows similar indications,
albeit less clear, with the certainty of the real ratings showing the
same tendency as the nominal ratings. However, the significance
of this is diminished by the fact that the certainty of the not real
ratings does not show an inverse tendency, i.e. certainty does not
decrease with an increasing level of congruency.
Conversely, both certainty levels pertaining to the clouds show the
desired tendency; with certainty of the real ratings rising in
concordance with the congruency level, and the not real certainty
having the inverse inclination. However, the significance of this
data set is diminished by the fact that the nominal rating of the
clouds shows no meaningful tendencies across the congruency
levels. The remaining components with variable congruency
levels, i.e. the lightning strike and the tree breaking, showed no
relationships between certainty ratings and no meaningful
tendencies in the nominal ratings.
General care was taken not to prime the test participants, however,
this could also constitute a flaw in the design since directed
attention is governed both by perceptive stimuli as well as
background knowledge (Sutcliffe, 2003 p. 29). With no priming,
it could have made the assessment for the test participants more
difficult than need be. Finally, since the test was performed across
independent groups it does not account for subjectivity of the test
participants, and the sample size was not large enough to
counteract this factor. Consequently, the variations in the test data
The inclination so far based on the interpretations of the data
would lean towards a rejection of the hypothesis. This verdict is
likely unfounded as there are several mitigating factors that argue
against both rejection and acceptance of the hypothesis. The
mitigating factors can roughly be divided into two categories; the
design of the stimuli presented to the test participants and the
design of the test procedure.
6
could just be random fluctuations completely unrelated to the
congruency levels, something that is supported by the lack of
statistical significance across the board.
Cozby Poul C. Methods in Behavioural Research [Book]. - Mountain
View : Mayfield Publishing Company, 1997. - 6th Edition. - ISBN: 155934-659-0.
A more general mitigating factor could be the general dominance
of the visual modality (Sinnett, et al., 2007 s. 673) acting as a
natural inhibitor on the effect of the soundtracks. Additionally, it
could be argued that people have a high tolerance for moderately
incongruent cross-modal stimuli due to the prevalence of said
incongruence in films.
Laurienti Paul J. [et al.] Semantic congruence is a critical factor in
multisensory behavioral performance [Journal] // Experimental Brain
Research. - [s.l.] : Springer, 2004. - pp. 405 - 414.
Nelson Stephen L. Mba's Guide to Microsoft Excel 2000: The Essential
Excel Reference for Business Professionals [Book]. - [s.l.] : Redmond
Technology Press, 1999. - ISBN: 978-0967298108.
Olivetti Belardinelli M. and Sestieri, C. and Matteo, R. and Delogu, F.
and Gratta, C. and Ferretti, A. and Caulo, M. and Tartaro, A. and
Romani, G. Audio-visual crossmodal interactions in environmental
perception: an fMRI investigation [Journal] // Cognitive Processing. [s.l.] : Springer, 2004. - pp. 167 - 174.
The common denominator to most of these mitigating factors is
that they are occurring as a result of the attempt at making the
experiment ecologically valid. Upon reflection it would appear
that a more appropriate method would be to test a visual effects
sequence containing only one varying congruency component. A
certain complexity would still be needed to make the experiment
ecologically valid, but great care should be taken to ensure that all
extraneous elements remain as non-intrusive as possible.
Furthermore, the considerations concerning the test design should
also be adhered to and consequently the only real, not real
question posed, should pertain to the single variable congruency
component present in the sequence. Adopting such a
methodology, we feel, would procure more lucid and cogent
results.
Rademacher Pablo M. Measuring the Perceived Visual Realism of
Images [Report] / Department of Computer Science ; University of North
Carolina. - Chapel Hill : [s.n.], 2002. - Ph.D thesis.
Reissner T. M. Hearing this while seeing that: Semantic congruence
affects processing of audiovisual stimuli [Report]. - Braunschweig : Der
Technischen Universität Carolo-Wilhelmina zu Braunschweig, 2008.
Singer Philip Rodrigues How It's Done... [Online] // Art of Foley. - 13
December 2008. - 13 December 2008. - http://www.marblehead.net/foley/.
Sinnett Scott, Spence Charles and Soto-Faraco Salvador Visual
dominance and attention: The Colavita effect revisited [Journal] //
Perception & Psychophysics. - [s.l.] : Psychonomic Society Publications,
2007. - Number 5 : Vol. 69. - pp. 673-686.
5. CONCLUSION
Inconclusive test results notwithstanding, we still feel confident
that there is a connection present between perceived believability
and audio-visual semantic congruency. Unfortunately, we were
not able to measure this connection with the present experiment
design. By applying the changes to the test design outlined in the
discussion and designing a simpler, but still ecologically valid,
visual effects sequence with a comparatively simple soundtrack;
the claimed semantic congruency connection should be
measurable.
Sonnenschein David Sound Design - The Expressive Power of Music,
Voice, and Sound Effects in Cinema [Book]. - Studio City : [s.n.], 2001.
Sutcliffe Alastair Cognitive Psychology for Multimedia Information
Processing [Book Section] // Multimedia and Virtual Reality: Designing
Multisensory User Interfaces. - London : LEA publishers, 2003.
Trochim William M.K. Nonprobability Sampling [Online] // Research
Methods Knowledge Base. - 20 October 2006. - 16 December 2008. http://www.socialresearchmethods.net/kb/sampnon.php.
Tullis Thomas and Albert William Measuring the User Experience:
Collecting, Analyzing, and Presenting Usability Metrics [Book]. Burlington : Morgan Kaufmann Publishers, 2008. - ISBN: 9780123735584.
6. ACKNOWLEDGMENTS
We would like to thank Luis E. Bruni for constructive
conversations on semantic congruency, Camilla Hägg for help
with retrieving the raw camera footage, Gry Poulsen for advice
regarding statistics, and Daniel and Martin Miksa for helping with
audio field recording.
Walton K.L. Fearing Fictions [Book Section] // Aesthetics and the
Philosophy of Art: The Analytic Tradition - An Anthology / book auth.
Lamarque Peter and Olsen Stein Haugom.. - [s.l.] : Blackwell Publishing,
2004.
Wyat Hilary and Amyes Tim Audio Post Production for Television and
Film: An introduction to technology and techniques [Book] = Audio Post
Production for Television and Film. - Oxford : Focal Press, 2007. - 3rd
Edition.
7. REFERENCES
Altman Rick Sound Theory / Sound Practice [Book]. - New York :
Routledge, 1992.
Bartneck Christoph [et al.] Is The Uncanny Valley An Uncanny Cliff?
[Conference] // 16th IEEE International Conference on Robot & Human
Interactive Communication. - Jeju, Korea : IEEE, 2007. - pp. 368-373.
BBC BBC Wild Africa [DVD]. - BBC, 2005. - Episode 2 - Savannah;
00:02:10; 00:35:17.
7
Download