Here - Nanyang Technological University

advertisement
Prroceed
din
ngs
Sh
hortt Paapeers
ISBN: 978-981-09
9
-4946-4
Preface
This year, CASA has received 77 papers. 27 papers have been selected and published in a
special issue of the Computer Animation and Virtual Worlds Journal published by Wiley
(CAVW). These Proceedings contain the 13 short papers that have been selected for
publication with the ISBN number 978-981-09-4946-4. In total, 40 papers have been
presented locally in the conference.
CASA was founded by the Computer Graphics Society in 1988 in Geneva and is the oldest
Conference on Computer Animation in the world. It has been held in various countries
around the world and in recent years, in 2013 in Istanbul, Turkey, in 2014 in Houston, USA,
and this year in Singapore.
This year, the conference has started the first day with two workshops: the 3D Telepresence
Workshop and the Industrial Workshop on 3D Graphics & Virtual Reality. Workshops are
followed by two days of presentations of the 27 full papers and a dozen of short papers.
Addressing this year’s conference are four invited speakers: 1) Mark Sagar, from the
University of Auckland, New Zealand, 2) Dieter Fellner, from Fraunhofer Institute,
Germany, 3) Cai Yiyu, from Nanyang Technological University and 4) Zheng Jianming,
from the same University. A panel on “Sharing life with Social Robots and Virtual Humans,
is it our future ?” was also held on May 13.
The Program CoChairs
Daniel Thalmann, NTU, Singapore & EPFL, Switzerland
Jian Jun Zhang, Bournemouth University, UK
COMMITTEES
Conference Chairs
Dieter Fellner
Nadia Magnenat Thalmann
Junfeng Yao
Fraunhofer Institute, Darmstadt, Germany
NTU, Singapore & MIRALab, University of Geneva,
Switzerland
Xiamen University, China
Program Chairs
Jian Jun Zhang
Daniel Thalmann
Bournemouth University, UK
NTU, Singapore & EPFL, Switzerland
Local arrangement Committee
Qi Cao
Poh Yian Lim
Yaminn Khin
NTU, Singapore
NTU, Singapore
NTU, Singapore
International Program Committee
Norman Badler
Selim Balcisoy
Jan Bender
Ronan Boulic
Yiyu Cai
Marc Cavazza
Bing-Yu Chen
Guoning Chen
Yiorgos Chrysanthou
Frederic Cordier
Justin Dauwels
Etienne de Sevin
Zhigang Deng
Fabian Di Fiore
Arjan Egges
Abdennour El Rhalibi
Petros Faloutsos
Ugur Gudukbay
Xiaohu Guo
Mario Gutierrez
James Hahn
Ying He
Zhiyong Huang
Veysi Isler
Jean-Pierre Jessel
Xiaogang Jin
Sophie Joerg
Chris Joslin
University of Pennsylvania, US
Sabanci University, Turkey
TU Darmstadt, Germany
EPFL, Switzerland
NTU, Singapore
Teesside University, UK
National Taiwan University, Taiwan
University of Houston, US
University of Cyprus, Cyprus
UHA, France
NTU, Singapore
MASA Group R&D, France
University of Houston, US
Hasselt University, Belgium
Utrecht University, Netherlands
Liverpool John Moores University, UK
York University, UK
Bilkent University, Turkey
The University of Texas at Dallas, US
OZWE SàRL, Switzerland
George Washington University, US
NTU, Singapore
A*STAR, Singapore
METU, Turkey
Paul Sabatier University, Turkey
Zhejiang University, China
Clemson University, US
Carleton University, Canada
Prem Kalra
Mustafa Kasap
Andy Khong
Scott King
Taku Komura
Caroline Larboulette
Rynson Lau
Binh Le
Wonsook Lee
J.P. Lewis
Tsai-Yen Li
Xin Li
Hao Li
Ming Lin
Wan-Chun Ma
Anderson Maciel
Nadia Magnenat Thalmann
Dinesh Manocha
Franck Multon
Soraia Musse
Rahul Narain
Luciana Nedel
Junyong Noh
Veronica Orvalho
Igor Pandzic
George Papagiannakis
Laura Papaleo
Nuria Pelechano
Christopher Peters
Julien Pettre
Pierre Poulin
Nicolas Pronost
Taehyun Rhee
Isaac Rudomin
Jun Saito
Yann Savoye
Hubert Shum
Matthias Teschner
Daniel Thalmann
Xin Tong
Jun Wang
Jack Wang
Enhua Wu
Junsong Yuan
Cem Yuksel
Zerrin Yumak
Jian Zhang
Jianmin Zheng
IIT Delhi, India
Microsoft Turkey, Turkey
NTU, Singapore
Texas A&M University - Corpus Christi, US
University of Edinburgh, UK
University of South Brittany
City University of Hong Kong, Hong Kong
University of Houston, US
University of Ottawa, Canada
Victoria University, Australia
National Chengchi University, Taiwan
Louisiana State University, US
University of Southern California, US
UNC Chapel Hill, US
Weta Digital, New Zealand
UUFRGS, Brazil
NTU, Singapore and MIRALab, University of Geneva,
Switzerland
UNC Chapel Hill, US
University Rennes 2, France
PUCRS, Brazil
University of California, Berkeley, US
UFRGS, Brazil
KAIST, South Korea
FCUP, Portugal
FER, Croatia
University of Crete, Greece
University of Paris-SUD, France
UPC, Spain
KTH Royal Institute of Technology, Sweden
INRIA, France
Universite de Montreal, Canada
Université Claude Bernard Lyon 1, France
Victoria University of Wellington, New Zealand
BSC, Spain
Marza Animation Planet, Tokyo
University of Innsbruck, Austria
Northumbria University, UK
University of Freiburg, Germany
NTU,Singapore and EPFL, Switzerland
MSRA, China
Nanjing University of Aeronautics and Astronautics, China
University of Hong Kong, Hong Kong
University of Macau, China
NTU, Singapore
University of Utah, US
NTU, Singapore
Bournemouth University, UK
NTU, Singapore
Contents
1.
Data-Driven Model for Spontaneous Smiles
Laura Trutoiu, Nancy Pollard, Jeffrey Cohn and Jessica Hodgins
...................
1
2.
Expectancy Violations Related to a Virtual Human’s Joint Gaze ...................
Behavior in Real-Virtual Human Interactions
Kangsoo Kim, Arjun Nagendran, Jeremy Bailenson and Greg Welch
5
3.
Segmentation-Based Graph Representation for 3D Animations
Guoliang Luo, Quanhua Tang and Yihan Liu
...................
9
4.
Motion Puppetry Taking Skeletal Similarity into Account
Norihide Kaneko, Reo Takahashi and Issei Fujishiro
...................
13
5.
Real-Time Marker-Less Implicit Behavior Tracking for User ...................
Profiling in a TV Context
Francois Rocca, Pierre-Henri De Deken, Fabien Grisard, Matei Mancas
and Bernard Gosselin
17
6.
Robust Space-Time Footsteps for Agent-Based Steering
Glen Berseth, Mubbasir Kapadia and Petros Faloutsos
...................
21
7.
Avatar Chat: A Prototype of a Multi-Channel Pseudo Real-Time ...................
Communication System
Kei Tanaka, Dai Hasegawa, Martin J. Dürst and Hiroshi Sakuta
25
8.
On Streams and Incentives: A Synthesis of Individual and ...................
Collective Crowd Motion
Arthur van Goethem, Norman Jaklin, Atlas Cook Iv and Roland
Geraerts
29
9.
Constrained Texture Mapping via Voronoi Diagram Base Domain
Peng Cheng, Chunyan Miao and Nadia Thalmann
...................
33
10.
Hybrid Modeling of Multi-Physical Processes for Volcano ...................
Animation
Fanlong Kong, Changbo Wang, Chen Li and Hong Qin
39
11.
Determining Personality Traits from Goal-Oriented Driving ...................
Behaviors: Toward Believable Virtual Drivers
Andre Possani-Espinosa, J. Octavio Gutierrez-Garcia and Isaac Vargas
Gordillo
43
12.
Virtual Meniscus Examination in Knee Arthroscopy Training
Bin Weng and Alexei Sourin
...................
47
13.
Space Deformation for Character Deformation using Multi-Domain ...................
Smooth Embedding
Zhiping Luo, Remco Veltkamp and Arjan Egges
51
Data-Driven Model for Spontaneous Smiles
Laura Trutoiu1 , Nancy Pollard1 , Jeffrey F. Cohn2 , Jessica Hodgins1
1
Carnegie Mellon University 2 University of Pittsburgh
Abstract
We present a generative model for spontaneous smiles that preserves their dynamics and can thus be used to generate genuine
animations. We use a high-resolution motion capture dataset of
spontaneous smiles to represent the accurate temporal information present in spontaneous smiles. The smile model consists
of data-driven interpolation functions generated from a Principal Component Analysis model and two blendshapes, neutral
and peak. We augment the model for facial deformations with
plausible, correlated head motions as observed in the data. The
model was validated in two perceptual experiments that compared animations generated from the model, animations generated directly from motion capture data, and animations with
traditional blendshape-based approaches with ease-in/ease-out
interpolation functions. Animations with model interpolation
functions were rated as more genuine than animations with
ease-in/ease-out interpolation functions for different computergenerated characters. Our results suggest that data-driven interpolation functions accompanied by realistic head motions can
be used by animators to generate more genuine smiles than animations with generic ease-in/ease-out interpolation functions.
1 Introduction
Facial animation research has made significant
progress in the quality of the static appearance of
realistic faces [1; 2] as well as the high-resolution
techniques for capturing dynamic facial expressions [3; 4]. Generative models for the deformations that occur on the face, however, do not represent the full range of of subtle human expression.
These generative models are particularly useful if
they integrate with traditional animation methods,
which often rely on key framing static expressions.
In this paper, we use high-resolution motion capture data to build a model of one such dynamic
facial expression, smiles.
Our smile model consists of two parts: (1) datadriven interpolation functions to model smile expressions and (2) plausible head motions. We
start with high-resolution motion capture data of
smiles for one individual. The motion capture data
contains both spontaneous smiles, elicited through
various activities, and posed smiles. For the smile
expressions, we build a generative model that produces interpolation functions nonlinear in time.
These interpolation functions capture the plausible
velocity as well as the multiple peaks that occur in
natural smiles. For each data-driven interpolation
function we provide a plausible head motion.
Through perceptual studies, we demonstrate
that our model outperforms the commonly used
ease-in/ease-out interpolation functions. We eval-
uate our model based on how the smiles are rated
for genuineness. In a first perceptual experiment,
we compare model smiles with recorded highresolution spontaneous smiles, and also smiles
generated with ease-in/ease-out interpolation functions. Our data showed no significant difference
between the high-resolution spontaneous smiles
and the model smiles. In a second experiment, we
find that our model-based interpolation functions
coupled with appropriate head motions can generalize to different CG characters.
2 Related work
Smiles can be categorized, depending on how they
are elicited, as spontaneous or posed. Posed or
deliberate smiles mask other emotions rather than
genuinely convey amusement or joy. Conversely,
spontaneous smiles more often convey genuine
emotions though the perceptual labels associated
with spontaneous smiles are diverse. Smiles can
be perceptually labeled as polite, embarrassed, or
fearful [5; 6; 7]. What are the cues that help differentiate between different type of smiles?
Many cues, static and dynamic, affect how a
smile is perceived [7]. The Duchenne marker
(slight wrinkling on the outer corner of the
eyes [8]), is considered an indication of a genuine smile. However, the timing of the Duchenne
marker relative to the smile, a dynamics cue, also
impact whether the smile is perceived as genuine [9]. Spontaneous smiles have been found
to have smaller amplitude and slower onset (start
time) than posed smiles [6; 10].
Few research projects explicitly consider generative models for smiles and laughter. Krumhuber
and colleagues [11] used a data-based heuristic to
generate genuine smiles characterized by symmetric smiles with a long onset and offset duration
with a shorter peak. A discrete model for smiles
with a limited number of parameters was proposed
by Ochs and colleagues [12]. Previous research
suggested that temporal information is required
to maintain the genuineness of smile expressions
when linearizing spatial motion, as in the case of
blendshape interpolation [13].
1
Original
Generated
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
0
100
200
300
−0.2
0
400
Frame #
100
200
300
400
Frame #
Figure 1: PCA model for genuine smiles: (left) original smile profiles and (right) generated profiles.
3 Smile model
Our smile model consists of two parts: (1) a generative model for smile expressions, represented as
interpolation functions, and (2) plausible head motions. For the smile expression model, we represent smiles in the temporal domain as data-driven
interpolation functions.
We capture temporal nonlinearities in datadriven interpolation functions with a generative
Principal Component Analysis (PCA) model. Our
model includes of plausible head motions because
animations presented without head motion appeared artificial and rigid.
We used a dataset of high-resolution motion capture data for the face and head movement. The dataset was recorded by Trutoiu and
colleagues [13]. For animations, a computergenerated version of the participant was created
by a professional artist. Animations were created
in Autodesk Maya either from the high-resolution
data or from blendshape interpolation with various
functions.
3.1 Generative model
We modeled the temporal properties of spontaneous smile expression as interpolation functions
and used PCA to construct a generative model of
data-driven interpolation functions. Twenty five
spontaneous smiles from the same subject (SD)
were used for the PCA model. High-resolution
expressions of motion capture data were reconstructed in a least-square fashion with two blendshapes to build a time series dataset for smile dynamics. The time series are represented by the
coefficients s. PCA models the variability in the
data-driven interpolation functions.
We represented the original dataset using the
first ten principal components accounting for 98%
variance. Next, we projected new, random coefficients within one standard deviation of the original coefficients onto these ten PCA dimensions.
Figure 1 shows the input and output of the PCA
model: the original time series correspond to the
input and the generated time series to the output.
Using the last two terms of the newly generated
time series, we scaled back each part of the smile
to create interpolation functions of different durations. Animated smiles generated using these
interpolation functions were used in both experiments.
3.2 Plausible head motions
We hypothesized that a plausible head motion is
proportional to the smile amplitude, similar to
laughing, where sound correlates to torso movements. Cohn and colleagues found moderate correlation between head pitch and smile intensity
with the intensity increasing as the head moved
downwards [14].
We evaluated the correlation between head motion and smiles in our dataset and found a moderate average correlation only for head pitch, −0.38
(on a scale from −1 to 1, 0 means no correlation
is occurring). Some smile samples show stronger
correlations with a maximum of −0.8. Based on
these results, we generated plausible head motions
derived from the interpolation functions, such that
the smile amplitude is proportional to head pitch.
4 Perceptual experiments
The goal of Experiment 1 was to evaluate a small
number of samples from the model relative to
ground truth animations and ease-in/ease-out animations. In Experiment 2, we tested a large sample
of model smiles and applied the model to multiple
CG characters.
We hypothesized that interpolation functions
generated from our PCA model would result in
smile animations with high genuineness ratings
compared to ease-in/ease-out interpolation functions. We expected that adding plausible head
motions, proportional to the smile amplitude, to
the smile expressions would increase the perceived
genuineness of the animation.
2
June 30, N=61
With head motion
Without head motion
70
Genuineness
60
50
40
30
20
10
Figure 3: The three CG characters in Experiment 2.
Spont
Spontaneous
Model
Ease−in/Ease−out
Conditions
Posed
Figure 2: Experiment 1 results.
***
***
4.1 Experiment 1: Different smile types
To evaluate our model, we conducted a withinsubjects experiment with the following independent variables: smile type (spontaneous, posed,
model, and ease-in/ease-out smiles) and head motion (with and without). The dependent variable
was perceived genuineness on a scale from 1 to
100. In this experiment, the head motion for the
model and ease-in/ease-out condition were identical and proportional to the model smile profile
as described above. Sixty-one viewers rated 24
smile animations on Amazon’s Mechanical Turk.
All animations were created with a CG character that matched in appearance our participant SD.
The high-resolution animations were obtained directly from the recorded motion capture data and
are considered ground truth animations that match
as best as possible recorded video.
Three samples of each of the following types
were used: (1) Spontaneous (2) Posed smiles,
(3) Model, and (4) Ease-in/ease-out. For the
model smiles two static blendshapes (neutral and
peak) were obtained from the collection of spontaneous smiles. For each model interpolation
function we created counterpart ease-in/ease-out
curves with the same durations as the model smiles
from neutral to peak and from peak to the end the
smile.
4.1.1 Results
We conducted a repeated-measures 4 (smile type)
× 2 (head motion) ANOVA to investigate possible effects of the independent variables on smile
genuineness. Both independent variables and their
interaction significantly impacted genuineness ratings.
We found a significant main effect of smile
type on smile genuineness F (3, 420) = 19.13,
p < .0001. Posed smiles (rating of 35.62) were
rated as being significantly less genuine than all
other conditions, which were not significantly
different amongst each other. Similarly, head
motion had a significant effect on smile genuineness: animations with head motion had an average
rating of 52.56 while animations without head
motion averaged 37.51, F (1, 420) = 120.80,
Figure 4: Genuineness ratings for three characters and
two smile types.
p < .0001.
We further investigated the significant interaction between smile type and head motion
F (3, 420) = 2.83, p = .0379. For animations
with head motion, spontaneous smiles (59.31)
were significantly different than ease-in/ease-out
smiles (53.28), F (1, 420) = 4.12, p = .043,
but not significantly different than model smiles
(57.93), F (1, 420) = .215, p = .643. Without
head motion, only posed smiles were significantly
less genuine. This interaction is shown in Figure 2.
4.2 Experiment 2: Multiple characters
The independent variables used in this experiment
are the smile type (model or ease-in/ease-out), the
CG character (female SD, male KB, or cartoonlike CP), and the smile sample (1 to 12). The
dependent variable is smile genuineness. The
twelve data-driven interpolation functions and corresponding head motions are from the SD model.
The peak blendshapes are shown in Figure 3. Animations were shown with head motion based on
the model smiles.
We used a mixed experiment design, 3 (character) × 2 (smile type) × 12 (smile sample), with
character as a between-subjects variable while
smile type and smile sample were within-subjects
variables. Each participant saw each animation
type (n = 24) for only one character. Fifty-eight
participants viewed and rated SD animations, 57
participants rated KB animations, and 63 participants rated CP animations.
3
Table 1: Significant results from Experiment 2: Multiple characters.
Effect
Main effects
F-Test
Smile type
F(1,4031)=35.13, p<.0001
Sample
F(11,4030)=28, p<.0001
Post-hoc
Model smiles (56.13) are rated as more
genuine than ease-in/ease-out smiles (52.22)
The ratings for samples varied from
64.79 (sample 5) to 45.82 (sample 3)
Two-way Interactions
Smile type*Sample
F(11,4030)=7.15, p<.0001
Smile type*Character
F(2,4031)=17.30, p<.0001
4.2.1 Results
Significant main effects were observed for smile
type and sample but not for CG character (shown
in Table 1). The ANOVA statistical analysis
indicates that our model is appropriate for use
with photorealistic characters such as KB and SD.
However, more research is required to investigate
how this type of model can be used with cartoonlike characters.
Model smile sample 5 is rated highest (70.42)
while model smile sample 12 is rated lowest (43.51)
For SD and KB, model smiles are
higher than ease-in/ease out
[4]
Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd
Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. High-quality passive facial performance capture using anchor frames. ACM SIGGRAPH
2011 papers on - SIGGRAPH ’11, 1(212):1, 2011.
[5]
Paul Ekman. Telling lies: Clues to deceit in the marketplace, politics, and marriage (revised and updated edition). W. W. Norton & Company, 2 rev sub edition,
September 2001.
[6]
Jeffrey F. Cohn and Karen L. Schmidt. The timing of
facial motion in posed and spontaneous smiles. Journal of
Wavelets, Multi-resolution and Information Processing,
2:1–12, 2004.
[7]
Zara Ambadar, Jeffrey F Cohn, and L I Reed. All smiles
are not created equal: Morphology and timing of smiles
perceived as amused, polite, and embarrassed/nervous.
Journal of Nonverbal Behavior, 33(1):17–34, 2009.
[8]
Paul Ekman, R J Davidson, and Wallace Friesen. The
Duchenne smile: emotional expression and brain physiology. Journal of Personality and Social Psychology,
58(2):342–53, February 1990.
[9]
Eva G Krumhuber and Antony S R Manstead. Can
Duchenne smiles be feigned? New evidence on felt and
false smiles. Emotion, 9(6):807–820, 2009.
5 Discussion
The primary contribution of this paper is to
demonstrate that data-driven interpolation functions accompanied by correlated head motions
are appropriate for modeling smiles. Our smile
model of interpolation functions and plausible
head motions is rated as more genuine than animations based on the commonly used ease-in/easeout interpolation functions, even when the easein/ease-out examples had the same overall timing to and from the peak of the smile and the
same head motions. The model preserves naturally occurring smile accelerations, decelerations,
and multiple smile peaks. In contrast, animations
with ease-in/ease-out interpolation functions are
smooth with a single peak and therefore may not
accurately represent spontaneous smiles.
References
[1]
Henrik Wann Jensen, Stephen R Marschner, Marc Levoy,
and Pat Hanrahan. A practical model for subsurface light
transport. In Proceedings of the 28th annual conference
on Computer graphics and interactive techniques, pages
511–518, 2001.
[2]
Jorge Jimenez, Timothy Scully, Nuno Barbosa, Craig
Donner, Xenxo Alvarez, Teresa Vieira, Paul Matts,
Verónica Orvalho, Diego Gutierrez, and Tim Weyrich.
A practical appearance model for dynamic facial color.
ACM Transactions on Graphics, 29(6):141:1–141:10,
December 2010.
[3]
Li Zhang, Noah Snavely, Brian Curless, and Steven M
Seitz. Spacetime faces: High-resolution capture for˜
modeling and animation. In Data-Driven 3D Facial Animation, pages 248–276. Springer, 2007.
[10] Karen L Schmidt, Yanxi Liu, and Jeffrey F Cohn. The
role of structural facial asymmetry in asymmetry of peak
facial expressions. Laterality, 11(6):540–61, November
2006.
[11] Eva Krumhuber and Arvid Kappas. Moving smiles: The
role of dynamic components for the perception of the
genuineness of smiles. Journal of Nonverbal Behavior,
29(1):3–24, April 2005.
[12] Magalie Ochs, Radoslaw Niewiadomski, Paul Brunet,
and Catherine Pelachaud. Smiling virtual agent in social
context. Cognitive Processing, pages 1–14, 2011.
[13] Laura C. Trutoiu, Elizabeth J. Carter, Nancy Pollard, Jeffrey F. Cohn, and Jessica K. Hodgins. Spatial and temporal linearities in posed and spontaneous smiles. ACM
Transactions on Applied Perception, 8(3):1–17, August
2014.
[14] Jeffrey F. Cohn, Lawrence I. Reed, Tsuyoshi Moriyama,
Jing Xiao, Karen Schmidt, and Zara Ambadar. Multimodal coordination of facial action, head rotation, and
eye motion during spontaneous smiles. IEEE Proceedings of the International Conference on Automatic Face
and Gesture Recognition, pages 129–135, 2004.
4
Expectancy Violations Related to a Virtual Human’s
Joint Gaze Behavior in Real-Virtual Human Interactions
Kangsoo Kim1 , Arjun Nagendran1 , Jeremy Bailenson2 , and Greg Welch1
1 University
of Central Florida
University
2 Stanford
Abstract
Joint gaze—the shared gaze by the individuals
toward a common object/point of interest—
offers important non-verbal cues that allow
interlocutors to establish common ground in
communication between collocated humans.
Joint gaze is related to but distinct from mutual
gaze—the gaze by the interlocutors towards
each other such as during eye contact, which
is also a critical communication cue. We
conducted a user study to compare real human
perceptions of a virtual human (VH) with
their expectancy of the VH’s gaze behavior.
Each participant experienced and evaluated
two conditions: (i) VH with mutual gaze only
and (ii) VH with mutual and joint gaze. We
found evidence of positive responses when
the VH exhibited joint gaze, and preliminary
evidence supporting the effect of expectancy
violation, i.e., more positive perceptions when
participants were presented with VH’s gaze
capabilities that exceeded what was expected.
Figure 1: Virtual human exhibiting mutual gaze (left) and joint
gaze (right).
In this paper, we conduct a user study that independently varies a virtual human’s (VH) joint
gaze behavior (Fig. 1), and investigate the effects of a mismatch between user expectations
of the VH’s gaze behavior and the VH’s actual gaze behavior, with respect to the user’s
perceptions of the VH. Joint gaze is the shared
gaze that interacting interlocutors typically exhibit when attending to a common object of interest. Joint gaze is an important aspect of establishing common ground, so interlocutors generally expect joint gaze when attempting to establish joint attention to a shared object. For
example, if you explain directions to a partner
while pointing toward features on a map, you
would expect your partner to look at the map. If
your partner does not look at the map, you might
be puzzled and wonder whether your partner is
paying attention. A positive (or negative) violation corresponds to when a subject initially having a low (or high) expectation of a VH’s joint
gaze later evaluates the VH more positively (or
negatively) after they actually meet a VH with
(or without) joint gaze. We hypothesize that an
expectation violation related to the VH’s joint
gaze will influence one’s perceptions of the VH.
Keywords: joint gaze, expectancy violation, human perception, virtual human, avatar
1 Introduction
Expectancy violation (EV) is a well-known phenomenon in human communications and psychology [1]. The phenomenon of EV arises
when one encounters an unexpected behavior,
and as a result experiences either positive or negative feelings. For example, if a child is given a
gift, she will likely be happier if the gift was unexpected (e.g., “out of the blue”) than if the gift
was expected (e.g., it was her birthday). This
would be a positive violation. Conversely, not
receiving a gift on her birthday, when she clearly
expected it, would be a negative violation that
could cause her to have an unfavorable response.
Among the prior research looking at the importance of gaze behavior in VH systems, some
work has looked at the gaze behavior between
VHs in a virtual environment [2, 3], while other
work has looked at gaze between real humans
5
3 Experiment
and VHs in real/mixed environments [4, 5, 6, 7,
8]. Our interest is in the latter due to the involvement of real objects in the interaction, as opposed to interactions in a purely virtual environment. Previous research supported the importance of VH eye gaze (mostly mutual gaze, i.e.
eye contact) in human perception (e.g., socialpresence) or task performance. However, there
is relatively little research narrowing the focus
down to joint gaze and one’s expectancy. This
paper presents preliminary results about the effects of VH’s joint gaze and its expectancy violation in one’s perceptions of the VH.
3.1 Scenario and Manipulation
Our human subjects were introduced to a VH
and told his name (“Michael”). They were then
told that the VH was a new student at the university who was currently in a building off campus, but needed to return for a lecture on campus, and that he was late. The subjects were
then asked to staff a “help desk” and to provide the VH with directions using a Campus
Map and a pen. We had two conditions of the
VH’s gaze behavior: (i) mutual gaze only and
(ii) mutual gaze with joint gaze (Fig. 1). While
the VH always looked at the subject’s face without looking down to the map in “mutual-only”
condition, he looked at the map occasionally
in the “mutual+joint” condition. In both conditions, the VH exhibited small natural uppertorso movement and eye blinks. Subjects experienced both conditions and evaluated the two
VHs. The overall procedure is illustrated in
Fig. 3. First, subjects saw the VH verbally explaining the situation that he would look for
directions to the campus, and completed a demographic pre-questionnaire. They experienced
both interaction 1 and 2 explaining the map,
but the VH performed a different gaze behavior in each interaction—either “mutual-only” or
“mutual+joint.” After each interaction, subjects
were asked to complete a questionnaire about
their perceptions of the VH and sense of their
EV with respect to the VH’s gaze behavior (5scale Likert). Finally, they compared two conditions and reported their preference in questionnaire 3. To prevent the subjects from familiarizing themselves with the same set of directions,
we counter-balanced a different destination on
the map as well as the VH’s gaze behavior for
each interaction.
2 Virtual Human
A remote human controller manipulated our VH
from a separate room (Wizard of Oz). We provided the controller with a video feed of the
experimental space so he could see the experimental environment and affect the VH gaze either toward the subjects face or toward the map,
depending on the current trial/subject (Fig. 1).
The controller used an interface with an infrared camera (TrackIR) and a magnetic tracking
device (Razer Hydra) to perform the VH’s facial expressions, mouth movement, and change
of gaze direction effectively via our system developed previously [9]. The upper torso of our
VH was displayed in near human-size on a 55”
2D flat screen, and a table with black curtains
blocked the place where the lower torso should
have been, so subjects could feel that the VH
was behind the table (Fig. 2). In our scenario,
the VH expressed a normal or slightly pleasant
facial expression during the interaction, so that
the subjects could feel the VH’s emotional state
was consistent. The VH generally initiated the
conversation unless the subject started talking
first, but did not say anything proactively during the interaction. In other words, the VH only
made positive reciprocal answers (e.g. “Yes, I
understand.”) to the subject’s affirmative question “Do you understand?”.
Figure 3: Overall procedure.
Figure 2: Virtual human setup (left) and facial expressions (right).
6
3.2 Human Subjects
A total of 28 subjects were recruited from The
University of Central Florida, and received $15
of monetary compensation for the experiment.
Subjects were 75% male (n = 21, mean age =
19.95, range = 18–26) and 25% female (n = 7,
mean age = 20.71, range = 18–24). Most of
them (n = 26) had previous experience of virtual characters through video games or virtual
reality applications. All the subjects were aware
that what they interacted with was a VH.
4 Results and Discussion
Joint Gaze: We evaluated subjects’ responses
from (comparison) quesionnaire 3 to check
which gaze condition of VH subjects preferred. More than 50% of subjects chose “mutual+joint” VH as their preference in all the
questions, which indicates the importance of
joint gaze feature in VH system (Table 1). However, there were still considerable number of
people who did not feel any difference between
two conditions. According to informal discussion with subjects after the study, a majority
of the subjects addressed that the VH’s verbal response capability far exceeded what they
had previously experienced although our VH
merely responded their affirmative questions.
We guess that highly engaging verbal communication might overwhelm the effect of joint gaze,
so subjects could not feel any difference between two conditions.
Expectancy Violation (EV): Although we
used the same questions for questionnaire 1 and
2 (perception / sense of EV) for experiment consistency, we only analyzed subjects’ responses
from questionnaire 1 to evaluate EV effects, because their expectation might be biased for the
multiple interactions. After the first interaction,
we asked for their sense of EV with respect to
the VH’s gaze behavior, e.g., “How would you
rate the capability of virtual human’s gaze com-
Figure 4: Population by subject-reported sense of EV in VH’s
gaze behavior. Population with “mutual+joint” VH
tends towards the highest (5) in the sense of EV while
population with “mutual-only” is more towards (4) in
the sense of EV, which can be interpreted that VH’s joint
gaze encourages more positive violation.
pared to what you expected?” (5-scale, 1: more
negative than what I expected, 3: same as what
I expected, 5: more positive than what I expected). We expected both negative and positive
responses, but the responses were mostly positive, so we focused on positive violations. The
results indicated that “mutual+joint” VH encouraged more positive violation than “mutualonly” VH. In other words, subjects with “mutual+joint” VH tended to evaluate the VH’s gaze
behavior more positively, compared to what they
expected before, than with “mutual-only” VH
(Fig. 4). T-tests showed a significant difference
in subject-reported EV of VH’s gaze behavior
for “mutual-only” (M = 3.643, SD = 0.842) and
“mutual+joint” (M = 4.500, SD = 0.650) conditions; t(24) = -3.01, p = 0.006.
When we analyzed the relationship between
subject’s perceptions and their sense of EV, we
observed high-reliability between the responses
from 9 questions in Table 2 (Cronbach’s alpha
> 0.80), so we averaged their responses into a
single value and used it as their perception re-
Table 1: Subject’s responses from comparison questionnaire 3. The value indicates the number of people who preferred the condition, and
its percentage out of total 28 subjects in parentheses.
Question
Mutual+Joint Mutual-Only No Difference
Which virtual human did you like more?
17 (61%)
2 (7%)
9 (32%)
Which interaction did you enjoy more?
16 (57%)
8 (29%)
4 (14%)
Which interaction were you more engaged with?
14 (50%)
3 (11%)
11 (39%)
Which virtual human did you think more pay attention to what you were explaining?
21 (75%)
2 (7%)
5 (18%)
Which virtual human did you feel that more understood what you were explaining?
17 (61%)
6 (21%)
5 (18%)
Which virtual human did you feel more as if it was a real human?
16 (57%)
3 (11%)
9 (32%)
Which virtual human gave you more sense of physical presence?
14 (50%)
2 (7%)
12 (43%)
Which virtual human was more natural (human-like)?
18 (64%)
3 (11%)
7 (25%)
7
sponse. In Fig. 5, we found evidence of a tendency that a subjects’ perception became more
positive (i.e. larger values in y-axis) as their expectancy of gaze was more positively violated
(i.e. more towards 5 in x-axis) when we compared 3, 4, and 5 columns. Although the sample size (N) was small and varied, the tendency
could be interpreted that subject’s perception
was influenced by their expectancy, which was
positively violated after the interaction.
gaze behavior exceeded their expectation (positively) regardless of the presence of joint gaze.
In the future, we will consider a large-sample
study investigating the effects of a user’s previous experience and expectations related to various features of virtual or robotic humans. If
we find a certain feature that causes a negative
violation in general, which means people normally have high expectations about the feature,
it would indicate that the feature should be carefully considered for future VHs.
Table 2: Nine questions for subject’s perception responses from
questionnaire 1 (5-scale, 1: strongly disagree, 5: strongly
agree). Subject’s responses from these questions were
correlated (Cronbach’s alpha > 0.80).
1.
2.
3.
4.
5.
6.
7.
8.
9.
References
[1] Judee K. Burgoon, Deborah A. Newton, Joseph B.
Walther, and E. James Baesler. Nonverbal expectancy
violations and conversational involvement. Journal of
Nonverbal Behavior, 13(2):97–119, 1989.
[2] Jeremy N. Bailenson, Andrew C. Beall, and Jim Blascovich. Gaze and task performance in shared virtual
environments. The Journal of Visualization and Computer Animation, 13(5):313–320, 2002.
[3] Rutger Rienks, Ronald Poppe, and Dirk Heylen. Differences in head orientation behavior for speakers and
listeners: an experiment in a virtual environment.
ACM Transactions on Applied Perception, 7(1):1–13,
2010.
[4] Roel Vertegaal, Robert Slagter, Gerrit van der Veer,
and Anton Nijholt. Eye gaze patterns in conversations:
there is more to conversational agents than meets the
eyes. In SIGCHI Conference on Human Factors in
Computing Systems, pages 301–308, 2001.
[5] Alex Colburn, Michael F. Cohen, and Steven Drucker.
The Role of Eye Gaze in Avatar Mediated Conversational Interfaces. In Technical Report MSR-TR-200081, Microsoft Research, 2000.
[6] William Steptoe, Robin Wolff, Alessio Murgia, Estefania Guimaraes, John Rae, Paul Sharkey, David
Roberts, and Anthony Steed. Eye-tracking for avatar
eye-gaze and interactional analysis in immersive collaborative virtual environments. In ACM Conference
on Computer Supported Cooperative Work, pages
197–200, 2008.
[7] Maia Garau, Mel Slater, Vinoba Vinayagamoorthy,
Andrea Brogni, Anthony Steed, and M. Angela Sasse.
The Impact of Avatar Realism and Eye Gaze Control
on Perceived Quality of Communication in a Shared
Immersive Virtual Environment. In SIGCHI Conference on Human Factors in Computing Systems, pages
529–536, 2003.
[8] Gary Bente, Felix Eschenburg, and Nicole C. Krämer.
Virtual gaze. A pilot study on the effects of computer
simulated gaze in avatar-based conversations. Virtual
Reality (LNCS), 4563:185–194, 2007.
[9] Arjun Nagendran, Remo Pillat, Adam Kavanaugh,
Greg Welch, and Charles Hughes. AMITIES: Avatarmediated Interactive Training and Individualized Experience System. In ACM Symposium on Virtual Reality Software and Technology, pages 143–152, 2013.
You liked the virtual human.
You enjoyed the interaction with the virtual human.
You were engaged in the interaction with the virtual human.
You had the feeling that the virtual human was paying
attention to what you explained.
The virtual human seemed to understand what you explained.
You had the feeling that the virtual human was a real human.
You had the feeling that the virtual human was physically
present in real space.
The interaction with the virtual human was natural.
You had the feeling that the virtual human looked at the map.
Figure 5: Mean of subjects’ perception responses over selfreported sense of EV. In both “mutual-only” and “mutual+joint” conditions, a higher positive EV (x-axis) resulted in a higher value of perception (y-axis).
5 Conclusions
We have presented a user study aimed at understanding the effects of a VH’s joint gaze behavior and the phenomenon of expectancy violation
(EV) with respect to a human’s perception of
the joint gaze behavior of a VH. As expected,
joint gaze was found to be an important characteristic for subjects to build positive responses to
the VH during a map explanation scenario. We
also discovered preliminary evidence of a positive EV effect—subjects evaluated the VH more
positively corresponding to how much the VH’s
8
Segmentation-based Graph Representation for 3D
Animations
Guoliang Luo
gl.luo@yahoo.com
Jiangxi Normal University
Quanhua Tang
quanhuatang@163.com
Jiangxi Normal University
Yihan Liu
yi-han.liu@etu.unistra.fr
University of Strasbourg
number of vertices and fixed topology, and
Dynamic Mesh that a 3D model has varied
number of vertices and/or topology. In fact, a
dynamic mesh can be converted into a mesh
sequence
by
computing
the
vertex
correspondence between neighbouring frames,
which can be another challenging and
computational task [1]. For the sake of
simplicity, we work on mesh sequence.
This paper is organized as follows. In Section
2, we first review the previous shape retrieval
techniques in both Computer Graphics and
Computer Vision domains. Then we introduce
a segmentation method for 3D animations and
their graph representation in Section 3. To
validate the new representation, we apply it for
computing the animation similarities in Section
3.3. After showing the experimental results in
Section 4, we conclude in Section 5.
Abstract
3D animation is becoming another popular
multimedia data because of the development of
animation technologies. A 3D animation data,
as a sequence of meshes within each is a set of
points, is a different data other than 2D objects
or 3D static models, and thus requires new
signatures for data management. In this paper,
we present a new segmentation-based method
for representing and comparing 3D animations.
The main idea is to group both the deformed
triangles and the rigid triangles on the model
surface for the spatial segmentation. Then we
represent the segmentation into a weighted
graph, where each node denotes a triangle
group, either ‘deformed’ or ‘rigid’, and edges
denote neighborhoods. Moreover, we note the
property of each node with a vector of
geometrical attributes of the triangles within
the group. Our experimental results show that
the dissimilarities of the signatures reflect
motion dissimilarities among 3D animations.
2. Related Works
Shape retrieval is one of the most popular
applications of shape signatures [2]. In this
section, we briefly review the state of the art of
the existing works on 3D shape retrieval.
An abundant of research has been devoted on
3D shape retrieval during the last decade,
which can be classified into geometry based
methods [3,4,5,6] and graph based methods
[7]. Geometry-based methods compare 3D
models based on static geometrical properties,
or shape descriptors. For example, shape
histogram divides the space into blocks and
counts the number of surface points within
each block which corresponds to a histogram
bin [3]; spin image is to map 3D vertices on
surface into 2D space via the cylindrical
coordinate system [4]. Other shape descriptors
include spherical harmonics [5] and statistical
histogram of geometrical attributes [6], etc.
This object can also be achieved by comparing
Keywords: 3D animation, segmentation, graph
representation
1. Introduction
Recent increasingly advancement of the
techniques for modelling 3D animations has
led to an abundant of 3D animation data, which
makes the data management technique more
and more necessary, such as animation data
representation and shape retrieval. Although
such techniques have been intensively studied
for 3D static models, it remains a new
challenging task for 3D animations. In
computer graphics research field, 3D
animations can be classified into Mesh
Sequence that a 3D mesh deforms with fixed
9
multiple 2D views of each 3D model. Such
method holds the advantage that it reduces the
problem space from 3D to 2D, for which many
existing approaches can be directly applied [8].
On comparison, the graph based methods
include not only geometry information, but
also the spatial connections among shape
components, i.e., topological information.
Most typically the graphs are the skeletons
extracted from 3D objects [7]. For example,
Sundar et al. match two shapes by using a
greedy algorithm to compute the maximum
common structures between the skeletal graphs
of the two shapes [7]. More generally, with the
existing techniques for extracting skeleton out
of 3D models, both of the graph distance
computing methods, i.e., Maximum Common
Subgraph [9] and Graph Edit Distance [10],
can be used to compare the skeletons.
However, different from 3D shapes, 3D
animations also carry dynamic behaviours, i.e.,
mesh deformation caused by actions, for which
reason the retrieval methods of shapes may not
be applicable for animations.
Algorithm 1: Spatial segmentation of a 3D animation.
Step 1: For each triangle, compute the maximum
strain throughout all frames, i.e., s i  max ( s ip ) . See
p 1,..., N
Figure 1(a).
Step 2: By starting from a random rigid triangles
whose strain is less than a threshold  , we apply
region-growing to group the neighboring triangles,
until including any of the neighboring triangles make
the average strain of the group exceeds  . Each of
the group is a ‘rigid’ segment. (   0.5 in our
experiments).
Step 3: The above step runs iteratively until the
strains of the ungrouped triangles are larger than  .
Step 4: For the remaining ungrouped triangles, we
continue to merge the spatially reachable deformed
triangles. Each of the obtained group is a ‘deformed’
segment.
Step 5: The above process stops until no unground
triangles remains.
Finally, after removing small groups by
merging with the most similar neighboring
segment, a deforming mesh is divided into
‘deformed’ and ‘rigid’ segments. See Figure
1(b) and supplemental video.
3.2 Graph Representation
3. Similarities of 3D Animations
We describe in this section the representation
of the above segmentation results into
weighted graphs, as for a graph representation.
First, we represent the spatial segmentation
into a graph, where each node corresponds to a
spatial segment and the edges denote the
neighborhoods among spatial segments. See
Figure 1(c). Moreover, similar to Luo et al.’s
work [12], we also represent each node with a
vector of attributes. For the sake of simplicity,
we use the percentage of surface area and
statistical distributions of strains as node
attributes. In our experiment, the distribution of
strains is a histogram of 6 bins. That is, the i-th
node can be noted as ni  (ni1 ,..., ni7 ) , where
the first 6 items are the histogram bins and the
last is the percentage of surface area.
Therefore, we can compare the dissimilarity d ij
between two nodes as the Euclidean distance
of the two vectors:
To extract the dynamic behaviors of a 3D
animation, we first present a spatial
segmentation method to divide a mesh into
‘deformed’ and ‘rigid’ parts, and the
segmentation result is further represented with
a weighted graph, i.e., each graph node is
represented with a set of dynamic features.
This representation allows to compute the
graph similarity as animation similarity.
3.1 Segmentation of 3D animations
Note that one can choose among several
descriptors to represent dynamic behaviors of
3D models [1,3,4,5,6,7,11]. Without losing
generality, in our method, we use strain
to
measure
the
sip (i  1,..., M , p  1,..., N )
deformation of the triangle t i in frame f p ,
where M, N are the number of triangles within
each frame and the number of frames,
respectively. In [11], Luo et al. propose a
normalized strain, with values ranging in [0, 1],
a larger value indicating higher deformation,
and vice-versa.
Having the strain values of each triangle within
each frame, we then process the segmentation
by following Algorithm 1.
d ij  k 1,...,7 (nik  n kj ) 2 ,
where ni and n j are two graph nodes.
Note that different attributes are equally
weighted. The weighting strategy can be
optimized if any of the attribute is found highly
representative of dynamic behavior.
10
(a) Strains in sampled frames, with red/blue denoting high/low deformation.
‘deformed’
‘rigid’
(d) Wuhrer et al.’s
(c) Graph representation.
(b) Segmentation in left and right views.
Segmentation result [14].
Figure 1: (a,b,c)Segmentation and the graph representation by our method (d)Segmentation by Wuhrer etal.’s method [14].
Figure 1(a,b,c) shows the segmentation and the
representation of a galloping ‘Horse’. As can
be seen, although we have not refine the
segment boundaries, the graph representation
clearly shows the dynamic behaviors of
different parts, i.e., which segment is deformed
and which is not. More segmentation results of
different animations are shown in supplemental
video.
3.3 Similarity of 3D Animations
Now that we have the graph representations of
the segmentation of two animations, we
compare graph similarity for the motion
similarity of the animations. The graph
similarity reflects motion similarity because, as
described in Section 3.2, each graph node
contains the dynamic features of the
corresponding spatial segment.
In this work, we choose Graph Edit Distance [2]
to compare graphs, who calculates the cost of
operations (or edit path) to transform one graph
to the other; the operations can be the
addition/deletion of nodes/edges. Neuhaus et
al. [13] have proposed an efficient approach
based on dynamic programming that finds an
edit path with minimum costs. Their method
requires the inputs of the topology of two
graphs and the node distances between the two
graphs, and outputs the minimum cost of an
optimized edit path.
Table 1: Used animations in our experiments, the timings,
and their dissimilarities to ‘Gallop-Camel’.
Animations #Triangles #Frames
Timings
(second)
Distance to
GallopCamel/Rankings
Gallop-Camel
43778
48
2.53
0/0
Gallop-Horse
16858
48
0.8
2.24/1
1.2
2.48/3
Jumping1
Jumping2
1.1
2.83/5
Walking
1.5
2.52/4
Jogging
1.3
2.45/2
29999
55
In [14], Wuhrer et al. propose a segmentation
method for animated meshes by locating the
segmentation boundaries in the highly
deformed regions, where the deformation of
each edge is measured by the maximum
change of the dihedral angle over time.
Comparing to the segmentation results of
galloping ‘Horse’ by using Wuhrer et al.’s
method, see Figure 1(d), our result in Figure
1(b) contains ‘deformed’ segments, which
inherently carries more motion information.
Similarity Measurement. In order to validate
the effectiveness of the new representation, we
proceed to compute the motion similarities by
measuring graph dissimilarity. With referring
to ‘Gallop-Camel’, we have obtained the
distance to the other 5 animations, ‘GallopHorse’, ‘Jumping1’, ‘Jumping2’, ‘Walking’,
‘Jogging’, shown in the last column in Table 1.
4. Experiments and Discussions
In this section, we present experiments with a
set of 3D animations by using the presented
motion similarity measurement method.
Readers may refer to our supplementary video
demonstrations for more results.
We have implemented the proposed method in
Matlab scripts and run on an Intel Pentium (R)
of 2.7GHz with 2GB of memory. The used 3D
animation data, together with the timings of
segmentation, are shown in Table 1, which
shows the efficiency of our method that all the
tested data can be segmented within 2 seconds.
11
As can be seen, ‘Gallop-Camel’ has the least
distance to ‘Gallop-Horse’. Moreover,
comparing to jumping motions, ‘GallopCamel’ has less distance to ‘Jogging’ as both
are rapid motions performed by 4-limb bodies.
[4] Johnson, A. E. (1997). Spin-images: a
representation for 3D surface matching
Doctoral dissertation, Microsoft Research.
[5] Kazhdan, M., Funkhouser, T., &
Rusinkiewicz, S. (2003, June). Rotation
invariant spherical harmonic representation
of 3 D shape descriptors. In Symposium on
geometry processing (Vol. 6).
[6] Sidi, O., van Kaick, O., Kleiman, Y.,
Zhang, H., & Cohen-Or, D. (2011).
Unsupervised co-segmentation of a set of
shapes via descriptor-space spectral
clustering (Vol. 30, No. 6, p. 126). ACM.
[7] Sundar, H., Silver, D., Gagvani, N., &
Dickinson, S. (2003, May). Skeleton based
shape matching and retrieval. In Shape
Modeling International, 2003 (pp. 130139). IEEE.
[8] Riesenhuber, M., & Poggio, T. (2000).
Computational
models
of
object
recognition in cortex: A review (No. AIMEMO-1695). Massachusetts Institute of
Technology
Cambridge
Artificial
Intelligence Lab.
[9] Bunke, H., & Shearer, K. (1998). A graph
distance metric based on the maximal
common subgraph. Pattern recognition
letters, 19(3), 255-259.
[10] Gao, X., Xiao, B., Tao, D., & Li, X.
(2010). A survey of graph edit distance.
Pattern Analysis and applications, 13(1),
113-129.
[11] Luo G., Cordier F., and Seo H (2014).
Similarity of deforming meshes based on
spatio-temporal segmentation. Eurographics 2014 Workshop on 3D Object
Retrieval. The Eurographics Association.
[12] Luo, P, Wu Z., Xia C., Feng L., and Ma T.
Co-segmentation of 3D shapes via multiview spectral clustering. The Visual
Computer 29, no. 6-8 (2013): 587-597.
[13] Neuhaus, M., Riesen, K., & Bunke, H.
(2006). Fast suboptimal algorithms for the
computation of graph edit distance. In
Structural, Syntactic, and Statistical
Pattern Recognition (pp. 163-172).
Springer Berlin Heidelberg.
[14] Wuhrer, S., & Brunton, A. (2010).
Segmenting animated objects into nearrigid components. The Visual Computer,
26(2), 147-155.
5. Conclusions
We have presented a weighted graph
representation of 3D animations based on
segmentation, with each node denoting a
spatial segment and being weighted with a
vector of geometrical properties. Additionally,
we have shown the effectiveness of the
representation
for
motion
similarity
measurement. This idea is driven by the fact
that our weighted graph representation
inherently carries motion information within
the vector space of each node.
Moreover, although our experiments show
satisfactory results, one may extend the
potential of our graph comparison by
incorporating different attributes of each graph
node, which helps to avoid ambiguous node
matching between two graphs.
Acknowledgements
This work is funded by Jiangxi Normal
University. We specially thank Professor Hao
Wang and Associate Professor Gang Lei for
providing experimental devices.
References
[1] Van Kaick, O., Zhang, H., Hamarneh, G.,
& Cohen‐Or, D. (2011, September). A
survey on shape correspondence. In
Computer Graphics Forum (Vol. 30, No. 6,
pp. 1681-1707). Blackwell Publishing Ltd.
[2] Tangelder, J. W., & Veltkamp, R. C.
(2008). A survey of content based 3D
shape retrieval methods. Multimedia tools
and applications, 39(3), 441-471.
[3] Ankerst, M., Kastenmüller, G., Kriegel, H.
P., & Seidl, T. (1999, January). 3D shape
histograms for similarity search and
classification in spatial databases. In
Advances in Spatial Databases (pp. 207226). Springer Berlin Heidelberg.
12
Motion Puppetry Taking Skeletal Similarity into Account
Norihide Kaneko1
Reo Takahashi2
Issei Fujishiro3
Keio University, Japan
kurikuri-bouzu@jcom.home.ne.jp1
{ r.takahashi2, fuji3}@fj.ics.keio.jp
been used in the motion conversion. The
method was proposed by Gleicher in 1998 [1],
and has become indispensable when reusing
motion capture data. The original retargeting
was intended as a post processing of motion
capture. On the other hand, with the advent of
real-time capturing devices such as Microsoft
Kinect™, motion capture has begun to be used
for manipulating an avatar instantly. An online motion retargeting a variety of avatars is
often referred to as puppetry. Several studies
have proposed actual puppetry methods,
though these suffer from problems that lack
interactivity or require a user to input much
metadata in addition to training animations.
Thus, we attempted to address these problems
by incorporating into the previous works, (1)
body parts classification based on the
symmetry of 3DCG models, and (2) skeletal
similarity.
Abstract
Motion retargeting is a well-known method for
generating character animations in computer
graphics, where the original motion of an actor
can be transferred to an avatar with a different
skeletal structure and/or limbs’ length. A
specific type of retargeting for manipulating
various avatars interactively is referred to as
puppetry. However, previous learning-based
works of puppetry cannot be said to be fully
interactive, or require a large amount of
metadata about the avatar's skeleton as well as
training data of motions. Thus, we attempted to
integrate existing methods by taking into
account skeletal similarity between an actor
and his/her avatar, to come up with an
interactive and intuitive motion retargeting
method that only requires a relatively small
amount of training animations together with
simple input metadata. Moreover, by
classifying avatar’s body parts in a procedural
manner, a more flexible motion puppetry was
realized, where the system user is allowed to
specify desirable part-to-part correspondences.
2. Related work
Considering target avatars’ skeleton, motion
retargeting is generally divided into two types;
to human like avatars or to non-human like
ones. The original work by Gleicher [1] was
known as the first attempt to retarget captured
motion to an avatar with different limbs’ length
by applying its spatiotemporal constraints to
captured motion data, whereas the avatar needs
to possess a skeleton of human beings. Recent
studies have achieved more flexible retargeting
to non-human like avatars by learning the
correspondence between a given actor’s
motion data and avatar’s one. The benefit of
Keywords: Computer animation, character
animation, motion retargeting, joint matching.
1. Introduction
Many 3D character animations in computer
graphics have been created using motion
capture data. In order to fill the gap between an
actor’s skeleton and an avatar’s one, the
procedure, called motion retargeting, has often
13
this type of retargeting method lies in a wider
range of its applicability in terms of target
avatar’s topology.
Yamane et al. [2] used the Shared Gaussian
Process Latent Variable Model (SGPLVM)
technique to convert captured data to the
motions of various avatars, including Luxo Jr.,
Squirrel, and Penguin. They combined the
SGPLVM and the Inverse Dynamics method,
to realize high-quality control of avatar
motions. However, due to its expensive
algorithms, it works only in an off-line
environment. Seol et al. [3] focused most of
their attention on puppetry, and indeed their
interactive avatar motion control uses a linear
combination of multi regression functions
generated from active parameters of an avatar
and those of an actor. However, unclear
correspondence relations of joints between
them require much metadata. Rhodin et al. [4]
showed various retargeting not only for body
motions but also for facial expressions through
the use of Canonical Correlation Analysis
(CCA), though their method cannot learn
multiple retargeting animations.
Figure 1: Overview of the system
for correct joint matching and ease of
retargeting indication. For example, the actor is
allowed to transfer his or her arm motion to the
avatar’s leg. Second, the joint similarities are
calculated for each of the body parts. These
values will be used in the next parts-based
regression process.
To classify the body parts, we took into
account the bilateral symmetries of the 3DCG
avatar’s initial poses. In Figure 2, we start a
search from the root joint and examine the
angle consistency of the children joints
𝑗1 , 𝑗2 , and 𝑗3 in the joint branch. Suppose that
we choose 𝑗1 . Another joint 𝑗2 has angle
between 𝑗1 and 𝑗2 (𝜃𝑗1,𝑗2), and the other joint 𝑗3
has angle between 𝑗1 and 𝑗3 (𝜃𝑗1,𝑗3 ). If these
angles are identical, 𝑗1 is classified as spine.
However, when choosing 𝑗2 , 𝜃𝑗2,𝑗1 and 𝜃𝑗2,𝑗3
have different values, and thus 𝑗2 is classified
as leg. As is the case with 𝑗2 , 𝑗3 is also
identified as leg. As such, spine and leg joints
can be classified. If the end of leg joints with a
height position value upper than a threshold,
they are assumed to be an arm. A body part
located at a spine end is considered as a tail. As
for facial rigs, we cannot estimate their parts
because they may sometimes have facial
expression joints. So, the system user has to
specify them manually.
3. System Overview
Our approach integrates joint matching with
the learning-based method proposed by Seol et
al. [3], so that it does not require metadata
except for training animations to realize
intuitive motion retargeting. Our puppetry
system consists of the learning part, which
learns three kinds of motion-related data,
followed by the real-time mapping part, as
shown in Figure 1. The values for skeletal
similarity calculated in joint matching are then
referred by the subsequent parts-based
regression. In the parts-based regression, the
input actor’s motions are reflected directly to
retargeting result. Furthermore, avatar’s
intrinsic animations are reflected to the result
in the motion classification process. Each of
these processes will be explained in the
following subsections in more detail.
Joint matching is performed between the
actor’s joint 𝑗𝑢 and avatar’s joint 𝑗𝑐 for the
same body parts class, as in Jacobson et al. [5].
The similarity value Similarity𝑗𝑢,𝑗𝑐 between 𝑗𝑢
and 𝑗𝑐 is defined as follows.
3.1 Joint Matching
The joint matching consists of two steps. First,
both actor and avatar models are divided into
five parts, i.e., head, tail, arm, leg, and spine
14
in the parameter values; 𝐾𝑑 the variance of the
parameter; 𝐾r the regression error value
described as 𝑝𝑐 − 𝜉(𝑝𝑢 ) using the data (1). In
contrast, 𝐾𝑐 is the error value using the data (2).
Figure 2: Body parts classification
The function 𝜉 is defined as follows:
Similarity𝑗𝑢 ,𝑗𝑐 = 𝜔𝑏𝑒𝑡 𝐾𝑏𝑒𝑡 (𝑗𝑢 , 𝑗𝑐 ) + 𝜔𝑜𝑟𝑖 𝐾𝑜𝑟𝑖 (𝑗𝑢 , 𝑗𝑐 )
+ 𝜔𝑝𝑒𝑟 𝐾𝑝𝑒𝑟 (𝑗𝑢 , 𝑗𝑐 )
𝑝
2
𝜉(𝑝𝑢 ) = (𝐴 − 𝐶)/(1 + ( 𝐵𝑢 ) ) + 𝐶 ,
where, 𝐾𝑏𝑒𝑡 denotes the difference of
betweenness centralities. This value represents
how far it is located from the root in the
network; 𝐾𝑜𝑟𝑖 the inner product of the joint
directions; and 𝐾𝑝𝑒𝑟 the percentage of the
relative length of the joint. There may be
missing body parts in the actor’s skeleton (e.g.
tail). But it is assumed herein that all the body
parts of the avatar should have their
corresponding parts in the actor’s skeleton,
thus the most plausible actor’s body parts are
decided in terms of averaged similarity value.
where 𝐴, 𝐵, 𝐶 are fitting parameters and
derived from the following least square method
with their maximum, minimum, and initial
values:
2
𝑚𝑖𝑛A,B,C ∑𝑖∈{𝑚𝑎𝑥,𝑖𝑛𝑖𝑡,𝑚𝑖𝑛}‖𝑝𝑐,𝑖 − 𝜉(𝑝𝑢,𝑖 )‖ .
The weight coefficients 𝜔𝑛 are given as the
solutions of the following equation:
𝑀
2
𝑁
𝑚𝑖𝑛 ∑ |𝑝𝑐,𝑚 − ∑ 𝜔𝑛 𝜉𝑛,𝑚 (𝑝𝑢,𝑛,𝑚 )|
𝜔
𝑚=1
3.2 Parts-based Regression
𝑛=1
𝑠. 𝑡. ∑𝑁
𝑛=1 𝜔𝑛 = 1, 𝜔 > 0,
where M denotes the number of frames in all
the animations (2). This can be solved by the
QP solver.
In the parts-based regression, retargeting is
carried out for each of the body parts with (1)
actor’s and avatar’s motion data and (2) avatarintrinsic animation data. The data (1)
comprises all the poses of the actor and avatar,
whereas the data (2) has the avatar’s intrinsic
animations that any actor cannot mimic easily
(e.g. quadruped walk, gallop), together with its
approximate animations of the actor (e.g.
bipedal walk, run).
3.3 Motion Classification
Motion classification is performed by SVM
with the parameters for the specified parts.
These parameters consist of 9 DoF per joint,
actor’s joint position, speed, and acceleration.
In runtime, the system estimates the motion
class based on the captured actor’s parameters,
and composes the avatar’s intrinsic animation
of the same class together with the result of
parts-based regression.
After active parameters 𝑝𝑐 are extracted from
the avatar’s animation data (1), the actor’s N
parameters 𝑝𝑢 are found as the ones with the
smallest parameter errors 𝐸𝑟𝑟𝑜𝑟𝑝𝑢 ,𝑝𝑐 in the
body parts obtained through the joint matching
process. Using these parameters, we generate
the regression function 𝑝𝑐 = ∑𝑁𝑛=1 𝜔𝑛 𝜉(𝑝𝑢,𝑛 ) .
Substituting the actor’s parameters into the
function, the corresponding avatar’s animation
can be estimated in runtime. We followed the
definition of the parameters shown in Seol et al.
[3].
4. Results
Owing to the body parts classification in joint
matching, our system enables the user to map
arbitrary actor’s body parts to specific avatar’s
ones readily.
A retargeting result is shown in Figure 3,
where the correspondence between the actor’s
arms and the sheep’s head is obtained. The
sheep’s basic motion of running is modulated
with detailed expressions according to the
actor’s playacting (e.g. arm, swing and body
tilting). Result of retargeting to a different
avatar is shown in Figure 6.
𝐸𝑟𝑟𝑜𝑟𝑝𝑢 ,𝑝𝑐 is defined as follows.
𝐸𝑟𝑟𝑜𝑟𝑝𝑢 ,𝑝𝑐 = 𝜔𝑠 𝐾𝑠 (𝑝𝑢 , 𝑝𝑐 ) + 𝜔𝑣 𝐾𝑣 (𝑝𝑢 , 𝑝𝑐 ) +
𝜔𝑑 𝐾𝑑 (𝑝𝑢 , 𝑝𝑐 ) + 𝜔𝑟 𝐾𝑟 (𝑝𝑢 , 𝑝𝑐 ) + 𝜔𝑐 𝐾𝐶 (𝑝𝑢 , 𝑝𝑐 ),
where the new term of 𝐾𝑠 for the similarity
value estimated in joint matching was added to
the original expression in [3]. 𝐾𝑣 denotes the
differences of the directional vector of 3D
avatar’s mesh geometry invoked by the change
15
animation data to learn. Also it is a challenge
task to extend it so that it can handle facial
models having non-divisible features by
adopting additional principles like CCA [4].
Acknowledgements
This work has been partially supported by
MEXT KAKENHI Grant Number 25120014
and JSPS KAKENHI Grant Number 26240015.
References
Figure 3: Mapped character’s motion
[1] M. Gleicher. Retargeting Motion to New
Characters. In Proc. SIGGRAPH 98, pages 33–42,
1998.
[2] K. Yamane, Y. Ariki, and J. Hodgins. Animating
Non-Humanoid Characters with Human Motion
Data. In Proc. SCA 2010, pages 169–178, 2010.
[3] Y. Seol, C. Sullivan, and J. Lee. Creature features:
Online Motion Puppetry for Non-Human
Characters. In Proc. SCA 2013, pages 213–221,
2013.
[4] H. Rhodin, J. Tompkin, K. I. Kim, K. Varanasi, H.
Seidel, and C. Theobalt. Interactive Motion
Mapping for Real-time Character Control.
Computer Graphics Forum, 33:273–282, 2014.
[5] A. Jacobson, D. Panozzo, O. Glauser, C.Pradalier,
O. Hilliges, and O. S. Hornung. Tangible and
Modular Input Device for Character Articulation.
ACM Transactions on Graphics, 33:82–93, 2014.
5. Discussions
The advantages of our method can be
summarized in the following two points.
Metadata Reduction – In the previous method
[3], it is necessary for the user to give as
metadata, all correspondences between his/her
own joints and his/her avatar’s ones. In our
method, however, the automatic matching
procedure succeeded to reduce the amount of
metadata drastically. Figure 4 shows the results
of parts-based regression executed without
metadata. In our method, the avatar’s legs
optimally follow the actor’s leg motion,
whereas the previous method cannot work well.
a)
Retargeting Indication – Grouping body parts
leads to flexible linkage of body parts between
the actor and avatar, as shown in Figure 5.
b)
6. Future Work
The present motion puppetry method could be
further improved so as to require less
Figure 5: Switching target body parts by simple
indication. Mapping actor’s arms to legs (a), to
ears (b).
a)
b)
Figure 4: Qualitative comparison. Our method
gives appropriate motions to the avatar (a),
whereas the existing method [3] moves the body
parts in a wrong way without metadata (b).
Figure 6: Retargeting to a different avatar.
16
Real-time marker-less implicit behavior tracking for
user profiling in a TV context
F. Rocca, P.-H. De Deken, F. Grisard, M. Mancas & B.Gosselin
Numediart Institute - University of Mons
Mons, Belgium
{francois.rocca, pierre-henri.dedeken, fabien.grisard,
matei.mancas, bernard.gosselin}@umons.ac.be
Abstract
is known on the different media segments, it is
possible to update the user profiling.
In section 2, we present techniques for gaze
estimation based on head pose and we will
describe the marker-less method used in this
study. Section 3 shows the descriptor used to
defined the user interest. In section 4, we
present the experimental setup which was used
to estimate the interest and send these data to
the profiling platform. Finally we conclude in
section 5.
In this paper, we present a marker-less motion
capture system for user analysis and profiling.
In this system we perform an automatic face
tracking and head direction extraction. The aim
is to identify moments of attentive focus in a
non-invasive way to dynamically improve the
user profile by detecting which media have
drawn the user attention. Our method is based
on the face detection and head-pose estimation
in 3D using a consumer depth camera. This
study is realized in the scenario of TV
watching with second screen interaction
(tablet, smartphone), a behaviour that has
become common for spectators. Finally, we
show how the analysed data could be used to
establish and update the user profile.
2. State of the art
The orientation of the head is less difficult to
estimate than the direction of the eyes. The
head direction and the eye-gaze direction are
strongly correlated. Physiological studies have
shown that the prediction of gaze is a
combination of the direction of the eyes and
the direction of the head [1]. In this work, the
distance from the sensor can be up to a few
meters, and at these distances, the eye tracking
becomes very difficult to achieve. An initial
study showed the link between eye-gaze
direction and the head direction. In this study,
the correlation was assessed qualitatively when
user focuses his gaze on a map (Figure 1). The
results were obtained with the eye tracking
system FaceLAB [2] and show that the average
error is 3 to 4 cm to a plane located 1 meter
away, which means that the angular difference
is very small. The direction of the head is
intrinsically linked to the direction of the eyes.
This is especially the case when still in a
rotating area of the head comfortable for the
Keywords: head pose estimation, viewer
interest, face tracking, attention, user profiling
1. Introduction
In this work we will focus on the analysis of a
user sitting in front of his television. It will
give us information on the spectator behavior.
What draw the user interest? The analysis of
the interest that the user brings to his
environment is significant for the user
profiling. This information can be known or
estimated by different methods. The best way
is to get a rapid estimation of the interest based
on gaze direction and also the duration of the
gaze fixation in this direction. Once the interest
17
head of the actor and they are tracked through
multiple cameras. The markers are often
colored dots or infrared reflective markers and
the cameras depend on the markers type.
Accurate tracking requires multiple cameras
and specific software to compute head pose
estimation but these systems are very
expensive and complex, and they need for
precise positioning of markers and calibration
(Optitrack [4], Qualisys [5]).
Marker-less tracking is another approach for
face motion capture and a wide range of
methods exists. Some marker-less equipment
uses infrared cameras to compute tracking of
characteristic points. For example, FaceLAB
gives the head orientation and the position of
lips, eyes and eyebrows [2]. But there are also
algorithms using only a consumer webcam. We
can cite Facetracker using PNP [6] and
FaceAPI [2]. Marker-less systems use classical
cameras or infrared cameras to compute
tracking of characteristic points. Based on
consumer infrared camera, we can cite the
Microsoft KinectV1 SDK [7]. The KinectV1
SDK is free, easy to use and contains multiple
tools for user tracking such as face tracking
and head pose estimation. These tools combine
2D and 3D information obtained with the
KinectV1 sensor. Based on 3D consumer
sensor there are also methods using random
regression forest for head pose estimation from
only depth images [8].
In this work we choose to use the KinectV2
with the new version of the SDK [9]. The
KinectV2 is composed by a color camera
(1080p) and a depth sensor (512x424 pixels).
The technology behind the new sensor is
infrared TOF for time of flight. This sensor
measures the time it takes for pulses of laser
light to travel from the laser projector to a
target surface, and then back to an image
sensor. Based on this measure, the sensor gives
a depth map. To achieve head pose, at least the
upper part of the user's KinectV2 skeleton has
to be tracked in order to identify the position of
the head. The position of the head is located
using the head pivot from the 3D skeleton only
on the depth map. The head pose estimation is
based on the face tracking and it is achieved on
the color images. Consequently, the face
tracking is dependent on the light conditions,
even if KinectV2 is stable into darker light
conditions.
user. Therefore, the direction of the face gives
a good indication on the look when it is not
possible to clearly get the direction of the eyes.
Intersection of the head
direction with the screen
Intersection of the eye-gaze
with the screen
Figure 1. The head direction and the eye-gaze
are highly correlated.
The gaze estimation can be achieved by
calculating the orientation of the head, and
these rotations have physiological limits and
specific names (Figure 2). For an average user,
the range of motion extending from the sagittal
flexing of the head to the extension (head
movement from the front to the rear) is about 60° to 70°. This movement is more commonly
called “Pitch”. Regarding the front lateral
flexion (movement from right to left when
looking ahead), it occurs around 40° in each
direction and is called “Roll”. The last
movement, a horizontal axial rotation (head
motion by looking horizontally from right to
left), is around 78° in each direction [3] and is
named “Yaw”. All the motions of head rotation
can be obtained by combining these angles.
In the animation industry, head pose estimation
and head movements are almost exclusively
captured with physical sensors and optical
analysis. Physical sensors such as gyroscopes,
accelerometers and magnetometers are placed
on the head to compute the head rotation.
Figure 2. The 3 different degrees of freedom:
pitch, roll and yaw [7].
Another way for head pose estimation is
marker-based optical motion capture. These
systems are able to capture the subtlety of the
motion because the markers are placed on the
18
3. Fixing duration and direction
for interest measurements
TV
KinectV2
Based on the gaze estimation, or in this case on
the head direction, it is possible to measure the
interest of a person to a specific element of his
environment by calculating the intersection
between the direction of the face and a 2D
plane (or a 3D volume). In this case, the TV
screen will be represented by a 2D plane and
another 2D plane will be used for the second
screen. The previous head-pose estimation will
give an indication on what the user is looking
at. In a visual attention to television context, a
study showed that there are four types of
behavior depending on the fixing duration [10].
This classification is given in Table 1.
Tablet
Figure 3. User watches the TV with a tablet in
his hand (second screen). The head is
about 2.5 meters from the TV.
4.2 User interaction and media
The system allows us to know when the user
watches the TV (main screen) or the tablet
(second screen) using the interest durations
given in point 3. When the user does not watch
TV, the image is hidden but the media runs
because the user hears the sound. The user can
use a keyboard to control the player and
navigate into the video enrichment displayed
next to the player. The video used for the tests
is a mashup of CNN Student News. It has been
enriched with links to related web pages that
are displayed next to the video.
Table 1. Attentive behavior over time.
Duration
≤1.5 sec.
Behavior
Monitoring
1.5 sec. to
5.5 sec.
Orienting
5.5 sec. to
15 sec.
Engaged
>15 sec.
Stares
These four measures of attention correspond
to be firstly attracted by something with
“monitoring behavior”, and then intrigued,
“orienting behavior”, and more time passes
more the user becomes interested, “engaged
behavior”, and beyond 15 seconds the user is
captivated with a “staring behavior”. These
measures have been established for a TV
watching and used to correctly describe the
interaction with one or more screens.
4.3 Behavior analyses
When the user comes into the field of view of
the KinectV2, placed under the TV, his
skeleton is tracked and the head orientation is
estimated. The Tracker Computer performs the
process and determines what the user is
watching with an accuracy of a few
centimeters: Screen 1 (video player or list of
enrichement), Screen 2 or elsewhere. These
informations are completed by attentive
behavior over time and are sent to the Player
Server (Figures 4 & 5). The televison displays
the web page from the player server containing
the media player accompanied by enrichments
related to playing video segment [11]. The
working flow structure is given on Figure 4.
4. Experimental setup
4.1 Placement
The purpose of the experiment is to get a
maximum of information on user implicit
behavior in front of the TV. Several users
watch a TV (main screen) and need in the same
time to focus on some of the content while
playing to a tablet game (second screen). The
sofa is installed 2 meters away from the TV
which is equipped with a KinectV2 (Figure 3).
Figure 4. Overall working flow.
19
Acknowledgements
4.4 User Profiling
The data coming from tracking is related to
each video segment through the server player
(Figure 5). The User Profiling module receives
measures of interest (or disinterest) for each
video segment. A score of interest could be
calculated for each keyword from the profiling
list. This score list allows to establish the user
profile and to know by what the user is
interested.
This work is supported by the LinkedTV
Project
funded
by
the
European
Commission through the 7th Framework
Program (FP7- 287911) and by the funds
“Region Wallonne” WBHealth for the
Project RobiGame.
References
[1] S. Langton, H. Honeyman, and E. Tessler,
“The influence of head contour and nose
angle on the perception of eye-gaze
direction,” Perception and Psychophysics,
vol. 66, no. 5, pp. 752–771, 2004
[2] Seeing Machines. FaceLAB5 & FaceAPI;
Face and eye tracking application, 2011.
[3] V. Ferrario, et al. “Active range of motion
of the head and cervical spine: a threedimensional investigation in healthy young
adults,” J. Orthopaedic Research, vol. 20,
no. 1, pages. 122–129, 2002.
[4] OptiTrack, Optical motion tracking
solutions. www.optitrack.com/ accessed
on 23/03/2015
[5] Qualisys. Products and services based on
optical mocap. http://www.qualisys.com
accessed on 19/03/2015
[6] F. Rocca, et al. Head Pose Estimation by
Perspective-n-Point Solution Based on 2D
Markerless Face Tracking. Intelligent
Technologies for Interactive Entertainment:
6th International ICST Conference, 2014
[7] Microsoft Kinect Software Development
Kit,
www.microsoft.com/en-us/kinectfo
rwindowsdev/Start.aspx, on 20/02/2015
[8] G. Fanelli, J. Gall, and L. Van Gool. Real
time head pose estimation with random
regression forests. CVPR, 617-624, 2011.
[9] KinectV2 SDK, https://msdn.microsoft.
com/en-us/library/dn799271.aspx, accessed
on 18/03/2015
[10]
R. Hawkins, S. Pingree, et al. What
Produces Television Attention and
Attention Style?. Human Communication
Research, 31(1), pages 162-187, 2005
[11]
J. Kuchař and T. Kliegr. GAIN: web
service for user tracking and preference
learning - a smart TV use case. RecSys '13.
In Proceedings of the 7th ACM conference
on Recommender systems. ACM, New
York, NY, USA, 467-468. 2013
Figure 5. User interest is sent from the User
Tracker to the User Profiling trough
the Player Server.
At the end of each session we get events
timeline: the interest value for each screen,
player control, etc. This will update the list of
the score on the user profile (Figure 6).
Figure 6. Event timeline for one session of 15
minutes
5. Conclusion
In this paper, we have described a marker-less
motion capture system for implicit behaviour
analysis in a TV context using a consumer
depth camera. This system allows us to
establish and update a user profile using user
interest based on head pose estimation and
using the duration of fixation separated into 4
levels of attention. The aim is to identify
moments of attentive focus in a non-invasive
way to dynamically improve the user profile by
detecting which parts of the media have drawn
the user attention.
20
Robust Space-Time Footsteps for Agent-Based
Steering
Glen Berseth
University of British Columbia
Mubbasir Kapadia
Rutgers University
Petros Faloutsos
York University
Abstract
body and thus achieve denser crowd packing [1].
While footstep-based steering is known to remove many artifacts from the problem of mapping human motion to the steering behaviour,
there are still a number of challenges researchers
face when adopting more complex agent models. In a standard footstep-based steering approach, an A* algorithm is used to find optimal paths from the agent’s current location to
the agent’s target location as a sequence of footsteps. Yet, how do we know the types of footsteps to use or what is a good stepping distance
range, or even how should we initially configure
an agent to ensure it can reach its target?
Recent agent-based steering methods abandon the standard particle abstraction of an
agent’s locomotion abilities and employ more
complex models from timed footsteps to
physics-based controllers. These models often
provide the action space of an optimal search
method that plans a sequence of steering actions
for each agent that minimize a performance criterion. The transition from particle-based models to more complex models is not straightforward and gives rise to a number of technical
challenges. For example, a disk geometry is
constant, symmetric and convex, while a footstep model maybe non-convex and dynamic. In
this paper, we identify general challenges associated with footstep-based steering approaches
and present a new space-time footstep planning
steering model that is robust to challenging scenario configurations.
Keywords: Crowd Simulation, Footsteps, Planning and Analysis
We focus on the issue of making a footstepbased steering algorithm resilient to environment configuration. Specifically, we present
a robust footstep-based steering algorithm to
avoid invalid initial configuration and to prune
undesirable and potentially unsound short term
goal states.
This is done in two steps. First, geometric
checks are used when adding an agent to a scenario, ensuring the agent can make an initial
step. Second, we add constrained random footsteps to handle cases where pre-defined step intervals can result in an inability to find a plan.
These, together with the properties of the A*
search method, construct a more robust steering
strategy.
1 Introduction
While traditionally sliding particle models have
been the standard for crowd simulation, there
has been recent interest in footstep-based steering. Footstep-based models do not suffer from
the footskate issue, where feet turn or slide while
in contact with the ground. They can easily incorporate dynamic representations of the agent’s
21
2 Related Work
ditional check to make sure the agent can make
a footstep from its initial configuration. Given
the forward direction of the agent, a rectangular
region can be traced out in front of the agent.
An example initial geometry check is illustrated
in Figure 1(b) that detects overlap with nearby
obstacles. These two properties together ensure
that the agent does not start intersecting any geometry and will be able to make an initial step.
Sliding particle methods [2, 3] model the agent
as a disk centred at the particle’s location. There
are a number of issues when driving bipedal
characters with only position information. Sliding disks can instantly change their forward direction, which is not natural for a biped. For
people, complex interactions occur at doorways
where tendency is to step aside for oncoming
traffic, both disc models may try to push through
the doorway at the same time. Resulting in
an unusual sliding contact motion between the
agent. Footstep-based models do not suffer from
these issues and have been used in dynamic environments [4].
Footstep-based planning is used by both the
computer animation community [1] and the
robotics community. This work uses a similar
method to both of these works but focuses on
creating an algorithm that can endure random
environment layouts and agent state configurations.
4 Robust Footstep Planning
In a footstep-based steering method, a plan is
computed between two locations that is free of
collisions. The actions in the plan can be understood as a sequence of footstep actions in spacetime hstep0 . . . stepn i.
The state of a footstep-based agent is defined
as
st = {(x, y), (ẋ, ẏ), ( fx , fy ), fφ , I ∈ {L, R}}. (1)
Where (x, y) and (ẋ, ẏ) are the position and velocity of the centre of mass. The current footstep is described by the location ( fx , fy ), orientation fφ , and foot I ∈ {L, R}. Potential actions
are created, using an inverted pendulum model,
between states by considering an action with orientation φ , velocity and time duration. Each step
has a cost related to step length and ground reaction forces. The heuristic function is then a
combination of the expected cost of the step and
the number of steps left to reach the goal.
3 Initial Agent Placement
Extracting a valid plan to for a footstep-based
steering agent is still a poorly understood problem; an infinite number of possible plans exists
that could lead the agent to its target. To understand the conditions necessary to ensure a plan
can be found we must analyze the problem inductively.
A scenario s is a collection of agents A and
obstacles O. In traditional crowd simulation the
geometry of an agent is a disk. In footstep-based
models the geometry for an agent is dynamic depending on the current state of the agent. This
dynamic geometry suffers from complex configurations that can result in an invalid state where
the agent can not proceed without colliding with
an obstacle.
To add agents to a scenario, particular locations could be hand selected but the most versatile method would be to add agents randomly.
In order to add an agent to a scenario two properties must be satisfied. The first, which is true
for any type of crowd simulation algorithm, is
that the agent must not overlap any items in the
scenario. A footstep-based model needs an ad-
4.1 Improved Footstep Sampling
The A* planning algorithm in a footstep-based
steering model is used to compute safe navigation decisions during simulation. The successor
states that are generated using an A* model can
be ad-hoc, with different footstep angles and durations at fixed intervals [1]. However, there is
an infinite number of geometry combinations in
a scenario and it is simple to construct an example scenario where an ad-hoc method can not
find a valid step when many exist. To make the
planning system more robust, randomized footsteps are introduced. The randomized angle orientations (in radians) and step time lengths are
limited to be between 0.3 and 1.3. An example
is shown in Figure 1(c), where an agent starts
22
more realistic plan.
4.3 Additional Footstep Types
(a)
(b)
(c)
(d)
The last feature of the planning system is a new
footstep style. In lieu of common forward stepping at different angles, footsteps that simulate
in-place turning can be used. In-place turning is
done by allowing the agent to take steps where
the agent’s heels are close, with the feet being
nearly perpendicular or the next stance foot is
placed pointing inward, as shown in Figure 1(d).
5 Analysis and Results
A group of metrics similar to [5] are used to
compare footstep-based algorithms. These metrics use the concept of a reference-agent, Other
agents added to the scenario make the scenario more challenging for the reference-agent.
Scenario specific metrics are defined and then
aggregate metrics over a sizable sampling of
10, 000 scenarios (S) are used.
For a single scenario s, when the referenceagent reaches its target location before the max
simulation time expires1 , completed is 1 and 0
otherwise. The second metric, solved, is 1 when
completed is 1 and the reference-agent reaches
its target location without any collisions, otherwise solved = 0. These metrics are aggregated
over S with completion = ∑s∈S completed(s)
and coverage = ∑s∈S solved(s). An average of
completed is used over all agents in a scenario
as all-completed that is equal to the percentage of agents in the simulation that have completed = 1. Similarly, all-solved is an average
for solved. To measure computational performance, the time spent simulating S is computed,
denoted as simTime.
Three versions of footstep-based algorithms
are compared. The first version is a common
footstep-based method baseline [1]. The second version, baseline-with-randomization, is the
baseline method with randomized footstep actions. The final version of the algorithm, robust,
includes both randomized footstep actions and
geometric checks. Using a combination of metrics, comparisons are made as to the effective-
Figure 1: Corner cases that are avoided by the
robust algorithm. The left foot is blue,
the right green, the dashed box is the
geometry overlap check and the green
star is a generated waypoint. The navy
blue squares are obstacles.
in a corner and can escape after a feasable random step is generated. By adding randomized
step angles and step distances, the algorithm
achieves better theoretical properties for steering in any possible scenario configuration.
4.2 Finite Horizon Planning
When planning is used, the path found is guaranteed to be sound from beginning to end. However, it is common to mix long range global
planning with a more dynamic finite horizon
planing between waypoints. When using finite
horizon planning, the final state of the composed plan can put the agent in a state were the
agent can not make a step. An example of an
agent getting stuck straddling an obstacle is illustrated in Figure 1(a). These invalid states can
be avoided by applying the same configuration
checks used when an agent is randomly placed
in a scenario, at the end of every short term plan.
Additional optimizations can be placed on
short term planning to improve efficiency and
fitness. The first of these is to never execute the
final action in a short term plan unless that action
is a final goal state. The geometry configuration
validation ensures there will be a possible footstep, but re-planning one step earlier results in a
1 We
give an agent more than enough time to navigate
around the boundary of the scenario twice.
23
crease the computational performance of the algorithm, as undesired search areas are avoided.
Limitations A simple rectangular geometry is
used to validate initial configurations. It might
be possible to achieve 100% all-completed by
using a more complex geometry check. Other
metrics could be used to compare the algorithms, such as ground truth similarity.
Future Work Any-time planning algorithms
could be used to increase the acceptability of
footstep-based steering methods. It is possible
to further increase the coverage of the algorithm
using a method such as [6] to optimize the parameters of the steering algorithm.
ness of each footstep-based algorithm. The results of these comparisons can be seen in Figure 2.
Figure 2: A comparison of three footstep-based
algorithms, using five performance
metrics.
References
Notably, the largest increase in fitness comes
from adding randomness and in-place turning to
the algorithm. These two features together allow the algorithm to generate a wider variety of
possible footsteps and increase coverage from
37% to 81%. Adding geometry checks to the
algorithm increases the overall completion to a
perfect 100% from an original 46%. The robust
version of the algorithm is surprisingly capable.
After simulating ∼ 45, 100 agents, only ∼ 150
agents do not reach their target locations.
The robust algorithm has significant computational performance improvements over baseline. By using random footstep actions, the algorithm explores the search space more resourcefully, avoiding locally optimal regions in favour
of smoother, lower effort, plans. With geometrical checks pruning undesired branches from the
search space, the final algorithm simulates the
10, 000 scenarios in ∼ 1/5 the time.
[1] Shawn Singh, Mubbasir Kapadia, Glenn
Reinman, and Petros Faloutsos. Footstep
navigation for dynamic crowds. CAVW,
22(2-3):151–158, 2011.
[2] Shawn Singh, Mubbasir Kapadia, Petros
Faloutsos, and Glenn Reinman. An open
framework for developing, evaluating, and
sharing steering algorithms. In MIG, pages
158–169, 2009.
[3] Mubbasir Kapadia, Shawn Singh, William
Hewlett, and Petros Faloutsos. Egocentric affordance fields in pedestrian steering.
In ACM SIGGRAPH I3D, pages 215–223,
2009.
[4] Mubbasir Kapadia, Alejandro Beacco,
Francisco Garcia, Vivek Reddy, Nuria
Pelechano, and Norman I. Badler. Multidomain real-time planning in dynamic
environments. SCA ’13, pages 115–124,
New York, NY, USA, 2013. ACM.
6 Conclusion
[5] Mubbasir Kapadia, Matthew Wang, Shawn
Singh, Glenn Reinman, and Petros Faloutsos. Scenario space: Characterizing coverage, quality, and failure of steering algorithms. In ACM SIGGRAPH/EG SCA, 2011.
A more sound footstep-based steering method
has been presented. This method has been analyzed and compared to a common version of the
algorithm using numerical analysis with metrics
for completion and coverage. The new method
is found to be excellent at avoiding invalid states
and almost perfect at completing simulations.
The most significant improvement to the algorithm comes from adding random and in-place
stepping features. These new features also in-
[6] G. Berseth, M. Kapadia, B. Haworth, and
P. Faloutsos. SteerFit: Automated Parameter Fitting for Steering Algorithms. In
Vladlen Koltun and Eftychios Sifakis, editors, Proceedings of ACM SIGGRAPH/EG
SCA, pages 113–122, 2014.
24
Avatar Chat: A Prototype of a Multi-Channel Pseudo
Real-time Communication System
Kei Tanaka
Dai Hasegawa
Martin J. Dürst
Hiroshi Sakuta
Aoyama Gakuin University
c5614135@aoyama.jp, {hasegawa, duerst, sakuta}@it.aoyama.ac.jp
communication. When you send a message
such as “Ooh really?” and receive a message
such as “Yes”, it is difficult to distinguish
between strong agreement and disgust.
Abstract
Increasingly the Internet and mobile terminals
become our common tools. Text-based chat
communication in mobile terminals is now
widely used because of its convenience of
asynchronicity. However, it is difficult for an
asynchronous text-based chat system to realize
rich and lively aspects of communication such
as bodily information exchange (gesture, facial
expression, etc.) and temporal information
exchange (overlap of utterances and speech
timing). In this paper, we propose a text-based
chat system that allows us to deliver nonverbal information and temporal information
without losing convenience (asynchronicity).
To do so, we will introduce avatars and a
virtual time axis into a conventional chat
system.
Figure 1 : Communication Style
Face-to-face real-time communication encompasses non-verbal communication with bodily
information and timing information. Bodily
information includes gestures and facial
expressions. It is thought that by using this
information, emotions and feelings can be
transmitted more accurately. But non-face-toface text-based chat systems cannot transmit
such information. Emoticons and illustration
images are used to transmit bodily information
such as gestures and facial expressions. But
their expressiveness is limited.
Time information includes overlaps of utterances, speech pauses, and utterance speed. This
characteristic is elucidated by the study of
timing information in audio speech dialog. For
example, Kawashima et al. [1] confirm that the
presence and length of speech pauses in comic
dialogue depends on the function and purpose
of the utterances. Also Uchida’s study [2]
considers what kind of influence utterance
Keywords: Avatar, chat, communication, nonverbal information
1. Introduction
Conventional communication is face-to-face
communication. But with the spread of mobile
terminals and the Internet, text-based chat
communication came to be widely used.
Communication using mobile terminals is
highly convenient because it is not limited by
location and time. It is an advantage that chat
communication has few burdens of mind and
restrictions to communication are weak
because there is no need to reply immediately
like in face-to-face communication. However,
non-face-to-face text-based chat communication is limited in information carrying capacity
in comparison with face-to-face real-time
25
speed and speech pause has on the perception
of a speaker’s character, and concludes that the
impression of the speaker is changed when
changing the utterance speed.
In text-based chat systems, users exchange
texts of a certain size in turns. This makes it
difficult to express the overlap of utterances.
2.1 Chat system expressing time information
Yamada [3] proposes a text-based chat system
allowing
synchronous
communication.
Messages are displayed in single-word units as
they are being input by keyboard. This system
allows real time communication without time
loss when typing, making it possible to express
overlaps of utterances and speech timing. But it
requires the user to continuously watch the
screen to not miss the timing of each message.
Figure 2 : Chat using AR avatar
Many text-based chat systems can be used as
synchronous-like communication tools and
therefore have a higher immediacy than
asynchronous communication systems such as
bulletin boards. But there is still a time lag due
to typing. Therefore, these systems are not real
time in the same way as face-to-face
communication. Also, such systems do not
allow replay; therefore, timing information is
not conserve and cannot be re-examined.
Figure 1 shows the current state of the trade-off
between richness of convenience of various
communication media, and our proposal to
improve it with avatar-mediated chat.
To solve the problems identified above, we
propose an online chat system using avatars as
shown in Figure 2. This system can transmit
bodily information by avatar animation and
time information such as speech timing. The
overall communication is captured in a log
called a Communication Log. This Communication Log is extended by the participants by
adding new utterances and accompanying
animations at the end.
Figure 3 : UI configuration
Therefore the advantage of text-based chat
systems - that it is not necessary to reply
immediately - is lost. Also, because this system
is purely text-based, it cannot transmit bodily
information.
2.2 Chat system expressing emotion
Kusumi et al. [4] describe an online chat
communication system where an avatar is used
to express part of the bodily information in
three-dimensional virtual space. This system
can let an avatar express emotion by selecting
icon of emotion. This system can express
emotion by using avatar animation, but it does
neither allow to express timing information,
nor to communicate synchronously.
3. AR avatar chat system
In this chapter, we explain the structure of our
avatar chat system which produces pseudo
real-time communication.
2. Related Research
3.1 System overview
The system’s user interface is shown in
Figure 3. The system consists of a text input
area, a list of selectable animations, an area for
text display, and an area for avatar display. We
In this chapter, we discuss related research for
the purpose of transmission of the nonverbal
information.
26
show the procedure of chat communication
using this system.
1. User inputs a text message.
2. User selects an animation to be performed
by the avatar from the list of animations.
3. User A chooses the timing of the start of
the animation (relative to the last message
from User B) using a slider.
4. User A sends the message with the related
timing and animation parameters by
pushing the submit button.
5. A selected number of past animations
including the newest one sent by User A is
played to User B. These animations are
replayed continuously replayed until they
are amended by a new utterance.
6. It is now User B’s turn to add a text
message and an animation and choose the
time delay (between 0 and 5 seconds) for
the utterance, and send the message by
pushing the submit button.
7. The users repeat steps 1 through 6.
Users input text messages by tapping a text
input area. The maximum length of a text
message is 30 characters. The user can select
the animation that for the avatar by pushing an
animation button from the list of animations.
The selected animation is displayed in the
animation confirmation area, which allows the
user to confirm the animation. The remark
display area displays messages that have been
transmitted and received. Received messages
are displayed left-aligned, and sent messages
are displayed right-aligned. Only one message
is displayed at any given time. Messages are
displayed for 2 seconds unless interrupted
earlier. The animated avatars are displayed an
AR markers in the avatar display area. It is
usual to displays one’s own avatar on right side,
and the avatar of the correspondent on the left
side.
notifications and stores this information in its
database. The animation display module
obtains the specified parameters the number of
past utterances specified by the user from the
parameter database. It performs the animation
of the avatars and displays a message at same
time.
3.3 Bodily information display function
This system make avatar perform animation to
express bodily information. We created 12
types of animation, each with a duration
between 1 and 2 seconds, as show in Figure 5.
Figure 4 : System configuration
Figure 5 : Animation list
We base our selection of animations on a study
by Inoue et al. [5]. They extracted a set of 100
scenes, each of about 3 seconds in length,
showing expression of emotions from the
series “The Simpsons”. A set of 60 emotionrelated words from previous research was
reduced to 37 emotional words for use in a
questionnaire. Each scene’s emotional content
was matched against each of the 37 words with
a four-step scale. Based on the questionnaire
results, the scenes were classified into five
categories using factor analysis. The five
emotion categories are: Negative emotion of
introversion, negative emotion of extroversion,
positive feelings, tense feelings, and feelings of
lost interest. Based on these results, we created
3.2 System configuration
The system is implemented as a Web
application using Android terminals as shown
in Figure 4. The application each terminal
consists of a data transmission module, a data
receiver module and an animation display
module. The data transmission module
transmits parameters that consist of input
message, selected animation, display timing of
animation and user information to the server by
using TCP. The data receiver module receives
the
parameters
transmitted
by push
27
the “Sad” animation to express negative
emotion of introversion (sadness, disappointment) by reference to that result, and the
“Shout” animation to express negative emotion
of extroversion, and the “Fun” animation to
express positive feelings. We also created the
“Shock” and “Oops” animations to express
surprise and tense feeling, and the “Bored”
animation to express feelings of lost interest.
4. Conclusions
In this study, we proposed and implemented a
chat system for the purpose of achieving
virtual synchronous communication and
transmitting
non-verbal
information.
Synchronous and non-verbal communication
have been both difficult to be transmitted in
conventional text-based chat systems, all while
maintaining the advantage of a chat system to
not require immediate reply. Non-verbal bodily
information is transmitted by using animated
avatars. Timing information is expressed by
placing the animations on a virtual timeline
that can be repeatedly replayed.
The next steps in our research will be
evaluation and improvement of the UI,
followed by evaluation experiments to
investigate the influence of the timing
information on conversation content and
efficiency.
Figure 6 : Selectable timing range of newly
added animation
References
In addition we created 5 animations to express
modalities that show utterance attitude,
communication attitude to towards the listener,
and attitude towards the contents of the
proposition. In particular, we created the
“Why” animation to express request
clarification, the “?” animation to express a
query, the “Yes” animation to express
agreement, the “Yeah” animation to express
strong agreement, and the “No” animation to
express disagreement and.
[1] H. Kawashima, L. Scogging, and T. Matsuyama. Analysis of the Dynamic Structure of Manzai: Toward a Natural Utterance-Timing Control. Human Interface
Journal, 9(3), 379-390, 2007-08-25, in
Japanese
[2] T. Uchida. Impression of Speaker's Personality and the Naturalistic Qualities of
Speech: Speech Rate and Pause Duration.
Educational psychology research, 53(1), 113, 2005-03-31, in Japanese
[3] Y. Yamada and Y. Takeuchi. Development
of Free Turn Chatting System and Analysis
of the dynamics of social dialog. Technical
Report of the Institute of Electronics,
Information and Communication Engineers,
HCS, 102(734), 19-24, 2003-03-11, in
Japanese
[4] T. Kusumi, H. Komeda, and T. Kojima.
Improving Communication in 3D-MUD
(Multi User Dungeon) by Using Avatar's
Facial Expression Features. Journal of the
Japan Society for Educational Technology,
31(4), 415-424, 2008-03-10, in Japanese
3.4 Time information display function
In order to control the speech timing, this
system expresses its virtual time axis with a
time seek bar. Users communicate by
connecting avatar animations. The virtual time
of the animation does not proceed until a new
animation is added. The communication log
that was created seems to be perform real-time
communication when the log is played back to
the users. This makes it possible to express
speech timing because users arbitrarily decide
the timing of the newly added animation. A
reply with its animation can be started between
0 and 5 seconds after the start of the last
utterance. This allows to express the
interruption of past utterances by overlapping
utterances as shown in Figure 6.
28
On Streams and Incentives:
A Synthesis of Individual and Collective Crowd Motion
Arthur van Goethem
Norman Jaklin
Atlas Cook IV
Roland Geraerts
TU Eindhoven
Utrecht University
University of Hawaii at Manoa
Utrecht University
a.i.v.goethem@tue.nl
n.s.jaklin@uu.nl
acook4@hawaii.edu
r.j.geraerts@uu.nl
Figure 1: Left: A dense crowd of agents collaboratively moves through a narrow doorway. Right: A 2D representation of the doorway
shows that each agent interpolates between individual behavior (green) and coordinated behavior (red).
Abstract
We present a crowd simulation model that combines
the advantages of agent-based and flow-based
paradigms while only relying on local information.
Our model can handle arbitrary and
dynamically changing crowd densities, and it
enables agents to gradually interpolate between
individual and coordinated behavior. Our model
can be used with any existing global path planning and local collision-avoidance method. We
show that our model reduces the occurrence of
deadlocks and yields visually convincing crowd
behavior for high-density scenarios while maintaining individual agent behavior at lower densities.
Keywords: crowd simulation, multi-agent system, autonomous virtual agents
1 Introduction
Crowd simulation models can be divided into
agent-based simulations and flow-based simulations. Agent-based simulations focus on the behaviors of each individual in the crowd. While these
methods usually work well at low to medium densities, they struggle when handling high crowd densities due to a lack of coordination between the agents.
29
By contrast, flow-based simulations aim at simulating collective emergent phenomena by treating a
crowd as one large entity. These techniques typically perform well with high-density scenarios because they facilitate a high level of coordination
among the agents. However, they struggle to handle low- to medium-density scenarios because they
omit the individuality of the crowd members.
Contributions. We propose a new model that combines the advantages of agent-based and flow-based
paradigms while only relying on local information.
It enables the simulation of large numbers of virtual
agents at arbitrary and dynamically changing crowd
densities. Our technique preserves the individuality of each agent in any virtual 2D or multi-layered
3D environment. The model performs as well as
existing agent-based models that focus on low- to
medium-density scenarios, while also enabling the
simulation of large crowds in highly dense situations without any additional requirements or user interference. Compared to existing agent-based models, our model significantly reduces the occurrence
of deadlocks in extremely dense scenarios. Our
model is flexible and supports existing methods for
computing global paths, simulating an agent’s individual behavior, and avoiding collisions with other
agents. Furthermore, it yields energy-efficient and
more realistic crowd movement that displays emergent crowd phenomena such as lane formation and
the edge effect [1].
2 Overview of our model
We represent each agent as a disk with a variable radius. The center of the disk is the current position
of the agent. Each agent has a field of view (FOV),
which is a cone stretching out from the agent’s current position, centered on the agent’s current velocity vector and bounded by both a maximum lookahead distance dmax = 8 meters and a maximum
viewing angle φ = 180◦ .
Let A be an arbitrary agent. We perform the following five steps in each simulation cycle:
1. We compute an individual velocity for agent A.
It represents the velocity A would choose if no
other agents were in sight. Our model is independent of the exact method that is used.
2. We compute the local crowd density that agent A
can perceive; see Section 3.1.
3. We compute the locally perceived stream velocity
of agents near A; see Section 3.2.
4. We compute A’s incentive λ. This incentive is
used to interpolate between the individual velocity from step 1 and the perceived stream velocity
from step 3; see Section 3.3.
5. The interpolated velocity is passed to a collisionavoidance algorithm. Our model is independent
of the exact method that is used.
3 Streams
We define streams as flows of people that coordinate their movement by either aligning their paths or
following each other. This leads to fewer collisions
and abrupt changes in the direction of movement. A
dominant factor is the local density ρ.
3.1 Computing local density information
We use the agent’s FOV to compute ρ. We determine the set N of neighboring agents that have their
current position inside A’s FOV. We sum up the area
∆(N ) occupied for each agent N ∈ N and divide
it by the total area ∆(F OV ) of A’s FOV. A FOV
occupied to one third can already be considered a
highly crowded situation. Thus, we multiply our result by 3 and cap it at a maximum of 1. Formally,
we define the crowd density ρ as follows:
ρ := min
!
X
3
∆(N ), 1 .
∆(F OV )
N ∈N
3.2 The perceived stream velocity
Let B be a single agent in A’s FOV, and let xA and
xB be their current positions, respectively. We define the perceived velocity vper(A,B) as an interpolation between B’s velocity vB and a vector vdir(A,B)
of the same length that points along the line of sight
between A and B; see Figure 2. Let ρ ∈ [0, 1] be the
−xA k
local density in A’s FOV, and let dA,B = kxdBmax
be the relative distance between A and B. A factor fA,B = ρ · dA,B is used to angularly interpolate
between vB and vdir(A,B) . The larger ρ is the more
A is inclined to pick a follow strategy rather than an
alignment strategy.
Let N5 be a set of up to 5 nearest neighbors of A.
To avoid perceived stream velocities canceling each
other out, we restrict the angle between the velocities of A and each neighbor to strictly less than π2 .
We define the average perceived stream speed s as
follows:
X
1
·
||vper(A,N ) ||.
(2)
s :=
|N5 |
N ∈N5
The locally perceived stream velocity vstream perceived by agent A is then defined as follows:
P
vper(A,N )
N ∈N5
vstream := s · P
,
(3)
||
vper(A,N ) ||
N ∈N5
3.3 Incentive
The incentive λ is defined by four different factors:
internal motivation γ, deviation Φ, local density ρ,
and time spent τ . We simulate the behavior of an
agent A in a way such that – aside from the internal
motivation factor – the most dominant factor among
Φ, ρ and τ has the highest impact on A0 s behavior.
We define the incentive λ as follows:
λ := γ + (1 − γ) · max Φ, (1 − ρ)3 , τ .
(4)
Internal motivation γ ∈ [0, 1] determines a minimum incentive that an agent has at all times. For
the local density ρ, a non-linear relation with the incentive is desired, and we use (1 − ρ)3 .
vper(A,B)
vB
vdir(A,B)
xB
vA
B
xA
A
(1)
Figure 2: An example of the perceived velocity vper(A,B) based
30 on an interpolation between vB and vdir(A,B) .
The deviation factor Φ makes agent A leave a stream
when vstream deviates too much from vindiv . We
use a threshold angle φmin . Whenever the angle between vstream and vindiv is smaller than φmin , the
factor Φ will be 0. This yields stream behavior unless the other factors determine a different strategy.
If the angle is greater than φmin , we gradually increase Φ up to a maximum deviation of 2φmin . Angles greater than this threshold correspond to a deviation factor of 1, thus yielding individual steering
behavior. Let φdev be the smallest angle between
vindiv and vstream . We define the deviation factor Φ
as follows:
!
φdev − φmin
Φ := min max
, 0 , 1 . (5)
φmin
The time spent factor τ is used to make stream behavior less attractive the longer it takes the agent to
reach its goal. We initially calculate the expected
time τexp agent A will need to get to its destination.
How this is done depends on how A’s individual velocity is calculated, i.e. what method is used as a
black box. We keep track of the actual simulation
time τspent that has passed since A has started moving. We define the time spent factor τ as follows:
!
τspent − τexp
τ := min max
, 0 , 1 . (6)
τexp
Figure 3: The different scenarios in our experiments are (from
top to bottom): merging-streams, crossing-streams, hallway1,
hallway2, narrow-50 and military.
average time t spent by an agent. A lower score is
considered to be a better result.
We used six different scenarios; see Figure 3. Preferred speeds were randomly chosen between 0.85
and 2.05 meters per second. We have tested our
model with three popular collision-avoidance methods [5, 6, 7]; see Figure 4. We have also compared our model to the same scenarios when only
individual behavior is being displayed. Here, we
use the IRM together with the collision-avoidance
method by Moussaı̈d et al. [7] because this yielded
the best results. Figure 5 shows the corresponding
mean Steerbench scores per agent over 50 runs per
scenario. Figure 6 shows the average percentage of
agents that did not reach their goal in a total time of
200 seconds with stream behavior turned on and off.
Figure 7 shows the average running times needed to
compute one step of the simulation for an increasing
number of agents in the military and hallway-stress
scenarios. Our model runs at interactive rates in typical gaming or simulation scenarios, even when coordination among the agents is high.
Finally, let β = φdev λ be the deviation angle
angle scaled by the incentive. We rotate vstream
towards vindiv by β. In general, the lengths of
vindiv and vstream are not equal. Therefore, we also
linearly interpolate the lengths of these vectors. The
resulting velocity is the new velocity for agent A in
the next simulation cycle.
4 Experiments
Our model has been implemented in a framework
based on the Explicit Corridor Map [2]. We use one
CPU core of a PC running Windows 7 with a 3.1
GHz AMD FXTM 8120 8-Core CPU, 4 GB RAM
and a Sapphire HD 7850 graphics card with 2 GB
of onboard GDDR5 memory. To compute vindiv ,
we combined our model with the Indicative Route
Method (IRM) [3]. To benchmark and validate our
model, we use the Steerbench framework [4]. Our
benchmarking score is defined as follows:
score = 50c + e + t.
(7)
It is comprised of the average number of collisions
c per agent, the average kinetic energy e, and the
5 Conclusion and Future Work
31
We have introduced a crowd simulation model that
interpolates an agent’s steering strategy between in-
2250
2000
% of agents not reaching goal
mean Steerbench scores
2500
Moussaïd et al.
Karamouzas et al.
van den Berg et al.
1750
1500
1250
1000
750
500
250
0
crossing
hallway1
hallway2
merging
narrow-50
mean Steerbench scores
350
300
with streams
without streams
250
200
150
100
50
0
crossing
hallway1
hallway2
merging narrow-50 narrow-100
Figure 5: Mean Steerbench scores for the scenarios with our
streams model turned on and off. The scores are averaged per
agent over 50 runs.
80
60
40
20
0
narrow-50
narrow-100
120
hallway-stress
military
100
80
60
40
20
0
0
400
800
1200
1600
2000
Figure 7: Average running times to compute one step of the
simulation (in ms) for an increasing number of agents in the
military and hallway-stress scenarios. Each measurement is the
average of 10 runs for the same number of agents. Deadlocks
frequently occur for more than 1000 agents in military. In the
hallway-stress scenario, we could simulate up to 2000 agents
simultaneously without any deadlocks.
dividual behavior and coordination with the crowd.
Local streams determine an agent’s trajectory when
local crowd density is high. This allows the simulation of large numbers of autonomous agents at interactive rates.
We have validated our model with the Steerbench
framework [4] by measuring the average numbers of
collisions, expended kinetic energy, and time spent.
Experiments show that our model works as well as
existing agent-based methods in low- to mediumdensity scenarios, while showing a clear improvement when handling large crowds in densely packed
environments. These conclusions are also validated
in the accompanying video.
The flexibility to use any global planning method
and any local collision-avoidance method as a black
box makes our model applicable to a wide range
of research fields that require the simulation of autonomous virtual agents. We believe that our model
can form a basis for improving crowd movement in
future gaming and simulation applications, in CGIenhanced movies, in urban city planning software,
and in safety training applications. For further details on our model, we refer the interested reader to
the full-length version of this paper [8].
References
[1] S. J. Guy, J. Chhugani, S. Curtis, P. Dubey, M. C. Lin, and
D. Manocha. Pledestrians: A least-effort approach to crowd simulation. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics
Symposium on Computer Animation, pages 119–128. Eurographics
Association, 2010.
[2] R. Geraerts. Planning short paths with clearance using Explicit
Corridors. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, pages 1997–2004, 2010.
[3] I. Karamouzas, R. Geraerts, and M. Overmars. Indicative routes for
path planning and crowd simulation. 4th International Conference
on Foundations of Digital Games, pages 113–120, 2009.
[4] S. Singh, M. Kapadia, P. Faloutsos, and G. Reinman. Steerbench:
A benchmark suite for evaluating steering behaviors. Computer
Animation and Virtual Worlds, 20(5-6):533–548, 2009.
[5] J. van den Berg, S. Guy, M. C. Lin, and D. Manocha. Reciprocal
n-body collision avoidance. In Robotics Research, pages 3–19.
Springer, 2011.
[6] I. Karamouzas, P. Heil, P. van Beek, and M. Overmars. A predictive
collision avoidance model for pedestrian simulation. In Motion in
Games, pages 41–52. Springer, 2009.
[7] M. Moussaı̈d, D. Helbing, and G. Theraulaz. How simple rules
determine pedestrian behavior and crowd disasters. Proceedings
of the National Academy of Sciences, 108(17):6884–6888, April
2011.
[8] Arthur van Goethem, Norman Jaklin, Atlas Cook IV, and Roland
Geraerts. On streams and incentives: A synthesis of individual
and collective crowd motion. Technical Report UU-CS-2015-005,
Department of Information and Computing Sciences, Utrecht University, March 2015.
Acknowledgements
This research was partially funded by the COMMIT/ project,
http://www.commit-nl.nl.
with streams
without streams
Figure 6: Percentage of agents that did not reach their goal in
high-density scenarios with our streams model turned on and
off, averaged over 50 runs each.
avg time per simulation step (ms)
Figure 4: Mean Steerbench scores of the three different collision avoidance methods for our test scenarios. The scores are
averaged over 50 runs per agent. In all our experiments, lower
scores are better.
100
32
Constrained Texture Mapping via Voronoi Diagram
Base Domain
Peng Cheng1,2 Chunyan Miao1 Nadia Magnenat Thalmann1,2
1. School of Computer Engineering, Nanyang Technological University, Singapore
2. Institute for Media Innovation, Nanyang Technological University, Singapore
pcheng1, ascymiao, nadiathalmann@ntu.edu.sg
Abstract
Constrained Texture mapping builds extra correspondence between mesh features and texture
details. In this paper, we propose a Voronoi diagram based matching method to address the positional constrained texture mapping problem.
The first step is to generate a Voronoi diagram
on the mesh surface taking user-defined feature
points as Voronoi sites. Then we build a Voronoi
diagram with corresponding feature points on
the image plane. Finally, we create an exponential mapping between every pair of Voronoi
cells. The proposed method is simple and efficient to allow real-time constraints editing and
high-level semantic constraints. Experiments
show that the proposed method achieves good
result with much less constraints.
1 Introduction
Texture mapping is a common technique to enhance visual realism in computer graphics. To
build meaningful correspondence between surface geometry features and texture details, Lévy
et al. [1] [2] [3] formulated this problem as a
constrained optimization problem. Constraints
are often defined as correspondence between
feature points on the mesh surface and those
in the 2D image. However, many constraints
are needed to match only one meaningful feature. To reduce texture distortion and to avoid
foldovers, users need to manually edit these constraints. This is a painstaking task.
We propose a Voronoi diagram base domain
method to address the constrained texture map-
ping problem. The key advantage of our method
is that we can achieve decent texture mapping
performance by relying on only one point or
curve constraint for each general feature. On the
image plane, we construct a Voronoi diagram
with all the constraints as Voronoi seeds. On
the mesh surface, we construct the other Voronoi
diagram also with constraints as Voronoi seeds.
As constraints between 2D image and mesh surface are one-to-one mapped, we have Voronoi
cells between the two Voronoi diagrams one-toone mapped. For each pair of the Voronoi cells,
we use bilinear exponential map for the local
texture mapping.
Thanks to the property of closeness of
Voronoi cells, we can avoid global foldover
among the Voronoi cells. Moreover, the local
exponential map can minimize texture distortion
near feature seeds. Figure 1 is the constrained
texture mapping results of proposed method
with Beijing Opera faces on mask model. With
only four constraints of eyes, nose, and mouth,
we have effectively achieved pretty good results.
Existing major work [3] need to specify 27 point
constraints to map a tiger face onto human face.
And expensive post-processing step of smoothing and reducing of redundant Steiner vertices
are also necessary.
2 Related work
Lévy et al.[1] first studied constrained texture
mapping problem. To avoid foldover and obtain smooth mapping with small distortion is the
main challenge of this problem. Floater and
33
Table 1: Notations in the proposed method.
CM
CI
(a)
(b)
(c)
(d)
(e)
(f)
Figure 1: Texture mapping results of Beijing
opera face images on mask model(499,955) of
proposed method with only four constraints of
eyes, nose and mouse.
Hormann et al. [4] [5] summarized this problem
in a geometric modeling view.
Existing methods for the problem can be
grouped into two categories, namely, global optimization based methods [1] [6] [7] and image
warping based methods [2][3][8]. For the global
optimization method [1], its objective function
contains two parts. The first part implies constraints defined by user, an example is square
deviation of the constraint data points. The second part controls the smoothness of the final
mapping. This is a compromise between satisfying constraints and minimizing distortion of
the mapping. Hard constraints can be implemented by using Lagrange multipliers. However, these methods fail to guarantee a bijective embedding. Moreover, these optimizationbased methods can not guarantee to converge.
As for the image-warping based methods
[3][8], Delaunay triangulation and edge swaps
are frequently used to satisfy positional constraints and to avoid fold-overs. Eckstein et al.
[2] constructed a 2D mesh in image plane and
warped it on to mesh surface. The 2D mesh was
constructed with a same topology of the mesh
surface. Then mapping was created between
corresponding vertices and triangles. The limitation is that it is complicated and not robust,
though it may handle a large set of constraints.
Kraevoy et al.[3] and Lee et al. [9] performed
embedding by adding a fixed rectangular vir-
VM
i
VM
VI
VIi
expi
1
2
m
CM = {CM
, CM
, · · · , CM
},
point constraints set on mesh surface M
CI = {CI1 , CI2 , · · · , CIm },
point constraints set in texture image space I
Voronoi diagram on M with C M as the Voronoi sites
Voronoi cell in VM
Voronoi diagram on I with C I as Voronoi sites
Voronoi cell in VI
i
exponential map between V M
and VIi
tual boundary. Then, they applied the Delaunay
method to triangulate the region between true
and virtual boundaries. A subsequent step of
smoothing procedure is required to reduce distortion, after aligning user-specified hard constraints. In contrast, the proposed method does
not need such expensive post-precessing step of
smoothing, due to the property of exponential
map.
3 Method
In this section, we introduce a novel method
based on global Voronoi base domain and local
exponential map. One key advantage of our proposed method is that it relies on very few natural
constraints, which can be exactly satisfied while
preserving the metrics of original 3D geometry.
A triangular mesh surface M is represented
as the pair (T, X), where T is the topology or connectivity information, and X =
{x1 , x2 , · · · , xN } is the geometric position of
vertices in R3 . The input texture image I is a
2D planar space of I(u, v), where u, v ∈ [0, 1].
Notations are listed in Table1.
3.1 Overview
We suppose that mesh surface has natural
boundary, and texture has already been segmented. Both boundaries will be sketched by
user in our implementation. Then user may interactively specify the constraints by clicking
points on the surface and image. Our approach
generates a one-to-one mapping form M to I in
the following two steps, as shown in Figure 2.
• Building Voronoi Base Domain. We first
build a 2D Voronoi diagram VI in the image space. The boundary is the texture
34
boundary either segmented from the input
or interactively sketched by the user. The
Voronoi sites are feature constraints CI .
Then we build Voronoi diagram VM on the
mesh surface. The details will be described
in section 3.2. Next, based on the two
Voronoi diagrams VI and VM and corresponding feature constraints, We can easily
generate the one-to-one mapping relation
i of V .
between the Voronoi cells of VM
M
• Computing Local Bilinear Map. For
the local mapping between each pair of
Voronoi cells, we assume that the area near
Voronoi sites(features) is more important
than that far from the Voronoi sites. The
exponential map or geodesic polar coordinates has smaller distortion near its source
point(Voronoi sites). Therefore, exponential map fulfills the requirement of distortion and makes a good match around feature. Section 3.3 provides the implementation details.
3.2 Building Voronoi Base Domain on
Surface
To compute the bisectional curve between
Voronoi sites, we measure surface distance on
mesh by computing geodesics from Voronoi
sites to all the other vertices. Computing exact geodesic distance is expensive [10] [11]. We
propose a simple and efficient method to compute a smooth distance field, which is able to
lead to a smooth segmentation of the mesh surface. We flood the distance and angle field of
source points to all the other points with only
one traverse of each vertex. Moreover, geodesic
flooding may converge fast when there are more
source points. Algorithm 1 includes the detailed
steps. In Algorithm 1, tagi is a flag indicating
j
that vertex xi lies in, as
the Voronoi cell VM
shown in Figure 3, di is the geodesic distance
from xi to its Voronoi sites, as shown in Figure
4, and αi is the exponential angle of the corresponding geodesic curve.
(a)
(a)
(b)
(b)
(c)
Figure 3: Tag of each vertex indicates by color.
Red, blue, yellow and green are the Voronoi
cell of left eye, right eye, nose and mouth. (a)
Mask model (499,955); (b) Refined mask model
(31762,62467).
(d)
(e)
(f)
Figure 2: Voronoi Diagram based constraint
texture mapping method workflow. (a) The input mesh surface; (b) The boundary (red curve),
point constraints (green spheres) and corresponding Voronoi Diagram in the texture image; (c) Boundary (red curve), point constraints
(green spheres), and corresponding Voronoi Diagram on mesh surface; (d) Parametrization result; (e), (f) Different views of textured surface.
We follow [12] to update the geodesic distance and angle. In Algorithm 1, subfunction
Distance(s, i) updates the geodesic distance
of vertex vi by distance of its neighbor s and
Angle(s, i) updates angle of the corresponding
geodesic. Once we have the Voronoi cell tag and
distance field of each vertex, We can compute
Voronoi points of VM by matching distance to
the related Voronoi cells of VI . Next, Voronoi
edges are traced from the start Voroni point in
the counterclockwise order of their Voronoi cell.
35
Algorithm 1 Voronoi Base Domain on Surface.
Require: A mesh surface M (T, X) and feature
constraints CM on surface.
i deEnsure: For xi ∈ X, we compute a tag Cm
notes the Voronoi cell xi lies in, the distance
i , and the angleα of the cordi from xi to Cm
i
responding geodesic direction.
for i = 0; i < X; i + + do
tagi = −1;
di = ∞;
αi = 0;
end for
for i = 0; i < CM ; i + + do
tagC i = i;
M
dC i = 0;
M
αC i = 0;
M
i );
Heap.push(CM
end for
while Heap.notEmpty() do
s = Heap.getSmallest();
for eachi ∈ N eighbor(s) do
N ewdi = Distance(s, i);
if N ewdi < di then
heap.push(i);
di = N ewdi ;
tagi = tags ;
αi = Angle(s, i);
end if
end for
end while
more smooth geodesic polar coordinates using
a simple linear angle interpolation. Then, we
follow the same linear angle interpolation procedure in our implementation. In order to match
i , we embetween each Voronoi pair VIi and VM
ploy a piecewise linear scale of angle in the exponential map, as shown in Figure 5. Equation
1 shows the scale parameters.
(a)
(b)
Figure 4: Geodesic distance field on refined
mask model (31762,62467).
αi
5
i=1 αi
βi
= 5
i=1 βi
(1)
3.3 Generating Local Bilinear Mapping
Our goal is to map the 2D image area VIi to suri , such that the constraint point Ci
face area VM
I
i on
in the image lies at the constraint point CM
the mesh. One alternative of defining a 2D coi is the
ordinate system on the surface around CM
exponential map and geodesic polar coordinates
[13]. The exponential map projects point v on
M to the tangent plane Tv at v. For any unit
vector v ∈ Tp , a geodesic g(v) with gv (0) = p
and gv (0) = v is exist and unique.
The challenge of computing an approximate exponential mapping expv is to trace
smooth and accurate radial angles of each vertex. Schmidt et al. [14][15] extended Dijkastra’s graph distance algorithm [16] to trace angle
difference with rotation angle while unfolding
neighbor triangle. M elvær et al.[12] achieve
Figure 5: Piecewise scale of angle for each pair
of the Voronoi diagrams.
4 Experimental Results and
Discussion
In this section, we evaluate our proposed method
using different texture images and show its validity, efficiency and robustness. In Figure 6,
we apply the proposed method with checkerboard image to show the smoothness of the
parametrization.
In Figure 7, we compared the proposed
method with constrained harmonic mapping.
36
(a)
(a)
(b)
(b)
(c)
(c)
(d)
(d)
Figure 6: (a) Checkerboard image and constraints(green spheres); (b) Mask mesh surface(499,955) and constraints; (c),(d) Textured
mask surface of front and side views.
(e)
(f)
Figure 7: Comparison with constrained harmonic map.(a) Texture image and constraints;
(b) Mask surface and constraints; (c), (e)
parametrization and texture mapping results
of constrained harmonic mapping; (d),(f)
parametrization and texture mapping result of
proposed method.
Although constrained harmonic mapping leads
to a more smooth parametrization, the texture
mapping result has big distortion as you see in
Figure 7. The constraint feature of eyes and
mouth have a big distortion. Moreover, constraints are prune to lead to foldovers for the
minimization of Dirichlet energy. Users have to
carefully maintain the structure consistence of
between constraints. In contrast, our proposed
method does not suffer from the distortion problem due to the property of exponential map.
Graphics and Interactive Techniques, SIGGRAPH ’01, pages 417–424, New York,
NY, USA, 2001. ACM.
[2] Ilya Eckstein, Vitaly Surazhsky, and Craig
Gotsman. Texture mapping with hard constraints. In Computer Graphics Forum,
volume 20, pages 95–104. Wiley Online
Library, 2001.
[3] Vladislav Kraevoy, Alla Sheffer, and Craig
Gotsman. Matchmaker: Constructing constrained texture maps. In ACM SIGGRAPH 2003 Papers, SIGGRAPH ’03,
pages 326–333, New York, NY, USA,
2003. ACM.
Acknowledgements
Cheng Peng would like to acknowledge the
Ph.D. grant from the Institute for Media Innovation, Nanyang Technological University, Singapore. This research is supported by the National
Research Foundation, Prime Minister’s Office,
Singapore under its IDM Futures Funding Initiative and administered by the Interactive and
Digital Media Programme Office.
[4] Michael S Floater and Kai Hormann.
Surface parameterization: a tutorial and
survey. In Advances in multiresolution
for geometric modelling, pages 157–186.
Springer, 2005.
[5] Kai Hormann, Bruno Lévy, Alla Sheffer,
et al. Mesh parameterization: Theory and
practice. 2007.
References
[1] Bruno Lévy. Constrained texture mapping
for polygonal meshes. In Proceedings of
the 28th Annual Conference on Computer
[6] Bruno Lévy, Sylvain Petitjean, Nicolas
Ray, and Jérome Maillot. Least squares
37
conformal maps for automatic texture atlas generation. In ACM Transactions on
Graphics (TOG), volume 21, pages 362–
371. ACM, 2002.
[7] Mathieu Desbrun, Mark Meyer, and Pierre
Alliez. Intrinsic parameterizations of surface meshes. In Computer Graphics Forum, volume 21, pages 209–218. Wiley
Online Library, 2002.
the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pages
153–160. ACM, 2013.
[16] Edsger W Dijkstra.
A note on two
problems in connexion with graphs.
Numerische mathematik, 1(1):269–271,
1959.
[8] Yuewen Ma, Jianmin Zheng, and Jian
Xie. Foldover-free mesh warping for constrained texture mapping.
[9] Tong-Yee Lee, Shao-Wei Yen, and ICheng Yeh. Texture mapping with hard
constraints using warping scheme. Visualization and Computer Graphics, IEEE
Transactions on, 14(2):382–395, 2008.
[10] Vitaly Surazhsky, Tatiana Surazhsky,
Danil Kirsanov, Steven J Gortler, and
Hugues Hoppe.
Fast exact and approximate geodesics on meshes.
In
ACM Transactions on Graphics (TOG),
volume 24, pages 553–560. ACM, 2005.
[11] Joseph SB Mitchell, David M Mount, and
Christos H Papadimitriou. The discrete
geodesic problem. SIAM Journal on Computing, 16(4):647–668, 1987.
[12] Eivind Lyche Melvær and Martin Reimers.
Geodesic polar coordinates on polygonal
meshes. In Computer Graphics Forum,
volume 31, pages 2423–2435. Wiley Online Library, 2012.
[13] Manfredo Perdigao Do Carmo and Manfredo Perdigao Do Carmo. Differential geometry of curves and surfaces, volume 2.
Prentice-Hall Englewood Cliffs, 1976.
[14] Ryan Schmidt, Cindy Grimm, and Brian
Wyvill. Interactive decal compositing with
discrete exponential maps. ACM Transactions on Graphics (TOG), 25(3):605–613,
2006.
[15] Qian Sun, Long Zhang, Minqi Zhang, Xiang Ying, Shi-Qing Xin, Jiazhi Xia, and
Ying He. Texture brush: an interactive surface texturing interface. In Proceedings of
38
Hybrid Modeling of Multi-physical Processes for
Volcano Animation
Fanlong Kong
Software Engineering Institute,
East China Normal University.
Changbo Wang∗
Software Engineering Institute,
East China Normal University.
email: cbwangcg@gmail.com
Chen Li
Software Engineering Institute,
East China Normal University.
Hong Qin
Computer Science Department,
Stony Brook University.
Abstract
physical interaction and heat transfer. Finally,
multi-physical quantities are tightly coupled
to support interaction with surroundings
including fluid-solid coupling, ground friction, lava-smoke coupling, smoke creation, etc.
Many complex and dramatic natural phenomena
(e.g., volcano eruption) are difficult to animate
for graphics tasks, because frequently a single
type of physical processes and their numerical
simulation can not afford high-fidelity and
effective scene production. Volcano eruption
and its subsequent interaction with earth is one
such complicated phenomenon that demands
multi-physical processes and their tight coupling. In technical essence, volcano animation
includes heat transfer, lava-lava collision, lavarock interaction, melting, solidification, fire and
smoke modeling, etc. Yet, the tight synchronization of multi-physical processes and their
inter-transition involving multiple states for
volcano animation still exhibit many technical
challenges in graphics and animation. This paper documents a novel and effective solution for
volcano animation that embraces multi-physical
processes and their tight unification. First, we
introduce a multi-physical process model with
state transition dictated by temperature, and
add dynamic viscosity that varies according
to the temperature.
Second, we augment
traditional SPH with several required attributes
in order to better handle our new multi-physical
process model that can simulate lava and its
melting, solidification, interaction with earth. A
particle system is developed to handle multi-
Keywords: Volcano Animation, Multi-physical
Processes and Interaction, Heat Transfer
1 Introduction and Motivation
Volcano eruption is one of the most horrified
and dramatic natural phenomena on earth. It
results in natural disaster and huge economic
loss for human beings whenever and wherever
it occurs throughout the world. Its high-fidelity
simulation have attracted a large amount of scientific attention in many relevant fields ranging
from geophysics, atmospheric sciences, civil engineering, to emergency management. Despite
earlier research progresses in this subject, realistic simulation of volcano eruption still remains
to be of high interest to graphics and animation.
In movies and games with disaster scenes, highfidelity animation of volcano eruption is indispensable. Moreover, animating volcano eruption both precisely and effectively can also be of
benefit to human beings in many other aspects
such as volcano prevention, disaster rescue, and
39
emergency planning. In the long run, it might
also be possible to consider how human beings
could make better use of the enormous energy
burst during volcano eruption.
In graphics there have been many natural phenomena that are well animated with high precision, such as water, cloud, smoke, fire, debris
flow, ice, sand, etc. Compared with the aforementioned natural phenomena, volcano eruption
and its interaction with earth and atmosphere are
much more complex than what a single type of
physical processes could handle. For volcano
eruption, there are many different types of participating media including lava, mountain, and
smoke, and the interaction and movement of different materials. Complicated interaction of different substrates and multiple physical processes
are involved simultaneously. Our belief is that,
complex scenes such as volcano eruption can
not be simulated by using traditional simple individual physical process and/or simple numerical models. Multi-physical processes must be
invoked to better handle the complex scene production, therefore, it is much more challenging
to design realistic models for such task. Aside
from the issue of multi-physical processes, their
effective integration presents another key challenge. At the technical level, in volcano animation lava can be described by a free surface fluid
with larger viscosity and complicated boundary
conditions, mountain can be described by a rigid
body with no movement, and smoke can be described by particles driven by a vorticity velocity
field. Recent works tend to focus on lava simulation rather than volcano animation, resulting in the inability of producing highly-realistic
scenes. Even though Some numerical models
have been proposed in the recent past, they generally require large-scale computation that is not
suitable for desktop environment, hence they are
less suitable for the purpose of computer animation in terms of time and precision.
This paper focuses on high-performance and
efficient modeling for complex volcano animation. Towards this ambitious goal, we propose
a multi-physical process model that can perform
efficiently for volcano animation. Our key contributions include:
el multi-physical process model is proposed
to handle complex scenes with multi-physical
quantities and their tight coupling. To model state transition, we choose temperature as a core
quantity, other quantities are dependent on temperature.
Multi-physical process modeling with state
transition dictated by temperature. A nov-
where ρ donates the density, u donates the velocity, P donates the pressure, µ donates the fluid
Discrete modeling for multi-physical process. We choose SPH as our base model, however, traditional SPH can not handle the multiphysical process model well. Temperature, particle type, lifetime, and temperature-dependent
viscosity must all be integrated into traditional
SPH. Moreover, after SPH particles are split into several types, a particle system is proposed to
handle multi-physical interaction via heat transfer.
Tight coupling of multi-physical quantities
and their interaction with surroundings. Towards photo-realism, multi-physical quantities
must be efficiently coupled and interact with surroundings naturally, including fluid-solid coupling, ground friction, melting, lava-smoke coupling, etc.
2 Multi-physical Process Model
Volcano eruption and its subsequent interaction
with earth is a complicated phenomenon that
includes heat transfer, lava-lava collision, lavarock interaction, melting, solidification, fire and
smoke modeling, etc. Only the tight synchronization of multi-physical processes can handle
its animation in a correct way. In our multiphysical processes model, lava is described by a
free surface fluid with larger viscosity and complicated boundary conditions, mountain is described by a rigid body with no movement, and
smoke is described by particles driven by a vorticity velocity field. All these physical processes
keep synchronized by way of the temperature.
Since lava is described as fluids, the motion of
lava can be formulated by Navier-Stokes equations [1]:
∂
ρ( + u · ∇)u = −∇P + µ∇ · (∇µ) + f, (1)
∂t
∇ · u = 0,
40
(2)
Tp ≤ Tsolid
Particle
Type
Lava fluid particle(High temperature)
Lava fluid particle(Low temperature)
Lava solid particle
Mountain particle
Lava fluid particle
Lava solid particle
Mountain particle
Smoke particle
Heat Transfer
Heat Transfer
PhaseTransition
Lava Coupling
Tp ≥ Tmelt
Mountain
Heat Transfer
Generate
Ground Friction
Fluid-rigid Coupling
Mountain
Mountain
(a) lava fluid particle → lava solid particle
Heat Transfer
Eruption
Tm / Ts
Tmin
T
Mountain
Mountain
Mountain
(b) lava solid particle → lava fluid particle
Figure 1: Workflow of the whole framework.
Figure 2: (a) A phase transition process occurs
at the regions where lava fluid particles make contact with mountain. (b)
Transition process occurs at the regions where lava fluid particles make
contact with lava solid particles. All the phase transitions are handled by
the temperature.
viscosity and f donates the sum of external force
on the fluid. Viscosity force varies along with
the temperature.
Continuum equations are not suitable for
computer animation because numerical integration must be invoked for domain discretization in both space and time. For more realistic appearance, lava should be incompressible. We choose a predictive-corrective incompressible SPH ( PCISPH ) model [2] as our
fluid solver to simulate lava and its melting,
solidification, interaction with earth. However, conventional SPH can not handle the multiphysical process model well, in order to better handle our new multiphysical process model we augment the traditional SPH with several required attributes including temperature,
particle type, lifetime, the number of neighboring particles, the type of neighbor particles and
adaptive viscosity. Tabel 1 shows the detailed
Quantities of different particles. In Tabel 1,
As heat transfer occurs in lava-lava collision,
lava-rock interaction, melting, solidification, fire
and smoke modeling, it is necessary to run the
process of heat transfer during the entire volcano
animation to guide and couple with other physical processes.
In actual situation, the lava moves quickly at
the eruption. During the volcano animation, lava
is divided into fluid lava and solid lava in order
to simulate the solidification and melting. Our
phase transition model is similar to the work
in [3], every particle stores a melting temperature Tm and solidification temperature Ts .
Table 1: Quantities of different particles.
Particle Type
3 Tight Coupling in Volcano
Animation
Particle Method Required Quantities
particle type, density, position,
Lava fluid particle SPH
pressure, velocity, temperature,
neighbor rigid num,
In Volcano animation, lava flows has complicated interactions with surroundings including
lava-rock interaction, ground friction and the
coupling of lava and smoke. A single type of
physical processes cannot afford the synchronization of these physical processes and physical quantities. We document a particle system with multi-physical processes to handle the
interactions. All participating medias are described as particles with different quantities, so
the unification of these physical processes can
be guaranteed. We simulate the coupling of lava
and mountain as rigid-fluid coupling.Since la-
dynamic viscosity
particle type, density, position,
Lava solid particle SPH
pressure, velocity, temperature,
neighbor rigid num,
dynamic viscosity
Mountain particle SPH
Smoke particle
Particle system
particle type, density, position,
pressure, velocity, temperature
particle type, position, velocity,
life, neighbor smoke num
neighbor rigid num is used in the generation of
smoke particle, life records how long the smoke
particle have been generated. The whole workflow of our framework is shown in Figure 1.
41
ter computing speed, we must consider certain
tradeoffs. For example, our computing model
simplifies the lava model as fluid (but in physical reality lava consists of liquid, ash, stone,
and its complex mixture). Another observation
is that, even though we model the heat transfer
in volcano animation, we simply the complicated phase transition, divide the states of lava into
fluid and solid. There are still many complicated phenomena such as gasification, and tephra
generation being neglected. Moreover, the moving stones, ash, air and their interactions are not
explicitly modeled in the interest of high performance.
Future works include involving more substrates in volcano animation, taking multi-phase
transformation and moving obstacles into account, etc. More accurate multi-phase and
multi-physical models shall be researched for
the rapid and precise production of complex
scenes.
va is extremely viscous, it moves very slowly
on the mountain. In order to keep the reality,
the ground friction can not be ignored. Similar
to [4] introduced, we apply a ground friction on
the lava particles. In the multi-physical processes model, the interaction between smoke and lava is an essential part. Smoke always changes
along with the event of volcano eruption. The
majority of smoke particles is generated at the
eruption. The remaining generation of smoke
particles is similar to the method how the diffuse particles are generated in [5]. Only lava
fluid particle with temperature higher than Tmin
generates smoke particles.
References
[1] K. Erleben, J. Sporring, K. Henriksen, and
H. Dohlmann. Physics-based animation.
Charles River Media Hingham, 2005.
Figure 3: The comparison of scenes without
phase transition (left column) and
scenes with phase transition (right column). With phase transition, the motion of lava is closer to physical reality,
and is easier to produce compounded
phenomena.
[2] Barbara Solenthaler and Renato Pajarola.
Predictive-corrective incompressible sph. In
ACM transactions on graphics (TOG), volume 28, page 40. ACM, 2009.
4 Conclusion, Discussion, and
Future Work
[4] A. Hérault, G. Bilotta, A. Vicari, E. Rustico,
and C. Del Negro. Numerical simulation of
lava flow using a gpu sph model. Annals of
Geophysics, 54(5):600–620, 2011.
[3] Barbara Solenthaler, Jürg Schläfli, and Renato Pajarola. A unified particle model for
fluid–solid interactions. Computer Animation and Virtual Worlds, 18(1):69–82, 2007.
This paper has documented an effective multiphysical process model for volcano animation.
Abundant physical quantities are necessary to
simulate various accompanying phenomena in
volcano eruption and its subsequent interaction
with earth and atmosphere. Towards the goal of
high fidelity and photo-realism, physical quantities are tightly integrated in our system and
a highly-effective rendering technique is also
devised. In practice, in order to achieve bet-
[5] M. Ihmsen, N. Akinci, G. Akinci, and
M. Teschner. Unified spray, foam and air
bubbles for particle-based fluids. The Visual
Computer, 28(6-8):669–677, 2012.
42
Determining Personality Traits from Goal-oriented
Driving Behaviors: Toward Believable Virtual Drivers
Andre Possani-Espinosaa,1, J. Octavio Gutierrez-Garciab,2 and Isaac Vargas Gordilloa,3
a
Department of Digital Systems
b
Department of Computer Science
Instituto Tecnológico Autónomo de México
Mexico, DF 01080
1
andre.possani@itam.mx, 2octavio.gutierrez@itam.mx, 3vargoris@hotmail.com
typology test [2] consisting of seventy-two yes
or no questions. After completing the test, the
players were asked to play with a racing car
simulator [3], which was modified in order to
monitor the driving behavior of players. In this
paper, the relationship between personality
traits and driving behaviors is explored by
using a decision tree analysis. The results show
that it is possible to determine whether players
are either introvert or extravert.
Abstract
This paper lays the foundations for the design
of believable virtual drivers by proposing a
methodology for profiling players using the
open racing car simulator. Data collected from
fifty-nine players about their driving behaviors
and personality traits give insights into how
personality traits should affect the behavior of
believable virtual drivers. The data analysis
was conducted using the J48 decision tree
algorithm. Empirical evidence shows that goaloriented driving behaviors can be used to
determine whether players are either introvert
or extravert.
The structure of the paper is as follows.
Section 2 presents the Jung typology test.
Section 3 introduces the open racing car
simulator. Section 4 presents the data analysis
and results. Section 5 includes a comparison
with related work and Section 6 gives some
concluding remarks.
Keywords: personality traits, player modeling,
driving behaviors, virtual drivers.
2. The Jung typology test
1. Introduction
The Jung typology test [2] was used to
determine the personality traits of players. The
Jung typology test was selected due to its
brevity and ease of completion.
In a realistic video game, a player can compete
against either human opponents or artificial
opponents controlled by a computer.
Believable artificial opponents are fundamental
to engage players and make a video game more
entertaining. In addition, Loyall and Bates
indicate that believable agents must have
personalities [1].
According to the Jung typology test, the
following four bipolar dimensions can
characterize the personality of a person: (i) the
extraversion-introversion dimension, (ii) the
intuition-sensing dimension, (iii) the feelingthinking dimension, and (iv) the judgingperceiving dimension. Provided that each one
of the four dimensions consisted of two
opposite poles, there are sixteen personality
types, e.g., Extroversion-Sensing-ThinkingPerceiving.
This paper lays the foundations for the design
of believable artificial opponents in the context
of car racing games by giving the first insights
into how personality traits should affect the
behavior of believable artificial opponents. To
do this, personality traits of fifty-nine players
were extracted. The players completed a Jung
43
with no objective, fast laps, cautious laps, and
fast, but cautious laps, respectively. The fifth
category is composed of aggregate features
from all the laps. From each lap type, five
features were extracted: lap time, maximum
speed, penalty time, number of times the car
goes off track to the left and to the right. The
penalty time is the amount of time the car
spends completely off the track. The car was
considered to be off track either to the left or to
the right when it was completely off the track
as shown in Fig. 1.
3. The open racing car simulator
The open racing car simulator [3] (TORCS)
was used to collect data on driving behavior of
players.
4.2 Data analysis
The data analysis was conducted using a
decision tree analysis. The decision tree
algorithm used in this work was J48, which is
implemented in the WEKA data mining
software [5]).
Figure 1 : Screenshot from the racetrack
The training and validation sets of the decision
tree
for
the
extraversion-introversion
dimension (Fig. 2) consisted of thirty and
twenty-nine randomly selected instances,
respectively. Consequently, approximately
50% of the data was used to create the decision
tree, and the remaining instances were used to
validate it. An instance was composed of the
twenty-seven variables that profile the driving
behavior of a player, e.g., maximum speed
reached in cautious laps. Each instance was
labeled as either extraversion or introversion
according to the results of the personality test.
The car was build taking into account realistic
specifications of regular cars. The racetrack
was designed to test players’ driving skills on
different levels of complexity.
4. Empirical results
4.1 Data collection
Fifty-nine players were asked to complete the
Jung typology test in order to determine their
personality type. Afterward, the players were
asked to complete four pairs of laps adopting
different goals. In the first pair of laps, the
players were instructed to get familiar with the
car, the racetrack, a thrustmaster steering wheel
and a pedal set. The virtual car had an
automatic transmission. In the second pair of
laps, the players were asked to complete the
laps as soon as possible. In the third pair of
laps, the players were instructed to complete
the laps as cautious as possible. Finally, in the
fourth pair of laps, the players were instructed
to complete the laps as fast as possible and
simultaneously as cautious as possible. The
players were asked to adopt different goals in
each pair of laps because there is psychological
evidence of a relationship between goals and
personality traits [4].
It should be noted that due to the small size of
the sample, there was not sufficient data to
build and validate decision trees for the
intuition-sensing dimension, the feelingthinking dimension, and the judging-perceiving
dimension. Whereas the data set used to build
and validate the decision tree for the
extraversion-introversion dimension had an
almost equal number of introversion and
extraversion instances, in the case of the other
dimensions, the instances were dominated by
one of the (dimension) poles. For instance,
there were only ten instances labeled as
perceiving and the remaining forty-nine
instances were labeled as judging.
In addition to the training and validation sets,
the remaining input parameters of the WEKA
data mining software to build the decision tree
were: (i) minimum number of instances per
leaf, which was set to 2 and (ii) confidence
threshold for pruning, which was set to 0.25. It
In order to profile the driving behavior of
players, five feature categories were defined.
The first, second, third, and fourth categories
are composed of features extracted from laps
44
should be noted that the decision tree was
pruned to avoid overfitting.
4.3 Results
From the results shown in Table 1 and Fig. 2,
three observations were drawn.
Observation 1. It is possible to determine
whether a player is either introvert or extravert
by using a relatively small decision tree
composed of seven decision nodes (Fig. 2)
with a success classification rate of 72.4%
(Table 1).
Predicted
Confusion matrix of decision tree
Introversion Extraversion
for Extraversion-Introversion
Introversion
Extraversion
Percentage of correctly classified
instances of the training set
Percentage of correctly classified
instances of the validation set
Overall percentage of correctly
classified instances
Actual
8
6
2
13
100%
72.4%
86.4%
Table 1 : Confusion matrix of the decision tree
for the extraversion-introversion dimension
Observation 2. A combination of features from
different lap types is necessary to determine
personal traits of drivers.
Features from different lap types are used as
decision nodes of the decision tree (Fig. 2).
The features were automatically selected based
on entropy, namely information gain ratio,
which indicates how useful is a feature for
classifying the players, for instance, as either
introvert or extravert. After selecting a feature,
e.g., the root decision node, the feature with the
highest information gain ratio is selected from
the remaining features. This process is repeated
until all the instances have been classified.
Figure 2 : Decision tree for the extraversionintroversion dimension
• To some extent, not going off track to the
left in the fast laps as denoted by decision
node: number of times the car goes off track
to the left in fast laps ≤ 0.29.
• Not being among the fastest in the fast laps
as denoted by decision node: lap time in fast
laps ≤ 0.94.
• Not going off track to the right in the
cautious laps as denoted by decision node:
number of times the car goes off track to the
right in cautious laps ≤ 0.
• In general, sometimes going off track to the
left as denoted by decision node: aggregate
number of times the car goes off track to the
left from aggregate features > 0.
• Not attaining the highest maximum speed in
the fast laps as denoted by decision node:
maximum speed in fast laps ≤ 0.96.
The J48 decision tree algorithm automatically
selected the features that were the most
informative, which corresponded to features
from laps where the players pursued different
goals. This confirms the relationship between
goal-oriented driving behaviors of players and
their personality traits. In addition, this also
validates the present methodology (where
players were asked to adopt different goals) for
determining their personality traits.
Observation 3. Overall, the profile of extravert
players involves:
45
As shown in the decision tree depicted in Fig.
2, its longest branch classified the majority of
the extravert players. There are other branches
classifying extravert players, however, they
only classified a few extravert players, and
thus, may not be entirely representative.
•
•
The above contributions lay the foundations for
the design of believable virtual drivers.
5. Related work
Future work will focus on exploiting the
insights gained from the analysis of the
empirical evidence to design and implement
believable virtual drivers.
The importance of believable artificial
opponents to engage and entertain players is
commonly stressed. Nevertheless, only a few
research efforts ([6, 7, 8]) have been proposed.
Acknowledgements
Gallego et al. [6] propose creating virtual
drivers in car racing video games by evolving
neural networks using genetic algorihtms. The
behavior of virtual drivers is determined by a
fitness function that evaluates virtual drivers
mostly based on how stable and fast they are.
Gallego et al. generate efficient virtual drivers,
however, the drivers may not be realistic.
This work has been supported by Asociación
Mexicana de Cultura A.C.
References
[1] A.B. Loyall and J. Bates. Personality-rich
Muñoz et al. [7] contribute an imitation
learning mechanism to create believable virtual
drivers. Muñoz et al. assume that a believable
virtual driver is a driver that imitates the
driving behavior of the current human player.
The main limitation of the approach of Muñoz
et al. is that in order to imitate the behavior of a
human player some data has to be collected to
train the mechanism.
[2]
[3]
[4]
Lu et al. [8] propose a personality model for
generating realistic driving behaviors. The
personality model is based on (i) a threedimension model: psychoticism, extraversion,
and neuroticism, and (ii) six descriptors related
to each dimension. In order to generate
personality-based driving behaviors, Lu et al.
conducted a study involving a number of
participants who labeled computer-generated
driving behaviors as either aggressive,
egocentric, active, risk-taking, tense, and shy.
However, participants (which are required to
label driving behaviors) may not know how an
egocentric or a shy driving style looks like.
[5]
[6]
[7]
[8]
6. Conclusion and future work
The contributions of this work are as follows.
•
Providing empirical evidence of the
relationship between personality traits and
driving behaviors of players.
Obtaining the first profile of extravert
players in car racing games.
Devising a methodology for profiling
players based on personality traits
extracted from their driving behaviors.
46
believable agents that use language. In
Proceedings of the 1st International Conference
on Autonomous Agents, pp. 106-113, 1997
Humanmetrics Inc. Personality test based on C.
Jung and I. Briggs Myers type theory, available
at http://www.humanmetrics.com, 2015
B. Wymann, E. Espié, C. Guionneau, C.
Dimitrakakis, R. Coulom and A. Sumner.
TORCS, the open racing car simulator, v1.3.6,
available at http://www.torcs.org, 2015
R.A. Emmons. Motives and Life Goals. In
Handbook of Personality Psychology, Hogan,
Johnson and Briggs (Eds.), Academic Press,
San Diego, pp. 485-512, 1997
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.
Reutemann, and I.H. Witten. The WEKA data
mining software: An update. SIGKDD
Explorations, 11(1):10-18, 2009
F. Gallego, F. Llorens, M. Pujol and R. Rizo.
Driving-Bots with a Neuroevolved Brain:
Screaming Racers. Inteligencia Artificial,
9(28):9-16, 2005
J. Muñoz, G. Gutierrez and A. Sanchis.
Towards imitation of human driving style in car
racing games. Believable Bots: Can Computers
Play Like People?. P. Hingston (ed.), Springer
Berlin Heidelberg, pp. 289-313, 2012
X. Lu, Z. Wang, M. Xu, W. Chen and Z. Deng.
A
personality
model
for
animating
heterogeneous traffic behaviors. Computer
Animation and Virtual Worlds, 25(3-4):363373, 2014
Virtual Meniscus Examination in Knee Arthroscopy
Training
Weng Bin and Alexei Sourin
Nanyang Technological University
procedure. Its first step is the inspection of the
meniscus by deflecting it up and down with a
probing instrument. The meniscus has a
complicated anisotropic elasticity. First, it is
very flexible in a vertical direction (indicated
by arrows in Figure 1a), while it is stiff
horizontally. Moreover, when the meniscus is
pressed with the probe near its inner thin
boundary, it touches the cartilage of the
opposite bone and may develop short elastic
deformations looking like wrinkles (see Figure
1a). By tactile examination of the meniscus
with the probe and analysing its visual elastic
deformation, the surgeons conclude about its
integrity.
Abstract
Knee arthroscopy is a minimally invasive
surgery performed on the knee joint. Virtual
simulation of arthroscopy is an extremely
important training tool that allows the medical
students to acquire necessary motor skills
before they can approach real patients. We
propose how to achieve visually realistic
deformation of the virtual meniscus by using
linear co-rotational finite element method
applied to the coarse mesh enriched with
monitor points responsible for fine wrinkles
simulation. The simulation is performed in real
time and it closely follows the actual meniscus
deformation. The generated wrinkles are easily
adjustable and do not require large memory.
Virtual simulation of arthroscopy becomes an
extremely important training tool. There are
several commercial virtual knee arthroscopy
simulators available, for examples, SIMENDO
arthroscopy from SIMENDO, ArthroSim from
ToLTech, ARTHRO Mentor from Simbionix
and ArthroS from VirtaMed. Besides, there are
several relevant projects, e.g., [1], considering
menisci deformations. However, very few
works have been done on the simulation of the
meniscus wrinkles since it is challenging when
using physically-based virtual models with
high resolution [2]. On the other hand, fine
wrinkles on the virtual meniscus can be
produced by data-driven method, however it
requires pre-storage of large off-line simulation
data and lacks the flexibility of tuning the
wrinkle deformation. It motivated our research,
and in this paper we propose a tuneable
simulation method that requires much less data
storage. To model the meniscus wrinkles, we
enrich the concept of embedding by using it to
monitor local states of soft tissue deformation.
The main contributions of the paper are: /1/ We
use embedding as a local deformation state
monitor; /2/ We dynamically define local
reference state of the wrinkles according to
complex instrument-to-tissue and tissue-to-
Keywords: meniscus, embedding, monitor
point, arthroscopy
1. Introduction
Meniscus is a vulnerable thin crescent-shaped
tissue with variable thickness located within
the knee joint between the two leg bones
(Figure 1a). Menisci are thinner in the inner
border while thicker in the outer border.
(a)
(b)
Figure 1: (a) A frame of a real surgical video.
(b) Screenshot from our simulation.
For the treatment of meniscus lesion,
arthroscopic surgery is the common surgical
47
coarse hexahedron mesh, which is driven by
the co-rotational finite element method with an
implicit solver. We use the LCP force feedback
method proposed in [9], which updates and
stores the LCP state in the simulation loop
(25-30 Hz) and repeatedly applies in haptic
loop (500-1000 Hz). The whole simulation can
run in real time, but the fine wrinkles of the
meniscus cannot be generated this way due to
the coarse simulation mesh.
tissue interactions. We have proposed and
implemented a fast function-based method for
local wrinkles modelling.
2. Related Works
The simulations of the meniscus were
commonly done using the popular finite
element methods [1] and mass-spring methods
[3]. Other physically-based methods could be
used as well, e.g., [4], however simulation of
the wrinkles would require using high
resolution mesh which is challenging to
generate in real time. We follow the general
technique of embedding by incorporating a
high resolution surface mesh of the meniscus
into a coarse volumetric mesh model for real
time physical simulation. Furthermore, we
enrich the concept of embedding as a technique
to monitor local deformation states, and define
reference state for wrinkles formation of thin
deformable bodies such as the menisci.
3.2. The embedding of monitor points and
region points
We embed so-called monitor points into the
inner border of the meniscus before the
simulation in order to monitor the deformation
state near the inner border of the meniscus. The
points are used not only to monitor the local
deformation state, but also to control the local
deformation near the thin border of the
meniscus. To define the affected area by the
monitor points, we also introduce region points.
The region points should be positioned inside
the meniscus, and the distance between the
monitor and the region points define the
wrinkle area. The embedding follows three
steps: /1/ We manually select a few data point
pairs (one monitor data point and one region
data point) along the inner side of the meniscus
(see Figures 2a); /2/ We generate two smooth
curves by interpolation on these data points
separately (see Figure 2b); /3/ We sample the
monitor and the region points on the
interpolated curves, respectively, with the
desired resolution. The monitor points are then
embedded into the existing simulation mesh by
the same way as the embedding of the surface
vertices can be done. Then, the monitor points
are driven by the simulation mesh.
There are many existing methods for wrinkles
generation on 2D objects, e.g., skin [5] or cloth
[6]. However, the meniscus is generally a 3D
object and the above methods are not directly
applicable to it. For the wrinkles on 3D objects,
Seiler et al. first proposed a data-driven method
in [2], and then improved their method in [7]
and [8]. In these works, fine wrinkles are
generated by adding pre-stored high-resolution
offline simulation data to low-resolution
simulation. The storage of large offline
simulation data is necessary. Since the fine
details relied on the offline simulation results,
it would take significant time to tune the
deformation. On the contrary, our method does
not need to store large offline simulation data,
and it is easy to tune by adjusting a few
parameters.
3. Meniscus
wrinkles
deformation
with
3.1. The overall simulation of the meniscus
examination
(a)
We use two Geomagic Touch desktop haptic
devices together with a 3D-printed knee model,
and a video monitor for displaying the
simulated virtual scene. We embed the highresolution meniscus surface mesh into the
(b)
Figure 2: (a) The red points (on the meniscus
boundary) are monitor points while the blue
points (in the middle of the meniscus) are
region points. (b) Two respective B-spline
curves are reconstructed.
48
With reference to Figure 3, the affected region
on the surface is defined by both the monitor
and the region points. Each surface point can
be affected by several monitor points, and there
can be overlapping affected areas.
if count >Nc then
needWrinkles ← true
the monitor points in the last step as reference points
end if
end if
if numberOfCollideWithLowerSurface < Ne &&
collideWithInstrument=false then
needWrinkles ← false
wrinkleReferencePoints ←NULL
end if
if needWrinkles = true then
implement the wrinkling algorithm
end if
Pjs
Pim 0
Pir 0
Pjp
Figure 3: The projection of the surface points
on the region line, where Pim 0 is the monitor
Pkwc
Pkmc
point, Pir 0 is the region point, P js is the
Pkw 0
surface point and Pjp is the projected points
from the surface points to the region line.
Figure 4: The displacements of the current
monitor points. The green points (below) are
the reference position of the wrinkles, the red
points (on top) are the current monitor points,
the purple points are the displaced points
(called wrinkle points), and the black vectors
are the displacements.
3.3. Definition of the reference state of the
wrinkles
We need to capture the deformation state just
before the wrinkles appear, and store this state
as a reference for the modelling of the wrinkles.
To monitor the state of deformation, we need
the monitor points both in the current and in
the last time step. In each time step, according
to the instrument-to-tissue contact point, we
select a few points Pkmc to monitor the local
deformation state. The interaction between the
tissues is also needed to be considered. If the
number of the collision points is larger than Ns,
we start the monitoring. If the number of the
points that move in the opposite direction with
last step is larger than Nc, we store the monitor
points in the last step as reference points. A
threshold Ne is set for the ending of the
wrinkles. If the number of the collision points
is smaller than the Ne, we stop adding the
wrinkles and clear the reference state. The
reference state of the wrinkles defined by this
process can adapt to the complex instrumenttissue and tissue-to-tissue interaction. The
process is summarized in the following pseudo
code.
3.4. A function-based method to generate the
meniscus wrinkles
As long as the reference state for the wrinkles
is found, the wrinkles can be generated by
displacing current wrinkle points with respect
to the reference state (see Figure 4). We use a
simple and efficient method to generate the
magnitude of the wrinkles based on the sine
function:
(1)
f (x) = A×sin(x)
Both the magnitude A and the function domain
are tuneable for controlling the wrinkles shape.
After the generation of the wrinkle points, the
displacements between the wrinkle and the
monitor points are propagated by the relation
described in Section 3.2. A Gaussian function
is used to control the attenuation of the
displacements propagation.
if numberOfCollideWithLowerSurface >Ns
&& collideWithInstrument=true
&& needWrinkles=false then
Select Pkmc according to the instrument-meniscus
4. Analysis
Results
contact point
for each Pkmc
if move in the opposite direction with last step then
count++
end if
end for
of
the
Modelling
We implemented the proposed method using
the open source platform SOFA (Simulation
49
Open Framework Architecture) [10]. Figure 1b
illustrates the simulated deformation process
and compares it with the actual meniscus
deformation (Figure 1a). The whole simulation
can be run with about 50 FPS on a desktop
computer with Intel Xeon dual core 2.27 GHz
CPU and 12 GB RAM. Compared with the
data-driven method [7], where pre-storage of
3.33 MB data is required, our method requires
only 12.9 KB of monitor points and region
points. Besides, by adjusting the value of A and
the parameters of sine function in Eq. 1, we
can easily tune the wrinkles deformation
without re-generation of the off-line simulation
data. Although we generate visually plausible
wrinkles on meniscus, the quantitative errors
might be large due to the different bulges
positions of wrinkles with real surgery. Our
method still cannot produce the dynamic
waving movement of the wrinkles.
References
[1] Wang, Y., et al., vKASS: a surgical
procedure
simulation
system
for
arthroscopic anterior cruciate ligament
reconstruction. Computer Animation and
Virtual Worlds, 2013. 24(1): p. 25-41.
[2] Seiler, M., J. Spillmann, and M. Harders,
Enriching coarse interactive elastic
objects with high-resolution data-driven
deformations, in Proceedings of the ACM
SIGGRAPH/Eurographics Symposium on
Computer Animation. 2012, Eurographics
Association: Lausanne, Switzerland.
p. 9-17.
[3] Jinghua, L., et al. A knee arthroscopy
simulator for partial meniscectomy
training. in Asian Control Conference,
2009. ASCC 2009. 7th. 2009.
[4] Nealen, A., et al., Physically Based
Deformable
Models
in
Computer
Graphics. Computer Graphics Forum,
2006. 25(4): p. 809-836.
[5] Rémillard, O. and P.G. Kry, Embedded
thin shells for wrinkle simulation. ACM
Trans. Graph., 2013. 32(4): p. 1-8.
[6] Rohmer, D., et al., Animation wrinkling:
augmenting coarse cloth simulations with
realistic-looking wrinkles. ACM Trans.
Graph., 2010. 29(6): p. 1-8.
[7] Seiler, M., J. Spillmann, and M. Harders,
Data-Driven Simulation of Detailed
Surface Deformations for Surgery
Training Simulators. Visualization and
Computer Graphics, IEEE Transactions on,
2014. 20(10): p. 1379-1391.
[8] Seiler, M.U., J. Spillmann, and M.
Harders, Efficient Transfer of ContactPoint Local Deformations for Data-Driven
Simulations. 2014. p. 29-38.
[9] Saupin, G., C. Duriez, and S. Cotin,
Contact Model for Haptic Medical
Simulations, in Proceedings of the 4th
international symposium on Biomedical
Simulation.
2008,
Springer-Verlag:
London, UK. p. 157-165.
[10] Faure, F., et al., SOFA: A Multi-Model
Framework for Interactive Physical
Simulation, in Soft Tissue Biomechanical
Modeling for Computer Assisted Surgery,
Y. Payan, Editor. 2012, Springer Berlin
Heidelberg. p. 283-321.
5. Conclusion and Future Work
We have proposed and successfully
implemented an efficient method for
generation of local meniscus wrinkles when it
is examined with the virtual surgical probe.
Compared with the existing methods, our
method produces fine wrinkles deformation
with less memory. The wrinkles generation can
be integrated into the meniscus cutting process,
which is a way how the meniscus injury can be
treated surgically. This will involve the update
of the monitor points and control regions.
Acknowledgements
This project is supported by the Ministry of
Education of Singapore Grant MOE2011-T2-1006 “Collaborative Haptic Modeling for
Orthopaedic Surgery Training in Cyberspace”.
The project is also supported by Fraunhofer
IDM@NTU, which is funded by the National
Research Foundation (NRF) and managed
through the multi-agency Interactive & Digital
Media Programme Office (IDMPO) hosted by
the Media Development Authority of
Singapore (MDA). The authors sincerely thank
Mr. Fareed Kagda (MD), arthopedic surgeon,
for rendering his evaluation of the simulation
results and useful advises.
50
Space Deformation for Character Deformation using
Multi-Domain Smooth Embedding
Zhiping Luo
Utrecht University
Z.Luo@uu.nl
Remco C. Veltkamp
Utrecht University
R.C.Veltkamp@uu.nl
Abstract
Arjan Egges
Utrecht University
J.Egges@uu.nl
enough to interpolate surfaces.
Radial basis functions (RBFs) are the most
versatile and commonly used smooth interpolation techniques in graphics and animation. Radial basis functions mostly are based on Euclidean distances. In this case, movements of
a branch of a model therefore might affect others when they are close to that branch in Euclidean space, often happening in character deformation. Levi and Levin [4] compute geodesic
distances to overcome the limitations. Nevertheless, the distance metrics heavily depend upon
the mesh topology or representation, leading to
a loss of generality.
A skeletal character consists of limb segments
and the deformation of each segment is locally
controlled. Vaillant et al. [5] propose to partition the character into multiple bone-associated
domains, and approximate deformations of each
segment by a field function. The method, however, obtains speedups by hardware acceleration, and requires a composition of the field
functions into a global field function, which is
difficult to realize in practice.
We propose a novel space deformation method
based on domain-decomposition to animate
character skin. The method supports smoothness and local controllability of deformations,
and can achieve interactive interpolating rates.
Given a character, we partition it into multiple
domains according to skinning weights, and
attach each to a linear system, without seam
artifacts. Examples are presented for articulated deformable characters with localized
changes in deformation of each near-rigid body
part. An application example is provided by
usage in deformation energies, known to offer
preservation of shape and volumetric features.
Keywords: space deformation, radial basis
functions, character deformation, deformable
surface
1 Introduction
Space deformation is a common acceleration
strategy used in nonlinear variational deformation (e.g. [1]), supporting preservation of
shape-details and volumetric properties, and
deformable solids such as Finite Element
model [2], offering interior dynamics in addition to quasistatic skins, to produce realistic deformations. In such contexts, a coarse representation loosely enclosing the original mesh
surface is established, carrying out the expensive computations, and the resulting deformations are propagated back to the original mesh
by efficient space deformation. Shepard’s interpolation scheme [3], though, is extensively
adopted as space deformation, it is not smooth
Contribution We propose smooth embedding
based on radial basis functions (RBFs) for character deformation. To avoid interplay between
mesh branches, we partition the character into
multiple domains according to associated skinning weights, and attach each to a small, linear
system of a local RBFs. Regions at and around
boundaries of contact skin parts are smoothed
in the post-processing, avoiding seam artifacts.
In contrast to [5, 6] , our method does not blend
field functions, instead we only introduce a simple, geometric post-processing to remove seam
artifacts.
51
Figure 1: Pipeline overview: We first sample a set of points (black dots), ready to be RBF centers, on
the mesh surface, and then partition the character into multiple domains based on skinning
weights, indicated by colors. Each segment, indicated by a colour, is accordingly associated
with a small group of samples for the construction of a local RBFs interpolation. Our method
is applied in interactive applications.
Our multi-domain RBFs interpolation scheme
can run at interactive rates and gives rise to
smooth shape deformations. We test our method
in a nonlinear variational deformation energy.
The results demonstrate the effectiveness.
human hand, our method still addresses the limitations, given a proper weight map. In practice,
such weights are often painted by skillful artists
to provide a guarantee. We also computed exact geodesic distances [9] as alternatives to Euclidean metrics. Single RBFs with these values,
though, yields similar results to our method, it
might have drawbacks, e.g., low order smoothness at and around contact regions. Figure 2
shows the comparisons.
We iteratively employ Laplacian smoothing
to remove the sharp features at and around the
boundaries of contact regions, specifically using
2 Multi-domain smooth space
deformation
We use RBFs with compact support to interpolate displacements
Φ(x) = e(
−kx−ck2
)
σ2
+ aT · x + b,
(1)
where c ∈ R3 is the RBF center, and x ∈ R3
is the evaluation point. Presence of the linear
term helps to reproduce global behavior of the
function. The resulting sparse matrix is solved
by LU decomposition, leading to the identical
result. In our implementation, σ is average distance between RBF centers by the guideline of
ALGLIB [7] which is a popular numerical analysis library, guaranteeing that there will be several centers at distance σ around each evaluation
point.
An overview of our method is shown in Figure 1. We employ Poisson disk sampling [8]
to sample the RBF centers on the mesh surface, and based on the resulting segmentation,
we consequently put them into corresponding
clusters. In our space deformation, only the RBF
centers are embedded, and the original surface
is left to a set of domain-associated RBFs interpolation systems. For complex figures such as
vit+1 = vit + P
1
j∈N (i) ωi,j
X
ωi,j (vjt − vit ),
j∈N (i)
(2)
where N (i) is N one-ring neighboring vertices
of vi , and ωi,j is cotangent weight [10]. The
method is efficient even in complex motions including twisting and bending, as illustrated in
Figure 3.
3 Results
An experiment is conducted to investigate how
the number of samples affects the performance.
We increase the number of RBF centers, and
report the resulting timings in Table 1, which
shows the efficiency of our method for larger
number of RBF centers.
Our method succeeds in volumetric deformation using volumetric PriMO energy [1] for instance, as shown in Figure 4. In more detail, the
52
Figure 2: Both single RBFs with geodesic distances and our method avoid interplay between fingers,
whereas single BRFs with Euclidean distances does not. Our method, however, supports
higher order smoothness than geodesic single-RBFs. The trackball indicates rotating motion
of a finger.
Figure 3: Elbow twisting (left). Laplacian smoothing by 5 iterations. Leg bending (right). Smoothing
by 3 iterations.
Initialize (ms)
Runtime (ms)
Total (ms)
Method
Single
Multi-domain
Single
Multi-domain
Single
Multi-domain
50
1.02
0.18
19.54
28.17
20.56
28.35
100
3.92
0.43
30.51
42.52
34.43
42.95
150
13.63
1.08
44.94
58.77
58.57
59.85
200
27.98
1.90
57.81
74.11
85.79
76.01
250
52.34
3.26
72.56
89.81
124.90
93.07
300
91.48
5.53
86.56
105.73
178.04
111.26
350
144.99
8.51
103.09
119.82
248.08
128.33
Table 1: Timings: Multi-domain interpolation shows advantage increased in number of samples. Note
that the additional runtime cost in our method, with respect to single RBFs interpolation, is
introduced by the smoothing step rather than solving multiple linear systems themselves. The
time is in milliseconds. The model has 5,103 vertices, 10,202 triangles and six domains.
model with 5, 406 vertices, 10, 808 triangles and
300 samples on surface (Figure 4), the space
deformation using multi-domain smooth embedding, only takes average 0.07 s per frame,
volumetric PriMO energy as a nonlinear variational deformation technique, while keeping deformation as-rigid-as-possible, is computationally expensive. In our test, of the Armadillo
53
multi-domain subspace deformations.
In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on
Computer Animation, SCA ’11, pages
63–72, 2011.
whereas the volumetric deformation costs average 1.64 s per frame, showing that the runtime
cost of space deformation is minimum without
introducing significant overload.
[3] R.E Barnhill, R.P Dube, and F.F Little.
Properties of shepard’s surfaces. Journal
of Mathematics, 13(2), 1983.
[4] Zohar Levi and David Levin. Shape deformation via interior rbf. Visualization
and Computer Graphics, IEEE Transactions on, 20(7):1062–1075, July 2014.
Figure 4: We apply multi-domain space deformation in an energy minimization
based deformation technique, namely
the volumetric PriMO energy with
voxels [1].
[5] Rodolphe Vaillant, Loı̈c Barthe, Gaël
Guennebaud, Marie-Paule Cani, Damien
Rohmer, Brian Wyvill, Olivier Gourmel,
and Mathias Paulin. Implicit skinning:
Real-time skin deformation with contact modeling.
ACM Trans. Graph.,
32(4):125:1–125:12, July 2013.
4 Conclusion
In this paper, we presented a method of space
deformation based on domain-decomposition
for character deformation. The method is based
on radial basis functions, supporting smoothness without seam artifacts. We have applied
our method in a nonlinear variational deformation technique, demonstrating the efficiency of
the method.
Our method is likely to yield domains with
very small number of samples for skinned model
with many bones, making the resulting interpolation matrices inefficient. A possible solution
is to group the contact domains associating with
minimum number of samples as a new domain,
and update the RBFs accordingly.
Our domain decomposition method holds
promise to perform surface deformation in realtime animation and simulation applications on
commodity laptops in the near future.
Acknowledgement: This publication was
supported by the Dutch national program COMMIT.
[6] Rodolphe Vaillant, Gäel Guennebaud,
Loı̈c Barthe, Brian Wyvill, and MariePaule Cani. Robust iso-surface tracking for
interactive character skinning. ACM Trans.
Graph., 33(6):189:1–189:11, November
2014.
[7] Alglib: a cross-platform numerical analysis and data processing library. available online at http://www.alglib.
net/. Accessed: 2015-03-03.
[8] Kenric B. White, David Cline, and Parris K. Egbert. Poisson disk point sets by
hierarchical dart throwing. In Proceedings
of the 2007 IEEE Symposium on Interactive Ray Tracing, RT ’07, pages 129–132,
2007.
[9] Shi-Qing Xin and Guo-Jin Wang. Improving chen and han’s algorithm on the
discrete geodesic problem. ACM Trans.
Graph., 28(4):104:1–104:8, September
2009.
References
[1] Mario Botsch, Mark Pauly, Martin Wicke,
and Markus Gross. Adaptive space deformations based on rigid cells. Computer
Graphics Forum, 26(3):339–347, 2007.
[10] Mark Meyer, Mathieu Desbrun, Peter
Schröder, and Alan H Barr. Discrete
differential-geometry operators for triangulated 2-manifolds. In Visualization and
mathematics III, pages 35–57. Springer,
2003.
[2] Theodore Kim and Doug L. James.
Physics-based character skinning using
54
Download