Prroceed din ngs Sh hortt Paapeers ISBN: 978-981-09 9 -4946-4 Preface This year, CASA has received 77 papers. 27 papers have been selected and published in a special issue of the Computer Animation and Virtual Worlds Journal published by Wiley (CAVW). These Proceedings contain the 13 short papers that have been selected for publication with the ISBN number 978-981-09-4946-4. In total, 40 papers have been presented locally in the conference. CASA was founded by the Computer Graphics Society in 1988 in Geneva and is the oldest Conference on Computer Animation in the world. It has been held in various countries around the world and in recent years, in 2013 in Istanbul, Turkey, in 2014 in Houston, USA, and this year in Singapore. This year, the conference has started the first day with two workshops: the 3D Telepresence Workshop and the Industrial Workshop on 3D Graphics & Virtual Reality. Workshops are followed by two days of presentations of the 27 full papers and a dozen of short papers. Addressing this year’s conference are four invited speakers: 1) Mark Sagar, from the University of Auckland, New Zealand, 2) Dieter Fellner, from Fraunhofer Institute, Germany, 3) Cai Yiyu, from Nanyang Technological University and 4) Zheng Jianming, from the same University. A panel on “Sharing life with Social Robots and Virtual Humans, is it our future ?” was also held on May 13. The Program CoChairs Daniel Thalmann, NTU, Singapore & EPFL, Switzerland Jian Jun Zhang, Bournemouth University, UK COMMITTEES Conference Chairs Dieter Fellner Nadia Magnenat Thalmann Junfeng Yao Fraunhofer Institute, Darmstadt, Germany NTU, Singapore & MIRALab, University of Geneva, Switzerland Xiamen University, China Program Chairs Jian Jun Zhang Daniel Thalmann Bournemouth University, UK NTU, Singapore & EPFL, Switzerland Local arrangement Committee Qi Cao Poh Yian Lim Yaminn Khin NTU, Singapore NTU, Singapore NTU, Singapore International Program Committee Norman Badler Selim Balcisoy Jan Bender Ronan Boulic Yiyu Cai Marc Cavazza Bing-Yu Chen Guoning Chen Yiorgos Chrysanthou Frederic Cordier Justin Dauwels Etienne de Sevin Zhigang Deng Fabian Di Fiore Arjan Egges Abdennour El Rhalibi Petros Faloutsos Ugur Gudukbay Xiaohu Guo Mario Gutierrez James Hahn Ying He Zhiyong Huang Veysi Isler Jean-Pierre Jessel Xiaogang Jin Sophie Joerg Chris Joslin University of Pennsylvania, US Sabanci University, Turkey TU Darmstadt, Germany EPFL, Switzerland NTU, Singapore Teesside University, UK National Taiwan University, Taiwan University of Houston, US University of Cyprus, Cyprus UHA, France NTU, Singapore MASA Group R&D, France University of Houston, US Hasselt University, Belgium Utrecht University, Netherlands Liverpool John Moores University, UK York University, UK Bilkent University, Turkey The University of Texas at Dallas, US OZWE SàRL, Switzerland George Washington University, US NTU, Singapore A*STAR, Singapore METU, Turkey Paul Sabatier University, Turkey Zhejiang University, China Clemson University, US Carleton University, Canada Prem Kalra Mustafa Kasap Andy Khong Scott King Taku Komura Caroline Larboulette Rynson Lau Binh Le Wonsook Lee J.P. Lewis Tsai-Yen Li Xin Li Hao Li Ming Lin Wan-Chun Ma Anderson Maciel Nadia Magnenat Thalmann Dinesh Manocha Franck Multon Soraia Musse Rahul Narain Luciana Nedel Junyong Noh Veronica Orvalho Igor Pandzic George Papagiannakis Laura Papaleo Nuria Pelechano Christopher Peters Julien Pettre Pierre Poulin Nicolas Pronost Taehyun Rhee Isaac Rudomin Jun Saito Yann Savoye Hubert Shum Matthias Teschner Daniel Thalmann Xin Tong Jun Wang Jack Wang Enhua Wu Junsong Yuan Cem Yuksel Zerrin Yumak Jian Zhang Jianmin Zheng IIT Delhi, India Microsoft Turkey, Turkey NTU, Singapore Texas A&M University - Corpus Christi, US University of Edinburgh, UK University of South Brittany City University of Hong Kong, Hong Kong University of Houston, US University of Ottawa, Canada Victoria University, Australia National Chengchi University, Taiwan Louisiana State University, US University of Southern California, US UNC Chapel Hill, US Weta Digital, New Zealand UUFRGS, Brazil NTU, Singapore and MIRALab, University of Geneva, Switzerland UNC Chapel Hill, US University Rennes 2, France PUCRS, Brazil University of California, Berkeley, US UFRGS, Brazil KAIST, South Korea FCUP, Portugal FER, Croatia University of Crete, Greece University of Paris-SUD, France UPC, Spain KTH Royal Institute of Technology, Sweden INRIA, France Universite de Montreal, Canada Université Claude Bernard Lyon 1, France Victoria University of Wellington, New Zealand BSC, Spain Marza Animation Planet, Tokyo University of Innsbruck, Austria Northumbria University, UK University of Freiburg, Germany NTU,Singapore and EPFL, Switzerland MSRA, China Nanjing University of Aeronautics and Astronautics, China University of Hong Kong, Hong Kong University of Macau, China NTU, Singapore University of Utah, US NTU, Singapore Bournemouth University, UK NTU, Singapore Contents 1. Data-Driven Model for Spontaneous Smiles Laura Trutoiu, Nancy Pollard, Jeffrey Cohn and Jessica Hodgins ................... 1 2. Expectancy Violations Related to a Virtual Human’s Joint Gaze ................... Behavior in Real-Virtual Human Interactions Kangsoo Kim, Arjun Nagendran, Jeremy Bailenson and Greg Welch 5 3. Segmentation-Based Graph Representation for 3D Animations Guoliang Luo, Quanhua Tang and Yihan Liu ................... 9 4. Motion Puppetry Taking Skeletal Similarity into Account Norihide Kaneko, Reo Takahashi and Issei Fujishiro ................... 13 5. Real-Time Marker-Less Implicit Behavior Tracking for User ................... Profiling in a TV Context Francois Rocca, Pierre-Henri De Deken, Fabien Grisard, Matei Mancas and Bernard Gosselin 17 6. Robust Space-Time Footsteps for Agent-Based Steering Glen Berseth, Mubbasir Kapadia and Petros Faloutsos ................... 21 7. Avatar Chat: A Prototype of a Multi-Channel Pseudo Real-Time ................... Communication System Kei Tanaka, Dai Hasegawa, Martin J. Dürst and Hiroshi Sakuta 25 8. On Streams and Incentives: A Synthesis of Individual and ................... Collective Crowd Motion Arthur van Goethem, Norman Jaklin, Atlas Cook Iv and Roland Geraerts 29 9. Constrained Texture Mapping via Voronoi Diagram Base Domain Peng Cheng, Chunyan Miao and Nadia Thalmann ................... 33 10. Hybrid Modeling of Multi-Physical Processes for Volcano ................... Animation Fanlong Kong, Changbo Wang, Chen Li and Hong Qin 39 11. Determining Personality Traits from Goal-Oriented Driving ................... Behaviors: Toward Believable Virtual Drivers Andre Possani-Espinosa, J. Octavio Gutierrez-Garcia and Isaac Vargas Gordillo 43 12. Virtual Meniscus Examination in Knee Arthroscopy Training Bin Weng and Alexei Sourin ................... 47 13. Space Deformation for Character Deformation using Multi-Domain ................... Smooth Embedding Zhiping Luo, Remco Veltkamp and Arjan Egges 51 Data-Driven Model for Spontaneous Smiles Laura Trutoiu1 , Nancy Pollard1 , Jeffrey F. Cohn2 , Jessica Hodgins1 1 Carnegie Mellon University 2 University of Pittsburgh Abstract We present a generative model for spontaneous smiles that preserves their dynamics and can thus be used to generate genuine animations. We use a high-resolution motion capture dataset of spontaneous smiles to represent the accurate temporal information present in spontaneous smiles. The smile model consists of data-driven interpolation functions generated from a Principal Component Analysis model and two blendshapes, neutral and peak. We augment the model for facial deformations with plausible, correlated head motions as observed in the data. The model was validated in two perceptual experiments that compared animations generated from the model, animations generated directly from motion capture data, and animations with traditional blendshape-based approaches with ease-in/ease-out interpolation functions. Animations with model interpolation functions were rated as more genuine than animations with ease-in/ease-out interpolation functions for different computergenerated characters. Our results suggest that data-driven interpolation functions accompanied by realistic head motions can be used by animators to generate more genuine smiles than animations with generic ease-in/ease-out interpolation functions. 1 Introduction Facial animation research has made significant progress in the quality of the static appearance of realistic faces [1; 2] as well as the high-resolution techniques for capturing dynamic facial expressions [3; 4]. Generative models for the deformations that occur on the face, however, do not represent the full range of of subtle human expression. These generative models are particularly useful if they integrate with traditional animation methods, which often rely on key framing static expressions. In this paper, we use high-resolution motion capture data to build a model of one such dynamic facial expression, smiles. Our smile model consists of two parts: (1) datadriven interpolation functions to model smile expressions and (2) plausible head motions. We start with high-resolution motion capture data of smiles for one individual. The motion capture data contains both spontaneous smiles, elicited through various activities, and posed smiles. For the smile expressions, we build a generative model that produces interpolation functions nonlinear in time. These interpolation functions capture the plausible velocity as well as the multiple peaks that occur in natural smiles. For each data-driven interpolation function we provide a plausible head motion. Through perceptual studies, we demonstrate that our model outperforms the commonly used ease-in/ease-out interpolation functions. We eval- uate our model based on how the smiles are rated for genuineness. In a first perceptual experiment, we compare model smiles with recorded highresolution spontaneous smiles, and also smiles generated with ease-in/ease-out interpolation functions. Our data showed no significant difference between the high-resolution spontaneous smiles and the model smiles. In a second experiment, we find that our model-based interpolation functions coupled with appropriate head motions can generalize to different CG characters. 2 Related work Smiles can be categorized, depending on how they are elicited, as spontaneous or posed. Posed or deliberate smiles mask other emotions rather than genuinely convey amusement or joy. Conversely, spontaneous smiles more often convey genuine emotions though the perceptual labels associated with spontaneous smiles are diverse. Smiles can be perceptually labeled as polite, embarrassed, or fearful [5; 6; 7]. What are the cues that help differentiate between different type of smiles? Many cues, static and dynamic, affect how a smile is perceived [7]. The Duchenne marker (slight wrinkling on the outer corner of the eyes [8]), is considered an indication of a genuine smile. However, the timing of the Duchenne marker relative to the smile, a dynamics cue, also impact whether the smile is perceived as genuine [9]. Spontaneous smiles have been found to have smaller amplitude and slower onset (start time) than posed smiles [6; 10]. Few research projects explicitly consider generative models for smiles and laughter. Krumhuber and colleagues [11] used a data-based heuristic to generate genuine smiles characterized by symmetric smiles with a long onset and offset duration with a shorter peak. A discrete model for smiles with a limited number of parameters was proposed by Ochs and colleagues [12]. Previous research suggested that temporal information is required to maintain the genuineness of smile expressions when linearizing spatial motion, as in the case of blendshape interpolation [13]. 1 Original Generated 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −0.2 0 100 200 300 −0.2 0 400 Frame # 100 200 300 400 Frame # Figure 1: PCA model for genuine smiles: (left) original smile profiles and (right) generated profiles. 3 Smile model Our smile model consists of two parts: (1) a generative model for smile expressions, represented as interpolation functions, and (2) plausible head motions. For the smile expression model, we represent smiles in the temporal domain as data-driven interpolation functions. We capture temporal nonlinearities in datadriven interpolation functions with a generative Principal Component Analysis (PCA) model. Our model includes of plausible head motions because animations presented without head motion appeared artificial and rigid. We used a dataset of high-resolution motion capture data for the face and head movement. The dataset was recorded by Trutoiu and colleagues [13]. For animations, a computergenerated version of the participant was created by a professional artist. Animations were created in Autodesk Maya either from the high-resolution data or from blendshape interpolation with various functions. 3.1 Generative model We modeled the temporal properties of spontaneous smile expression as interpolation functions and used PCA to construct a generative model of data-driven interpolation functions. Twenty five spontaneous smiles from the same subject (SD) were used for the PCA model. High-resolution expressions of motion capture data were reconstructed in a least-square fashion with two blendshapes to build a time series dataset for smile dynamics. The time series are represented by the coefficients s. PCA models the variability in the data-driven interpolation functions. We represented the original dataset using the first ten principal components accounting for 98% variance. Next, we projected new, random coefficients within one standard deviation of the original coefficients onto these ten PCA dimensions. Figure 1 shows the input and output of the PCA model: the original time series correspond to the input and the generated time series to the output. Using the last two terms of the newly generated time series, we scaled back each part of the smile to create interpolation functions of different durations. Animated smiles generated using these interpolation functions were used in both experiments. 3.2 Plausible head motions We hypothesized that a plausible head motion is proportional to the smile amplitude, similar to laughing, where sound correlates to torso movements. Cohn and colleagues found moderate correlation between head pitch and smile intensity with the intensity increasing as the head moved downwards [14]. We evaluated the correlation between head motion and smiles in our dataset and found a moderate average correlation only for head pitch, −0.38 (on a scale from −1 to 1, 0 means no correlation is occurring). Some smile samples show stronger correlations with a maximum of −0.8. Based on these results, we generated plausible head motions derived from the interpolation functions, such that the smile amplitude is proportional to head pitch. 4 Perceptual experiments The goal of Experiment 1 was to evaluate a small number of samples from the model relative to ground truth animations and ease-in/ease-out animations. In Experiment 2, we tested a large sample of model smiles and applied the model to multiple CG characters. We hypothesized that interpolation functions generated from our PCA model would result in smile animations with high genuineness ratings compared to ease-in/ease-out interpolation functions. We expected that adding plausible head motions, proportional to the smile amplitude, to the smile expressions would increase the perceived genuineness of the animation. 2 June 30, N=61 With head motion Without head motion 70 Genuineness 60 50 40 30 20 10 Figure 3: The three CG characters in Experiment 2. Spont Spontaneous Model Ease−in/Ease−out Conditions Posed Figure 2: Experiment 1 results. *** *** 4.1 Experiment 1: Different smile types To evaluate our model, we conducted a withinsubjects experiment with the following independent variables: smile type (spontaneous, posed, model, and ease-in/ease-out smiles) and head motion (with and without). The dependent variable was perceived genuineness on a scale from 1 to 100. In this experiment, the head motion for the model and ease-in/ease-out condition were identical and proportional to the model smile profile as described above. Sixty-one viewers rated 24 smile animations on Amazon’s Mechanical Turk. All animations were created with a CG character that matched in appearance our participant SD. The high-resolution animations were obtained directly from the recorded motion capture data and are considered ground truth animations that match as best as possible recorded video. Three samples of each of the following types were used: (1) Spontaneous (2) Posed smiles, (3) Model, and (4) Ease-in/ease-out. For the model smiles two static blendshapes (neutral and peak) were obtained from the collection of spontaneous smiles. For each model interpolation function we created counterpart ease-in/ease-out curves with the same durations as the model smiles from neutral to peak and from peak to the end the smile. 4.1.1 Results We conducted a repeated-measures 4 (smile type) × 2 (head motion) ANOVA to investigate possible effects of the independent variables on smile genuineness. Both independent variables and their interaction significantly impacted genuineness ratings. We found a significant main effect of smile type on smile genuineness F (3, 420) = 19.13, p < .0001. Posed smiles (rating of 35.62) were rated as being significantly less genuine than all other conditions, which were not significantly different amongst each other. Similarly, head motion had a significant effect on smile genuineness: animations with head motion had an average rating of 52.56 while animations without head motion averaged 37.51, F (1, 420) = 120.80, Figure 4: Genuineness ratings for three characters and two smile types. p < .0001. We further investigated the significant interaction between smile type and head motion F (3, 420) = 2.83, p = .0379. For animations with head motion, spontaneous smiles (59.31) were significantly different than ease-in/ease-out smiles (53.28), F (1, 420) = 4.12, p = .043, but not significantly different than model smiles (57.93), F (1, 420) = .215, p = .643. Without head motion, only posed smiles were significantly less genuine. This interaction is shown in Figure 2. 4.2 Experiment 2: Multiple characters The independent variables used in this experiment are the smile type (model or ease-in/ease-out), the CG character (female SD, male KB, or cartoonlike CP), and the smile sample (1 to 12). The dependent variable is smile genuineness. The twelve data-driven interpolation functions and corresponding head motions are from the SD model. The peak blendshapes are shown in Figure 3. Animations were shown with head motion based on the model smiles. We used a mixed experiment design, 3 (character) × 2 (smile type) × 12 (smile sample), with character as a between-subjects variable while smile type and smile sample were within-subjects variables. Each participant saw each animation type (n = 24) for only one character. Fifty-eight participants viewed and rated SD animations, 57 participants rated KB animations, and 63 participants rated CP animations. 3 Table 1: Significant results from Experiment 2: Multiple characters. Effect Main effects F-Test Smile type F(1,4031)=35.13, p<.0001 Sample F(11,4030)=28, p<.0001 Post-hoc Model smiles (56.13) are rated as more genuine than ease-in/ease-out smiles (52.22) The ratings for samples varied from 64.79 (sample 5) to 45.82 (sample 3) Two-way Interactions Smile type*Sample F(11,4030)=7.15, p<.0001 Smile type*Character F(2,4031)=17.30, p<.0001 4.2.1 Results Significant main effects were observed for smile type and sample but not for CG character (shown in Table 1). The ANOVA statistical analysis indicates that our model is appropriate for use with photorealistic characters such as KB and SD. However, more research is required to investigate how this type of model can be used with cartoonlike characters. Model smile sample 5 is rated highest (70.42) while model smile sample 12 is rated lowest (43.51) For SD and KB, model smiles are higher than ease-in/ease out [4] Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. High-quality passive facial performance capture using anchor frames. ACM SIGGRAPH 2011 papers on - SIGGRAPH ’11, 1(212):1, 2011. [5] Paul Ekman. Telling lies: Clues to deceit in the marketplace, politics, and marriage (revised and updated edition). W. W. Norton & Company, 2 rev sub edition, September 2001. [6] Jeffrey F. Cohn and Karen L. Schmidt. The timing of facial motion in posed and spontaneous smiles. Journal of Wavelets, Multi-resolution and Information Processing, 2:1–12, 2004. [7] Zara Ambadar, Jeffrey F Cohn, and L I Reed. All smiles are not created equal: Morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous. Journal of Nonverbal Behavior, 33(1):17–34, 2009. [8] Paul Ekman, R J Davidson, and Wallace Friesen. The Duchenne smile: emotional expression and brain physiology. Journal of Personality and Social Psychology, 58(2):342–53, February 1990. [9] Eva G Krumhuber and Antony S R Manstead. Can Duchenne smiles be feigned? New evidence on felt and false smiles. Emotion, 9(6):807–820, 2009. 5 Discussion The primary contribution of this paper is to demonstrate that data-driven interpolation functions accompanied by correlated head motions are appropriate for modeling smiles. Our smile model of interpolation functions and plausible head motions is rated as more genuine than animations based on the commonly used ease-in/easeout interpolation functions, even when the easein/ease-out examples had the same overall timing to and from the peak of the smile and the same head motions. The model preserves naturally occurring smile accelerations, decelerations, and multiple smile peaks. In contrast, animations with ease-in/ease-out interpolation functions are smooth with a single peak and therefore may not accurately represent spontaneous smiles. References [1] Henrik Wann Jensen, Stephen R Marschner, Marc Levoy, and Pat Hanrahan. A practical model for subsurface light transport. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 511–518, 2001. [2] Jorge Jimenez, Timothy Scully, Nuno Barbosa, Craig Donner, Xenxo Alvarez, Teresa Vieira, Paul Matts, Verónica Orvalho, Diego Gutierrez, and Tim Weyrich. A practical appearance model for dynamic facial color. ACM Transactions on Graphics, 29(6):141:1–141:10, December 2010. [3] Li Zhang, Noah Snavely, Brian Curless, and Steven M Seitz. Spacetime faces: High-resolution capture for˜ modeling and animation. In Data-Driven 3D Facial Animation, pages 248–276. Springer, 2007. [10] Karen L Schmidt, Yanxi Liu, and Jeffrey F Cohn. The role of structural facial asymmetry in asymmetry of peak facial expressions. Laterality, 11(6):540–61, November 2006. [11] Eva Krumhuber and Arvid Kappas. Moving smiles: The role of dynamic components for the perception of the genuineness of smiles. Journal of Nonverbal Behavior, 29(1):3–24, April 2005. [12] Magalie Ochs, Radoslaw Niewiadomski, Paul Brunet, and Catherine Pelachaud. Smiling virtual agent in social context. Cognitive Processing, pages 1–14, 2011. [13] Laura C. Trutoiu, Elizabeth J. Carter, Nancy Pollard, Jeffrey F. Cohn, and Jessica K. Hodgins. Spatial and temporal linearities in posed and spontaneous smiles. ACM Transactions on Applied Perception, 8(3):1–17, August 2014. [14] Jeffrey F. Cohn, Lawrence I. Reed, Tsuyoshi Moriyama, Jing Xiao, Karen Schmidt, and Zara Ambadar. Multimodal coordination of facial action, head rotation, and eye motion during spontaneous smiles. IEEE Proceedings of the International Conference on Automatic Face and Gesture Recognition, pages 129–135, 2004. 4 Expectancy Violations Related to a Virtual Human’s Joint Gaze Behavior in Real-Virtual Human Interactions Kangsoo Kim1 , Arjun Nagendran1 , Jeremy Bailenson2 , and Greg Welch1 1 University of Central Florida University 2 Stanford Abstract Joint gaze—the shared gaze by the individuals toward a common object/point of interest— offers important non-verbal cues that allow interlocutors to establish common ground in communication between collocated humans. Joint gaze is related to but distinct from mutual gaze—the gaze by the interlocutors towards each other such as during eye contact, which is also a critical communication cue. We conducted a user study to compare real human perceptions of a virtual human (VH) with their expectancy of the VH’s gaze behavior. Each participant experienced and evaluated two conditions: (i) VH with mutual gaze only and (ii) VH with mutual and joint gaze. We found evidence of positive responses when the VH exhibited joint gaze, and preliminary evidence supporting the effect of expectancy violation, i.e., more positive perceptions when participants were presented with VH’s gaze capabilities that exceeded what was expected. Figure 1: Virtual human exhibiting mutual gaze (left) and joint gaze (right). In this paper, we conduct a user study that independently varies a virtual human’s (VH) joint gaze behavior (Fig. 1), and investigate the effects of a mismatch between user expectations of the VH’s gaze behavior and the VH’s actual gaze behavior, with respect to the user’s perceptions of the VH. Joint gaze is the shared gaze that interacting interlocutors typically exhibit when attending to a common object of interest. Joint gaze is an important aspect of establishing common ground, so interlocutors generally expect joint gaze when attempting to establish joint attention to a shared object. For example, if you explain directions to a partner while pointing toward features on a map, you would expect your partner to look at the map. If your partner does not look at the map, you might be puzzled and wonder whether your partner is paying attention. A positive (or negative) violation corresponds to when a subject initially having a low (or high) expectation of a VH’s joint gaze later evaluates the VH more positively (or negatively) after they actually meet a VH with (or without) joint gaze. We hypothesize that an expectation violation related to the VH’s joint gaze will influence one’s perceptions of the VH. Keywords: joint gaze, expectancy violation, human perception, virtual human, avatar 1 Introduction Expectancy violation (EV) is a well-known phenomenon in human communications and psychology [1]. The phenomenon of EV arises when one encounters an unexpected behavior, and as a result experiences either positive or negative feelings. For example, if a child is given a gift, she will likely be happier if the gift was unexpected (e.g., “out of the blue”) than if the gift was expected (e.g., it was her birthday). This would be a positive violation. Conversely, not receiving a gift on her birthday, when she clearly expected it, would be a negative violation that could cause her to have an unfavorable response. Among the prior research looking at the importance of gaze behavior in VH systems, some work has looked at the gaze behavior between VHs in a virtual environment [2, 3], while other work has looked at gaze between real humans 5 3 Experiment and VHs in real/mixed environments [4, 5, 6, 7, 8]. Our interest is in the latter due to the involvement of real objects in the interaction, as opposed to interactions in a purely virtual environment. Previous research supported the importance of VH eye gaze (mostly mutual gaze, i.e. eye contact) in human perception (e.g., socialpresence) or task performance. However, there is relatively little research narrowing the focus down to joint gaze and one’s expectancy. This paper presents preliminary results about the effects of VH’s joint gaze and its expectancy violation in one’s perceptions of the VH. 3.1 Scenario and Manipulation Our human subjects were introduced to a VH and told his name (“Michael”). They were then told that the VH was a new student at the university who was currently in a building off campus, but needed to return for a lecture on campus, and that he was late. The subjects were then asked to staff a “help desk” and to provide the VH with directions using a Campus Map and a pen. We had two conditions of the VH’s gaze behavior: (i) mutual gaze only and (ii) mutual gaze with joint gaze (Fig. 1). While the VH always looked at the subject’s face without looking down to the map in “mutual-only” condition, he looked at the map occasionally in the “mutual+joint” condition. In both conditions, the VH exhibited small natural uppertorso movement and eye blinks. Subjects experienced both conditions and evaluated the two VHs. The overall procedure is illustrated in Fig. 3. First, subjects saw the VH verbally explaining the situation that he would look for directions to the campus, and completed a demographic pre-questionnaire. They experienced both interaction 1 and 2 explaining the map, but the VH performed a different gaze behavior in each interaction—either “mutual-only” or “mutual+joint.” After each interaction, subjects were asked to complete a questionnaire about their perceptions of the VH and sense of their EV with respect to the VH’s gaze behavior (5scale Likert). Finally, they compared two conditions and reported their preference in questionnaire 3. To prevent the subjects from familiarizing themselves with the same set of directions, we counter-balanced a different destination on the map as well as the VH’s gaze behavior for each interaction. 2 Virtual Human A remote human controller manipulated our VH from a separate room (Wizard of Oz). We provided the controller with a video feed of the experimental space so he could see the experimental environment and affect the VH gaze either toward the subjects face or toward the map, depending on the current trial/subject (Fig. 1). The controller used an interface with an infrared camera (TrackIR) and a magnetic tracking device (Razer Hydra) to perform the VH’s facial expressions, mouth movement, and change of gaze direction effectively via our system developed previously [9]. The upper torso of our VH was displayed in near human-size on a 55” 2D flat screen, and a table with black curtains blocked the place where the lower torso should have been, so subjects could feel that the VH was behind the table (Fig. 2). In our scenario, the VH expressed a normal or slightly pleasant facial expression during the interaction, so that the subjects could feel the VH’s emotional state was consistent. The VH generally initiated the conversation unless the subject started talking first, but did not say anything proactively during the interaction. In other words, the VH only made positive reciprocal answers (e.g. “Yes, I understand.”) to the subject’s affirmative question “Do you understand?”. Figure 3: Overall procedure. Figure 2: Virtual human setup (left) and facial expressions (right). 6 3.2 Human Subjects A total of 28 subjects were recruited from The University of Central Florida, and received $15 of monetary compensation for the experiment. Subjects were 75% male (n = 21, mean age = 19.95, range = 18–26) and 25% female (n = 7, mean age = 20.71, range = 18–24). Most of them (n = 26) had previous experience of virtual characters through video games or virtual reality applications. All the subjects were aware that what they interacted with was a VH. 4 Results and Discussion Joint Gaze: We evaluated subjects’ responses from (comparison) quesionnaire 3 to check which gaze condition of VH subjects preferred. More than 50% of subjects chose “mutual+joint” VH as their preference in all the questions, which indicates the importance of joint gaze feature in VH system (Table 1). However, there were still considerable number of people who did not feel any difference between two conditions. According to informal discussion with subjects after the study, a majority of the subjects addressed that the VH’s verbal response capability far exceeded what they had previously experienced although our VH merely responded their affirmative questions. We guess that highly engaging verbal communication might overwhelm the effect of joint gaze, so subjects could not feel any difference between two conditions. Expectancy Violation (EV): Although we used the same questions for questionnaire 1 and 2 (perception / sense of EV) for experiment consistency, we only analyzed subjects’ responses from questionnaire 1 to evaluate EV effects, because their expectation might be biased for the multiple interactions. After the first interaction, we asked for their sense of EV with respect to the VH’s gaze behavior, e.g., “How would you rate the capability of virtual human’s gaze com- Figure 4: Population by subject-reported sense of EV in VH’s gaze behavior. Population with “mutual+joint” VH tends towards the highest (5) in the sense of EV while population with “mutual-only” is more towards (4) in the sense of EV, which can be interpreted that VH’s joint gaze encourages more positive violation. pared to what you expected?” (5-scale, 1: more negative than what I expected, 3: same as what I expected, 5: more positive than what I expected). We expected both negative and positive responses, but the responses were mostly positive, so we focused on positive violations. The results indicated that “mutual+joint” VH encouraged more positive violation than “mutualonly” VH. In other words, subjects with “mutual+joint” VH tended to evaluate the VH’s gaze behavior more positively, compared to what they expected before, than with “mutual-only” VH (Fig. 4). T-tests showed a significant difference in subject-reported EV of VH’s gaze behavior for “mutual-only” (M = 3.643, SD = 0.842) and “mutual+joint” (M = 4.500, SD = 0.650) conditions; t(24) = -3.01, p = 0.006. When we analyzed the relationship between subject’s perceptions and their sense of EV, we observed high-reliability between the responses from 9 questions in Table 2 (Cronbach’s alpha > 0.80), so we averaged their responses into a single value and used it as their perception re- Table 1: Subject’s responses from comparison questionnaire 3. The value indicates the number of people who preferred the condition, and its percentage out of total 28 subjects in parentheses. Question Mutual+Joint Mutual-Only No Difference Which virtual human did you like more? 17 (61%) 2 (7%) 9 (32%) Which interaction did you enjoy more? 16 (57%) 8 (29%) 4 (14%) Which interaction were you more engaged with? 14 (50%) 3 (11%) 11 (39%) Which virtual human did you think more pay attention to what you were explaining? 21 (75%) 2 (7%) 5 (18%) Which virtual human did you feel that more understood what you were explaining? 17 (61%) 6 (21%) 5 (18%) Which virtual human did you feel more as if it was a real human? 16 (57%) 3 (11%) 9 (32%) Which virtual human gave you more sense of physical presence? 14 (50%) 2 (7%) 12 (43%) Which virtual human was more natural (human-like)? 18 (64%) 3 (11%) 7 (25%) 7 sponse. In Fig. 5, we found evidence of a tendency that a subjects’ perception became more positive (i.e. larger values in y-axis) as their expectancy of gaze was more positively violated (i.e. more towards 5 in x-axis) when we compared 3, 4, and 5 columns. Although the sample size (N) was small and varied, the tendency could be interpreted that subject’s perception was influenced by their expectancy, which was positively violated after the interaction. gaze behavior exceeded their expectation (positively) regardless of the presence of joint gaze. In the future, we will consider a large-sample study investigating the effects of a user’s previous experience and expectations related to various features of virtual or robotic humans. If we find a certain feature that causes a negative violation in general, which means people normally have high expectations about the feature, it would indicate that the feature should be carefully considered for future VHs. Table 2: Nine questions for subject’s perception responses from questionnaire 1 (5-scale, 1: strongly disagree, 5: strongly agree). Subject’s responses from these questions were correlated (Cronbach’s alpha > 0.80). 1. 2. 3. 4. 5. 6. 7. 8. 9. References [1] Judee K. Burgoon, Deborah A. Newton, Joseph B. Walther, and E. James Baesler. Nonverbal expectancy violations and conversational involvement. Journal of Nonverbal Behavior, 13(2):97–119, 1989. [2] Jeremy N. Bailenson, Andrew C. Beall, and Jim Blascovich. Gaze and task performance in shared virtual environments. The Journal of Visualization and Computer Animation, 13(5):313–320, 2002. [3] Rutger Rienks, Ronald Poppe, and Dirk Heylen. Differences in head orientation behavior for speakers and listeners: an experiment in a virtual environment. ACM Transactions on Applied Perception, 7(1):1–13, 2010. [4] Roel Vertegaal, Robert Slagter, Gerrit van der Veer, and Anton Nijholt. Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes. In SIGCHI Conference on Human Factors in Computing Systems, pages 301–308, 2001. [5] Alex Colburn, Michael F. Cohen, and Steven Drucker. The Role of Eye Gaze in Avatar Mediated Conversational Interfaces. In Technical Report MSR-TR-200081, Microsoft Research, 2000. [6] William Steptoe, Robin Wolff, Alessio Murgia, Estefania Guimaraes, John Rae, Paul Sharkey, David Roberts, and Anthony Steed. Eye-tracking for avatar eye-gaze and interactional analysis in immersive collaborative virtual environments. In ACM Conference on Computer Supported Cooperative Work, pages 197–200, 2008. [7] Maia Garau, Mel Slater, Vinoba Vinayagamoorthy, Andrea Brogni, Anthony Steed, and M. Angela Sasse. The Impact of Avatar Realism and Eye Gaze Control on Perceived Quality of Communication in a Shared Immersive Virtual Environment. In SIGCHI Conference on Human Factors in Computing Systems, pages 529–536, 2003. [8] Gary Bente, Felix Eschenburg, and Nicole C. Krämer. Virtual gaze. A pilot study on the effects of computer simulated gaze in avatar-based conversations. Virtual Reality (LNCS), 4563:185–194, 2007. [9] Arjun Nagendran, Remo Pillat, Adam Kavanaugh, Greg Welch, and Charles Hughes. AMITIES: Avatarmediated Interactive Training and Individualized Experience System. In ACM Symposium on Virtual Reality Software and Technology, pages 143–152, 2013. You liked the virtual human. You enjoyed the interaction with the virtual human. You were engaged in the interaction with the virtual human. You had the feeling that the virtual human was paying attention to what you explained. The virtual human seemed to understand what you explained. You had the feeling that the virtual human was a real human. You had the feeling that the virtual human was physically present in real space. The interaction with the virtual human was natural. You had the feeling that the virtual human looked at the map. Figure 5: Mean of subjects’ perception responses over selfreported sense of EV. In both “mutual-only” and “mutual+joint” conditions, a higher positive EV (x-axis) resulted in a higher value of perception (y-axis). 5 Conclusions We have presented a user study aimed at understanding the effects of a VH’s joint gaze behavior and the phenomenon of expectancy violation (EV) with respect to a human’s perception of the joint gaze behavior of a VH. As expected, joint gaze was found to be an important characteristic for subjects to build positive responses to the VH during a map explanation scenario. We also discovered preliminary evidence of a positive EV effect—subjects evaluated the VH more positively corresponding to how much the VH’s 8 Segmentation-based Graph Representation for 3D Animations Guoliang Luo gl.luo@yahoo.com Jiangxi Normal University Quanhua Tang quanhuatang@163.com Jiangxi Normal University Yihan Liu yi-han.liu@etu.unistra.fr University of Strasbourg number of vertices and fixed topology, and Dynamic Mesh that a 3D model has varied number of vertices and/or topology. In fact, a dynamic mesh can be converted into a mesh sequence by computing the vertex correspondence between neighbouring frames, which can be another challenging and computational task [1]. For the sake of simplicity, we work on mesh sequence. This paper is organized as follows. In Section 2, we first review the previous shape retrieval techniques in both Computer Graphics and Computer Vision domains. Then we introduce a segmentation method for 3D animations and their graph representation in Section 3. To validate the new representation, we apply it for computing the animation similarities in Section 3.3. After showing the experimental results in Section 4, we conclude in Section 5. Abstract 3D animation is becoming another popular multimedia data because of the development of animation technologies. A 3D animation data, as a sequence of meshes within each is a set of points, is a different data other than 2D objects or 3D static models, and thus requires new signatures for data management. In this paper, we present a new segmentation-based method for representing and comparing 3D animations. The main idea is to group both the deformed triangles and the rigid triangles on the model surface for the spatial segmentation. Then we represent the segmentation into a weighted graph, where each node denotes a triangle group, either ‘deformed’ or ‘rigid’, and edges denote neighborhoods. Moreover, we note the property of each node with a vector of geometrical attributes of the triangles within the group. Our experimental results show that the dissimilarities of the signatures reflect motion dissimilarities among 3D animations. 2. Related Works Shape retrieval is one of the most popular applications of shape signatures [2]. In this section, we briefly review the state of the art of the existing works on 3D shape retrieval. An abundant of research has been devoted on 3D shape retrieval during the last decade, which can be classified into geometry based methods [3,4,5,6] and graph based methods [7]. Geometry-based methods compare 3D models based on static geometrical properties, or shape descriptors. For example, shape histogram divides the space into blocks and counts the number of surface points within each block which corresponds to a histogram bin [3]; spin image is to map 3D vertices on surface into 2D space via the cylindrical coordinate system [4]. Other shape descriptors include spherical harmonics [5] and statistical histogram of geometrical attributes [6], etc. This object can also be achieved by comparing Keywords: 3D animation, segmentation, graph representation 1. Introduction Recent increasingly advancement of the techniques for modelling 3D animations has led to an abundant of 3D animation data, which makes the data management technique more and more necessary, such as animation data representation and shape retrieval. Although such techniques have been intensively studied for 3D static models, it remains a new challenging task for 3D animations. In computer graphics research field, 3D animations can be classified into Mesh Sequence that a 3D mesh deforms with fixed 9 multiple 2D views of each 3D model. Such method holds the advantage that it reduces the problem space from 3D to 2D, for which many existing approaches can be directly applied [8]. On comparison, the graph based methods include not only geometry information, but also the spatial connections among shape components, i.e., topological information. Most typically the graphs are the skeletons extracted from 3D objects [7]. For example, Sundar et al. match two shapes by using a greedy algorithm to compute the maximum common structures between the skeletal graphs of the two shapes [7]. More generally, with the existing techniques for extracting skeleton out of 3D models, both of the graph distance computing methods, i.e., Maximum Common Subgraph [9] and Graph Edit Distance [10], can be used to compare the skeletons. However, different from 3D shapes, 3D animations also carry dynamic behaviours, i.e., mesh deformation caused by actions, for which reason the retrieval methods of shapes may not be applicable for animations. Algorithm 1: Spatial segmentation of a 3D animation. Step 1: For each triangle, compute the maximum strain throughout all frames, i.e., s i max ( s ip ) . See p 1,..., N Figure 1(a). Step 2: By starting from a random rigid triangles whose strain is less than a threshold , we apply region-growing to group the neighboring triangles, until including any of the neighboring triangles make the average strain of the group exceeds . Each of the group is a ‘rigid’ segment. ( 0.5 in our experiments). Step 3: The above step runs iteratively until the strains of the ungrouped triangles are larger than . Step 4: For the remaining ungrouped triangles, we continue to merge the spatially reachable deformed triangles. Each of the obtained group is a ‘deformed’ segment. Step 5: The above process stops until no unground triangles remains. Finally, after removing small groups by merging with the most similar neighboring segment, a deforming mesh is divided into ‘deformed’ and ‘rigid’ segments. See Figure 1(b) and supplemental video. 3.2 Graph Representation 3. Similarities of 3D Animations We describe in this section the representation of the above segmentation results into weighted graphs, as for a graph representation. First, we represent the spatial segmentation into a graph, where each node corresponds to a spatial segment and the edges denote the neighborhoods among spatial segments. See Figure 1(c). Moreover, similar to Luo et al.’s work [12], we also represent each node with a vector of attributes. For the sake of simplicity, we use the percentage of surface area and statistical distributions of strains as node attributes. In our experiment, the distribution of strains is a histogram of 6 bins. That is, the i-th node can be noted as ni (ni1 ,..., ni7 ) , where the first 6 items are the histogram bins and the last is the percentage of surface area. Therefore, we can compare the dissimilarity d ij between two nodes as the Euclidean distance of the two vectors: To extract the dynamic behaviors of a 3D animation, we first present a spatial segmentation method to divide a mesh into ‘deformed’ and ‘rigid’ parts, and the segmentation result is further represented with a weighted graph, i.e., each graph node is represented with a set of dynamic features. This representation allows to compute the graph similarity as animation similarity. 3.1 Segmentation of 3D animations Note that one can choose among several descriptors to represent dynamic behaviors of 3D models [1,3,4,5,6,7,11]. Without losing generality, in our method, we use strain to measure the sip (i 1,..., M , p 1,..., N ) deformation of the triangle t i in frame f p , where M, N are the number of triangles within each frame and the number of frames, respectively. In [11], Luo et al. propose a normalized strain, with values ranging in [0, 1], a larger value indicating higher deformation, and vice-versa. Having the strain values of each triangle within each frame, we then process the segmentation by following Algorithm 1. d ij k 1,...,7 (nik n kj ) 2 , where ni and n j are two graph nodes. Note that different attributes are equally weighted. The weighting strategy can be optimized if any of the attribute is found highly representative of dynamic behavior. 10 (a) Strains in sampled frames, with red/blue denoting high/low deformation. ‘deformed’ ‘rigid’ (d) Wuhrer et al.’s (c) Graph representation. (b) Segmentation in left and right views. Segmentation result [14]. Figure 1: (a,b,c)Segmentation and the graph representation by our method (d)Segmentation by Wuhrer etal.’s method [14]. Figure 1(a,b,c) shows the segmentation and the representation of a galloping ‘Horse’. As can be seen, although we have not refine the segment boundaries, the graph representation clearly shows the dynamic behaviors of different parts, i.e., which segment is deformed and which is not. More segmentation results of different animations are shown in supplemental video. 3.3 Similarity of 3D Animations Now that we have the graph representations of the segmentation of two animations, we compare graph similarity for the motion similarity of the animations. The graph similarity reflects motion similarity because, as described in Section 3.2, each graph node contains the dynamic features of the corresponding spatial segment. In this work, we choose Graph Edit Distance [2] to compare graphs, who calculates the cost of operations (or edit path) to transform one graph to the other; the operations can be the addition/deletion of nodes/edges. Neuhaus et al. [13] have proposed an efficient approach based on dynamic programming that finds an edit path with minimum costs. Their method requires the inputs of the topology of two graphs and the node distances between the two graphs, and outputs the minimum cost of an optimized edit path. Table 1: Used animations in our experiments, the timings, and their dissimilarities to ‘Gallop-Camel’. Animations #Triangles #Frames Timings (second) Distance to GallopCamel/Rankings Gallop-Camel 43778 48 2.53 0/0 Gallop-Horse 16858 48 0.8 2.24/1 1.2 2.48/3 Jumping1 Jumping2 1.1 2.83/5 Walking 1.5 2.52/4 Jogging 1.3 2.45/2 29999 55 In [14], Wuhrer et al. propose a segmentation method for animated meshes by locating the segmentation boundaries in the highly deformed regions, where the deformation of each edge is measured by the maximum change of the dihedral angle over time. Comparing to the segmentation results of galloping ‘Horse’ by using Wuhrer et al.’s method, see Figure 1(d), our result in Figure 1(b) contains ‘deformed’ segments, which inherently carries more motion information. Similarity Measurement. In order to validate the effectiveness of the new representation, we proceed to compute the motion similarities by measuring graph dissimilarity. With referring to ‘Gallop-Camel’, we have obtained the distance to the other 5 animations, ‘GallopHorse’, ‘Jumping1’, ‘Jumping2’, ‘Walking’, ‘Jogging’, shown in the last column in Table 1. 4. Experiments and Discussions In this section, we present experiments with a set of 3D animations by using the presented motion similarity measurement method. Readers may refer to our supplementary video demonstrations for more results. We have implemented the proposed method in Matlab scripts and run on an Intel Pentium (R) of 2.7GHz with 2GB of memory. The used 3D animation data, together with the timings of segmentation, are shown in Table 1, which shows the efficiency of our method that all the tested data can be segmented within 2 seconds. 11 As can be seen, ‘Gallop-Camel’ has the least distance to ‘Gallop-Horse’. Moreover, comparing to jumping motions, ‘GallopCamel’ has less distance to ‘Jogging’ as both are rapid motions performed by 4-limb bodies. [4] Johnson, A. E. (1997). Spin-images: a representation for 3D surface matching Doctoral dissertation, Microsoft Research. [5] Kazhdan, M., Funkhouser, T., & Rusinkiewicz, S. (2003, June). Rotation invariant spherical harmonic representation of 3 D shape descriptors. In Symposium on geometry processing (Vol. 6). [6] Sidi, O., van Kaick, O., Kleiman, Y., Zhang, H., & Cohen-Or, D. (2011). Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering (Vol. 30, No. 6, p. 126). ACM. [7] Sundar, H., Silver, D., Gagvani, N., & Dickinson, S. (2003, May). Skeleton based shape matching and retrieval. In Shape Modeling International, 2003 (pp. 130139). IEEE. [8] Riesenhuber, M., & Poggio, T. (2000). Computational models of object recognition in cortex: A review (No. AIMEMO-1695). Massachusetts Institute of Technology Cambridge Artificial Intelligence Lab. [9] Bunke, H., & Shearer, K. (1998). A graph distance metric based on the maximal common subgraph. Pattern recognition letters, 19(3), 255-259. [10] Gao, X., Xiao, B., Tao, D., & Li, X. (2010). A survey of graph edit distance. Pattern Analysis and applications, 13(1), 113-129. [11] Luo G., Cordier F., and Seo H (2014). Similarity of deforming meshes based on spatio-temporal segmentation. Eurographics 2014 Workshop on 3D Object Retrieval. The Eurographics Association. [12] Luo, P, Wu Z., Xia C., Feng L., and Ma T. Co-segmentation of 3D shapes via multiview spectral clustering. The Visual Computer 29, no. 6-8 (2013): 587-597. [13] Neuhaus, M., Riesen, K., & Bunke, H. (2006). Fast suboptimal algorithms for the computation of graph edit distance. In Structural, Syntactic, and Statistical Pattern Recognition (pp. 163-172). Springer Berlin Heidelberg. [14] Wuhrer, S., & Brunton, A. (2010). Segmenting animated objects into nearrigid components. The Visual Computer, 26(2), 147-155. 5. Conclusions We have presented a weighted graph representation of 3D animations based on segmentation, with each node denoting a spatial segment and being weighted with a vector of geometrical properties. Additionally, we have shown the effectiveness of the representation for motion similarity measurement. This idea is driven by the fact that our weighted graph representation inherently carries motion information within the vector space of each node. Moreover, although our experiments show satisfactory results, one may extend the potential of our graph comparison by incorporating different attributes of each graph node, which helps to avoid ambiguous node matching between two graphs. Acknowledgements This work is funded by Jiangxi Normal University. We specially thank Professor Hao Wang and Associate Professor Gang Lei for providing experimental devices. References [1] Van Kaick, O., Zhang, H., Hamarneh, G., & Cohen‐Or, D. (2011, September). A survey on shape correspondence. In Computer Graphics Forum (Vol. 30, No. 6, pp. 1681-1707). Blackwell Publishing Ltd. [2] Tangelder, J. W., & Veltkamp, R. C. (2008). A survey of content based 3D shape retrieval methods. Multimedia tools and applications, 39(3), 441-471. [3] Ankerst, M., Kastenmüller, G., Kriegel, H. P., & Seidl, T. (1999, January). 3D shape histograms for similarity search and classification in spatial databases. In Advances in Spatial Databases (pp. 207226). Springer Berlin Heidelberg. 12 Motion Puppetry Taking Skeletal Similarity into Account Norihide Kaneko1 Reo Takahashi2 Issei Fujishiro3 Keio University, Japan kurikuri-bouzu@jcom.home.ne.jp1 { r.takahashi2, fuji3}@fj.ics.keio.jp been used in the motion conversion. The method was proposed by Gleicher in 1998 [1], and has become indispensable when reusing motion capture data. The original retargeting was intended as a post processing of motion capture. On the other hand, with the advent of real-time capturing devices such as Microsoft Kinect™, motion capture has begun to be used for manipulating an avatar instantly. An online motion retargeting a variety of avatars is often referred to as puppetry. Several studies have proposed actual puppetry methods, though these suffer from problems that lack interactivity or require a user to input much metadata in addition to training animations. Thus, we attempted to address these problems by incorporating into the previous works, (1) body parts classification based on the symmetry of 3DCG models, and (2) skeletal similarity. Abstract Motion retargeting is a well-known method for generating character animations in computer graphics, where the original motion of an actor can be transferred to an avatar with a different skeletal structure and/or limbs’ length. A specific type of retargeting for manipulating various avatars interactively is referred to as puppetry. However, previous learning-based works of puppetry cannot be said to be fully interactive, or require a large amount of metadata about the avatar's skeleton as well as training data of motions. Thus, we attempted to integrate existing methods by taking into account skeletal similarity between an actor and his/her avatar, to come up with an interactive and intuitive motion retargeting method that only requires a relatively small amount of training animations together with simple input metadata. Moreover, by classifying avatar’s body parts in a procedural manner, a more flexible motion puppetry was realized, where the system user is allowed to specify desirable part-to-part correspondences. 2. Related work Considering target avatars’ skeleton, motion retargeting is generally divided into two types; to human like avatars or to non-human like ones. The original work by Gleicher [1] was known as the first attempt to retarget captured motion to an avatar with different limbs’ length by applying its spatiotemporal constraints to captured motion data, whereas the avatar needs to possess a skeleton of human beings. Recent studies have achieved more flexible retargeting to non-human like avatars by learning the correspondence between a given actor’s motion data and avatar’s one. The benefit of Keywords: Computer animation, character animation, motion retargeting, joint matching. 1. Introduction Many 3D character animations in computer graphics have been created using motion capture data. In order to fill the gap between an actor’s skeleton and an avatar’s one, the procedure, called motion retargeting, has often 13 this type of retargeting method lies in a wider range of its applicability in terms of target avatar’s topology. Yamane et al. [2] used the Shared Gaussian Process Latent Variable Model (SGPLVM) technique to convert captured data to the motions of various avatars, including Luxo Jr., Squirrel, and Penguin. They combined the SGPLVM and the Inverse Dynamics method, to realize high-quality control of avatar motions. However, due to its expensive algorithms, it works only in an off-line environment. Seol et al. [3] focused most of their attention on puppetry, and indeed their interactive avatar motion control uses a linear combination of multi regression functions generated from active parameters of an avatar and those of an actor. However, unclear correspondence relations of joints between them require much metadata. Rhodin et al. [4] showed various retargeting not only for body motions but also for facial expressions through the use of Canonical Correlation Analysis (CCA), though their method cannot learn multiple retargeting animations. Figure 1: Overview of the system for correct joint matching and ease of retargeting indication. For example, the actor is allowed to transfer his or her arm motion to the avatar’s leg. Second, the joint similarities are calculated for each of the body parts. These values will be used in the next parts-based regression process. To classify the body parts, we took into account the bilateral symmetries of the 3DCG avatar’s initial poses. In Figure 2, we start a search from the root joint and examine the angle consistency of the children joints 𝑗1 , 𝑗2 , and 𝑗3 in the joint branch. Suppose that we choose 𝑗1 . Another joint 𝑗2 has angle between 𝑗1 and 𝑗2 (𝜃𝑗1,𝑗2), and the other joint 𝑗3 has angle between 𝑗1 and 𝑗3 (𝜃𝑗1,𝑗3 ). If these angles are identical, 𝑗1 is classified as spine. However, when choosing 𝑗2 , 𝜃𝑗2,𝑗1 and 𝜃𝑗2,𝑗3 have different values, and thus 𝑗2 is classified as leg. As is the case with 𝑗2 , 𝑗3 is also identified as leg. As such, spine and leg joints can be classified. If the end of leg joints with a height position value upper than a threshold, they are assumed to be an arm. A body part located at a spine end is considered as a tail. As for facial rigs, we cannot estimate their parts because they may sometimes have facial expression joints. So, the system user has to specify them manually. 3. System Overview Our approach integrates joint matching with the learning-based method proposed by Seol et al. [3], so that it does not require metadata except for training animations to realize intuitive motion retargeting. Our puppetry system consists of the learning part, which learns three kinds of motion-related data, followed by the real-time mapping part, as shown in Figure 1. The values for skeletal similarity calculated in joint matching are then referred by the subsequent parts-based regression. In the parts-based regression, the input actor’s motions are reflected directly to retargeting result. Furthermore, avatar’s intrinsic animations are reflected to the result in the motion classification process. Each of these processes will be explained in the following subsections in more detail. Joint matching is performed between the actor’s joint 𝑗𝑢 and avatar’s joint 𝑗𝑐 for the same body parts class, as in Jacobson et al. [5]. The similarity value Similarity𝑗𝑢,𝑗𝑐 between 𝑗𝑢 and 𝑗𝑐 is defined as follows. 3.1 Joint Matching The joint matching consists of two steps. First, both actor and avatar models are divided into five parts, i.e., head, tail, arm, leg, and spine 14 in the parameter values; 𝐾𝑑 the variance of the parameter; 𝐾r the regression error value described as 𝑝𝑐 − 𝜉(𝑝𝑢 ) using the data (1). In contrast, 𝐾𝑐 is the error value using the data (2). Figure 2: Body parts classification The function 𝜉 is defined as follows: Similarity𝑗𝑢 ,𝑗𝑐 = 𝜔𝑏𝑒𝑡 𝐾𝑏𝑒𝑡 (𝑗𝑢 , 𝑗𝑐 ) + 𝜔𝑜𝑟𝑖 𝐾𝑜𝑟𝑖 (𝑗𝑢 , 𝑗𝑐 ) + 𝜔𝑝𝑒𝑟 𝐾𝑝𝑒𝑟 (𝑗𝑢 , 𝑗𝑐 ) 𝑝 2 𝜉(𝑝𝑢 ) = (𝐴 − 𝐶)/(1 + ( 𝐵𝑢 ) ) + 𝐶 , where, 𝐾𝑏𝑒𝑡 denotes the difference of betweenness centralities. This value represents how far it is located from the root in the network; 𝐾𝑜𝑟𝑖 the inner product of the joint directions; and 𝐾𝑝𝑒𝑟 the percentage of the relative length of the joint. There may be missing body parts in the actor’s skeleton (e.g. tail). But it is assumed herein that all the body parts of the avatar should have their corresponding parts in the actor’s skeleton, thus the most plausible actor’s body parts are decided in terms of averaged similarity value. where 𝐴, 𝐵, 𝐶 are fitting parameters and derived from the following least square method with their maximum, minimum, and initial values: 2 𝑚𝑖𝑛A,B,C ∑𝑖∈{𝑚𝑎𝑥,𝑖𝑛𝑖𝑡,𝑚𝑖𝑛}‖𝑝𝑐,𝑖 − 𝜉(𝑝𝑢,𝑖 )‖ . The weight coefficients 𝜔𝑛 are given as the solutions of the following equation: 𝑀 2 𝑁 𝑚𝑖𝑛 ∑ |𝑝𝑐,𝑚 − ∑ 𝜔𝑛 𝜉𝑛,𝑚 (𝑝𝑢,𝑛,𝑚 )| 𝜔 𝑚=1 3.2 Parts-based Regression 𝑛=1 𝑠. 𝑡. ∑𝑁 𝑛=1 𝜔𝑛 = 1, 𝜔 > 0, where M denotes the number of frames in all the animations (2). This can be solved by the QP solver. In the parts-based regression, retargeting is carried out for each of the body parts with (1) actor’s and avatar’s motion data and (2) avatarintrinsic animation data. The data (1) comprises all the poses of the actor and avatar, whereas the data (2) has the avatar’s intrinsic animations that any actor cannot mimic easily (e.g. quadruped walk, gallop), together with its approximate animations of the actor (e.g. bipedal walk, run). 3.3 Motion Classification Motion classification is performed by SVM with the parameters for the specified parts. These parameters consist of 9 DoF per joint, actor’s joint position, speed, and acceleration. In runtime, the system estimates the motion class based on the captured actor’s parameters, and composes the avatar’s intrinsic animation of the same class together with the result of parts-based regression. After active parameters 𝑝𝑐 are extracted from the avatar’s animation data (1), the actor’s N parameters 𝑝𝑢 are found as the ones with the smallest parameter errors 𝐸𝑟𝑟𝑜𝑟𝑝𝑢 ,𝑝𝑐 in the body parts obtained through the joint matching process. Using these parameters, we generate the regression function 𝑝𝑐 = ∑𝑁𝑛=1 𝜔𝑛 𝜉(𝑝𝑢,𝑛 ) . Substituting the actor’s parameters into the function, the corresponding avatar’s animation can be estimated in runtime. We followed the definition of the parameters shown in Seol et al. [3]. 4. Results Owing to the body parts classification in joint matching, our system enables the user to map arbitrary actor’s body parts to specific avatar’s ones readily. A retargeting result is shown in Figure 3, where the correspondence between the actor’s arms and the sheep’s head is obtained. The sheep’s basic motion of running is modulated with detailed expressions according to the actor’s playacting (e.g. arm, swing and body tilting). Result of retargeting to a different avatar is shown in Figure 6. 𝐸𝑟𝑟𝑜𝑟𝑝𝑢 ,𝑝𝑐 is defined as follows. 𝐸𝑟𝑟𝑜𝑟𝑝𝑢 ,𝑝𝑐 = 𝜔𝑠 𝐾𝑠 (𝑝𝑢 , 𝑝𝑐 ) + 𝜔𝑣 𝐾𝑣 (𝑝𝑢 , 𝑝𝑐 ) + 𝜔𝑑 𝐾𝑑 (𝑝𝑢 , 𝑝𝑐 ) + 𝜔𝑟 𝐾𝑟 (𝑝𝑢 , 𝑝𝑐 ) + 𝜔𝑐 𝐾𝐶 (𝑝𝑢 , 𝑝𝑐 ), where the new term of 𝐾𝑠 for the similarity value estimated in joint matching was added to the original expression in [3]. 𝐾𝑣 denotes the differences of the directional vector of 3D avatar’s mesh geometry invoked by the change 15 animation data to learn. Also it is a challenge task to extend it so that it can handle facial models having non-divisible features by adopting additional principles like CCA [4]. Acknowledgements This work has been partially supported by MEXT KAKENHI Grant Number 25120014 and JSPS KAKENHI Grant Number 26240015. References Figure 3: Mapped character’s motion [1] M. Gleicher. Retargeting Motion to New Characters. In Proc. SIGGRAPH 98, pages 33–42, 1998. [2] K. Yamane, Y. Ariki, and J. Hodgins. Animating Non-Humanoid Characters with Human Motion Data. In Proc. SCA 2010, pages 169–178, 2010. [3] Y. Seol, C. Sullivan, and J. Lee. Creature features: Online Motion Puppetry for Non-Human Characters. In Proc. SCA 2013, pages 213–221, 2013. [4] H. Rhodin, J. Tompkin, K. I. Kim, K. Varanasi, H. Seidel, and C. Theobalt. Interactive Motion Mapping for Real-time Character Control. Computer Graphics Forum, 33:273–282, 2014. [5] A. Jacobson, D. Panozzo, O. Glauser, C.Pradalier, O. Hilliges, and O. S. Hornung. Tangible and Modular Input Device for Character Articulation. ACM Transactions on Graphics, 33:82–93, 2014. 5. Discussions The advantages of our method can be summarized in the following two points. Metadata Reduction – In the previous method [3], it is necessary for the user to give as metadata, all correspondences between his/her own joints and his/her avatar’s ones. In our method, however, the automatic matching procedure succeeded to reduce the amount of metadata drastically. Figure 4 shows the results of parts-based regression executed without metadata. In our method, the avatar’s legs optimally follow the actor’s leg motion, whereas the previous method cannot work well. a) Retargeting Indication – Grouping body parts leads to flexible linkage of body parts between the actor and avatar, as shown in Figure 5. b) 6. Future Work The present motion puppetry method could be further improved so as to require less Figure 5: Switching target body parts by simple indication. Mapping actor’s arms to legs (a), to ears (b). a) b) Figure 4: Qualitative comparison. Our method gives appropriate motions to the avatar (a), whereas the existing method [3] moves the body parts in a wrong way without metadata (b). Figure 6: Retargeting to a different avatar. 16 Real-time marker-less implicit behavior tracking for user profiling in a TV context F. Rocca, P.-H. De Deken, F. Grisard, M. Mancas & B.Gosselin Numediart Institute - University of Mons Mons, Belgium {francois.rocca, pierre-henri.dedeken, fabien.grisard, matei.mancas, bernard.gosselin}@umons.ac.be Abstract is known on the different media segments, it is possible to update the user profiling. In section 2, we present techniques for gaze estimation based on head pose and we will describe the marker-less method used in this study. Section 3 shows the descriptor used to defined the user interest. In section 4, we present the experimental setup which was used to estimate the interest and send these data to the profiling platform. Finally we conclude in section 5. In this paper, we present a marker-less motion capture system for user analysis and profiling. In this system we perform an automatic face tracking and head direction extraction. The aim is to identify moments of attentive focus in a non-invasive way to dynamically improve the user profile by detecting which media have drawn the user attention. Our method is based on the face detection and head-pose estimation in 3D using a consumer depth camera. This study is realized in the scenario of TV watching with second screen interaction (tablet, smartphone), a behaviour that has become common for spectators. Finally, we show how the analysed data could be used to establish and update the user profile. 2. State of the art The orientation of the head is less difficult to estimate than the direction of the eyes. The head direction and the eye-gaze direction are strongly correlated. Physiological studies have shown that the prediction of gaze is a combination of the direction of the eyes and the direction of the head [1]. In this work, the distance from the sensor can be up to a few meters, and at these distances, the eye tracking becomes very difficult to achieve. An initial study showed the link between eye-gaze direction and the head direction. In this study, the correlation was assessed qualitatively when user focuses his gaze on a map (Figure 1). The results were obtained with the eye tracking system FaceLAB [2] and show that the average error is 3 to 4 cm to a plane located 1 meter away, which means that the angular difference is very small. The direction of the head is intrinsically linked to the direction of the eyes. This is especially the case when still in a rotating area of the head comfortable for the Keywords: head pose estimation, viewer interest, face tracking, attention, user profiling 1. Introduction In this work we will focus on the analysis of a user sitting in front of his television. It will give us information on the spectator behavior. What draw the user interest? The analysis of the interest that the user brings to his environment is significant for the user profiling. This information can be known or estimated by different methods. The best way is to get a rapid estimation of the interest based on gaze direction and also the duration of the gaze fixation in this direction. Once the interest 17 head of the actor and they are tracked through multiple cameras. The markers are often colored dots or infrared reflective markers and the cameras depend on the markers type. Accurate tracking requires multiple cameras and specific software to compute head pose estimation but these systems are very expensive and complex, and they need for precise positioning of markers and calibration (Optitrack [4], Qualisys [5]). Marker-less tracking is another approach for face motion capture and a wide range of methods exists. Some marker-less equipment uses infrared cameras to compute tracking of characteristic points. For example, FaceLAB gives the head orientation and the position of lips, eyes and eyebrows [2]. But there are also algorithms using only a consumer webcam. We can cite Facetracker using PNP [6] and FaceAPI [2]. Marker-less systems use classical cameras or infrared cameras to compute tracking of characteristic points. Based on consumer infrared camera, we can cite the Microsoft KinectV1 SDK [7]. The KinectV1 SDK is free, easy to use and contains multiple tools for user tracking such as face tracking and head pose estimation. These tools combine 2D and 3D information obtained with the KinectV1 sensor. Based on 3D consumer sensor there are also methods using random regression forest for head pose estimation from only depth images [8]. In this work we choose to use the KinectV2 with the new version of the SDK [9]. The KinectV2 is composed by a color camera (1080p) and a depth sensor (512x424 pixels). The technology behind the new sensor is infrared TOF for time of flight. This sensor measures the time it takes for pulses of laser light to travel from the laser projector to a target surface, and then back to an image sensor. Based on this measure, the sensor gives a depth map. To achieve head pose, at least the upper part of the user's KinectV2 skeleton has to be tracked in order to identify the position of the head. The position of the head is located using the head pivot from the 3D skeleton only on the depth map. The head pose estimation is based on the face tracking and it is achieved on the color images. Consequently, the face tracking is dependent on the light conditions, even if KinectV2 is stable into darker light conditions. user. Therefore, the direction of the face gives a good indication on the look when it is not possible to clearly get the direction of the eyes. Intersection of the head direction with the screen Intersection of the eye-gaze with the screen Figure 1. The head direction and the eye-gaze are highly correlated. The gaze estimation can be achieved by calculating the orientation of the head, and these rotations have physiological limits and specific names (Figure 2). For an average user, the range of motion extending from the sagittal flexing of the head to the extension (head movement from the front to the rear) is about 60° to 70°. This movement is more commonly called “Pitch”. Regarding the front lateral flexion (movement from right to left when looking ahead), it occurs around 40° in each direction and is called “Roll”. The last movement, a horizontal axial rotation (head motion by looking horizontally from right to left), is around 78° in each direction [3] and is named “Yaw”. All the motions of head rotation can be obtained by combining these angles. In the animation industry, head pose estimation and head movements are almost exclusively captured with physical sensors and optical analysis. Physical sensors such as gyroscopes, accelerometers and magnetometers are placed on the head to compute the head rotation. Figure 2. The 3 different degrees of freedom: pitch, roll and yaw [7]. Another way for head pose estimation is marker-based optical motion capture. These systems are able to capture the subtlety of the motion because the markers are placed on the 18 3. Fixing duration and direction for interest measurements TV KinectV2 Based on the gaze estimation, or in this case on the head direction, it is possible to measure the interest of a person to a specific element of his environment by calculating the intersection between the direction of the face and a 2D plane (or a 3D volume). In this case, the TV screen will be represented by a 2D plane and another 2D plane will be used for the second screen. The previous head-pose estimation will give an indication on what the user is looking at. In a visual attention to television context, a study showed that there are four types of behavior depending on the fixing duration [10]. This classification is given in Table 1. Tablet Figure 3. User watches the TV with a tablet in his hand (second screen). The head is about 2.5 meters from the TV. 4.2 User interaction and media The system allows us to know when the user watches the TV (main screen) or the tablet (second screen) using the interest durations given in point 3. When the user does not watch TV, the image is hidden but the media runs because the user hears the sound. The user can use a keyboard to control the player and navigate into the video enrichment displayed next to the player. The video used for the tests is a mashup of CNN Student News. It has been enriched with links to related web pages that are displayed next to the video. Table 1. Attentive behavior over time. Duration ≤1.5 sec. Behavior Monitoring 1.5 sec. to 5.5 sec. Orienting 5.5 sec. to 15 sec. Engaged >15 sec. Stares These four measures of attention correspond to be firstly attracted by something with “monitoring behavior”, and then intrigued, “orienting behavior”, and more time passes more the user becomes interested, “engaged behavior”, and beyond 15 seconds the user is captivated with a “staring behavior”. These measures have been established for a TV watching and used to correctly describe the interaction with one or more screens. 4.3 Behavior analyses When the user comes into the field of view of the KinectV2, placed under the TV, his skeleton is tracked and the head orientation is estimated. The Tracker Computer performs the process and determines what the user is watching with an accuracy of a few centimeters: Screen 1 (video player or list of enrichement), Screen 2 or elsewhere. These informations are completed by attentive behavior over time and are sent to the Player Server (Figures 4 & 5). The televison displays the web page from the player server containing the media player accompanied by enrichments related to playing video segment [11]. The working flow structure is given on Figure 4. 4. Experimental setup 4.1 Placement The purpose of the experiment is to get a maximum of information on user implicit behavior in front of the TV. Several users watch a TV (main screen) and need in the same time to focus on some of the content while playing to a tablet game (second screen). The sofa is installed 2 meters away from the TV which is equipped with a KinectV2 (Figure 3). Figure 4. Overall working flow. 19 Acknowledgements 4.4 User Profiling The data coming from tracking is related to each video segment through the server player (Figure 5). The User Profiling module receives measures of interest (or disinterest) for each video segment. A score of interest could be calculated for each keyword from the profiling list. This score list allows to establish the user profile and to know by what the user is interested. This work is supported by the LinkedTV Project funded by the European Commission through the 7th Framework Program (FP7- 287911) and by the funds “Region Wallonne” WBHealth for the Project RobiGame. References [1] S. Langton, H. Honeyman, and E. Tessler, “The influence of head contour and nose angle on the perception of eye-gaze direction,” Perception and Psychophysics, vol. 66, no. 5, pp. 752–771, 2004 [2] Seeing Machines. FaceLAB5 & FaceAPI; Face and eye tracking application, 2011. [3] V. Ferrario, et al. “Active range of motion of the head and cervical spine: a threedimensional investigation in healthy young adults,” J. Orthopaedic Research, vol. 20, no. 1, pages. 122–129, 2002. [4] OptiTrack, Optical motion tracking solutions. www.optitrack.com/ accessed on 23/03/2015 [5] Qualisys. Products and services based on optical mocap. http://www.qualisys.com accessed on 19/03/2015 [6] F. Rocca, et al. Head Pose Estimation by Perspective-n-Point Solution Based on 2D Markerless Face Tracking. Intelligent Technologies for Interactive Entertainment: 6th International ICST Conference, 2014 [7] Microsoft Kinect Software Development Kit, www.microsoft.com/en-us/kinectfo rwindowsdev/Start.aspx, on 20/02/2015 [8] G. Fanelli, J. Gall, and L. Van Gool. Real time head pose estimation with random regression forests. CVPR, 617-624, 2011. [9] KinectV2 SDK, https://msdn.microsoft. com/en-us/library/dn799271.aspx, accessed on 18/03/2015 [10] R. Hawkins, S. Pingree, et al. What Produces Television Attention and Attention Style?. Human Communication Research, 31(1), pages 162-187, 2005 [11] J. Kuchař and T. Kliegr. GAIN: web service for user tracking and preference learning - a smart TV use case. RecSys '13. In Proceedings of the 7th ACM conference on Recommender systems. ACM, New York, NY, USA, 467-468. 2013 Figure 5. User interest is sent from the User Tracker to the User Profiling trough the Player Server. At the end of each session we get events timeline: the interest value for each screen, player control, etc. This will update the list of the score on the user profile (Figure 6). Figure 6. Event timeline for one session of 15 minutes 5. Conclusion In this paper, we have described a marker-less motion capture system for implicit behaviour analysis in a TV context using a consumer depth camera. This system allows us to establish and update a user profile using user interest based on head pose estimation and using the duration of fixation separated into 4 levels of attention. The aim is to identify moments of attentive focus in a non-invasive way to dynamically improve the user profile by detecting which parts of the media have drawn the user attention. 20 Robust Space-Time Footsteps for Agent-Based Steering Glen Berseth University of British Columbia Mubbasir Kapadia Rutgers University Petros Faloutsos York University Abstract body and thus achieve denser crowd packing [1]. While footstep-based steering is known to remove many artifacts from the problem of mapping human motion to the steering behaviour, there are still a number of challenges researchers face when adopting more complex agent models. In a standard footstep-based steering approach, an A* algorithm is used to find optimal paths from the agent’s current location to the agent’s target location as a sequence of footsteps. Yet, how do we know the types of footsteps to use or what is a good stepping distance range, or even how should we initially configure an agent to ensure it can reach its target? Recent agent-based steering methods abandon the standard particle abstraction of an agent’s locomotion abilities and employ more complex models from timed footsteps to physics-based controllers. These models often provide the action space of an optimal search method that plans a sequence of steering actions for each agent that minimize a performance criterion. The transition from particle-based models to more complex models is not straightforward and gives rise to a number of technical challenges. For example, a disk geometry is constant, symmetric and convex, while a footstep model maybe non-convex and dynamic. In this paper, we identify general challenges associated with footstep-based steering approaches and present a new space-time footstep planning steering model that is robust to challenging scenario configurations. Keywords: Crowd Simulation, Footsteps, Planning and Analysis We focus on the issue of making a footstepbased steering algorithm resilient to environment configuration. Specifically, we present a robust footstep-based steering algorithm to avoid invalid initial configuration and to prune undesirable and potentially unsound short term goal states. This is done in two steps. First, geometric checks are used when adding an agent to a scenario, ensuring the agent can make an initial step. Second, we add constrained random footsteps to handle cases where pre-defined step intervals can result in an inability to find a plan. These, together with the properties of the A* search method, construct a more robust steering strategy. 1 Introduction While traditionally sliding particle models have been the standard for crowd simulation, there has been recent interest in footstep-based steering. Footstep-based models do not suffer from the footskate issue, where feet turn or slide while in contact with the ground. They can easily incorporate dynamic representations of the agent’s 21 2 Related Work ditional check to make sure the agent can make a footstep from its initial configuration. Given the forward direction of the agent, a rectangular region can be traced out in front of the agent. An example initial geometry check is illustrated in Figure 1(b) that detects overlap with nearby obstacles. These two properties together ensure that the agent does not start intersecting any geometry and will be able to make an initial step. Sliding particle methods [2, 3] model the agent as a disk centred at the particle’s location. There are a number of issues when driving bipedal characters with only position information. Sliding disks can instantly change their forward direction, which is not natural for a biped. For people, complex interactions occur at doorways where tendency is to step aside for oncoming traffic, both disc models may try to push through the doorway at the same time. Resulting in an unusual sliding contact motion between the agent. Footstep-based models do not suffer from these issues and have been used in dynamic environments [4]. Footstep-based planning is used by both the computer animation community [1] and the robotics community. This work uses a similar method to both of these works but focuses on creating an algorithm that can endure random environment layouts and agent state configurations. 4 Robust Footstep Planning In a footstep-based steering method, a plan is computed between two locations that is free of collisions. The actions in the plan can be understood as a sequence of footstep actions in spacetime hstep0 . . . stepn i. The state of a footstep-based agent is defined as st = {(x, y), (ẋ, ẏ), ( fx , fy ), fφ , I ∈ {L, R}}. (1) Where (x, y) and (ẋ, ẏ) are the position and velocity of the centre of mass. The current footstep is described by the location ( fx , fy ), orientation fφ , and foot I ∈ {L, R}. Potential actions are created, using an inverted pendulum model, between states by considering an action with orientation φ , velocity and time duration. Each step has a cost related to step length and ground reaction forces. The heuristic function is then a combination of the expected cost of the step and the number of steps left to reach the goal. 3 Initial Agent Placement Extracting a valid plan to for a footstep-based steering agent is still a poorly understood problem; an infinite number of possible plans exists that could lead the agent to its target. To understand the conditions necessary to ensure a plan can be found we must analyze the problem inductively. A scenario s is a collection of agents A and obstacles O. In traditional crowd simulation the geometry of an agent is a disk. In footstep-based models the geometry for an agent is dynamic depending on the current state of the agent. This dynamic geometry suffers from complex configurations that can result in an invalid state where the agent can not proceed without colliding with an obstacle. To add agents to a scenario, particular locations could be hand selected but the most versatile method would be to add agents randomly. In order to add an agent to a scenario two properties must be satisfied. The first, which is true for any type of crowd simulation algorithm, is that the agent must not overlap any items in the scenario. A footstep-based model needs an ad- 4.1 Improved Footstep Sampling The A* planning algorithm in a footstep-based steering model is used to compute safe navigation decisions during simulation. The successor states that are generated using an A* model can be ad-hoc, with different footstep angles and durations at fixed intervals [1]. However, there is an infinite number of geometry combinations in a scenario and it is simple to construct an example scenario where an ad-hoc method can not find a valid step when many exist. To make the planning system more robust, randomized footsteps are introduced. The randomized angle orientations (in radians) and step time lengths are limited to be between 0.3 and 1.3. An example is shown in Figure 1(c), where an agent starts 22 more realistic plan. 4.3 Additional Footstep Types (a) (b) (c) (d) The last feature of the planning system is a new footstep style. In lieu of common forward stepping at different angles, footsteps that simulate in-place turning can be used. In-place turning is done by allowing the agent to take steps where the agent’s heels are close, with the feet being nearly perpendicular or the next stance foot is placed pointing inward, as shown in Figure 1(d). 5 Analysis and Results A group of metrics similar to [5] are used to compare footstep-based algorithms. These metrics use the concept of a reference-agent, Other agents added to the scenario make the scenario more challenging for the reference-agent. Scenario specific metrics are defined and then aggregate metrics over a sizable sampling of 10, 000 scenarios (S) are used. For a single scenario s, when the referenceagent reaches its target location before the max simulation time expires1 , completed is 1 and 0 otherwise. The second metric, solved, is 1 when completed is 1 and the reference-agent reaches its target location without any collisions, otherwise solved = 0. These metrics are aggregated over S with completion = ∑s∈S completed(s) and coverage = ∑s∈S solved(s). An average of completed is used over all agents in a scenario as all-completed that is equal to the percentage of agents in the simulation that have completed = 1. Similarly, all-solved is an average for solved. To measure computational performance, the time spent simulating S is computed, denoted as simTime. Three versions of footstep-based algorithms are compared. The first version is a common footstep-based method baseline [1]. The second version, baseline-with-randomization, is the baseline method with randomized footstep actions. The final version of the algorithm, robust, includes both randomized footstep actions and geometric checks. Using a combination of metrics, comparisons are made as to the effective- Figure 1: Corner cases that are avoided by the robust algorithm. The left foot is blue, the right green, the dashed box is the geometry overlap check and the green star is a generated waypoint. The navy blue squares are obstacles. in a corner and can escape after a feasable random step is generated. By adding randomized step angles and step distances, the algorithm achieves better theoretical properties for steering in any possible scenario configuration. 4.2 Finite Horizon Planning When planning is used, the path found is guaranteed to be sound from beginning to end. However, it is common to mix long range global planning with a more dynamic finite horizon planing between waypoints. When using finite horizon planning, the final state of the composed plan can put the agent in a state were the agent can not make a step. An example of an agent getting stuck straddling an obstacle is illustrated in Figure 1(a). These invalid states can be avoided by applying the same configuration checks used when an agent is randomly placed in a scenario, at the end of every short term plan. Additional optimizations can be placed on short term planning to improve efficiency and fitness. The first of these is to never execute the final action in a short term plan unless that action is a final goal state. The geometry configuration validation ensures there will be a possible footstep, but re-planning one step earlier results in a 1 We give an agent more than enough time to navigate around the boundary of the scenario twice. 23 crease the computational performance of the algorithm, as undesired search areas are avoided. Limitations A simple rectangular geometry is used to validate initial configurations. It might be possible to achieve 100% all-completed by using a more complex geometry check. Other metrics could be used to compare the algorithms, such as ground truth similarity. Future Work Any-time planning algorithms could be used to increase the acceptability of footstep-based steering methods. It is possible to further increase the coverage of the algorithm using a method such as [6] to optimize the parameters of the steering algorithm. ness of each footstep-based algorithm. The results of these comparisons can be seen in Figure 2. Figure 2: A comparison of three footstep-based algorithms, using five performance metrics. References Notably, the largest increase in fitness comes from adding randomness and in-place turning to the algorithm. These two features together allow the algorithm to generate a wider variety of possible footsteps and increase coverage from 37% to 81%. Adding geometry checks to the algorithm increases the overall completion to a perfect 100% from an original 46%. The robust version of the algorithm is surprisingly capable. After simulating ∼ 45, 100 agents, only ∼ 150 agents do not reach their target locations. The robust algorithm has significant computational performance improvements over baseline. By using random footstep actions, the algorithm explores the search space more resourcefully, avoiding locally optimal regions in favour of smoother, lower effort, plans. With geometrical checks pruning undesired branches from the search space, the final algorithm simulates the 10, 000 scenarios in ∼ 1/5 the time. [1] Shawn Singh, Mubbasir Kapadia, Glenn Reinman, and Petros Faloutsos. Footstep navigation for dynamic crowds. CAVW, 22(2-3):151–158, 2011. [2] Shawn Singh, Mubbasir Kapadia, Petros Faloutsos, and Glenn Reinman. An open framework for developing, evaluating, and sharing steering algorithms. In MIG, pages 158–169, 2009. [3] Mubbasir Kapadia, Shawn Singh, William Hewlett, and Petros Faloutsos. Egocentric affordance fields in pedestrian steering. In ACM SIGGRAPH I3D, pages 215–223, 2009. [4] Mubbasir Kapadia, Alejandro Beacco, Francisco Garcia, Vivek Reddy, Nuria Pelechano, and Norman I. Badler. Multidomain real-time planning in dynamic environments. SCA ’13, pages 115–124, New York, NY, USA, 2013. ACM. 6 Conclusion [5] Mubbasir Kapadia, Matthew Wang, Shawn Singh, Glenn Reinman, and Petros Faloutsos. Scenario space: Characterizing coverage, quality, and failure of steering algorithms. In ACM SIGGRAPH/EG SCA, 2011. A more sound footstep-based steering method has been presented. This method has been analyzed and compared to a common version of the algorithm using numerical analysis with metrics for completion and coverage. The new method is found to be excellent at avoiding invalid states and almost perfect at completing simulations. The most significant improvement to the algorithm comes from adding random and in-place stepping features. These new features also in- [6] G. Berseth, M. Kapadia, B. Haworth, and P. Faloutsos. SteerFit: Automated Parameter Fitting for Steering Algorithms. In Vladlen Koltun and Eftychios Sifakis, editors, Proceedings of ACM SIGGRAPH/EG SCA, pages 113–122, 2014. 24 Avatar Chat: A Prototype of a Multi-Channel Pseudo Real-time Communication System Kei Tanaka Dai Hasegawa Martin J. Dürst Hiroshi Sakuta Aoyama Gakuin University c5614135@aoyama.jp, {hasegawa, duerst, sakuta}@it.aoyama.ac.jp communication. When you send a message such as “Ooh really?” and receive a message such as “Yes”, it is difficult to distinguish between strong agreement and disgust. Abstract Increasingly the Internet and mobile terminals become our common tools. Text-based chat communication in mobile terminals is now widely used because of its convenience of asynchronicity. However, it is difficult for an asynchronous text-based chat system to realize rich and lively aspects of communication such as bodily information exchange (gesture, facial expression, etc.) and temporal information exchange (overlap of utterances and speech timing). In this paper, we propose a text-based chat system that allows us to deliver nonverbal information and temporal information without losing convenience (asynchronicity). To do so, we will introduce avatars and a virtual time axis into a conventional chat system. Figure 1 : Communication Style Face-to-face real-time communication encompasses non-verbal communication with bodily information and timing information. Bodily information includes gestures and facial expressions. It is thought that by using this information, emotions and feelings can be transmitted more accurately. But non-face-toface text-based chat systems cannot transmit such information. Emoticons and illustration images are used to transmit bodily information such as gestures and facial expressions. But their expressiveness is limited. Time information includes overlaps of utterances, speech pauses, and utterance speed. This characteristic is elucidated by the study of timing information in audio speech dialog. For example, Kawashima et al. [1] confirm that the presence and length of speech pauses in comic dialogue depends on the function and purpose of the utterances. Also Uchida’s study [2] considers what kind of influence utterance Keywords: Avatar, chat, communication, nonverbal information 1. Introduction Conventional communication is face-to-face communication. But with the spread of mobile terminals and the Internet, text-based chat communication came to be widely used. Communication using mobile terminals is highly convenient because it is not limited by location and time. It is an advantage that chat communication has few burdens of mind and restrictions to communication are weak because there is no need to reply immediately like in face-to-face communication. However, non-face-to-face text-based chat communication is limited in information carrying capacity in comparison with face-to-face real-time 25 speed and speech pause has on the perception of a speaker’s character, and concludes that the impression of the speaker is changed when changing the utterance speed. In text-based chat systems, users exchange texts of a certain size in turns. This makes it difficult to express the overlap of utterances. 2.1 Chat system expressing time information Yamada [3] proposes a text-based chat system allowing synchronous communication. Messages are displayed in single-word units as they are being input by keyboard. This system allows real time communication without time loss when typing, making it possible to express overlaps of utterances and speech timing. But it requires the user to continuously watch the screen to not miss the timing of each message. Figure 2 : Chat using AR avatar Many text-based chat systems can be used as synchronous-like communication tools and therefore have a higher immediacy than asynchronous communication systems such as bulletin boards. But there is still a time lag due to typing. Therefore, these systems are not real time in the same way as face-to-face communication. Also, such systems do not allow replay; therefore, timing information is not conserve and cannot be re-examined. Figure 1 shows the current state of the trade-off between richness of convenience of various communication media, and our proposal to improve it with avatar-mediated chat. To solve the problems identified above, we propose an online chat system using avatars as shown in Figure 2. This system can transmit bodily information by avatar animation and time information such as speech timing. The overall communication is captured in a log called a Communication Log. This Communication Log is extended by the participants by adding new utterances and accompanying animations at the end. Figure 3 : UI configuration Therefore the advantage of text-based chat systems - that it is not necessary to reply immediately - is lost. Also, because this system is purely text-based, it cannot transmit bodily information. 2.2 Chat system expressing emotion Kusumi et al. [4] describe an online chat communication system where an avatar is used to express part of the bodily information in three-dimensional virtual space. This system can let an avatar express emotion by selecting icon of emotion. This system can express emotion by using avatar animation, but it does neither allow to express timing information, nor to communicate synchronously. 3. AR avatar chat system In this chapter, we explain the structure of our avatar chat system which produces pseudo real-time communication. 2. Related Research 3.1 System overview The system’s user interface is shown in Figure 3. The system consists of a text input area, a list of selectable animations, an area for text display, and an area for avatar display. We In this chapter, we discuss related research for the purpose of transmission of the nonverbal information. 26 show the procedure of chat communication using this system. 1. User inputs a text message. 2. User selects an animation to be performed by the avatar from the list of animations. 3. User A chooses the timing of the start of the animation (relative to the last message from User B) using a slider. 4. User A sends the message with the related timing and animation parameters by pushing the submit button. 5. A selected number of past animations including the newest one sent by User A is played to User B. These animations are replayed continuously replayed until they are amended by a new utterance. 6. It is now User B’s turn to add a text message and an animation and choose the time delay (between 0 and 5 seconds) for the utterance, and send the message by pushing the submit button. 7. The users repeat steps 1 through 6. Users input text messages by tapping a text input area. The maximum length of a text message is 30 characters. The user can select the animation that for the avatar by pushing an animation button from the list of animations. The selected animation is displayed in the animation confirmation area, which allows the user to confirm the animation. The remark display area displays messages that have been transmitted and received. Received messages are displayed left-aligned, and sent messages are displayed right-aligned. Only one message is displayed at any given time. Messages are displayed for 2 seconds unless interrupted earlier. The animated avatars are displayed an AR markers in the avatar display area. It is usual to displays one’s own avatar on right side, and the avatar of the correspondent on the left side. notifications and stores this information in its database. The animation display module obtains the specified parameters the number of past utterances specified by the user from the parameter database. It performs the animation of the avatars and displays a message at same time. 3.3 Bodily information display function This system make avatar perform animation to express bodily information. We created 12 types of animation, each with a duration between 1 and 2 seconds, as show in Figure 5. Figure 4 : System configuration Figure 5 : Animation list We base our selection of animations on a study by Inoue et al. [5]. They extracted a set of 100 scenes, each of about 3 seconds in length, showing expression of emotions from the series “The Simpsons”. A set of 60 emotionrelated words from previous research was reduced to 37 emotional words for use in a questionnaire. Each scene’s emotional content was matched against each of the 37 words with a four-step scale. Based on the questionnaire results, the scenes were classified into five categories using factor analysis. The five emotion categories are: Negative emotion of introversion, negative emotion of extroversion, positive feelings, tense feelings, and feelings of lost interest. Based on these results, we created 3.2 System configuration The system is implemented as a Web application using Android terminals as shown in Figure 4. The application each terminal consists of a data transmission module, a data receiver module and an animation display module. The data transmission module transmits parameters that consist of input message, selected animation, display timing of animation and user information to the server by using TCP. The data receiver module receives the parameters transmitted by push 27 the “Sad” animation to express negative emotion of introversion (sadness, disappointment) by reference to that result, and the “Shout” animation to express negative emotion of extroversion, and the “Fun” animation to express positive feelings. We also created the “Shock” and “Oops” animations to express surprise and tense feeling, and the “Bored” animation to express feelings of lost interest. 4. Conclusions In this study, we proposed and implemented a chat system for the purpose of achieving virtual synchronous communication and transmitting non-verbal information. Synchronous and non-verbal communication have been both difficult to be transmitted in conventional text-based chat systems, all while maintaining the advantage of a chat system to not require immediate reply. Non-verbal bodily information is transmitted by using animated avatars. Timing information is expressed by placing the animations on a virtual timeline that can be repeatedly replayed. The next steps in our research will be evaluation and improvement of the UI, followed by evaluation experiments to investigate the influence of the timing information on conversation content and efficiency. Figure 6 : Selectable timing range of newly added animation References In addition we created 5 animations to express modalities that show utterance attitude, communication attitude to towards the listener, and attitude towards the contents of the proposition. In particular, we created the “Why” animation to express request clarification, the “?” animation to express a query, the “Yes” animation to express agreement, the “Yeah” animation to express strong agreement, and the “No” animation to express disagreement and. [1] H. Kawashima, L. Scogging, and T. Matsuyama. Analysis of the Dynamic Structure of Manzai: Toward a Natural Utterance-Timing Control. Human Interface Journal, 9(3), 379-390, 2007-08-25, in Japanese [2] T. Uchida. Impression of Speaker's Personality and the Naturalistic Qualities of Speech: Speech Rate and Pause Duration. Educational psychology research, 53(1), 113, 2005-03-31, in Japanese [3] Y. Yamada and Y. Takeuchi. Development of Free Turn Chatting System and Analysis of the dynamics of social dialog. Technical Report of the Institute of Electronics, Information and Communication Engineers, HCS, 102(734), 19-24, 2003-03-11, in Japanese [4] T. Kusumi, H. Komeda, and T. Kojima. Improving Communication in 3D-MUD (Multi User Dungeon) by Using Avatar's Facial Expression Features. Journal of the Japan Society for Educational Technology, 31(4), 415-424, 2008-03-10, in Japanese 3.4 Time information display function In order to control the speech timing, this system expresses its virtual time axis with a time seek bar. Users communicate by connecting avatar animations. The virtual time of the animation does not proceed until a new animation is added. The communication log that was created seems to be perform real-time communication when the log is played back to the users. This makes it possible to express speech timing because users arbitrarily decide the timing of the newly added animation. A reply with its animation can be started between 0 and 5 seconds after the start of the last utterance. This allows to express the interruption of past utterances by overlapping utterances as shown in Figure 6. 28 On Streams and Incentives: A Synthesis of Individual and Collective Crowd Motion Arthur van Goethem Norman Jaklin Atlas Cook IV Roland Geraerts TU Eindhoven Utrecht University University of Hawaii at Manoa Utrecht University a.i.v.goethem@tue.nl n.s.jaklin@uu.nl acook4@hawaii.edu r.j.geraerts@uu.nl Figure 1: Left: A dense crowd of agents collaboratively moves through a narrow doorway. Right: A 2D representation of the doorway shows that each agent interpolates between individual behavior (green) and coordinated behavior (red). Abstract We present a crowd simulation model that combines the advantages of agent-based and flow-based paradigms while only relying on local information. Our model can handle arbitrary and dynamically changing crowd densities, and it enables agents to gradually interpolate between individual and coordinated behavior. Our model can be used with any existing global path planning and local collision-avoidance method. We show that our model reduces the occurrence of deadlocks and yields visually convincing crowd behavior for high-density scenarios while maintaining individual agent behavior at lower densities. Keywords: crowd simulation, multi-agent system, autonomous virtual agents 1 Introduction Crowd simulation models can be divided into agent-based simulations and flow-based simulations. Agent-based simulations focus on the behaviors of each individual in the crowd. While these methods usually work well at low to medium densities, they struggle when handling high crowd densities due to a lack of coordination between the agents. 29 By contrast, flow-based simulations aim at simulating collective emergent phenomena by treating a crowd as one large entity. These techniques typically perform well with high-density scenarios because they facilitate a high level of coordination among the agents. However, they struggle to handle low- to medium-density scenarios because they omit the individuality of the crowd members. Contributions. We propose a new model that combines the advantages of agent-based and flow-based paradigms while only relying on local information. It enables the simulation of large numbers of virtual agents at arbitrary and dynamically changing crowd densities. Our technique preserves the individuality of each agent in any virtual 2D or multi-layered 3D environment. The model performs as well as existing agent-based models that focus on low- to medium-density scenarios, while also enabling the simulation of large crowds in highly dense situations without any additional requirements or user interference. Compared to existing agent-based models, our model significantly reduces the occurrence of deadlocks in extremely dense scenarios. Our model is flexible and supports existing methods for computing global paths, simulating an agent’s individual behavior, and avoiding collisions with other agents. Furthermore, it yields energy-efficient and more realistic crowd movement that displays emergent crowd phenomena such as lane formation and the edge effect [1]. 2 Overview of our model We represent each agent as a disk with a variable radius. The center of the disk is the current position of the agent. Each agent has a field of view (FOV), which is a cone stretching out from the agent’s current position, centered on the agent’s current velocity vector and bounded by both a maximum lookahead distance dmax = 8 meters and a maximum viewing angle φ = 180◦ . Let A be an arbitrary agent. We perform the following five steps in each simulation cycle: 1. We compute an individual velocity for agent A. It represents the velocity A would choose if no other agents were in sight. Our model is independent of the exact method that is used. 2. We compute the local crowd density that agent A can perceive; see Section 3.1. 3. We compute the locally perceived stream velocity of agents near A; see Section 3.2. 4. We compute A’s incentive λ. This incentive is used to interpolate between the individual velocity from step 1 and the perceived stream velocity from step 3; see Section 3.3. 5. The interpolated velocity is passed to a collisionavoidance algorithm. Our model is independent of the exact method that is used. 3 Streams We define streams as flows of people that coordinate their movement by either aligning their paths or following each other. This leads to fewer collisions and abrupt changes in the direction of movement. A dominant factor is the local density ρ. 3.1 Computing local density information We use the agent’s FOV to compute ρ. We determine the set N of neighboring agents that have their current position inside A’s FOV. We sum up the area ∆(N ) occupied for each agent N ∈ N and divide it by the total area ∆(F OV ) of A’s FOV. A FOV occupied to one third can already be considered a highly crowded situation. Thus, we multiply our result by 3 and cap it at a maximum of 1. Formally, we define the crowd density ρ as follows: ρ := min ! X 3 ∆(N ), 1 . ∆(F OV ) N ∈N 3.2 The perceived stream velocity Let B be a single agent in A’s FOV, and let xA and xB be their current positions, respectively. We define the perceived velocity vper(A,B) as an interpolation between B’s velocity vB and a vector vdir(A,B) of the same length that points along the line of sight between A and B; see Figure 2. Let ρ ∈ [0, 1] be the −xA k local density in A’s FOV, and let dA,B = kxdBmax be the relative distance between A and B. A factor fA,B = ρ · dA,B is used to angularly interpolate between vB and vdir(A,B) . The larger ρ is the more A is inclined to pick a follow strategy rather than an alignment strategy. Let N5 be a set of up to 5 nearest neighbors of A. To avoid perceived stream velocities canceling each other out, we restrict the angle between the velocities of A and each neighbor to strictly less than π2 . We define the average perceived stream speed s as follows: X 1 · ||vper(A,N ) ||. (2) s := |N5 | N ∈N5 The locally perceived stream velocity vstream perceived by agent A is then defined as follows: P vper(A,N ) N ∈N5 vstream := s · P , (3) || vper(A,N ) || N ∈N5 3.3 Incentive The incentive λ is defined by four different factors: internal motivation γ, deviation Φ, local density ρ, and time spent τ . We simulate the behavior of an agent A in a way such that – aside from the internal motivation factor – the most dominant factor among Φ, ρ and τ has the highest impact on A0 s behavior. We define the incentive λ as follows: λ := γ + (1 − γ) · max Φ, (1 − ρ)3 , τ . (4) Internal motivation γ ∈ [0, 1] determines a minimum incentive that an agent has at all times. For the local density ρ, a non-linear relation with the incentive is desired, and we use (1 − ρ)3 . vper(A,B) vB vdir(A,B) xB vA B xA A (1) Figure 2: An example of the perceived velocity vper(A,B) based 30 on an interpolation between vB and vdir(A,B) . The deviation factor Φ makes agent A leave a stream when vstream deviates too much from vindiv . We use a threshold angle φmin . Whenever the angle between vstream and vindiv is smaller than φmin , the factor Φ will be 0. This yields stream behavior unless the other factors determine a different strategy. If the angle is greater than φmin , we gradually increase Φ up to a maximum deviation of 2φmin . Angles greater than this threshold correspond to a deviation factor of 1, thus yielding individual steering behavior. Let φdev be the smallest angle between vindiv and vstream . We define the deviation factor Φ as follows: ! φdev − φmin Φ := min max , 0 , 1 . (5) φmin The time spent factor τ is used to make stream behavior less attractive the longer it takes the agent to reach its goal. We initially calculate the expected time τexp agent A will need to get to its destination. How this is done depends on how A’s individual velocity is calculated, i.e. what method is used as a black box. We keep track of the actual simulation time τspent that has passed since A has started moving. We define the time spent factor τ as follows: ! τspent − τexp τ := min max , 0 , 1 . (6) τexp Figure 3: The different scenarios in our experiments are (from top to bottom): merging-streams, crossing-streams, hallway1, hallway2, narrow-50 and military. average time t spent by an agent. A lower score is considered to be a better result. We used six different scenarios; see Figure 3. Preferred speeds were randomly chosen between 0.85 and 2.05 meters per second. We have tested our model with three popular collision-avoidance methods [5, 6, 7]; see Figure 4. We have also compared our model to the same scenarios when only individual behavior is being displayed. Here, we use the IRM together with the collision-avoidance method by Moussaı̈d et al. [7] because this yielded the best results. Figure 5 shows the corresponding mean Steerbench scores per agent over 50 runs per scenario. Figure 6 shows the average percentage of agents that did not reach their goal in a total time of 200 seconds with stream behavior turned on and off. Figure 7 shows the average running times needed to compute one step of the simulation for an increasing number of agents in the military and hallway-stress scenarios. Our model runs at interactive rates in typical gaming or simulation scenarios, even when coordination among the agents is high. Finally, let β = φdev λ be the deviation angle angle scaled by the incentive. We rotate vstream towards vindiv by β. In general, the lengths of vindiv and vstream are not equal. Therefore, we also linearly interpolate the lengths of these vectors. The resulting velocity is the new velocity for agent A in the next simulation cycle. 4 Experiments Our model has been implemented in a framework based on the Explicit Corridor Map [2]. We use one CPU core of a PC running Windows 7 with a 3.1 GHz AMD FXTM 8120 8-Core CPU, 4 GB RAM and a Sapphire HD 7850 graphics card with 2 GB of onboard GDDR5 memory. To compute vindiv , we combined our model with the Indicative Route Method (IRM) [3]. To benchmark and validate our model, we use the Steerbench framework [4]. Our benchmarking score is defined as follows: score = 50c + e + t. (7) It is comprised of the average number of collisions c per agent, the average kinetic energy e, and the 5 Conclusion and Future Work 31 We have introduced a crowd simulation model that interpolates an agent’s steering strategy between in- 2250 2000 % of agents not reaching goal mean Steerbench scores 2500 Moussaïd et al. Karamouzas et al. van den Berg et al. 1750 1500 1250 1000 750 500 250 0 crossing hallway1 hallway2 merging narrow-50 mean Steerbench scores 350 300 with streams without streams 250 200 150 100 50 0 crossing hallway1 hallway2 merging narrow-50 narrow-100 Figure 5: Mean Steerbench scores for the scenarios with our streams model turned on and off. The scores are averaged per agent over 50 runs. 80 60 40 20 0 narrow-50 narrow-100 120 hallway-stress military 100 80 60 40 20 0 0 400 800 1200 1600 2000 Figure 7: Average running times to compute one step of the simulation (in ms) for an increasing number of agents in the military and hallway-stress scenarios. Each measurement is the average of 10 runs for the same number of agents. Deadlocks frequently occur for more than 1000 agents in military. In the hallway-stress scenario, we could simulate up to 2000 agents simultaneously without any deadlocks. dividual behavior and coordination with the crowd. Local streams determine an agent’s trajectory when local crowd density is high. This allows the simulation of large numbers of autonomous agents at interactive rates. We have validated our model with the Steerbench framework [4] by measuring the average numbers of collisions, expended kinetic energy, and time spent. Experiments show that our model works as well as existing agent-based methods in low- to mediumdensity scenarios, while showing a clear improvement when handling large crowds in densely packed environments. These conclusions are also validated in the accompanying video. The flexibility to use any global planning method and any local collision-avoidance method as a black box makes our model applicable to a wide range of research fields that require the simulation of autonomous virtual agents. We believe that our model can form a basis for improving crowd movement in future gaming and simulation applications, in CGIenhanced movies, in urban city planning software, and in safety training applications. For further details on our model, we refer the interested reader to the full-length version of this paper [8]. References [1] S. J. Guy, J. Chhugani, S. Curtis, P. Dubey, M. C. Lin, and D. Manocha. Pledestrians: A least-effort approach to crowd simulation. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 119–128. Eurographics Association, 2010. [2] R. Geraerts. Planning short paths with clearance using Explicit Corridors. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, pages 1997–2004, 2010. [3] I. Karamouzas, R. Geraerts, and M. Overmars. Indicative routes for path planning and crowd simulation. 4th International Conference on Foundations of Digital Games, pages 113–120, 2009. [4] S. Singh, M. Kapadia, P. Faloutsos, and G. Reinman. Steerbench: A benchmark suite for evaluating steering behaviors. Computer Animation and Virtual Worlds, 20(5-6):533–548, 2009. [5] J. van den Berg, S. Guy, M. C. Lin, and D. Manocha. Reciprocal n-body collision avoidance. In Robotics Research, pages 3–19. Springer, 2011. [6] I. Karamouzas, P. Heil, P. van Beek, and M. Overmars. A predictive collision avoidance model for pedestrian simulation. In Motion in Games, pages 41–52. Springer, 2009. [7] M. Moussaı̈d, D. Helbing, and G. Theraulaz. How simple rules determine pedestrian behavior and crowd disasters. Proceedings of the National Academy of Sciences, 108(17):6884–6888, April 2011. [8] Arthur van Goethem, Norman Jaklin, Atlas Cook IV, and Roland Geraerts. On streams and incentives: A synthesis of individual and collective crowd motion. Technical Report UU-CS-2015-005, Department of Information and Computing Sciences, Utrecht University, March 2015. Acknowledgements This research was partially funded by the COMMIT/ project, http://www.commit-nl.nl. with streams without streams Figure 6: Percentage of agents that did not reach their goal in high-density scenarios with our streams model turned on and off, averaged over 50 runs each. avg time per simulation step (ms) Figure 4: Mean Steerbench scores of the three different collision avoidance methods for our test scenarios. The scores are averaged over 50 runs per agent. In all our experiments, lower scores are better. 100 32 Constrained Texture Mapping via Voronoi Diagram Base Domain Peng Cheng1,2 Chunyan Miao1 Nadia Magnenat Thalmann1,2 1. School of Computer Engineering, Nanyang Technological University, Singapore 2. Institute for Media Innovation, Nanyang Technological University, Singapore pcheng1, ascymiao, nadiathalmann@ntu.edu.sg Abstract Constrained Texture mapping builds extra correspondence between mesh features and texture details. In this paper, we propose a Voronoi diagram based matching method to address the positional constrained texture mapping problem. The first step is to generate a Voronoi diagram on the mesh surface taking user-defined feature points as Voronoi sites. Then we build a Voronoi diagram with corresponding feature points on the image plane. Finally, we create an exponential mapping between every pair of Voronoi cells. The proposed method is simple and efficient to allow real-time constraints editing and high-level semantic constraints. Experiments show that the proposed method achieves good result with much less constraints. 1 Introduction Texture mapping is a common technique to enhance visual realism in computer graphics. To build meaningful correspondence between surface geometry features and texture details, Lévy et al. [1] [2] [3] formulated this problem as a constrained optimization problem. Constraints are often defined as correspondence between feature points on the mesh surface and those in the 2D image. However, many constraints are needed to match only one meaningful feature. To reduce texture distortion and to avoid foldovers, users need to manually edit these constraints. This is a painstaking task. We propose a Voronoi diagram base domain method to address the constrained texture map- ping problem. The key advantage of our method is that we can achieve decent texture mapping performance by relying on only one point or curve constraint for each general feature. On the image plane, we construct a Voronoi diagram with all the constraints as Voronoi seeds. On the mesh surface, we construct the other Voronoi diagram also with constraints as Voronoi seeds. As constraints between 2D image and mesh surface are one-to-one mapped, we have Voronoi cells between the two Voronoi diagrams one-toone mapped. For each pair of the Voronoi cells, we use bilinear exponential map for the local texture mapping. Thanks to the property of closeness of Voronoi cells, we can avoid global foldover among the Voronoi cells. Moreover, the local exponential map can minimize texture distortion near feature seeds. Figure 1 is the constrained texture mapping results of proposed method with Beijing Opera faces on mask model. With only four constraints of eyes, nose, and mouth, we have effectively achieved pretty good results. Existing major work [3] need to specify 27 point constraints to map a tiger face onto human face. And expensive post-processing step of smoothing and reducing of redundant Steiner vertices are also necessary. 2 Related work Lévy et al.[1] first studied constrained texture mapping problem. To avoid foldover and obtain smooth mapping with small distortion is the main challenge of this problem. Floater and 33 Table 1: Notations in the proposed method. CM CI (a) (b) (c) (d) (e) (f) Figure 1: Texture mapping results of Beijing opera face images on mask model(499,955) of proposed method with only four constraints of eyes, nose and mouse. Hormann et al. [4] [5] summarized this problem in a geometric modeling view. Existing methods for the problem can be grouped into two categories, namely, global optimization based methods [1] [6] [7] and image warping based methods [2][3][8]. For the global optimization method [1], its objective function contains two parts. The first part implies constraints defined by user, an example is square deviation of the constraint data points. The second part controls the smoothness of the final mapping. This is a compromise between satisfying constraints and minimizing distortion of the mapping. Hard constraints can be implemented by using Lagrange multipliers. However, these methods fail to guarantee a bijective embedding. Moreover, these optimizationbased methods can not guarantee to converge. As for the image-warping based methods [3][8], Delaunay triangulation and edge swaps are frequently used to satisfy positional constraints and to avoid fold-overs. Eckstein et al. [2] constructed a 2D mesh in image plane and warped it on to mesh surface. The 2D mesh was constructed with a same topology of the mesh surface. Then mapping was created between corresponding vertices and triangles. The limitation is that it is complicated and not robust, though it may handle a large set of constraints. Kraevoy et al.[3] and Lee et al. [9] performed embedding by adding a fixed rectangular vir- VM i VM VI VIi expi 1 2 m CM = {CM , CM , · · · , CM }, point constraints set on mesh surface M CI = {CI1 , CI2 , · · · , CIm }, point constraints set in texture image space I Voronoi diagram on M with C M as the Voronoi sites Voronoi cell in VM Voronoi diagram on I with C I as Voronoi sites Voronoi cell in VI i exponential map between V M and VIi tual boundary. Then, they applied the Delaunay method to triangulate the region between true and virtual boundaries. A subsequent step of smoothing procedure is required to reduce distortion, after aligning user-specified hard constraints. In contrast, the proposed method does not need such expensive post-precessing step of smoothing, due to the property of exponential map. 3 Method In this section, we introduce a novel method based on global Voronoi base domain and local exponential map. One key advantage of our proposed method is that it relies on very few natural constraints, which can be exactly satisfied while preserving the metrics of original 3D geometry. A triangular mesh surface M is represented as the pair (T, X), where T is the topology or connectivity information, and X = {x1 , x2 , · · · , xN } is the geometric position of vertices in R3 . The input texture image I is a 2D planar space of I(u, v), where u, v ∈ [0, 1]. Notations are listed in Table1. 3.1 Overview We suppose that mesh surface has natural boundary, and texture has already been segmented. Both boundaries will be sketched by user in our implementation. Then user may interactively specify the constraints by clicking points on the surface and image. Our approach generates a one-to-one mapping form M to I in the following two steps, as shown in Figure 2. • Building Voronoi Base Domain. We first build a 2D Voronoi diagram VI in the image space. The boundary is the texture 34 boundary either segmented from the input or interactively sketched by the user. The Voronoi sites are feature constraints CI . Then we build Voronoi diagram VM on the mesh surface. The details will be described in section 3.2. Next, based on the two Voronoi diagrams VI and VM and corresponding feature constraints, We can easily generate the one-to-one mapping relation i of V . between the Voronoi cells of VM M • Computing Local Bilinear Map. For the local mapping between each pair of Voronoi cells, we assume that the area near Voronoi sites(features) is more important than that far from the Voronoi sites. The exponential map or geodesic polar coordinates has smaller distortion near its source point(Voronoi sites). Therefore, exponential map fulfills the requirement of distortion and makes a good match around feature. Section 3.3 provides the implementation details. 3.2 Building Voronoi Base Domain on Surface To compute the bisectional curve between Voronoi sites, we measure surface distance on mesh by computing geodesics from Voronoi sites to all the other vertices. Computing exact geodesic distance is expensive [10] [11]. We propose a simple and efficient method to compute a smooth distance field, which is able to lead to a smooth segmentation of the mesh surface. We flood the distance and angle field of source points to all the other points with only one traverse of each vertex. Moreover, geodesic flooding may converge fast when there are more source points. Algorithm 1 includes the detailed steps. In Algorithm 1, tagi is a flag indicating j that vertex xi lies in, as the Voronoi cell VM shown in Figure 3, di is the geodesic distance from xi to its Voronoi sites, as shown in Figure 4, and αi is the exponential angle of the corresponding geodesic curve. (a) (a) (b) (b) (c) Figure 3: Tag of each vertex indicates by color. Red, blue, yellow and green are the Voronoi cell of left eye, right eye, nose and mouth. (a) Mask model (499,955); (b) Refined mask model (31762,62467). (d) (e) (f) Figure 2: Voronoi Diagram based constraint texture mapping method workflow. (a) The input mesh surface; (b) The boundary (red curve), point constraints (green spheres) and corresponding Voronoi Diagram in the texture image; (c) Boundary (red curve), point constraints (green spheres), and corresponding Voronoi Diagram on mesh surface; (d) Parametrization result; (e), (f) Different views of textured surface. We follow [12] to update the geodesic distance and angle. In Algorithm 1, subfunction Distance(s, i) updates the geodesic distance of vertex vi by distance of its neighbor s and Angle(s, i) updates angle of the corresponding geodesic. Once we have the Voronoi cell tag and distance field of each vertex, We can compute Voronoi points of VM by matching distance to the related Voronoi cells of VI . Next, Voronoi edges are traced from the start Voroni point in the counterclockwise order of their Voronoi cell. 35 Algorithm 1 Voronoi Base Domain on Surface. Require: A mesh surface M (T, X) and feature constraints CM on surface. i deEnsure: For xi ∈ X, we compute a tag Cm notes the Voronoi cell xi lies in, the distance i , and the angleα of the cordi from xi to Cm i responding geodesic direction. for i = 0; i < X; i + + do tagi = −1; di = ∞; αi = 0; end for for i = 0; i < CM ; i + + do tagC i = i; M dC i = 0; M αC i = 0; M i ); Heap.push(CM end for while Heap.notEmpty() do s = Heap.getSmallest(); for eachi ∈ N eighbor(s) do N ewdi = Distance(s, i); if N ewdi < di then heap.push(i); di = N ewdi ; tagi = tags ; αi = Angle(s, i); end if end for end while more smooth geodesic polar coordinates using a simple linear angle interpolation. Then, we follow the same linear angle interpolation procedure in our implementation. In order to match i , we embetween each Voronoi pair VIi and VM ploy a piecewise linear scale of angle in the exponential map, as shown in Figure 5. Equation 1 shows the scale parameters. (a) (b) Figure 4: Geodesic distance field on refined mask model (31762,62467). αi 5 i=1 αi βi = 5 i=1 βi (1) 3.3 Generating Local Bilinear Mapping Our goal is to map the 2D image area VIi to suri , such that the constraint point Ci face area VM I i on in the image lies at the constraint point CM the mesh. One alternative of defining a 2D coi is the ordinate system on the surface around CM exponential map and geodesic polar coordinates [13]. The exponential map projects point v on M to the tangent plane Tv at v. For any unit vector v ∈ Tp , a geodesic g(v) with gv (0) = p and gv (0) = v is exist and unique. The challenge of computing an approximate exponential mapping expv is to trace smooth and accurate radial angles of each vertex. Schmidt et al. [14][15] extended Dijkastra’s graph distance algorithm [16] to trace angle difference with rotation angle while unfolding neighbor triangle. M elvær et al.[12] achieve Figure 5: Piecewise scale of angle for each pair of the Voronoi diagrams. 4 Experimental Results and Discussion In this section, we evaluate our proposed method using different texture images and show its validity, efficiency and robustness. In Figure 6, we apply the proposed method with checkerboard image to show the smoothness of the parametrization. In Figure 7, we compared the proposed method with constrained harmonic mapping. 36 (a) (a) (b) (b) (c) (c) (d) (d) Figure 6: (a) Checkerboard image and constraints(green spheres); (b) Mask mesh surface(499,955) and constraints; (c),(d) Textured mask surface of front and side views. (e) (f) Figure 7: Comparison with constrained harmonic map.(a) Texture image and constraints; (b) Mask surface and constraints; (c), (e) parametrization and texture mapping results of constrained harmonic mapping; (d),(f) parametrization and texture mapping result of proposed method. Although constrained harmonic mapping leads to a more smooth parametrization, the texture mapping result has big distortion as you see in Figure 7. The constraint feature of eyes and mouth have a big distortion. Moreover, constraints are prune to lead to foldovers for the minimization of Dirichlet energy. Users have to carefully maintain the structure consistence of between constraints. In contrast, our proposed method does not suffer from the distortion problem due to the property of exponential map. Graphics and Interactive Techniques, SIGGRAPH ’01, pages 417–424, New York, NY, USA, 2001. ACM. [2] Ilya Eckstein, Vitaly Surazhsky, and Craig Gotsman. Texture mapping with hard constraints. In Computer Graphics Forum, volume 20, pages 95–104. Wiley Online Library, 2001. [3] Vladislav Kraevoy, Alla Sheffer, and Craig Gotsman. Matchmaker: Constructing constrained texture maps. In ACM SIGGRAPH 2003 Papers, SIGGRAPH ’03, pages 326–333, New York, NY, USA, 2003. ACM. Acknowledgements Cheng Peng would like to acknowledge the Ph.D. grant from the Institute for Media Innovation, Nanyang Technological University, Singapore. This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its IDM Futures Funding Initiative and administered by the Interactive and Digital Media Programme Office. [4] Michael S Floater and Kai Hormann. Surface parameterization: a tutorial and survey. In Advances in multiresolution for geometric modelling, pages 157–186. Springer, 2005. [5] Kai Hormann, Bruno Lévy, Alla Sheffer, et al. Mesh parameterization: Theory and practice. 2007. References [1] Bruno Lévy. Constrained texture mapping for polygonal meshes. In Proceedings of the 28th Annual Conference on Computer [6] Bruno Lévy, Sylvain Petitjean, Nicolas Ray, and Jérome Maillot. Least squares 37 conformal maps for automatic texture atlas generation. In ACM Transactions on Graphics (TOG), volume 21, pages 362– 371. ACM, 2002. [7] Mathieu Desbrun, Mark Meyer, and Pierre Alliez. Intrinsic parameterizations of surface meshes. In Computer Graphics Forum, volume 21, pages 209–218. Wiley Online Library, 2002. the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pages 153–160. ACM, 2013. [16] Edsger W Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269–271, 1959. [8] Yuewen Ma, Jianmin Zheng, and Jian Xie. Foldover-free mesh warping for constrained texture mapping. [9] Tong-Yee Lee, Shao-Wei Yen, and ICheng Yeh. Texture mapping with hard constraints using warping scheme. Visualization and Computer Graphics, IEEE Transactions on, 14(2):382–395, 2008. [10] Vitaly Surazhsky, Tatiana Surazhsky, Danil Kirsanov, Steven J Gortler, and Hugues Hoppe. Fast exact and approximate geodesics on meshes. In ACM Transactions on Graphics (TOG), volume 24, pages 553–560. ACM, 2005. [11] Joseph SB Mitchell, David M Mount, and Christos H Papadimitriou. The discrete geodesic problem. SIAM Journal on Computing, 16(4):647–668, 1987. [12] Eivind Lyche Melvær and Martin Reimers. Geodesic polar coordinates on polygonal meshes. In Computer Graphics Forum, volume 31, pages 2423–2435. Wiley Online Library, 2012. [13] Manfredo Perdigao Do Carmo and Manfredo Perdigao Do Carmo. Differential geometry of curves and surfaces, volume 2. Prentice-Hall Englewood Cliffs, 1976. [14] Ryan Schmidt, Cindy Grimm, and Brian Wyvill. Interactive decal compositing with discrete exponential maps. ACM Transactions on Graphics (TOG), 25(3):605–613, 2006. [15] Qian Sun, Long Zhang, Minqi Zhang, Xiang Ying, Shi-Qing Xin, Jiazhi Xia, and Ying He. Texture brush: an interactive surface texturing interface. In Proceedings of 38 Hybrid Modeling of Multi-physical Processes for Volcano Animation Fanlong Kong Software Engineering Institute, East China Normal University. Changbo Wang∗ Software Engineering Institute, East China Normal University. email: cbwangcg@gmail.com Chen Li Software Engineering Institute, East China Normal University. Hong Qin Computer Science Department, Stony Brook University. Abstract physical interaction and heat transfer. Finally, multi-physical quantities are tightly coupled to support interaction with surroundings including fluid-solid coupling, ground friction, lava-smoke coupling, smoke creation, etc. Many complex and dramatic natural phenomena (e.g., volcano eruption) are difficult to animate for graphics tasks, because frequently a single type of physical processes and their numerical simulation can not afford high-fidelity and effective scene production. Volcano eruption and its subsequent interaction with earth is one such complicated phenomenon that demands multi-physical processes and their tight coupling. In technical essence, volcano animation includes heat transfer, lava-lava collision, lavarock interaction, melting, solidification, fire and smoke modeling, etc. Yet, the tight synchronization of multi-physical processes and their inter-transition involving multiple states for volcano animation still exhibit many technical challenges in graphics and animation. This paper documents a novel and effective solution for volcano animation that embraces multi-physical processes and their tight unification. First, we introduce a multi-physical process model with state transition dictated by temperature, and add dynamic viscosity that varies according to the temperature. Second, we augment traditional SPH with several required attributes in order to better handle our new multi-physical process model that can simulate lava and its melting, solidification, interaction with earth. A particle system is developed to handle multi- Keywords: Volcano Animation, Multi-physical Processes and Interaction, Heat Transfer 1 Introduction and Motivation Volcano eruption is one of the most horrified and dramatic natural phenomena on earth. It results in natural disaster and huge economic loss for human beings whenever and wherever it occurs throughout the world. Its high-fidelity simulation have attracted a large amount of scientific attention in many relevant fields ranging from geophysics, atmospheric sciences, civil engineering, to emergency management. Despite earlier research progresses in this subject, realistic simulation of volcano eruption still remains to be of high interest to graphics and animation. In movies and games with disaster scenes, highfidelity animation of volcano eruption is indispensable. Moreover, animating volcano eruption both precisely and effectively can also be of benefit to human beings in many other aspects such as volcano prevention, disaster rescue, and 39 emergency planning. In the long run, it might also be possible to consider how human beings could make better use of the enormous energy burst during volcano eruption. In graphics there have been many natural phenomena that are well animated with high precision, such as water, cloud, smoke, fire, debris flow, ice, sand, etc. Compared with the aforementioned natural phenomena, volcano eruption and its interaction with earth and atmosphere are much more complex than what a single type of physical processes could handle. For volcano eruption, there are many different types of participating media including lava, mountain, and smoke, and the interaction and movement of different materials. Complicated interaction of different substrates and multiple physical processes are involved simultaneously. Our belief is that, complex scenes such as volcano eruption can not be simulated by using traditional simple individual physical process and/or simple numerical models. Multi-physical processes must be invoked to better handle the complex scene production, therefore, it is much more challenging to design realistic models for such task. Aside from the issue of multi-physical processes, their effective integration presents another key challenge. At the technical level, in volcano animation lava can be described by a free surface fluid with larger viscosity and complicated boundary conditions, mountain can be described by a rigid body with no movement, and smoke can be described by particles driven by a vorticity velocity field. Recent works tend to focus on lava simulation rather than volcano animation, resulting in the inability of producing highly-realistic scenes. Even though Some numerical models have been proposed in the recent past, they generally require large-scale computation that is not suitable for desktop environment, hence they are less suitable for the purpose of computer animation in terms of time and precision. This paper focuses on high-performance and efficient modeling for complex volcano animation. Towards this ambitious goal, we propose a multi-physical process model that can perform efficiently for volcano animation. Our key contributions include: el multi-physical process model is proposed to handle complex scenes with multi-physical quantities and their tight coupling. To model state transition, we choose temperature as a core quantity, other quantities are dependent on temperature. Multi-physical process modeling with state transition dictated by temperature. A nov- where ρ donates the density, u donates the velocity, P donates the pressure, µ donates the fluid Discrete modeling for multi-physical process. We choose SPH as our base model, however, traditional SPH can not handle the multiphysical process model well. Temperature, particle type, lifetime, and temperature-dependent viscosity must all be integrated into traditional SPH. Moreover, after SPH particles are split into several types, a particle system is proposed to handle multi-physical interaction via heat transfer. Tight coupling of multi-physical quantities and their interaction with surroundings. Towards photo-realism, multi-physical quantities must be efficiently coupled and interact with surroundings naturally, including fluid-solid coupling, ground friction, melting, lava-smoke coupling, etc. 2 Multi-physical Process Model Volcano eruption and its subsequent interaction with earth is a complicated phenomenon that includes heat transfer, lava-lava collision, lavarock interaction, melting, solidification, fire and smoke modeling, etc. Only the tight synchronization of multi-physical processes can handle its animation in a correct way. In our multiphysical processes model, lava is described by a free surface fluid with larger viscosity and complicated boundary conditions, mountain is described by a rigid body with no movement, and smoke is described by particles driven by a vorticity velocity field. All these physical processes keep synchronized by way of the temperature. Since lava is described as fluids, the motion of lava can be formulated by Navier-Stokes equations [1]: ∂ ρ( + u · ∇)u = −∇P + µ∇ · (∇µ) + f, (1) ∂t ∇ · u = 0, 40 (2) Tp ≤ Tsolid Particle Type Lava fluid particle(High temperature) Lava fluid particle(Low temperature) Lava solid particle Mountain particle Lava fluid particle Lava solid particle Mountain particle Smoke particle Heat Transfer Heat Transfer PhaseTransition Lava Coupling Tp ≥ Tmelt Mountain Heat Transfer Generate Ground Friction Fluid-rigid Coupling Mountain Mountain (a) lava fluid particle → lava solid particle Heat Transfer Eruption Tm / Ts Tmin T Mountain Mountain Mountain (b) lava solid particle → lava fluid particle Figure 1: Workflow of the whole framework. Figure 2: (a) A phase transition process occurs at the regions where lava fluid particles make contact with mountain. (b) Transition process occurs at the regions where lava fluid particles make contact with lava solid particles. All the phase transitions are handled by the temperature. viscosity and f donates the sum of external force on the fluid. Viscosity force varies along with the temperature. Continuum equations are not suitable for computer animation because numerical integration must be invoked for domain discretization in both space and time. For more realistic appearance, lava should be incompressible. We choose a predictive-corrective incompressible SPH ( PCISPH ) model [2] as our fluid solver to simulate lava and its melting, solidification, interaction with earth. However, conventional SPH can not handle the multiphysical process model well, in order to better handle our new multiphysical process model we augment the traditional SPH with several required attributes including temperature, particle type, lifetime, the number of neighboring particles, the type of neighbor particles and adaptive viscosity. Tabel 1 shows the detailed Quantities of different particles. In Tabel 1, As heat transfer occurs in lava-lava collision, lava-rock interaction, melting, solidification, fire and smoke modeling, it is necessary to run the process of heat transfer during the entire volcano animation to guide and couple with other physical processes. In actual situation, the lava moves quickly at the eruption. During the volcano animation, lava is divided into fluid lava and solid lava in order to simulate the solidification and melting. Our phase transition model is similar to the work in [3], every particle stores a melting temperature Tm and solidification temperature Ts . Table 1: Quantities of different particles. Particle Type 3 Tight Coupling in Volcano Animation Particle Method Required Quantities particle type, density, position, Lava fluid particle SPH pressure, velocity, temperature, neighbor rigid num, In Volcano animation, lava flows has complicated interactions with surroundings including lava-rock interaction, ground friction and the coupling of lava and smoke. A single type of physical processes cannot afford the synchronization of these physical processes and physical quantities. We document a particle system with multi-physical processes to handle the interactions. All participating medias are described as particles with different quantities, so the unification of these physical processes can be guaranteed. We simulate the coupling of lava and mountain as rigid-fluid coupling.Since la- dynamic viscosity particle type, density, position, Lava solid particle SPH pressure, velocity, temperature, neighbor rigid num, dynamic viscosity Mountain particle SPH Smoke particle Particle system particle type, density, position, pressure, velocity, temperature particle type, position, velocity, life, neighbor smoke num neighbor rigid num is used in the generation of smoke particle, life records how long the smoke particle have been generated. The whole workflow of our framework is shown in Figure 1. 41 ter computing speed, we must consider certain tradeoffs. For example, our computing model simplifies the lava model as fluid (but in physical reality lava consists of liquid, ash, stone, and its complex mixture). Another observation is that, even though we model the heat transfer in volcano animation, we simply the complicated phase transition, divide the states of lava into fluid and solid. There are still many complicated phenomena such as gasification, and tephra generation being neglected. Moreover, the moving stones, ash, air and their interactions are not explicitly modeled in the interest of high performance. Future works include involving more substrates in volcano animation, taking multi-phase transformation and moving obstacles into account, etc. More accurate multi-phase and multi-physical models shall be researched for the rapid and precise production of complex scenes. va is extremely viscous, it moves very slowly on the mountain. In order to keep the reality, the ground friction can not be ignored. Similar to [4] introduced, we apply a ground friction on the lava particles. In the multi-physical processes model, the interaction between smoke and lava is an essential part. Smoke always changes along with the event of volcano eruption. The majority of smoke particles is generated at the eruption. The remaining generation of smoke particles is similar to the method how the diffuse particles are generated in [5]. Only lava fluid particle with temperature higher than Tmin generates smoke particles. References [1] K. Erleben, J. Sporring, K. Henriksen, and H. Dohlmann. Physics-based animation. Charles River Media Hingham, 2005. Figure 3: The comparison of scenes without phase transition (left column) and scenes with phase transition (right column). With phase transition, the motion of lava is closer to physical reality, and is easier to produce compounded phenomena. [2] Barbara Solenthaler and Renato Pajarola. Predictive-corrective incompressible sph. In ACM transactions on graphics (TOG), volume 28, page 40. ACM, 2009. 4 Conclusion, Discussion, and Future Work [4] A. Hérault, G. Bilotta, A. Vicari, E. Rustico, and C. Del Negro. Numerical simulation of lava flow using a gpu sph model. Annals of Geophysics, 54(5):600–620, 2011. [3] Barbara Solenthaler, Jürg Schläfli, and Renato Pajarola. A unified particle model for fluid–solid interactions. Computer Animation and Virtual Worlds, 18(1):69–82, 2007. This paper has documented an effective multiphysical process model for volcano animation. Abundant physical quantities are necessary to simulate various accompanying phenomena in volcano eruption and its subsequent interaction with earth and atmosphere. Towards the goal of high fidelity and photo-realism, physical quantities are tightly integrated in our system and a highly-effective rendering technique is also devised. In practice, in order to achieve bet- [5] M. Ihmsen, N. Akinci, G. Akinci, and M. Teschner. Unified spray, foam and air bubbles for particle-based fluids. The Visual Computer, 28(6-8):669–677, 2012. 42 Determining Personality Traits from Goal-oriented Driving Behaviors: Toward Believable Virtual Drivers Andre Possani-Espinosaa,1, J. Octavio Gutierrez-Garciab,2 and Isaac Vargas Gordilloa,3 a Department of Digital Systems b Department of Computer Science Instituto Tecnológico Autónomo de México Mexico, DF 01080 1 andre.possani@itam.mx, 2octavio.gutierrez@itam.mx, 3vargoris@hotmail.com typology test [2] consisting of seventy-two yes or no questions. After completing the test, the players were asked to play with a racing car simulator [3], which was modified in order to monitor the driving behavior of players. In this paper, the relationship between personality traits and driving behaviors is explored by using a decision tree analysis. The results show that it is possible to determine whether players are either introvert or extravert. Abstract This paper lays the foundations for the design of believable virtual drivers by proposing a methodology for profiling players using the open racing car simulator. Data collected from fifty-nine players about their driving behaviors and personality traits give insights into how personality traits should affect the behavior of believable virtual drivers. The data analysis was conducted using the J48 decision tree algorithm. Empirical evidence shows that goaloriented driving behaviors can be used to determine whether players are either introvert or extravert. The structure of the paper is as follows. Section 2 presents the Jung typology test. Section 3 introduces the open racing car simulator. Section 4 presents the data analysis and results. Section 5 includes a comparison with related work and Section 6 gives some concluding remarks. Keywords: personality traits, player modeling, driving behaviors, virtual drivers. 2. The Jung typology test 1. Introduction The Jung typology test [2] was used to determine the personality traits of players. The Jung typology test was selected due to its brevity and ease of completion. In a realistic video game, a player can compete against either human opponents or artificial opponents controlled by a computer. Believable artificial opponents are fundamental to engage players and make a video game more entertaining. In addition, Loyall and Bates indicate that believable agents must have personalities [1]. According to the Jung typology test, the following four bipolar dimensions can characterize the personality of a person: (i) the extraversion-introversion dimension, (ii) the intuition-sensing dimension, (iii) the feelingthinking dimension, and (iv) the judgingperceiving dimension. Provided that each one of the four dimensions consisted of two opposite poles, there are sixteen personality types, e.g., Extroversion-Sensing-ThinkingPerceiving. This paper lays the foundations for the design of believable artificial opponents in the context of car racing games by giving the first insights into how personality traits should affect the behavior of believable artificial opponents. To do this, personality traits of fifty-nine players were extracted. The players completed a Jung 43 with no objective, fast laps, cautious laps, and fast, but cautious laps, respectively. The fifth category is composed of aggregate features from all the laps. From each lap type, five features were extracted: lap time, maximum speed, penalty time, number of times the car goes off track to the left and to the right. The penalty time is the amount of time the car spends completely off the track. The car was considered to be off track either to the left or to the right when it was completely off the track as shown in Fig. 1. 3. The open racing car simulator The open racing car simulator [3] (TORCS) was used to collect data on driving behavior of players. 4.2 Data analysis The data analysis was conducted using a decision tree analysis. The decision tree algorithm used in this work was J48, which is implemented in the WEKA data mining software [5]). Figure 1 : Screenshot from the racetrack The training and validation sets of the decision tree for the extraversion-introversion dimension (Fig. 2) consisted of thirty and twenty-nine randomly selected instances, respectively. Consequently, approximately 50% of the data was used to create the decision tree, and the remaining instances were used to validate it. An instance was composed of the twenty-seven variables that profile the driving behavior of a player, e.g., maximum speed reached in cautious laps. Each instance was labeled as either extraversion or introversion according to the results of the personality test. The car was build taking into account realistic specifications of regular cars. The racetrack was designed to test players’ driving skills on different levels of complexity. 4. Empirical results 4.1 Data collection Fifty-nine players were asked to complete the Jung typology test in order to determine their personality type. Afterward, the players were asked to complete four pairs of laps adopting different goals. In the first pair of laps, the players were instructed to get familiar with the car, the racetrack, a thrustmaster steering wheel and a pedal set. The virtual car had an automatic transmission. In the second pair of laps, the players were asked to complete the laps as soon as possible. In the third pair of laps, the players were instructed to complete the laps as cautious as possible. Finally, in the fourth pair of laps, the players were instructed to complete the laps as fast as possible and simultaneously as cautious as possible. The players were asked to adopt different goals in each pair of laps because there is psychological evidence of a relationship between goals and personality traits [4]. It should be noted that due to the small size of the sample, there was not sufficient data to build and validate decision trees for the intuition-sensing dimension, the feelingthinking dimension, and the judging-perceiving dimension. Whereas the data set used to build and validate the decision tree for the extraversion-introversion dimension had an almost equal number of introversion and extraversion instances, in the case of the other dimensions, the instances were dominated by one of the (dimension) poles. For instance, there were only ten instances labeled as perceiving and the remaining forty-nine instances were labeled as judging. In addition to the training and validation sets, the remaining input parameters of the WEKA data mining software to build the decision tree were: (i) minimum number of instances per leaf, which was set to 2 and (ii) confidence threshold for pruning, which was set to 0.25. It In order to profile the driving behavior of players, five feature categories were defined. The first, second, third, and fourth categories are composed of features extracted from laps 44 should be noted that the decision tree was pruned to avoid overfitting. 4.3 Results From the results shown in Table 1 and Fig. 2, three observations were drawn. Observation 1. It is possible to determine whether a player is either introvert or extravert by using a relatively small decision tree composed of seven decision nodes (Fig. 2) with a success classification rate of 72.4% (Table 1). Predicted Confusion matrix of decision tree Introversion Extraversion for Extraversion-Introversion Introversion Extraversion Percentage of correctly classified instances of the training set Percentage of correctly classified instances of the validation set Overall percentage of correctly classified instances Actual 8 6 2 13 100% 72.4% 86.4% Table 1 : Confusion matrix of the decision tree for the extraversion-introversion dimension Observation 2. A combination of features from different lap types is necessary to determine personal traits of drivers. Features from different lap types are used as decision nodes of the decision tree (Fig. 2). The features were automatically selected based on entropy, namely information gain ratio, which indicates how useful is a feature for classifying the players, for instance, as either introvert or extravert. After selecting a feature, e.g., the root decision node, the feature with the highest information gain ratio is selected from the remaining features. This process is repeated until all the instances have been classified. Figure 2 : Decision tree for the extraversionintroversion dimension • To some extent, not going off track to the left in the fast laps as denoted by decision node: number of times the car goes off track to the left in fast laps ≤ 0.29. • Not being among the fastest in the fast laps as denoted by decision node: lap time in fast laps ≤ 0.94. • Not going off track to the right in the cautious laps as denoted by decision node: number of times the car goes off track to the right in cautious laps ≤ 0. • In general, sometimes going off track to the left as denoted by decision node: aggregate number of times the car goes off track to the left from aggregate features > 0. • Not attaining the highest maximum speed in the fast laps as denoted by decision node: maximum speed in fast laps ≤ 0.96. The J48 decision tree algorithm automatically selected the features that were the most informative, which corresponded to features from laps where the players pursued different goals. This confirms the relationship between goal-oriented driving behaviors of players and their personality traits. In addition, this also validates the present methodology (where players were asked to adopt different goals) for determining their personality traits. Observation 3. Overall, the profile of extravert players involves: 45 As shown in the decision tree depicted in Fig. 2, its longest branch classified the majority of the extravert players. There are other branches classifying extravert players, however, they only classified a few extravert players, and thus, may not be entirely representative. • • The above contributions lay the foundations for the design of believable virtual drivers. 5. Related work Future work will focus on exploiting the insights gained from the analysis of the empirical evidence to design and implement believable virtual drivers. The importance of believable artificial opponents to engage and entertain players is commonly stressed. Nevertheless, only a few research efforts ([6, 7, 8]) have been proposed. Acknowledgements Gallego et al. [6] propose creating virtual drivers in car racing video games by evolving neural networks using genetic algorihtms. The behavior of virtual drivers is determined by a fitness function that evaluates virtual drivers mostly based on how stable and fast they are. Gallego et al. generate efficient virtual drivers, however, the drivers may not be realistic. This work has been supported by Asociación Mexicana de Cultura A.C. References [1] A.B. Loyall and J. Bates. Personality-rich Muñoz et al. [7] contribute an imitation learning mechanism to create believable virtual drivers. Muñoz et al. assume that a believable virtual driver is a driver that imitates the driving behavior of the current human player. The main limitation of the approach of Muñoz et al. is that in order to imitate the behavior of a human player some data has to be collected to train the mechanism. [2] [3] [4] Lu et al. [8] propose a personality model for generating realistic driving behaviors. The personality model is based on (i) a threedimension model: psychoticism, extraversion, and neuroticism, and (ii) six descriptors related to each dimension. In order to generate personality-based driving behaviors, Lu et al. conducted a study involving a number of participants who labeled computer-generated driving behaviors as either aggressive, egocentric, active, risk-taking, tense, and shy. However, participants (which are required to label driving behaviors) may not know how an egocentric or a shy driving style looks like. [5] [6] [7] [8] 6. Conclusion and future work The contributions of this work are as follows. • Providing empirical evidence of the relationship between personality traits and driving behaviors of players. Obtaining the first profile of extravert players in car racing games. Devising a methodology for profiling players based on personality traits extracted from their driving behaviors. 46 believable agents that use language. In Proceedings of the 1st International Conference on Autonomous Agents, pp. 106-113, 1997 Humanmetrics Inc. Personality test based on C. Jung and I. Briggs Myers type theory, available at http://www.humanmetrics.com, 2015 B. Wymann, E. Espié, C. Guionneau, C. Dimitrakakis, R. Coulom and A. Sumner. TORCS, the open racing car simulator, v1.3.6, available at http://www.torcs.org, 2015 R.A. Emmons. Motives and Life Goals. In Handbook of Personality Psychology, Hogan, Johnson and Briggs (Eds.), Academic Press, San Diego, pp. 485-512, 1997 M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The WEKA data mining software: An update. SIGKDD Explorations, 11(1):10-18, 2009 F. Gallego, F. Llorens, M. Pujol and R. Rizo. Driving-Bots with a Neuroevolved Brain: Screaming Racers. Inteligencia Artificial, 9(28):9-16, 2005 J. Muñoz, G. Gutierrez and A. Sanchis. Towards imitation of human driving style in car racing games. Believable Bots: Can Computers Play Like People?. P. Hingston (ed.), Springer Berlin Heidelberg, pp. 289-313, 2012 X. Lu, Z. Wang, M. Xu, W. Chen and Z. Deng. A personality model for animating heterogeneous traffic behaviors. Computer Animation and Virtual Worlds, 25(3-4):363373, 2014 Virtual Meniscus Examination in Knee Arthroscopy Training Weng Bin and Alexei Sourin Nanyang Technological University procedure. Its first step is the inspection of the meniscus by deflecting it up and down with a probing instrument. The meniscus has a complicated anisotropic elasticity. First, it is very flexible in a vertical direction (indicated by arrows in Figure 1a), while it is stiff horizontally. Moreover, when the meniscus is pressed with the probe near its inner thin boundary, it touches the cartilage of the opposite bone and may develop short elastic deformations looking like wrinkles (see Figure 1a). By tactile examination of the meniscus with the probe and analysing its visual elastic deformation, the surgeons conclude about its integrity. Abstract Knee arthroscopy is a minimally invasive surgery performed on the knee joint. Virtual simulation of arthroscopy is an extremely important training tool that allows the medical students to acquire necessary motor skills before they can approach real patients. We propose how to achieve visually realistic deformation of the virtual meniscus by using linear co-rotational finite element method applied to the coarse mesh enriched with monitor points responsible for fine wrinkles simulation. The simulation is performed in real time and it closely follows the actual meniscus deformation. The generated wrinkles are easily adjustable and do not require large memory. Virtual simulation of arthroscopy becomes an extremely important training tool. There are several commercial virtual knee arthroscopy simulators available, for examples, SIMENDO arthroscopy from SIMENDO, ArthroSim from ToLTech, ARTHRO Mentor from Simbionix and ArthroS from VirtaMed. Besides, there are several relevant projects, e.g., [1], considering menisci deformations. However, very few works have been done on the simulation of the meniscus wrinkles since it is challenging when using physically-based virtual models with high resolution [2]. On the other hand, fine wrinkles on the virtual meniscus can be produced by data-driven method, however it requires pre-storage of large off-line simulation data and lacks the flexibility of tuning the wrinkle deformation. It motivated our research, and in this paper we propose a tuneable simulation method that requires much less data storage. To model the meniscus wrinkles, we enrich the concept of embedding by using it to monitor local states of soft tissue deformation. The main contributions of the paper are: /1/ We use embedding as a local deformation state monitor; /2/ We dynamically define local reference state of the wrinkles according to complex instrument-to-tissue and tissue-to- Keywords: meniscus, embedding, monitor point, arthroscopy 1. Introduction Meniscus is a vulnerable thin crescent-shaped tissue with variable thickness located within the knee joint between the two leg bones (Figure 1a). Menisci are thinner in the inner border while thicker in the outer border. (a) (b) Figure 1: (a) A frame of a real surgical video. (b) Screenshot from our simulation. For the treatment of meniscus lesion, arthroscopic surgery is the common surgical 47 coarse hexahedron mesh, which is driven by the co-rotational finite element method with an implicit solver. We use the LCP force feedback method proposed in [9], which updates and stores the LCP state in the simulation loop (25-30 Hz) and repeatedly applies in haptic loop (500-1000 Hz). The whole simulation can run in real time, but the fine wrinkles of the meniscus cannot be generated this way due to the coarse simulation mesh. tissue interactions. We have proposed and implemented a fast function-based method for local wrinkles modelling. 2. Related Works The simulations of the meniscus were commonly done using the popular finite element methods [1] and mass-spring methods [3]. Other physically-based methods could be used as well, e.g., [4], however simulation of the wrinkles would require using high resolution mesh which is challenging to generate in real time. We follow the general technique of embedding by incorporating a high resolution surface mesh of the meniscus into a coarse volumetric mesh model for real time physical simulation. Furthermore, we enrich the concept of embedding as a technique to monitor local deformation states, and define reference state for wrinkles formation of thin deformable bodies such as the menisci. 3.2. The embedding of monitor points and region points We embed so-called monitor points into the inner border of the meniscus before the simulation in order to monitor the deformation state near the inner border of the meniscus. The points are used not only to monitor the local deformation state, but also to control the local deformation near the thin border of the meniscus. To define the affected area by the monitor points, we also introduce region points. The region points should be positioned inside the meniscus, and the distance between the monitor and the region points define the wrinkle area. The embedding follows three steps: /1/ We manually select a few data point pairs (one monitor data point and one region data point) along the inner side of the meniscus (see Figures 2a); /2/ We generate two smooth curves by interpolation on these data points separately (see Figure 2b); /3/ We sample the monitor and the region points on the interpolated curves, respectively, with the desired resolution. The monitor points are then embedded into the existing simulation mesh by the same way as the embedding of the surface vertices can be done. Then, the monitor points are driven by the simulation mesh. There are many existing methods for wrinkles generation on 2D objects, e.g., skin [5] or cloth [6]. However, the meniscus is generally a 3D object and the above methods are not directly applicable to it. For the wrinkles on 3D objects, Seiler et al. first proposed a data-driven method in [2], and then improved their method in [7] and [8]. In these works, fine wrinkles are generated by adding pre-stored high-resolution offline simulation data to low-resolution simulation. The storage of large offline simulation data is necessary. Since the fine details relied on the offline simulation results, it would take significant time to tune the deformation. On the contrary, our method does not need to store large offline simulation data, and it is easy to tune by adjusting a few parameters. 3. Meniscus wrinkles deformation with 3.1. The overall simulation of the meniscus examination (a) We use two Geomagic Touch desktop haptic devices together with a 3D-printed knee model, and a video monitor for displaying the simulated virtual scene. We embed the highresolution meniscus surface mesh into the (b) Figure 2: (a) The red points (on the meniscus boundary) are monitor points while the blue points (in the middle of the meniscus) are region points. (b) Two respective B-spline curves are reconstructed. 48 With reference to Figure 3, the affected region on the surface is defined by both the monitor and the region points. Each surface point can be affected by several monitor points, and there can be overlapping affected areas. if count >Nc then needWrinkles ← true the monitor points in the last step as reference points end if end if if numberOfCollideWithLowerSurface < Ne && collideWithInstrument=false then needWrinkles ← false wrinkleReferencePoints ←NULL end if if needWrinkles = true then implement the wrinkling algorithm end if Pjs Pim 0 Pir 0 Pjp Figure 3: The projection of the surface points on the region line, where Pim 0 is the monitor Pkwc Pkmc point, Pir 0 is the region point, P js is the Pkw 0 surface point and Pjp is the projected points from the surface points to the region line. Figure 4: The displacements of the current monitor points. The green points (below) are the reference position of the wrinkles, the red points (on top) are the current monitor points, the purple points are the displaced points (called wrinkle points), and the black vectors are the displacements. 3.3. Definition of the reference state of the wrinkles We need to capture the deformation state just before the wrinkles appear, and store this state as a reference for the modelling of the wrinkles. To monitor the state of deformation, we need the monitor points both in the current and in the last time step. In each time step, according to the instrument-to-tissue contact point, we select a few points Pkmc to monitor the local deformation state. The interaction between the tissues is also needed to be considered. If the number of the collision points is larger than Ns, we start the monitoring. If the number of the points that move in the opposite direction with last step is larger than Nc, we store the monitor points in the last step as reference points. A threshold Ne is set for the ending of the wrinkles. If the number of the collision points is smaller than the Ne, we stop adding the wrinkles and clear the reference state. The reference state of the wrinkles defined by this process can adapt to the complex instrumenttissue and tissue-to-tissue interaction. The process is summarized in the following pseudo code. 3.4. A function-based method to generate the meniscus wrinkles As long as the reference state for the wrinkles is found, the wrinkles can be generated by displacing current wrinkle points with respect to the reference state (see Figure 4). We use a simple and efficient method to generate the magnitude of the wrinkles based on the sine function: (1) f (x) = A×sin(x) Both the magnitude A and the function domain are tuneable for controlling the wrinkles shape. After the generation of the wrinkle points, the displacements between the wrinkle and the monitor points are propagated by the relation described in Section 3.2. A Gaussian function is used to control the attenuation of the displacements propagation. if numberOfCollideWithLowerSurface >Ns && collideWithInstrument=true && needWrinkles=false then Select Pkmc according to the instrument-meniscus 4. Analysis Results contact point for each Pkmc if move in the opposite direction with last step then count++ end if end for of the Modelling We implemented the proposed method using the open source platform SOFA (Simulation 49 Open Framework Architecture) [10]. Figure 1b illustrates the simulated deformation process and compares it with the actual meniscus deformation (Figure 1a). The whole simulation can be run with about 50 FPS on a desktop computer with Intel Xeon dual core 2.27 GHz CPU and 12 GB RAM. Compared with the data-driven method [7], where pre-storage of 3.33 MB data is required, our method requires only 12.9 KB of monitor points and region points. Besides, by adjusting the value of A and the parameters of sine function in Eq. 1, we can easily tune the wrinkles deformation without re-generation of the off-line simulation data. Although we generate visually plausible wrinkles on meniscus, the quantitative errors might be large due to the different bulges positions of wrinkles with real surgery. Our method still cannot produce the dynamic waving movement of the wrinkles. References [1] Wang, Y., et al., vKASS: a surgical procedure simulation system for arthroscopic anterior cruciate ligament reconstruction. Computer Animation and Virtual Worlds, 2013. 24(1): p. 25-41. [2] Seiler, M., J. Spillmann, and M. Harders, Enriching coarse interactive elastic objects with high-resolution data-driven deformations, in Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation. 2012, Eurographics Association: Lausanne, Switzerland. p. 9-17. [3] Jinghua, L., et al. A knee arthroscopy simulator for partial meniscectomy training. in Asian Control Conference, 2009. ASCC 2009. 7th. 2009. [4] Nealen, A., et al., Physically Based Deformable Models in Computer Graphics. Computer Graphics Forum, 2006. 25(4): p. 809-836. [5] Rémillard, O. and P.G. Kry, Embedded thin shells for wrinkle simulation. ACM Trans. Graph., 2013. 32(4): p. 1-8. [6] Rohmer, D., et al., Animation wrinkling: augmenting coarse cloth simulations with realistic-looking wrinkles. ACM Trans. Graph., 2010. 29(6): p. 1-8. [7] Seiler, M., J. Spillmann, and M. Harders, Data-Driven Simulation of Detailed Surface Deformations for Surgery Training Simulators. Visualization and Computer Graphics, IEEE Transactions on, 2014. 20(10): p. 1379-1391. [8] Seiler, M.U., J. Spillmann, and M. Harders, Efficient Transfer of ContactPoint Local Deformations for Data-Driven Simulations. 2014. p. 29-38. [9] Saupin, G., C. Duriez, and S. Cotin, Contact Model for Haptic Medical Simulations, in Proceedings of the 4th international symposium on Biomedical Simulation. 2008, Springer-Verlag: London, UK. p. 157-165. [10] Faure, F., et al., SOFA: A Multi-Model Framework for Interactive Physical Simulation, in Soft Tissue Biomechanical Modeling for Computer Assisted Surgery, Y. Payan, Editor. 2012, Springer Berlin Heidelberg. p. 283-321. 5. Conclusion and Future Work We have proposed and successfully implemented an efficient method for generation of local meniscus wrinkles when it is examined with the virtual surgical probe. Compared with the existing methods, our method produces fine wrinkles deformation with less memory. The wrinkles generation can be integrated into the meniscus cutting process, which is a way how the meniscus injury can be treated surgically. This will involve the update of the monitor points and control regions. Acknowledgements This project is supported by the Ministry of Education of Singapore Grant MOE2011-T2-1006 “Collaborative Haptic Modeling for Orthopaedic Surgery Training in Cyberspace”. The project is also supported by Fraunhofer IDM@NTU, which is funded by the National Research Foundation (NRF) and managed through the multi-agency Interactive & Digital Media Programme Office (IDMPO) hosted by the Media Development Authority of Singapore (MDA). The authors sincerely thank Mr. Fareed Kagda (MD), arthopedic surgeon, for rendering his evaluation of the simulation results and useful advises. 50 Space Deformation for Character Deformation using Multi-Domain Smooth Embedding Zhiping Luo Utrecht University Z.Luo@uu.nl Remco C. Veltkamp Utrecht University R.C.Veltkamp@uu.nl Abstract Arjan Egges Utrecht University J.Egges@uu.nl enough to interpolate surfaces. Radial basis functions (RBFs) are the most versatile and commonly used smooth interpolation techniques in graphics and animation. Radial basis functions mostly are based on Euclidean distances. In this case, movements of a branch of a model therefore might affect others when they are close to that branch in Euclidean space, often happening in character deformation. Levi and Levin [4] compute geodesic distances to overcome the limitations. Nevertheless, the distance metrics heavily depend upon the mesh topology or representation, leading to a loss of generality. A skeletal character consists of limb segments and the deformation of each segment is locally controlled. Vaillant et al. [5] propose to partition the character into multiple bone-associated domains, and approximate deformations of each segment by a field function. The method, however, obtains speedups by hardware acceleration, and requires a composition of the field functions into a global field function, which is difficult to realize in practice. We propose a novel space deformation method based on domain-decomposition to animate character skin. The method supports smoothness and local controllability of deformations, and can achieve interactive interpolating rates. Given a character, we partition it into multiple domains according to skinning weights, and attach each to a linear system, without seam artifacts. Examples are presented for articulated deformable characters with localized changes in deformation of each near-rigid body part. An application example is provided by usage in deformation energies, known to offer preservation of shape and volumetric features. Keywords: space deformation, radial basis functions, character deformation, deformable surface 1 Introduction Space deformation is a common acceleration strategy used in nonlinear variational deformation (e.g. [1]), supporting preservation of shape-details and volumetric properties, and deformable solids such as Finite Element model [2], offering interior dynamics in addition to quasistatic skins, to produce realistic deformations. In such contexts, a coarse representation loosely enclosing the original mesh surface is established, carrying out the expensive computations, and the resulting deformations are propagated back to the original mesh by efficient space deformation. Shepard’s interpolation scheme [3], though, is extensively adopted as space deformation, it is not smooth Contribution We propose smooth embedding based on radial basis functions (RBFs) for character deformation. To avoid interplay between mesh branches, we partition the character into multiple domains according to associated skinning weights, and attach each to a small, linear system of a local RBFs. Regions at and around boundaries of contact skin parts are smoothed in the post-processing, avoiding seam artifacts. In contrast to [5, 6] , our method does not blend field functions, instead we only introduce a simple, geometric post-processing to remove seam artifacts. 51 Figure 1: Pipeline overview: We first sample a set of points (black dots), ready to be RBF centers, on the mesh surface, and then partition the character into multiple domains based on skinning weights, indicated by colors. Each segment, indicated by a colour, is accordingly associated with a small group of samples for the construction of a local RBFs interpolation. Our method is applied in interactive applications. Our multi-domain RBFs interpolation scheme can run at interactive rates and gives rise to smooth shape deformations. We test our method in a nonlinear variational deformation energy. The results demonstrate the effectiveness. human hand, our method still addresses the limitations, given a proper weight map. In practice, such weights are often painted by skillful artists to provide a guarantee. We also computed exact geodesic distances [9] as alternatives to Euclidean metrics. Single RBFs with these values, though, yields similar results to our method, it might have drawbacks, e.g., low order smoothness at and around contact regions. Figure 2 shows the comparisons. We iteratively employ Laplacian smoothing to remove the sharp features at and around the boundaries of contact regions, specifically using 2 Multi-domain smooth space deformation We use RBFs with compact support to interpolate displacements Φ(x) = e( −kx−ck2 ) σ2 + aT · x + b, (1) where c ∈ R3 is the RBF center, and x ∈ R3 is the evaluation point. Presence of the linear term helps to reproduce global behavior of the function. The resulting sparse matrix is solved by LU decomposition, leading to the identical result. In our implementation, σ is average distance between RBF centers by the guideline of ALGLIB [7] which is a popular numerical analysis library, guaranteeing that there will be several centers at distance σ around each evaluation point. An overview of our method is shown in Figure 1. We employ Poisson disk sampling [8] to sample the RBF centers on the mesh surface, and based on the resulting segmentation, we consequently put them into corresponding clusters. In our space deformation, only the RBF centers are embedded, and the original surface is left to a set of domain-associated RBFs interpolation systems. For complex figures such as vit+1 = vit + P 1 j∈N (i) ωi,j X ωi,j (vjt − vit ), j∈N (i) (2) where N (i) is N one-ring neighboring vertices of vi , and ωi,j is cotangent weight [10]. The method is efficient even in complex motions including twisting and bending, as illustrated in Figure 3. 3 Results An experiment is conducted to investigate how the number of samples affects the performance. We increase the number of RBF centers, and report the resulting timings in Table 1, which shows the efficiency of our method for larger number of RBF centers. Our method succeeds in volumetric deformation using volumetric PriMO energy [1] for instance, as shown in Figure 4. In more detail, the 52 Figure 2: Both single RBFs with geodesic distances and our method avoid interplay between fingers, whereas single BRFs with Euclidean distances does not. Our method, however, supports higher order smoothness than geodesic single-RBFs. The trackball indicates rotating motion of a finger. Figure 3: Elbow twisting (left). Laplacian smoothing by 5 iterations. Leg bending (right). Smoothing by 3 iterations. Initialize (ms) Runtime (ms) Total (ms) Method Single Multi-domain Single Multi-domain Single Multi-domain 50 1.02 0.18 19.54 28.17 20.56 28.35 100 3.92 0.43 30.51 42.52 34.43 42.95 150 13.63 1.08 44.94 58.77 58.57 59.85 200 27.98 1.90 57.81 74.11 85.79 76.01 250 52.34 3.26 72.56 89.81 124.90 93.07 300 91.48 5.53 86.56 105.73 178.04 111.26 350 144.99 8.51 103.09 119.82 248.08 128.33 Table 1: Timings: Multi-domain interpolation shows advantage increased in number of samples. Note that the additional runtime cost in our method, with respect to single RBFs interpolation, is introduced by the smoothing step rather than solving multiple linear systems themselves. The time is in milliseconds. The model has 5,103 vertices, 10,202 triangles and six domains. model with 5, 406 vertices, 10, 808 triangles and 300 samples on surface (Figure 4), the space deformation using multi-domain smooth embedding, only takes average 0.07 s per frame, volumetric PriMO energy as a nonlinear variational deformation technique, while keeping deformation as-rigid-as-possible, is computationally expensive. In our test, of the Armadillo 53 multi-domain subspace deformations. In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’11, pages 63–72, 2011. whereas the volumetric deformation costs average 1.64 s per frame, showing that the runtime cost of space deformation is minimum without introducing significant overload. [3] R.E Barnhill, R.P Dube, and F.F Little. Properties of shepard’s surfaces. Journal of Mathematics, 13(2), 1983. [4] Zohar Levi and David Levin. Shape deformation via interior rbf. Visualization and Computer Graphics, IEEE Transactions on, 20(7):1062–1075, July 2014. Figure 4: We apply multi-domain space deformation in an energy minimization based deformation technique, namely the volumetric PriMO energy with voxels [1]. [5] Rodolphe Vaillant, Loı̈c Barthe, Gaël Guennebaud, Marie-Paule Cani, Damien Rohmer, Brian Wyvill, Olivier Gourmel, and Mathias Paulin. Implicit skinning: Real-time skin deformation with contact modeling. ACM Trans. Graph., 32(4):125:1–125:12, July 2013. 4 Conclusion In this paper, we presented a method of space deformation based on domain-decomposition for character deformation. The method is based on radial basis functions, supporting smoothness without seam artifacts. We have applied our method in a nonlinear variational deformation technique, demonstrating the efficiency of the method. Our method is likely to yield domains with very small number of samples for skinned model with many bones, making the resulting interpolation matrices inefficient. A possible solution is to group the contact domains associating with minimum number of samples as a new domain, and update the RBFs accordingly. Our domain decomposition method holds promise to perform surface deformation in realtime animation and simulation applications on commodity laptops in the near future. Acknowledgement: This publication was supported by the Dutch national program COMMIT. [6] Rodolphe Vaillant, Gäel Guennebaud, Loı̈c Barthe, Brian Wyvill, and MariePaule Cani. Robust iso-surface tracking for interactive character skinning. ACM Trans. Graph., 33(6):189:1–189:11, November 2014. [7] Alglib: a cross-platform numerical analysis and data processing library. available online at http://www.alglib. net/. Accessed: 2015-03-03. [8] Kenric B. White, David Cline, and Parris K. Egbert. Poisson disk point sets by hierarchical dart throwing. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing, RT ’07, pages 129–132, 2007. [9] Shi-Qing Xin and Guo-Jin Wang. Improving chen and han’s algorithm on the discrete geodesic problem. ACM Trans. Graph., 28(4):104:1–104:8, September 2009. References [1] Mario Botsch, Mark Pauly, Martin Wicke, and Markus Gross. Adaptive space deformations based on rigid cells. Computer Graphics Forum, 26(3):339–347, 2007. [10] Mark Meyer, Mathieu Desbrun, Peter Schröder, and Alan H Barr. Discrete differential-geometry operators for triangulated 2-manifolds. In Visualization and mathematics III, pages 35–57. Springer, 2003. [2] Theodore Kim and Doug L. James. Physics-based character skinning using 54