ppt file - Prof Paul McKevitt

advertisement
Animating Virtual Humans in Intelligent
Multimedia Storytelling
Minhua Eunice Ma and Paul Mc Kevitt
School of Computing and Intelligent Systems
Faculty of Engineering
University of Ulster, Magee
Derry, Northern Ireland
Outline
 State-of-the-art virtual human animation standards




VRML/X3D & MPEG-4 for object modelling
H-Anim & MPEG-4 SNHC for humanoid modelling
VHML & STEP for human animation modelling
Natural language to 3D animation
 Language visualisation (animation) in intelligent
multimodal storytelling system, CONFUCIUS




Humanoid animation in CONFUCIUS
Multiple animation channels
Space sites of virtual humans
Virtual object manipulation
 Conclusion & future work
PGNet 2005
Liverpool, 28 June 2005
Four levels of virtual human representation
Current virtual human representation languages can be classified
to four groups according to the levels of abstraction, starting from
3D geometry modelling to language animation.
high level
animation
CONFUCIUS AnimNL
VHML (BAML)
XML-based
STEP
script-based
H-Anim
low level
animation
VRML (X3D)
Level 4
Natural language to animation
Level 3
Human animation modelling
MPEG-4
SNHC
Level 2
3D human modelling
MPEG-4
Level 1
3D object modelling
PGNet 2005
Liverpool, 28 June 2005
Level 1: 3D object modelling
high level
animation
CONFUCIUS AnimNL
VHML (BAML)
XML-based
STEP
script-based
H-Anim
low level
animation
VRML (X3D)
Level 4
Natural language to animation
Level 3
Human animation modelling
MPEG-4
SNHC
Level 2
3D human modelling
MPEG-4
Level 1
3D object modelling
 VRML (Virtual Reality Modelling Language) is a hierarchical scene description
language that defines the geometry and behaviour of a 3D scene. X3D is the
successor to VRML.
 MPEG-4 uses BIFS (Binary Format for Scenes) for real-time streaming. BIFS
borrows many concepts from VRML. BIFS and VRML can be seen as different
representations of the same data.
PGNet 2005
Liverpool, 28 June 2005
Level 2: 3D human modelling
high level
animation
CONFUCIUS AnimNL
VHML (BAML)
XML-based
STEP
script-based
H-Anim
low level
animation
VRML (X3D)
Level 4
Natural language to animation
Level 3
Human animation modelling
MPEG-4
SNHC
Level 2
3D human modelling
MPEG-4
Level 1
3D object modelling
 H-Anim is a stardard VRML97 representation for humanoids. It defines standard
human Joints articulation, segments dimensions, and sites for “end effector” and
attachment points for clothing.
 MPEG-4 SNHC (Synthetic/Natural Hybrid Coding) incorporates H-Anim and
provides an efficient way to animate virtual human and tools for the efficient
compression of the animation parameters associated with the H-Anim human model.
PGNet 2005
Liverpool, 28 June 2005
H-Anim joint-segment hierarchy
 An H-Anim file contains a jointsegment hierarchy.
 Each joint node may contain
other joint nodes and a
segment node that describes
the body part associated with
the joint.
 Each segment is a normal VRML
transform node describing the
body part's geometry and
texture.
 H-Anim humanoids can be
animated using keyframing,
inverse kinematics, & other
animation techniques.
PGNet 2005
Liverpool, 28 June 2005
H-Anim models on the Web
Virtual
human
models
Nancy1
Authors
Cindy
Christian
Ballreich Babski
Baxter,
Nana2
Y.T.,
Hiro3
Dilbert3
Matt
Matt
Beitler Beitler
Max3
Jake3
Dork4
Matt
Matt
Michael
Beitler Beitler Miller
URLs: 1http://www.ballreich.net/vrml/h-anim/nancy_h-anim.wrl
2http://ligwww.epfl.ch/~babski/StandardBody
3http://www.cis.upenn.edu/~beitler/H-Anim/Models/H-Anim1.1/
4http://students.cs.tamu.edu/mmiller/hanim/proto/dork-proto.wrl
PGNet 2005
Liverpool, 28 June 2005
Level 3: Human animation modelling
high level
animation
CONFUCIUS AnimNL
VHML (BAML)
XML-based
STEP
script-based
H-Anim
low level
animation
VRML (X3D)
Level 4
Natural language to animation
Level 3
Human animation modelling
MPEG-4
SNHC
Level 2
3D human modelling
MPEG-4
Level 1
3D object modelling
 VHML (Virtual Human Mark-up Language) is an XML-based language which
provides an intuitive way to define virtual human animation. It is composed of
several sub-languages: DMML, FAML, BAML, SML, and EML.
 STEP is a scripting language for human actions. It has a Prolog-like syntax,
which makes it compatible with most standard logic programming languages.
PGNet 2005
Liverpool, 28 June 2005
VHML & STEP examples
<left-calf-flex amount=”medium”>
<right-calf-flex amount=”medium”>
<left-arm-front amount=“medium">
<right-arm-front amount=“medium">
Standing on my knees I beg you pardon
</right-arm-front></left-arm-front>
</right-calf-flex></left-calf-flex>
A. A VHML example
script(walk_forward_step(Agent),ActionList):ActionList=[parallel( [script_action(
walk_pose(Agent),
move(Agent,front,fast)
])].
B. A STEP example
PGNet 2005
Liverpool, 28 June 2005
Level 4: Natural language to animation
high level
animation
CONFUCIUS AnimNL
VHML (BAML)
XML-based
STEP
script-based
H-Anim
low level
animation
VRML (X3D)
Level 4
Natural language to animation
Level 3
Human animation modelling
MPEG-4
SNHC
Level 2
3D human modelling
MPEG-4
Level 1
3D object modelling
 High level animation applications converting natural language to virtual human
animation. Little research on virtual human animation focuses on this level.
 The AnimNL project aims to enable people to use natural language instructions
to tell virtual humans what to do
 CONFUCIUS also deals with language animation
 Research on this level will lead to powerful web-based applications
PGNet 2005
Liverpool, 28 June 2005
Architecture of CONFUCIUS
Natural language sentences
Knowledge base
Surface transformer
Language knowledge
Natural Language Processing
(WordNet, LCS database, FDG
parser)
mapping
3D authoring tools
existing 3D models &
virtual human models
Visual/audio knowledge
(3D models & animations, audio
encapsulated in graphic models)
Media allocator
semantic
representation
Animation
engine (with
nonspeech audio)
Text-toSpeech
Presentation
agent
(Merlin the Narrator)
Synchronizing
3D virtual world
with speech in VRML
Narration integration
Multimodal presentation
PGNet 2005
Liverpool, 28 June 2005
Humanoid animation in CONFUCIUS
Semantic Representation
match basic motions
in library?
N
User interaction
animation controller
environment
placement
Camera controller
Y
If the event predicate matches basic
human motions in animation library
Either loading a precreated keyframe animation
or providing animation specification for
animation generation
Motion instantiation
Apply spatial info & place OBJ/HUMAN
into a specified environment
Automatic camera placement &
apply cinematic rules
VRML file of the virtual story world
PGNet 2005
Liverpool, 28 June 2005
Multiple animation channels
 3rd level human animation modeling languages (VHML, STEP) provide a
facility to specify both sequential and parallel temporal relations
 Simultaneous animations cause the Dining Philosopher's problem for higher
level animation using predefined animation data (multiple animations may
request to access same body parts at the same time)
 Multiple animation channels allow characters to run multiple animations at the
same time, e.g. walking with the lower body while waving with the upper body
 Multiple animation channels often disable one channel when a specific
animation is playing on another channel to avoid conflicts with another
animation
sacroiliac
l_hip
r_hip
…
r_shoulder
walk
2
2
2
…
1
jump
2
2
2
…
1
wave
0
0
0
…
2
run
2
2
2
…
1
scratch head
0
0
0
…
2
sit
2
2
2
…
1
…
…
…
PGNet 2005
Liverpool, 28 June 2005
…
…
Involved joints /Animations
…
Space sites of virtual humans
 Types of virtual objects
Small props, manipulated by hands or
feet, e.g. cup, hat, ball
 Big props, source or targets of actions,
e.g. table, chair, tree
 Stage props have internal structure, e.g.
house, restaurant, chapel

grip, pincer grip
pushing
 Site tags of virtual humans
Manipulating small props, 6 sites on
hands (three sites for each hand), one
site on head (skull_tip), one site for
each foot tip
 For big props placement, 5 sites
indicating five directions around the
human body: x_front, x_back, x_left,
x_right, x_bottom. Big props like a table
or chairs usually placed at these
positions.
 For stage props setting, 5 more space
tags indicating further places: far_front,
far_back, far_left, far_right, far_top.
Stage props (e.g. a house) often locate
at these far sites.

PGNet 2005
Liverpool, 28 June 2005
pointing
Virtual object manipulation
Two approaches to organize knowledge
required for successful grasping
1. Store applicable objects in the
animation file of an action and
using lexical knowledge of nouns
to infer hypernymy relations
between objects
2. Including the manipulation hand
postures and movements within
the object description, besides its
intrinsic object properties. These
objects have the ability to
describe in details their
functionality and their possible
interactions with virtual humans.
4 stored hand postures for interacting
with 3D objects
index pointing
(press a button)
pincer grip (use thumb
and index finger to pick
up small objects)
PGNet 2005
Liverpool, 28 June 2005
grip (hold cup
handle, knob, a
bottle)
palm push
(push a piece of
furniture)
Conclusion

Classified virtual human representation languages into four levels of
abstraction

CONFUCIUS is an overall framework of intelligent multimedia
storytelling, using 3D modelling/animation techniques with natural
language understanding technologies to achieve higher level virtual
human animation

A number of projects are currently based on virtual human animation,
working on various application domains. Few of them takes modern
NLP approach that a high level human animation system should be
based on.

The value of CONFUCIUS lies in generation of 3D animation from
natural language by automating the processes of language parsing,
semantic representation and animation production.

Potential application areas: computer games, animation production
and direction, multimedia presentation, shared virtual worlds

Future work: coordination & synchronization of multiple virtual
humans
PGNet 2005
Liverpool, 28 June 2005
Download