MPEG-4, FULL MULTIMEDIA STANDARD

advertisement
MPEG-4, NETWORKED
MULTIMEDIA
STANDARD
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA DATA REPRESENTATION
• MULTIMEDIA SYSTEMS CAN BE MADE IF
THERE IS REPRESENTATION OF
MULTIMEDIA DATA.
• SUCH REPRESENTATION SHOULD BE
STANDARDIZED SO EVERYBODY CAN
USE IT
• THERE ARE MANY DIFFERENT SUCH
STANDARDS, WE WILL GIVE EXAMPLE
OF MOST A ADVANCED ONE
MULTIMEDIA SYSTEMS
IREK DEFEE
• MPEG-4 IS A STANDARD WHICH ENABLES
EFFICIENT REPRESENTATION OF
COMPLEX MULTIMEDIA DATA FOR
FULLY INTERACTIVE APPLICATIONS
• THE MPEG-4 STANDARD IS OBJECT-BASED
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
• VIDEO OBJECT-BASED TREATMENT AND
COMPRESSION MAKE THE
MPEG-4 STANDARD VERY FLEXIBLE MODERN
AND EFFICIENT
• BASIC VIDEO AND AUDIO PARTS OF THIS
STANDARD ARE USED IN DIGITAL TELEVISION
AND BLUE RAY DISCS
• BUT MPEG-4 INVOLVES ALL DATA TYPES AND
ALSO SPECIAL TOOLS FOR PARTICULAR
APPLICATIONS AND WE GIVE EXAMPLES OF
THEM NEXT
MULTIMEDIA SYSTEMS
IREK DEFEE
Synthetic Visual Tools in MPEG-4
MULTIMEDIA SYSTEMS
IREK DEFEE
Motivation
• A new type of data is appearing in multimedia
applications: Synthetic
• Both synthetic and natural data can
co-exist in today’s applications
• This new data needs to be compressed and
streamed in most applications
• New technologies are needed for:
– Compression and streaming of synthetic data
MULTIMEDIA SYSTEMS
IREK DEFEE
Some examples of applications
•
•
•
•
•
•
3D video
Augmented reality
Telepresence
Scientific visualization
Virtual reality
…
MULTIMEDIA SYSTEMS
IREK DEFEE
MULTIMEDIA SYSTEMS
IREK DEFEE
Application example: Telepresence
The room is generated in computer
MULTIMEDIA SYSTEMS
IREK DEFEE
Application example: Scientific visualization
MULTIMEDIA SYSTEMS
IREK DEFEE
Application example: Virtual 3D world
MULTIMEDIA SYSTEMS
IREK DEFEE
Synthetic Visual Tools in MPEG-4
• Version 1
– Face animation
– 2D dynamic mesh
– Scalable coding of synthetic texture
– View dependent scalable coding of texture
• Version 2
– Body animation
– 3D model compression
MULTIMEDIA SYSTEMS
IREK DEFEE
Face Animation
MULTIMEDIA SYSTEMS
IREK DEFEE
Example of applications
• Virtual meeting, tele-presence, videoconferencing, ...
• Virtual story teller, virtual actor, user
interface, ...
• Games, avatars, ...
MULTIMEDIA SYSTEMS
IREK DEFEE
Face animation
• Face: an object ready for
rendering and animation
– A realistic representation of a
“human” face
– Capable of animation by a
reasonable set of parameters
driven by speech, facial
expressions, or others
MULTIMEDIA SYSTEMS
IREK DEFEE
Face animation
• Shape, texture and expressions:
– specified parameters in the
incoming bitstream
– remote as well as local control
of these parameters
MULTIMEDIA SYSTEMS
IREK DEFEE
Initial face object
Y
yaw
pitch
X
roll
Z
• Gaze along the Z axis
• All face muscles relaxed
• Eyelids tangential to the iris
• Pupil one-third of full eye
• Lips: in contact; horizontal
• Mouth: closed; upper and lower teeth
touching
•Tongue: flat; tip touching front teeth
MULTIMEDIA SYSTEMS
IREK DEFEE
Face animation parameters
• Three sets of parameters used to describe
a face and its animation characterstics:
– Facial Definition Parameters (FDPs)
– Facial Animation Parameters (FAPs)
– Facial Interpolation Transform (FIT)
MULTIMEDIA SYSTEMS
IREK DEFEE
Facial Definition Parameters - FDPs
• Defines a specific face via
– 3D feature points
– 3D mesh/scene graph
– Face Texture
– Face Animation Table (FAT)
MULTIMEDIA SYSTEMS
IREK DEFEE
11.5
11.5
11.4
11.4
Face Feature Points
11.2
11.2
4.4
4.6
11.1
4.2
4.3
4.1
4.4
4.5
11.6
10.2
10.1
• Normalize animation
parameters
• Find feature correspondence
in different faces
• Roughly define shape
10.6
10.7
10.8
10.5
5.2
10.10
10.3
5.3
5.4
5.1
5.4
10.4
10.8
10.6
5.2
y
y
2.1
2.11
2.12
2.10
z
7.1
2.10
x
x
2.13
2.14
4.2
4.6
10.2
10.9
10.10
10.4
11.1
11.3
2.14
2.12
2.1
z
3.13
3.14
3.2
3.1
3.8
3.12
3.11
3.6
3.5
3.4
3.3
3.10
3.7
3.9
Right eye
9.6
9.7
Left eye
9.8
9.12
Nose
9.14
9.10
9.13
9.11
9.3
9.9
9.1
9.2
Teeth
8.6
8.9
2.5
8.10
2.7
8.5
8.3
2.2
2.6
2.4
6.3
6.2
8.8
6.1
9.5
8.1
8.4
6.4
9.15
9.4
Tongue
points affected by FAPs
MULTIMEDIA SYSTEMSFeature
Other feature points
IREK DEFEE
Mouth
2.9
2.8
2.3
8.7
8.2
36
3D mesh and feature points
MULTIMEDIA SYSTEMS
IREK DEFEE
Texture
MULTIMEDIA SYSTEMS
IREK DEFEE
FDPs
• Face Description Parameters
• Two modes:
– To customize the face model at the receiver to
a particular face
– To download a face model along with its
animation information
• Generally, sent once per session
– for calibration and/or download
– could be sent more often for “special effects”
MULTIMEDIA SYSTEMS
IREK DEFEE
Face Animation Parameters - FAPs
• Represent a complete set of facial actions
=> allow representation of most of the
natural facial expressions
• All FAPs involving translational
movement: in terms of Facial Animation
Parameter Units (FAPUs)
• Allows consistent interpretation of FAPs
on any facial model.
MULTIMEDIA SYSTEMS
IREK DEFEE
FAPUs
IRISD0
Iris diameter (by definition it is
equal to the distance between
upper ad lower eyelid) in neutral
face
IRISD =
IRISD0 / 1024
IRISD0
ES0
ENS0
ES0
Eye separation
ES = ES0 / 1024
ENS0
Eye - nose separation
ENS = ENS0 / 1024
MNS0
Mouth - nose separation
MNS = MNS0 / 1024
MW0
Mouth width
MW0 / 1024
AU
Angle Unit
10E-5 rad
MULTIMEDIA SYSTEMS
IREK DEFEE
MNS0
MW0
FAPs
• Facial Action Parameters
• 2 high-level FAPS:
– Viseme (visual correlate of phoneme)
– Expression (joy, anger, fear, disgust, sadness,
surprise)
• textual description of expression parameters
• point to groups of FAPs used together to achieve an expression
• 66 low level FAPs
– associated with the displacement or rotation of the facial
feature points
MULTIMEDIA SYSTEMS
IREK DEFEE
What is not standardized ?
• The way to extract the parameters
– Markers
– Speech driven
– Image analysis and
feature extraction
• The choice on which parameters to code and
with which precision
– Quantization
– Rate control
MULTIMEDIA SYSTEMS
IREK DEFEE
FITs
• Facial Interpolation Rules
• Specification of interpolation rules for
some/all FAPs by the sender.
• Sender specifies FAP Interpolation Graph
(FIG) and set of interpolation functions
• Allows higher degree of control over the
animation results.
MULTIMEDIA SYSTEMS
IREK DEFEE
What is standardized?
• FDPs
– BIFS Syntax and Semantics
– Rules for decoding and adaptation
• FAPs
– Bitstream syntax and semantics
– Rules for decoding and animation
• FITs
– Syntax and semantics
– Decoding rules
MULTIMEDIA SYSTEMS
IREK DEFEE
MPEG-4 AUDIO
This part of the standard is for audio signals
IT COVERS VERY BROAD RANGE
• Speech Coding
• General Audio Coding
• Scalable Audio Coding
MULTIMEDIA SYSTEMS
IREK DEFEE
Media Objects and Associated Operations
• Objects
– Natural audio
+
– Synthetic audio
– Control
• Operations on objects
– Synchronize
– Decode
– Compose into compound objects
– Present
– Interact
A1
A2
A3
MULTIMEDIA SYSTEMS
IREK DEFEE
Fx
+
Advantages of Object Framework
• Each signal coded with most efficient coding
system
– Natural
– Synthetic
• Composition of objects into audio scene
– Rate conversion
– Mix and Equalization
– Effects
• Final mix is done in the terminal
MULTIMEDIA SYSTEMS
IREK DEFEE
System Overview
Demux
Decode
Synch
Com pose
P re s e n t
I n t e ra ct io n
O bj 1
Dec 1
A u d io
S ce n e
G ra p h
O bj 2
C om pound
O b je ct
L is te n e r
Dec 2
C hannel
D e scrip
t io n
IP R
M gm t
I n t e rf .
IP R
C o n t ro l
Objects are delivered separetely,
synchronized, decoded and
composed
MULTIMEDIA SYSTEMS
IREK DEFEE
Audio Object Functionalities
• Signal compression
• Scalability
–
–
–
–
bit rate
signal bandwidth
presentation rate
encoder or decoder complexity
• Extraction and re-use
• Robustness to channel errors
MULTIMEDIA SYSTEMS
IREK DEFEE
Scalability depending on the bit rate
Satellite
C ellular phone
Secure com .
2
4
6
Internet
8 10 12 14 16
ISD N
24
bit-rate (kbps)
32
48
64
Scalable C oder
TTS
S p eech co d in g
G en eral au d io co d in g
4 kH z
8 kH z
T ypical A udio bandw idth
MULTIMEDIA SYSTEMS
IREK DEFEE
20 kH z
Application Domains
Profiles
• Speech
– low rate speech coders and TTS (Text-to-Speech)
• Scalable
– speech coders
– general audio coders
– all coders in scalable configuration
• Synthetic
– wavetable synthesis
– score driven synthesis
– TTS
MULTIMEDIA SYSTEMS
IREK DEFEE
MPEG-4 Speech Coding: Overview
• Excellent compression by using source model
– Linear Predictive Coding (LPC)
– Pitch or noise excitation
• Better compression than “general audio”
coders
– only for “clean speech” from single talker
MULTIMEDIA SYSTEMS
IREK DEFEE
Speech Coders
• Harmonic Vector Excitation Coder (HVXC)
• Code Excitation Linear Prediction (CELP)
• Wideband CELP
WB C ELP
S ig n a l B a n d w id th ,
kH z
7.2
16
HVXC
3.6
1.2
0
4
24
C E LP
6
24
10
C h a n n e l B itra te , k b /s
MULTIMEDIA SYSTEMS
IREK DEFEE
20
Communication Characteristics of Coders
• Low bit rate
– HVXC
– CELP
1.2 kb/s to 1.7 kb/s var. rate
2.0 kb/s to 4.0 kb/s const. rate
4.0 kb/s to 24 kb/s const. rate
• Low one-way delay
– HVXC
– CELP
33.5 ms to 56 ms
15 ms to 45 ms
• Not compromised for modem signals
MULTIMEDIA SYSTEMS
IREK DEFEE
Bit Rate Scalability
• Parameters coded using multi-stage Vector
Quantization
– base plus enhancement layer
• Enhancement layers can be stripped in
– server
– channel
– decoder
MULTIMEDIA SYSTEMS
IREK DEFEE
Parameter Update Scalability
F ra m e s
S u b -F ra m e s
• Linear Prediction Model
– updated every frame
– interpolated every sub-frame
• Excitation
– gain updated every subframe
MULTIMEDIA SYSTEMS
IREK DEFEE
MPEG-4 BIFS
MPEG-4 DATA INCLUDE VARIOUS MEDIA TYPES
WHICH CAN BE USED AT THE SAME TIME
THIS REQUIRES MECHANISM FOR THEIR
ORGANIZATION IN TIME AND SPACE
THIS MECHANISM IN THE MPEG-4 STANDARD IS
CALLED
BIFS – BINARY FORMAT FOR SCENES
MULTIMEDIA SYSTEMS
IREK DEFEE
BIFS: WHY?
MPEG-4 is an object based system
– => A Scene Description is needed to compose
BIFS is the MPEG-4 scene description protocol
– to compose MPEG-4 objects
– to describe interaction with MPEG-4 objects
– to animate MPEG-4 objects
MULTIMEDIA SYSTEMS
IREK DEFEE
Example of an MPEG-4 Audiovisual Scene
(1)
2D Audio-visual scene
Audio and Video + Scrolling Text
and Still Images
2D Audio-visual sne
Audio and Video + Still Images
MULTIMEDIA SYSTEMS
IREK DEFEE
Example of an MPEG-4 Audiovisual Scene
(2)
3D Audio-visual scene
3D World + arbitrary
shaped video + still images +
3D Objects
MULTIMEDIA SYSTEMS
IREK DEFEE
BIFS Scene Features (v2)
• Body Animation
• Advanced Audio
– Perceptual approach to modify natural source
– Acoustic properties for physical based audio rendering
• Stream and server control
– VCR controls and
– Application specific messaging
• Extensibility (Prototypes)
– Definition of new BIFS interfaces
• Hierarchical 3D objects
– Progressive loading and local degradation of 3D mesh
• Web interface
– Linking and embedding of a web page
MULTIMEDIA SYSTEMS
IREK DEFEE
MPEG-4 Systems Principle
We have data stream
describing whole scene
Scene Description Stream
We have
data stream
describing
which objects
are there
Object Descriptor Stream
Visual Stream
Visual Stream
We have
data streams
separate for
each object
Visual Stream
Audio Stream
...
MULTIMEDIA SYSTEMS
IREK DEFEE
Interactive Scene Description
BIFS content in MPEG-4 system
DELIVERY
BIFS-Update ES
BIFS Anim ES
MPEG-4 Streams
VRML Nodes
DECODING
D
E
C
O
D
I
N
G
SCENE GRAPH
MANAGEMENT
2D Nodes
Audio Nodes
3D Nodes
2D+3D Nodes
Interaction
S&N Sound
FBA
MPEG-4 Nodes
MULTIMEDIA SYSTEMS
IREK DEFEE
PRESENTATION
C
O
M
P
O
S
I
T
I
O
N
R
E
N
D
E
R
I
N
G
MPEG-4: An integrated Multimedia System
Decoding
N
e
t
w
o
r
k
TransMux
FlexMux
Primitive
AV Objects
Composition and
Rendering
...
...
...
Elementary
Streams
Organised by
BIFS
Scene Description
Information
Object Descriptor
MULTIMEDIA SYSTEMS
IREK DEFEE
MPEG-4 Interactive
Scene
Display and
Local User
Interaction
BIFS Delivery: BIFS Command
Scene Graph
Root
Transform
BikeSwitch
Bike
Body
Transform
Right Leg
Left Leg
-1
0
Body
Left Arm
Switch
Head
Right Arm
BIFS-Command
ES
…
CV
RS
MULTIMEDIA SYSTEMS
IREK DEFEE
BIFS Delivery: BIFS Anim
Scene Graph
Root
Transform
BikeSwitch
Body
Transform
Right Leg
Bike
Left Leg
Body
Left Arm
Head
Right Arm
BIFS-Anim ES
...
P
P
P
I
MULTIMEDIA SYSTEMS
IREK DEFEE
BIFS Scene Compression
• BIFS Scene
– factor 10-25 on scene text files
– Context dependency
– Hierarchical, linear quantization of scene data
– Differential multiple fields coding and mesh coding
integration (v2)
• BIFS-Anim
– factor 15-30 compression of animation
– Linear quantization
– Predictive coding (including rotation and normals)
– Adaptive arithmetic encoding
MULTIMEDIA SYSTEMS
IREK DEFEE
BIFS Scene Features
• Audio video (objects) playback
• 2D Composition & Graphics
– 2D composition, Basic shapes,
• 3D Composition & Graphics
– Full VRML capabilities
• Advanced audio composition
• Interactivity and Behavior
– Local manipulation and animation of objects
• Scripting (javascript)
– Programming of behaviors
• Face Animation
MULTIMEDIA SYSTEMS
IREK DEFEE
MPEG-4 & BIFS based services
Client
Stored Content
Broadcast
BIFS
Communication
Live Source / User
MULTIMEDIA SYSTEMS
IREK DEFEE
Conclusion
• BIFS provides a rich toolkit for composition of
MPEG-4 media in very flesible and general way
• BIFS can be profiled to fit best the application
area
• Provides a good mix of
– Functionality
– Complexity
– Compression
MULTIMEDIA SYSTEMS
IREK DEFEE
Download