MPEG-4, NETWORKED MULTIMEDIA STANDARD MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA DATA REPRESENTATION • MULTIMEDIA SYSTEMS CAN BE MADE IF THERE IS REPRESENTATION OF MULTIMEDIA DATA. • SUCH REPRESENTATION SHOULD BE STANDARDIZED SO EVERYBODY CAN USE IT • THERE ARE MANY DIFFERENT SUCH STANDARDS, WE WILL GIVE EXAMPLE OF MOST A ADVANCED ONE MULTIMEDIA SYSTEMS IREK DEFEE • MPEG-4 IS A STANDARD WHICH ENABLES EFFICIENT REPRESENTATION OF COMPLEX MULTIMEDIA DATA FOR FULLY INTERACTIVE APPLICATIONS • THE MPEG-4 STANDARD IS OBJECT-BASED MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE • VIDEO OBJECT-BASED TREATMENT AND COMPRESSION MAKE THE MPEG-4 STANDARD VERY FLEXIBLE MODERN AND EFFICIENT • BASIC VIDEO AND AUDIO PARTS OF THIS STANDARD ARE USED IN DIGITAL TELEVISION AND BLUE RAY DISCS • BUT MPEG-4 INVOLVES ALL DATA TYPES AND ALSO SPECIAL TOOLS FOR PARTICULAR APPLICATIONS AND WE GIVE EXAMPLES OF THEM NEXT MULTIMEDIA SYSTEMS IREK DEFEE Synthetic Visual Tools in MPEG-4 MULTIMEDIA SYSTEMS IREK DEFEE Motivation • A new type of data is appearing in multimedia applications: Synthetic • Both synthetic and natural data can co-exist in today’s applications • This new data needs to be compressed and streamed in most applications • New technologies are needed for: – Compression and streaming of synthetic data MULTIMEDIA SYSTEMS IREK DEFEE Some examples of applications • • • • • • 3D video Augmented reality Telepresence Scientific visualization Virtual reality … MULTIMEDIA SYSTEMS IREK DEFEE MULTIMEDIA SYSTEMS IREK DEFEE Application example: Telepresence The room is generated in computer MULTIMEDIA SYSTEMS IREK DEFEE Application example: Scientific visualization MULTIMEDIA SYSTEMS IREK DEFEE Application example: Virtual 3D world MULTIMEDIA SYSTEMS IREK DEFEE Synthetic Visual Tools in MPEG-4 • Version 1 – Face animation – 2D dynamic mesh – Scalable coding of synthetic texture – View dependent scalable coding of texture • Version 2 – Body animation – 3D model compression MULTIMEDIA SYSTEMS IREK DEFEE Face Animation MULTIMEDIA SYSTEMS IREK DEFEE Example of applications • Virtual meeting, tele-presence, videoconferencing, ... • Virtual story teller, virtual actor, user interface, ... • Games, avatars, ... MULTIMEDIA SYSTEMS IREK DEFEE Face animation • Face: an object ready for rendering and animation – A realistic representation of a “human” face – Capable of animation by a reasonable set of parameters driven by speech, facial expressions, or others MULTIMEDIA SYSTEMS IREK DEFEE Face animation • Shape, texture and expressions: – specified parameters in the incoming bitstream – remote as well as local control of these parameters MULTIMEDIA SYSTEMS IREK DEFEE Initial face object Y yaw pitch X roll Z • Gaze along the Z axis • All face muscles relaxed • Eyelids tangential to the iris • Pupil one-third of full eye • Lips: in contact; horizontal • Mouth: closed; upper and lower teeth touching •Tongue: flat; tip touching front teeth MULTIMEDIA SYSTEMS IREK DEFEE Face animation parameters • Three sets of parameters used to describe a face and its animation characterstics: – Facial Definition Parameters (FDPs) – Facial Animation Parameters (FAPs) – Facial Interpolation Transform (FIT) MULTIMEDIA SYSTEMS IREK DEFEE Facial Definition Parameters - FDPs • Defines a specific face via – 3D feature points – 3D mesh/scene graph – Face Texture – Face Animation Table (FAT) MULTIMEDIA SYSTEMS IREK DEFEE 11.5 11.5 11.4 11.4 Face Feature Points 11.2 11.2 4.4 4.6 11.1 4.2 4.3 4.1 4.4 4.5 11.6 10.2 10.1 • Normalize animation parameters • Find feature correspondence in different faces • Roughly define shape 10.6 10.7 10.8 10.5 5.2 10.10 10.3 5.3 5.4 5.1 5.4 10.4 10.8 10.6 5.2 y y 2.1 2.11 2.12 2.10 z 7.1 2.10 x x 2.13 2.14 4.2 4.6 10.2 10.9 10.10 10.4 11.1 11.3 2.14 2.12 2.1 z 3.13 3.14 3.2 3.1 3.8 3.12 3.11 3.6 3.5 3.4 3.3 3.10 3.7 3.9 Right eye 9.6 9.7 Left eye 9.8 9.12 Nose 9.14 9.10 9.13 9.11 9.3 9.9 9.1 9.2 Teeth 8.6 8.9 2.5 8.10 2.7 8.5 8.3 2.2 2.6 2.4 6.3 6.2 8.8 6.1 9.5 8.1 8.4 6.4 9.15 9.4 Tongue points affected by FAPs MULTIMEDIA SYSTEMSFeature Other feature points IREK DEFEE Mouth 2.9 2.8 2.3 8.7 8.2 36 3D mesh and feature points MULTIMEDIA SYSTEMS IREK DEFEE Texture MULTIMEDIA SYSTEMS IREK DEFEE FDPs • Face Description Parameters • Two modes: – To customize the face model at the receiver to a particular face – To download a face model along with its animation information • Generally, sent once per session – for calibration and/or download – could be sent more often for “special effects” MULTIMEDIA SYSTEMS IREK DEFEE Face Animation Parameters - FAPs • Represent a complete set of facial actions => allow representation of most of the natural facial expressions • All FAPs involving translational movement: in terms of Facial Animation Parameter Units (FAPUs) • Allows consistent interpretation of FAPs on any facial model. MULTIMEDIA SYSTEMS IREK DEFEE FAPUs IRISD0 Iris diameter (by definition it is equal to the distance between upper ad lower eyelid) in neutral face IRISD = IRISD0 / 1024 IRISD0 ES0 ENS0 ES0 Eye separation ES = ES0 / 1024 ENS0 Eye - nose separation ENS = ENS0 / 1024 MNS0 Mouth - nose separation MNS = MNS0 / 1024 MW0 Mouth width MW0 / 1024 AU Angle Unit 10E-5 rad MULTIMEDIA SYSTEMS IREK DEFEE MNS0 MW0 FAPs • Facial Action Parameters • 2 high-level FAPS: – Viseme (visual correlate of phoneme) – Expression (joy, anger, fear, disgust, sadness, surprise) • textual description of expression parameters • point to groups of FAPs used together to achieve an expression • 66 low level FAPs – associated with the displacement or rotation of the facial feature points MULTIMEDIA SYSTEMS IREK DEFEE What is not standardized ? • The way to extract the parameters – Markers – Speech driven – Image analysis and feature extraction • The choice on which parameters to code and with which precision – Quantization – Rate control MULTIMEDIA SYSTEMS IREK DEFEE FITs • Facial Interpolation Rules • Specification of interpolation rules for some/all FAPs by the sender. • Sender specifies FAP Interpolation Graph (FIG) and set of interpolation functions • Allows higher degree of control over the animation results. MULTIMEDIA SYSTEMS IREK DEFEE What is standardized? • FDPs – BIFS Syntax and Semantics – Rules for decoding and adaptation • FAPs – Bitstream syntax and semantics – Rules for decoding and animation • FITs – Syntax and semantics – Decoding rules MULTIMEDIA SYSTEMS IREK DEFEE MPEG-4 AUDIO This part of the standard is for audio signals IT COVERS VERY BROAD RANGE • Speech Coding • General Audio Coding • Scalable Audio Coding MULTIMEDIA SYSTEMS IREK DEFEE Media Objects and Associated Operations • Objects – Natural audio + – Synthetic audio – Control • Operations on objects – Synchronize – Decode – Compose into compound objects – Present – Interact A1 A2 A3 MULTIMEDIA SYSTEMS IREK DEFEE Fx + Advantages of Object Framework • Each signal coded with most efficient coding system – Natural – Synthetic • Composition of objects into audio scene – Rate conversion – Mix and Equalization – Effects • Final mix is done in the terminal MULTIMEDIA SYSTEMS IREK DEFEE System Overview Demux Decode Synch Com pose P re s e n t I n t e ra ct io n O bj 1 Dec 1 A u d io S ce n e G ra p h O bj 2 C om pound O b je ct L is te n e r Dec 2 C hannel D e scrip t io n IP R M gm t I n t e rf . IP R C o n t ro l Objects are delivered separetely, synchronized, decoded and composed MULTIMEDIA SYSTEMS IREK DEFEE Audio Object Functionalities • Signal compression • Scalability – – – – bit rate signal bandwidth presentation rate encoder or decoder complexity • Extraction and re-use • Robustness to channel errors MULTIMEDIA SYSTEMS IREK DEFEE Scalability depending on the bit rate Satellite C ellular phone Secure com . 2 4 6 Internet 8 10 12 14 16 ISD N 24 bit-rate (kbps) 32 48 64 Scalable C oder TTS S p eech co d in g G en eral au d io co d in g 4 kH z 8 kH z T ypical A udio bandw idth MULTIMEDIA SYSTEMS IREK DEFEE 20 kH z Application Domains Profiles • Speech – low rate speech coders and TTS (Text-to-Speech) • Scalable – speech coders – general audio coders – all coders in scalable configuration • Synthetic – wavetable synthesis – score driven synthesis – TTS MULTIMEDIA SYSTEMS IREK DEFEE MPEG-4 Speech Coding: Overview • Excellent compression by using source model – Linear Predictive Coding (LPC) – Pitch or noise excitation • Better compression than “general audio” coders – only for “clean speech” from single talker MULTIMEDIA SYSTEMS IREK DEFEE Speech Coders • Harmonic Vector Excitation Coder (HVXC) • Code Excitation Linear Prediction (CELP) • Wideband CELP WB C ELP S ig n a l B a n d w id th , kH z 7.2 16 HVXC 3.6 1.2 0 4 24 C E LP 6 24 10 C h a n n e l B itra te , k b /s MULTIMEDIA SYSTEMS IREK DEFEE 20 Communication Characteristics of Coders • Low bit rate – HVXC – CELP 1.2 kb/s to 1.7 kb/s var. rate 2.0 kb/s to 4.0 kb/s const. rate 4.0 kb/s to 24 kb/s const. rate • Low one-way delay – HVXC – CELP 33.5 ms to 56 ms 15 ms to 45 ms • Not compromised for modem signals MULTIMEDIA SYSTEMS IREK DEFEE Bit Rate Scalability • Parameters coded using multi-stage Vector Quantization – base plus enhancement layer • Enhancement layers can be stripped in – server – channel – decoder MULTIMEDIA SYSTEMS IREK DEFEE Parameter Update Scalability F ra m e s S u b -F ra m e s • Linear Prediction Model – updated every frame – interpolated every sub-frame • Excitation – gain updated every subframe MULTIMEDIA SYSTEMS IREK DEFEE MPEG-4 BIFS MPEG-4 DATA INCLUDE VARIOUS MEDIA TYPES WHICH CAN BE USED AT THE SAME TIME THIS REQUIRES MECHANISM FOR THEIR ORGANIZATION IN TIME AND SPACE THIS MECHANISM IN THE MPEG-4 STANDARD IS CALLED BIFS – BINARY FORMAT FOR SCENES MULTIMEDIA SYSTEMS IREK DEFEE BIFS: WHY? MPEG-4 is an object based system – => A Scene Description is needed to compose BIFS is the MPEG-4 scene description protocol – to compose MPEG-4 objects – to describe interaction with MPEG-4 objects – to animate MPEG-4 objects MULTIMEDIA SYSTEMS IREK DEFEE Example of an MPEG-4 Audiovisual Scene (1) 2D Audio-visual scene Audio and Video + Scrolling Text and Still Images 2D Audio-visual sne Audio and Video + Still Images MULTIMEDIA SYSTEMS IREK DEFEE Example of an MPEG-4 Audiovisual Scene (2) 3D Audio-visual scene 3D World + arbitrary shaped video + still images + 3D Objects MULTIMEDIA SYSTEMS IREK DEFEE BIFS Scene Features (v2) • Body Animation • Advanced Audio – Perceptual approach to modify natural source – Acoustic properties for physical based audio rendering • Stream and server control – VCR controls and – Application specific messaging • Extensibility (Prototypes) – Definition of new BIFS interfaces • Hierarchical 3D objects – Progressive loading and local degradation of 3D mesh • Web interface – Linking and embedding of a web page MULTIMEDIA SYSTEMS IREK DEFEE MPEG-4 Systems Principle We have data stream describing whole scene Scene Description Stream We have data stream describing which objects are there Object Descriptor Stream Visual Stream Visual Stream We have data streams separate for each object Visual Stream Audio Stream ... MULTIMEDIA SYSTEMS IREK DEFEE Interactive Scene Description BIFS content in MPEG-4 system DELIVERY BIFS-Update ES BIFS Anim ES MPEG-4 Streams VRML Nodes DECODING D E C O D I N G SCENE GRAPH MANAGEMENT 2D Nodes Audio Nodes 3D Nodes 2D+3D Nodes Interaction S&N Sound FBA MPEG-4 Nodes MULTIMEDIA SYSTEMS IREK DEFEE PRESENTATION C O M P O S I T I O N R E N D E R I N G MPEG-4: An integrated Multimedia System Decoding N e t w o r k TransMux FlexMux Primitive AV Objects Composition and Rendering ... ... ... Elementary Streams Organised by BIFS Scene Description Information Object Descriptor MULTIMEDIA SYSTEMS IREK DEFEE MPEG-4 Interactive Scene Display and Local User Interaction BIFS Delivery: BIFS Command Scene Graph Root Transform BikeSwitch Bike Body Transform Right Leg Left Leg -1 0 Body Left Arm Switch Head Right Arm BIFS-Command ES … CV RS MULTIMEDIA SYSTEMS IREK DEFEE BIFS Delivery: BIFS Anim Scene Graph Root Transform BikeSwitch Body Transform Right Leg Bike Left Leg Body Left Arm Head Right Arm BIFS-Anim ES ... P P P I MULTIMEDIA SYSTEMS IREK DEFEE BIFS Scene Compression • BIFS Scene – factor 10-25 on scene text files – Context dependency – Hierarchical, linear quantization of scene data – Differential multiple fields coding and mesh coding integration (v2) • BIFS-Anim – factor 15-30 compression of animation – Linear quantization – Predictive coding (including rotation and normals) – Adaptive arithmetic encoding MULTIMEDIA SYSTEMS IREK DEFEE BIFS Scene Features • Audio video (objects) playback • 2D Composition & Graphics – 2D composition, Basic shapes, • 3D Composition & Graphics – Full VRML capabilities • Advanced audio composition • Interactivity and Behavior – Local manipulation and animation of objects • Scripting (javascript) – Programming of behaviors • Face Animation MULTIMEDIA SYSTEMS IREK DEFEE MPEG-4 & BIFS based services Client Stored Content Broadcast BIFS Communication Live Source / User MULTIMEDIA SYSTEMS IREK DEFEE Conclusion • BIFS provides a rich toolkit for composition of MPEG-4 media in very flesible and general way • BIFS can be profiled to fit best the application area • Provides a good mix of – Functionality – Complexity – Compression MULTIMEDIA SYSTEMS IREK DEFEE