Advanced Video Capabilities of HD DVD-Video Kilroy Hughes Digital Media Architect Microsoft Corporation Contents • • • • • • • • • • What is HD DVD-Video Format? Video Capabilities Video/Graphics Layout Model (2D) Video/Graphics Composition Model (3D) Presentation and Synchronization Model (4D) Programming Animation Resource Management Output Conclusion What is HD DVD-Video? • “HD DVD-Video Format” is an APPLICATION format (i.e. content format) defined by the DVD Forum for use on various storage media • The HD DVD-Video Application format is currently specified for use on: – HD DVD-ROM discs (blue laser, 15 – 60GB) – DVD-ROM discs (red laser, 4.7 – 16.8GB) – R/W storage (flash memory, hard disk, etc.) • The format can combine video, audio, text, and graphics from optical disc, internal player storage, local area network storage, and program streams from the Web into realtime interactive video presentations Advanced Video Capabilities • Simultaneous presentation of: – – – – – – • • • • • 1 stream of up to 1920x1080P30 HD video (MPEG-2, H.264, or SMPTE 421M) 1 stream of up to 720x576P30 video (SD required, HD optional) 3 streams of up to 8 channel audio; streams can be from different sources 1 stream of text and graphics Subtitles, or bitmap Subpictures 16 Applications with programmed text, images, drawing, and animation A graphical cursor controlled by a pointing device (e.g. joystick, mouse, trackball, pressure pad, etc.) Z-order and alpha blend of graphics objects, and alpha blending of graphics and video planes Independent scaling, clipping, and positioning of all video and graphics objects Property animation (i.e. object size, position, transparency, color, etc. can be changed over time) Frame accurate composition and animation based on timecodes derived from video time (video frame/position) or Application time (elapsed time) Network support that enables updating presentations on optical disc with new content and programming that can be downloaded and streamed from the Web (e.g. new subtitles, new commentary, new movie trailers, new menus, new storyboard guide, new video games, etc.) 2D Video/Graphics Layout Model Application coordinates (0, 0) Application Region Canvass coordinate space (0, 0) (all origins upper left) Text Full Screen Display Aperture Author specified (e.g. 1920x1080,1280x720) Text Invisible Video (-200, 1000) (+2^31 -1) Note: Only App text/graphics Inside both App Region and Aperture are visible (+2^31 -1) Text Invisible text Text Text Visible Video Text Invisible Video (1920, 1080) (2220, 1000) Example of video object with portions outside the visible Aperture (To pan right, video object position would be animated left, etc.) 3D Multi-Plane Composition Model Object opacity style Cursor Application and Object Z-order Object opacity style Interactive Graphics Point of View Subtitles Object opacity style Secondary Video Alpha key & Rect Primary Video and background Opaque Z-Axis 3D Multi-Application and Object Composition Model Text Application Region z=0.0 z=0.1 z=0.n z-ordered Objects in an App Application Region z=1.0 z=1.1 z=1.n z-ordered Objects in an App Application Region z=N.0 z=N.1 z=N.n z-ordered Objects in an App Z=0 Text Z-ordered Applications Z=1 Text Z=N Text and graphics objects contained in an Application’s 3D Region Interactive Graphics Plane Painter’s algorithm draws objects from back to front, from z=N.n to z=0.0, with “Source Over” mixing Application and Object Z-orders can be dynamically changed by programming Video Keying and Blending • The Primary Video Plane is opaque, and any area not filled with video will show a designated background color • The Secondary Video Plane can be “luma” and chroma keyed, can have transparent objects called “clear rectangles”, and can set an Opacity style property (alpha value) for the entire video object – “Luma key” treats author designated sub-black pixels as transparent to the Primary Video below it (intended for professionally pre-produced blue screen or rotoscoped mattes) – “Chroma key” allows authors to designate a transparent color range, with the caveat that color quantization and block transforms used in video compression may result in rough edges and unintended areas of opacity or transparency (may be appropriate for “live video” overlay) – A video alpha channel for alpha per pixel is not supported • “Clear Rectangles” are layout objects defined in Graphics Plane Applications that “cut a hole” through any graphics objects in the same area and reveal either the Primary or Secondary Video beneath as designated Primary and Secondary Clear Rectangles Secondary Clear Rect Graphics Plane Secondary Video Primary Video Primary Clear Rect Primary Video Plane Secondary Video Overlay (Not Keyed) Secondary Video with Chroma Key An Image in the Graphics Plane Overlaying Primary Video Example of a “Clear Rectangle” Punching Through Graphics to Video Secondary Video with Clear Rectangle to Secondary Video Plane Secondary Video with Clear Rectangle to Primary Video Plane Presentation and Synchronization Model • HD DVD-Video uses an XML presentation language referred to as “iHD” for frame accurate video and graphics presentation and animation • A “Title Timeline” is specified for each presentation sequence (a Title); and Video Clips, Audio Clips, Subtitles, Applications, and Resources are laid out in sequences on that timeline and called Tracks • Multiple Titles can be combined in a Playlist, which contains all the valid content and playback sequences defined for a disc and its associated downloaded and streamed content • iHD Applications use a timing language that can reference the timecode of a Title, which is synchronized to a frame of video or audio on each Track, so iHD Applications can create deterministic, frame accurate, interactive graphics and video presentations • Simple interactive video applications without interactive graphics can be created with only a Playlist, video and audio Program Streams, and Time Map indexes for those Program Streams Playlists • • • • • • • • • • Typically multiple Titles in a Playlist Each Title has its own timeline and Title:Timecode Video Clips sequence to form Video Tracks Audio Clips sequence to form Audio Tracks Subtitle Segments sequence to form Subtitle Tracks Application Segments sequence to form Application Tracks Application Resource Tracks span one Application Title Resource Tracks span multiple Applications Playlist Applications and Resources span multiple Titles Playlists also specify: – Configuration information such as Aperture size – Navigation mapping of Tracks for remote controls – Media attributes that identify codec, resolution, active area, source frame rate, number of audio channels, nominal bitrate, etc. Playlist Title with 3 Video Clips Ch1 Video Track Audio Track Ch2 Title Timeline Ch3 End Video Clip 1 Video Clip 2 Video Clip 3 Audio Clip 1 Audio Clip 2 Audio Clip 3 Program Stream “Clips” can be segments of the same or different files They are combined on the Title timeline and “spliced” on playback TMAP (File 1.MAP) TMAP (File 2.MAP) TMAP (File 3.MAP) Three Time Map files provide timecode > byte offset indexes for three video files P-storage A/V (File 1.EVOB) Disc A/V (File 2.EVOB) Web A/V (File 3.EVOB) File/byte offsets are used to play Program Streams from files or HTTP: protocol Playlist with Secondary Video Ch1 Ch2 Title Timeline Ch3 End Video Clip 1 Video Clip 2 Video Clip 3 Audio Clip 1 Audio Clip 2 Audio Clip 3 Video Clip 1 Video Clip 2 Video Clip 3 Audio Clip 1 Audio Clip 2 Audio Clip 3 Menu App 1 Tablet PC App 2 Tracking App 3 App 1 Resources App 2 Resources App 3 Resources Main Video Sub Video Application 4D layout of content that can be shown in Primary Video Plane, Secondary Video Plane, and Graphics Plane with additional control by iHD Application programs Playlist Resource Management Ch1 Ch2 Title Timeline Ch3 End Video Clip 1 Video Clip 2 Video Clip 3 Audio Clip 1 Audio Clip 2 Audio Clip 3 Video Clip 1 Video Clip 2 Video Clip 3 Audio Clip 1 Audio Clip 2 Audio Clip 3 Menu App 1 Tablet PC App 2 Tracking App 3 App 1 Resources App 2 Resources App 3 Resources Main Video Sub Video App Resources The Resource Track on the bottom schedules loading and unloading of all required Application files into a 64MB File Cache so they are instantly accessible to the user during any portion of the Title when that App is “valid” iHD Programming •Optimized mix of Declarative and Procedural languages •Declarative Markup language handles most presentation needs with simple tags and reliable, realtime performance using native code and hardware •Compact ECMAScript Procedural language provides full programmability, through content and player APIs, author handled events and state machine iHD XML and ECMAScript Language Markup Style Timing Script Advanced Content Files (Playlist, Manifest, Markup, Script, Resources) Content Object Model Image, text, etc. Objects Video, Audio, etc. Objects System Object Model Playlist, App, etc. Objects Network Player, etc. Objects Animation • Property animation – – • Bitmap animation – – • Bitmap animations are a sequence of images that capture a pre-rendered animation. Playback can use a timed sequence of PNG or JPG image files (good for frame accuracy, trick modes, such as reverse play, etc.); or a single MNG file. Cell animation – • Any object (graphics, text, drawing, video) can change its properties over time in response to simple markup statements Properties include position, size, opacity, color, z-order, etc. Cell animation combines bitmap or property animated objects with separate backgrounds. Performance is improved because the entire frame does not have to be stored and redrawn each frame, and it is more flexible because animated foreground objects can be added, removed, and controlled by programming and user input. Animation can be synchronized to the Title Clock, Application Clock, or Page Clock – – – If an animation is synchronized to the Title Clock, it will pause when video pauses, jump to a timecoded animation frame or state when the video jumps to that timecode, play slow when the video plays slow, etc. One thing this enables is “video tracking hotspots”, which are graphics or interactive regions superimposed over “objects’’ in the video, such as adding a halo to a person who is walking around, appearing and disappearing from the video. If an animation is synchronized to the Application clock, it will continue to run or loop regardless of video playback If an animation is synchronized to an Application “page”, it can be run each time the page is loaded; for instance to do a menu build, or “fly in” a video image Audio/Video Output Synchronization • Most “DVD” video is 24 frame per second progressive source, such as movies and episodic television • HD DVD-Video perpetuates the practice of encoding 24P source as 30i by adding repeat field flags to generate 60Hz timing and (optionally) 3:2 pulldown • The HD DVD-V system is capable of ignoring the repeat flags and outputting pure 24 frame per second video, text, and graphics over HDMI … but • The current consumer electronics industry direction is to apply 3:2 pulldown and convert to 60 fields per second somewhere in the display pipeline in order to generate a raster signal for analog connections to CRT displays • It is very important that new HD displays and their HDMI inputs support 1080P24 input mode. Scaling and refresh should be handled in the display with methods appropriate for its particular display technology (which will rarely be CRT), and not add an extra step of inverse telecine detection, deinterlacing, scaling, and filtering The 50Hz/60Hz “Problem” • The legacy solution of +4% speed shift from 25Hz to 25Hz no longer works with compressed digital audio outputs (and was never really satisfactory) • HD DVD-V format requires that video be encoded at either 50Hz or 60Hz, so most content will be 24P encoded with 60HZ timing • Europe’s “HD Ready” logo indicates a display will handle both 50Hz and 60Hz HDMI input, but what about 24Hz? • Unless Europe (and other 50Hz regions) require 24Hz on HDMI displays, the options are: – Wait for a format converted 50Hz version of each disc – Watch the 60Hz version at 30i with 3:2 pulldown – Speed shift 24P to 25P and watch at 50Hz with pitch shifted uncompressed audio over HDMI The Interlace “Problem” • Most new DVD players and displays today support 480P over analog component interfaces at various refresh rates (e.g. 72Hz refresh) • But, the encoded video has reduced vertical resolution intended to reduce flicker on interlaced CRT displays (done by CCD sensors that mix adjacent “scan” lines, optical filters, FIR filters on resampling, etc.) • Deinterlace chips can’t restore the vertical resolution that was thrown away (a separate issue from the number of scan lines) • The industry needs to change this production and display model for HD DVD-V and BD!!! – Encode 1080P24 video at full vertical resolution to enable full resolution progressive display – Players must apply anti-alias and interlace filtering if they subsample and sequentially output 540 line fields for 1080i30 signal output (also applies to generated text and graphics) Take Aways on HD DVD-Video • XML Playlists accomplish “on the fly” editing and mixing in the player like EDLs or AAF on video editing work stations • Players include an HD video and graphics “blender”’ that alpha blends multiple planes of video, graphics and text in realtime with frame and pixel accuracy • Resources from various storage and network sources are marshaled and managed for realtime presentations that can be interactively navigated by users • Advanced audio and video codecs provide state of the art quality and efficiency including 1080P video and mathematically lossless 8 channel audio • Programmable and network updatable user experiences create new entertainment possibilities that combine the flexibility of the Web with the high quality and reliable consumer experience of DVD-Video Thank You