Chapter 10 Video Multimedia Systems Key Points The display of moving pictures depends on persistence of vision. Uncompressed video requires 26MBytes per sec (NTSC) or 31MBytes per sec (PAL). Digitization may be performed in the camera (e.g. DV) or using a capture card attached to a computer. NTSC, PAL and SECAM are analogue video standards. All three use interlaced fields. Key Points CCIR 601 is a standard for digital video. It uses Y'CBCR colour with 4:2:2 chrominance sub-sampling. The data rate is 166Mbits per sec. Video compression can make use of spatial (intra-frame) and temporal (inter-frame) compression. Spatial compression is still-image compression applied to individual frames. Temporal compression is based on frame differences and key frames. Motion JPEG applies JPEG compression to each frame. It is usually performed in hardware. Cinepak, Intel Indeo and Sorenson are popular software codecs used in multimedia. They are based on vector quantization. Key Points MPEG video is an elaborate codec that combines DCT-based compression of key frames (I-pictures) with forward and backward prediction of intermediate frames (P-pictures and B-pictures) using motion compensation. QuickTime is a component-based multimedia architecture providing cross-platform support for video, and incorporating many codecs. It has its own file format that is widely used for distributing video in multimedia. Digital video editing is non-linear (like film editing). Key Points Most digital post-production tasks are applications of image manipulation operations to the individual frames of a clip. For delivery using current technology, it may be necessary to sacrifice frame size, frame rate, colour depth, and image quality. Streamed video is played as soon as it arrives without being stored on disk, so it allows for live transmission and `video on demand'. Moving Pictures All current moving pictures depend on the following phenomena – Persistence of vision A lag in the eye's response results 'after-images' – Fusion frequency If a sequence of still images is presented above this frequency, we will experience a continuous visual sensation Depend on brightness of image relative to viewing environment Below this frequency will perceived flickering effect Generate Moving Pictures Video – Use video camera to capture a sequence of frames Animation – Generate each frame individually either by computer or by other means Digital Video A video sequence consists of a number of frames Each frame is a single image produced by digitizing time-varying signal generated by video camera Digital Video Think about the size of the uncompressed digital video – NTSC video format Bitmapped images for video frame – 640 480 pixels with 24-bit color = 0.9 MB/frame 30 frames per second – 900 kb/frame 30 frames/sec = 26 MB/sec 60 seconds per minute – 26 MB/sec 60 secs/minute = 1,600 MB/minute Strains on current processing, storage and data transmission ! Create Digital Video Get analog/digital video signal from – video camera – video tape recorder (VTR) – broadcast signal Digitize analog video & compress it Digitizing Analog Video In computer – Video capture card Convert analog to digital & compress Can also decompress & convert digital to analog – Compress through Video capture card (hardware codec) Software (software codec) Digitizing Analog Video In camera – Digitize and compress using circuitry inside camera – Transfer digitized signal from camera to computer through IEEE 1394 interface (FireWire): 400 Mb/sec USB: 12Mb/sec(version 1.1) ~ 480 Mb/sec(version 2.0) Digitize in Computer v.s. Camera Digitize in camera – Advantage Digital signals are resistant to corruption when transmitted down cables and stored on tape – Disadvantage User has no control between picture quality and data rate (file size) Analog Video Standard Frame/Field Rate (per sec) NTSC PAL SECA M Scan Aspect Horizontal/Vertical Lines Ratio frequency 525 29.97 / 59.94 (480) 25 / 50 625 (576) 25 / 50 625 (576) 4:3 4:3 15.734 kHZ / 60 HZ 15.652 kHZ / 50 HZ Worldwide Standard US, Japan, Taiwan, … Western Europe, … France, … 4:3 15.652 kHZ / 50 HZ Display Video on TV Cross-section of CRT Delta-delta shadow-mask CRT (Scan From “Computer Graphics: Principles and Practice”) Field and Interlace Transmitting many entire pictures in a second requires a lot of bandwidth Field – Divide each frame into two fields One consisting of the odd-numbered lines of each frame, the other of the even lines Interlace – Each frame is built up by interlacing the fields PAL – 50 fields/sec => 25 frames/sec NTSC – 59.94 fields/sec => 29.97 frames/sec Display Video on Computer Progressive scanning – Write all lines of each frame to frame buffer – Refresh whole screen from frame buffer at high rate Display Video TV … frame i frame i+1 frame i+2 frame i+3 … … Computer Monitor Field & Interlace Artifacts A video clip of flash light on the water surface Odd lines of frame i Even lines of frame i+1 Combine previous two analog video for progressive display Field & Interlace Artifacts Prevent Interlace Artifacts Average two field to construct a single frame Discard half fields and interpolate remained fields to construct a full frame Convert each field into a single frame (reduce frame rate but much better !) Types of Analog Video Component video – Three components: Y (luminance), U and V (color) – Often use in production and post-production Composite video – Combine three components into a signal – Color component (U and V) is allocated half bandwidth as the luminance (Y) – Often use in transmission S-video – Separates the luminance from the two color (total two signals) Digital Video Standards CCIR 601 (Rec. ITU-R BT.601) – specifies the image format, and coding for digital television signals Parameter Value YUV encoding 4:2:2 Sampling frequency for Y (MHz) 13.5 Sampling frequency for U and V (MHz) 6.75 No of samples per line 720 No of levels for Y component 220 No of levels for U,V components 225 Perplexing NTSC System 1 2 3 1 2 3 … … 480 480 1 2 3 … Analog to digital Pixels are square 640 1 2 3 … CCIR 601 standard Pixels are not square 720 CCIR 601 Sampling 4 : 2 : 2 sampling (co-site) 4 : 2 : 0 sampling (not co-site) Y samples CB and CR samples 4 : 1 : 1 sampling (co-site) Compression & Data Stream Standards Sampling produces a digital representation of a video signal This must be compressed and then formed into a data stream for transmission Further standards are needed to specify the compression algorithm and the format of the data stream Compression & Data Stream Standards DV standard – For semi-professional & news-gathering MPEG-2 standard – For family use – Organized into different profiles and levels The most combination is Main Profile at Main Level (MP@ML) Used for digital television broadcasts & DVD video Introduction to Video Compression Adapted to consumers’ hardware, video data needs to be compressed twice – First during capture – Then again when it is prepared for distribution Video Compression Digital video compression algorithms operate on a sequence of bit-mapped images – Spatial compression (intra-frame) Compress each individual image in isolation – Temporal compression (inter-frame) Store the differences between sub-sequences of frames Spatial Compression Compress method is similar to image compression – Lossless No information loss Compression ratios is lower – Lossy Some information loss Compression ratios is higher Why recompressing video is unavoidable – The compressor used for capture are not suitable for multimedia delivery – For post-production Temporal Compression Key frames – Certain frames in a sequence are designated as key frames Difference frame – Each of the frames between the key frames is replaced by a difference frame – Records only the differences between the frames Time Required for Compression & Decompression Symmetrical – Compression & decompression of a piece of video take the same time Asymmetrical – Compression & decompression of a piece of video not take the same time – Generally Compression takes longer time Motion JPEG (MJPEG) A popular approach to compressing video during capture Applying JPEG compression to each frame (No temporal compression) Therefore it is called “Motion JPEG” DV Compression based on DCT transform Perform temporal compression (motion compensation) between two fields of each frame Quality is varied dynamically to maintain constant data rate MJPEG v.s DV Date Rates Compression method Compression ratio MJPEG 24 Mb/sec JPEG compression 7:1 (mid-range capture card) DV 25 Mb/sec DCT based compression Motion compensation 5:1 (4:1:1 sample) Software Codecs for Multimedia Popular software codecs – MPEG-1 – Cinepak – Intel Indeo – Sorenson Vector Quantization Source Output Group into vectors encoder Find closest code-vector code book index Reconstruction decoder Table lookup index code book Unblock MPEG Stand for Motion Picture Experts Group (Joint of the ISO and the IEC) Works on standards for the coding of moving pictures and associated audio MPEG Family MPEG – 1 – Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mb/s MPEG – 2 – Generic coding of moving pictures and associated audio – For broadcasting & studio work MPEG – 3 – no longer exists (has been merged into MPEG-2) MPEG – 4 – Very low bit rate audio-visual (integrated multimedia) coding MPEG Family MPEG – 7 – Multimedia content description interface MPEG – 21 – Vision statement To enable transparent & augmented use of multimedia resources across a wide range of networks and devices – Objectives To understand how the elements fit together To identify new standards which are required if gaps in the infrastructure exist To accomplish the integration of different standards MPEG–1 Standard Defines a data stream syntax and a decompressor, allowing manufacturers to develop different compressors MPEG-1 compression – Temporal compression based on motion compensation – Spatial compression based on quantization & coding of frequency coefficients produced by a DCT of the data MPEG –1 Objective Medium quality video (VHS-like) Bit rate < 1.5 Mb/s – 1.15 Mb/s for video – 350 kb/s for audio & additional data Asymmetrical application – Store video & audio on CD-ROM Picture format : SIF (Source Input Format) – 4:2:0 sub-sampled – Frame size @ frequency rate 352 288 @ 25 HZ 352 240 @ 30 HZ An object moving between frames Area of potential change Motion Compensation Divide each frame into macroblocks of 16 16 pixels Predict where the corresponding macroblock in next frame – Try all possible displacements within a limited range – Choose the best match Construct difference frame by subtracting each macroblock from its predicted counterpart – Keep the motion vectors describing the predicted displacement of macroblocks between frames Picture Type I (intra) pictures – Code without reference to other pictures – Low compression rate P (predicted) pictures – Code using motion compensated prediction from a past I or P picture – Higher compression rate than I picture B (bidirectional-predicted) pictures – Code bidirectional interpolation between the I or P picture which preceded & followed them – Highest compression rate All are compressed using the MPEG version of JPEG compression P I B 01 B 02 03 I B 04 B 05 06 B B 11 I P 12 13 B 14 B 15 16 21 Group of Pictures (GOP) An MPEG sequence in display order P I I B 01 04 02 B 03 B 11 I P 05 B B 06 14 12 B 13 B 21 An MPEG sequence in bitstream order (decode order) 15 B 16 MPEG-1 視頻壓縮技術 運動補償 (Motion Compensation) 頻率變換 (Frequency Transform) 可變長度編碼 (Variable Length Coding) 彩色信號 subsampling 量化 (Quantization) 預測編碼 圖像插值 QuickTime Apple, 1991 Time base, non-linear editing Component-based architecture – Compressor components Cinepak, Intel Indeo codec – Sequence grabber components – Movie control component – Transcoder Translate data between different formats – Video digitizer component Support MPEG-1, DV, OMF, AVI, OpenDML Digital Video Editing & Post-production Editing Compositing Reverse shot – Conversation between two people Film & Video Editing Traditional In point and out point Timecode – SMPTE timecode – Hours, minutes, seconds, frames VHS – Two copying operations is to produce serious loss of quality – Constructed linearly Digital Video Editing Random access Non-destructive Premiere – Three main windows Project, timeline, monitor Figs. 10.12-14 – Timelines Have several video tracks – Transitions, Fig. 10.15 – Cuts and Transitions In a cut, two clips are butted In transitions, two clips overlap – Image processing is required to construct transitional frames Digital Video Post-production Over- or under-exposed, out of focus, color cast, digital artifacts – Provide image manipulation programs Adjust level, sharpen, blur – The same correction may be needed for every frame, so the levels can be set for the first frame and the adjustment will be applied to as many frames as user specifies. – If light fades during a sequence, it will necessary to increase the brightness gradually to compensate. Apply a suitable correction to each frame and allow their values at intermediate frames to be interpolated Varying parameter values over time Keying Selecting transparent areas Blue screening – Chroma keying: any color – Alpha channel – Luma keying: a brightness threshold is used to determine which areas are transparent Select explicitly – – – – Create mask In film and video, mask is called matte Matte out: removing unwanted elements Split-screen effects Alpha channel created in other application Track matte Chroma keying and luma keying – Color and brightness changes between frames – Use a sequence of masks as matte Separate video track: track matte – Track matte Painstaking by hand Generated from a single still image applying simple geometrical transformations over time to create a varying sequence of mattes Adobe After Effects Apply a filter to a clip and vary it over time A wide range of controls for the filter’s parameters Premiere: parameter values are interpolated linearly between key frames After effect: interpolation can use Bezier curves Preparing Video for Multimedia Delivery Frame size, frame rate, color depth, image quality People sit close to monitors, so a large picture is not necessary Higher frame rates are needed to eliminate flicker only if display is refreshed at the same rate. – Computer monitors are refreshed at a much higher rate from VRAM. Limiting colors – Not all codecs support Streamed Video & Video Conference Streamed video – Delivering video data stream from a remote server, to be displayed as it arrives – As against downloading an entire files to disk & playing it from there – Opens up the possibility of delivering live video on computers Streamed Video & Video Conference Video conference – Streamed video doesn't restricted to a single transmitter broadcasting to many consumers: Any suitably equipped computer can act both as receiver & transmitter – Users on several machines can communicate visually, taking part in what is usually called a video conference Single transmitter Multiple receiver All computer are receiver & transmitter Obstacle to Streamed Video Bandwidth – SIF MPEG-1 video require a bandwidth of 1.86 Mb/sec – Decent quality streamed video is restricted to LAN, T1 lines, ADSL & cable modems for now Delivering time over network – Deliver data with the minimum of delay Delay may cause independently delivered video & audio stream to lose synchronization Conventional Delivery of WWW Video Embedded video – Transfer movie files from a server to the user’s machine – Playback from disk once the entire file has arrived Progressive download (HTTP streaming) – Transfer movie files to user’s disk – Start playing as soon as enough of it has arrived – The file usually remains on the user’s disk after 100 playback is completed download Cannot be used for live video ! 0 play start playing time True Streaming Each stream frame is played as soon as it arrives over the network Video files is never stored on the user’s disk – Length of streamed movie is limited only by the storage size at the server, not by the user’s machine Suit for live video & video on demand (VOD) – The network must be able to deliver the data stream fast enough for playback – Movie’s data rate & quality is restricted to what the network can deliver State of the Art Leading technique over the Internet – RealVideo (Real Networks) – Streaming QuickTime (Apple) – Media Player (Microsoft) Architecture – RTSP (Real Time Streaming Protocol) Control the playback of video streams – Providing several versions of a movie compressed Server chooses the appropriate one to fit the speed of the user’s connection Codecs for Video Conferencing H.261 – Designed for two-way telecommunication applications over ISDN – A precursor of MPEG-1 DCT-based compression with motion compensation It does not use B-pictures H.263 – Very low bit rate video H.263+ – An extension of H.263 H.261 Real time constrain – Video conference cannot tolerate longer delays without becoming disjointed – Maximum delay: 150 ms (about 7 frames/sec) Bit rate: p 64 kbps (p = 1 ~ 30) Picture format – CIF (Common Intermediate Format) Component(size): Y(352 288), Cb & Cr(176 144) Picture rate: 29.97 frames/sec – QCIF (Quarter CIF) Component(size): Y(176 144), Cb & Cr(88 72) H.263 Very low bit rate video (< 64 kbps) Primary target rate is about 27 kbps (V.34 modem) Compression techniques – Chroma sub-sampling 4:2:0 – DCT compression with quantization – Run-length and variable-length encoding of coefficients – Motion compensation with forward & backward prediction – Compress a QCIF picture as low as 3.5 frames/sec