Technology Seminar MPEG Video Technology Kit Barritt Principal Lecturer Sony Training Centre this is not a rehearsal V2.0 Nov 2000 Content • • • • • • Standards Frame Structure Frame Types Frame Sequences Difference Frames Motion Prediction this is not a rehearsal Content • MPEG Encoding – Inter-frame Process • I/P/B Frames • Motion Prediction – Intra-frame Process • • • • this is not a rehearsal DCT Zig Zag Scan & Run length encoding Entropy coding Quantisation MPEG • Edit techniques • Multi-generation considerations • Compression System Comparisons – MPEG-2 422P@ML – DVC • MPEG System Layer – MPEG Transmission • Programme Stream • Transport Stream • PID, PAT & PMT – Time Clocks this is not a rehearsal Content • MPEG-4 – history – Audio coding – Video coding this is not a rehearsal The MPEG Standards Set MPEG-21 MPEG-7 MPEG MPEG-2 MPEG-1 MPEG-4 MPEG-3 this is not a rehearsal Levels and Profiles • MPEG-2 Profile @ Level • Profiles – Simple, Main, SNR, Spatial, High, 422 Profile • Levels – High, High-1440, Main, Low this is not a rehearsal Levels and Profiles Profile Frame Types Level Chroma Sampling I&P 4:2:0 Main I, P & B 4:2:0 Samples/line Lines/frame High Frames/sec Max Bit-rate (MBps) 1920 1152 60 80 Samples/line High Lines/frame 1440 Frames/sec Max Bit-rate (MBps) 1440 1152 60 60 Samples/line Main Lines/frame Frames/sec Max Bit-rate (MBps) Low this is not a rehearsal Simple Samples/line Lines/frame Frames/sec Max Bit-rate (MBps) 720 576 30 15 720 576 30 15 352 288 30 4 SNR I, P & B 4:2:0 Spatial I, P & B 4:2:0 High I, P & B 720 576 30 15 352 288 30 4 I, P & B 4:2:0 or 4:2:2 4:2:0 or 4:2:2 1920 1152 60 100 1440 1152 60 60 422 1920 1088 60 300 1440 1152 60 80 720 576 30 20 720 608 30 50 MPEG-2 Sample Structure 4:2:2 Represent Luminance Samples Represent Chrominance Samples this is not a rehearsal MPEG-2 Sample Structure 4:2:0 Represent Luminance Samples Represent Chrominance Samples Skip this is not a rehearsal MPEG Block Structure 8 x 8 Pixel Block Y, Cr or Cb Y Y Block Block Y Y Block Block Cr Block Cb Block + + Cb Block Cr Block 422P @ ML Macro Block consisting of Y, Cr & Cb Blocks this is not a rehearsal MPEG Frame Structure 45 Single Macro Block Slices Consisting of a Number of Macro Blocks 38 An MPEG 422P @ ML Frame of 45 x 38 Macro Blocks this is not a rehearsal Inter-field and BRR Process I Frame Input Video I Generate B/P Frame Bit Rate Reduction P/B INTER frame Process this is not a rehearsal INTRA frame Process Coded Stream to Application or Transmission Stream Prediction Frame Source Frame A Source Frame B Frame B is the same as A except for: Frame A is encoded as a I Frame Frame B is encoded as a P Frame this is not a rehearsal Prediction Frame Source Frame 1 Source Frame 2 Good Prediction Coded as I Frame this is not a rehearsal Source Frame 3 Good Prediction Coded as B Frame Coded as I Frame Frame Types • I Frame – Intra frame compression • P Frame – Forward Prediction frame • Typically 30% Size of I frame • B Frame – Bi-directional Prediction • May refer to I or P frames before or after the coding frame • Typically 50% the size of P frame (I.e 15% size of I frame) this is not a rehearsal MPEG Frame Sequence I 1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 I13 12 Frame GOP I 1 P2 B3 P4 I5 P6 B7 P8 I 9 4 Frame GOP I 1 B2 I 3 B4 I 5 B6 I 7 2 Frame GOP this is not a rehearsal MPEG P Frame Generation Presentation Stream Access Stream this is not a rehearsal Frame 1 Frame 2 I1 = Fr1 P2 = Fr1 - Fr2 Frame 3 Frame 4 Difference Frame Frame N (Frame N) - (Frame N+1) this is not a rehearsal Frame N+1 Difference Frame Frame N (Frame N) - (Frame N+1) this is not a rehearsal Frame N+1 Difference Frame Frame N (Frame N) - (Frame N+1) this is not a rehearsal Frame N+1 R Frame Generation Presentation Stream Access Stream Frame 1 Frame 2 I1 = Fr1 R2 = Fr 2 - Fr3 Frame 3 Note: MPEG does not support R Frames, they are a type of B Frame this is not a rehearsal Frame 4 B Frame Generation Presentation Stream Frame 1 1/2 Access Stream I1 = Fr1 Frame 2 - Frame 3 1/2 B2 = Fr1 + Fr3 -Fr2 2 Note: B frames can refer to I frames or P frames Motion Prediction can refer to earlier or later frames this is not a rehearsal Frame 4 Motion Prediction F1 F2 To Compression this is not a rehearsal Motion Prediction F2 F1 Y X To Compression (+ X, Y Vector Data) this is not a rehearsal Motion Prediction Frame N this is not a rehearsal Frame N+1 Motion Prediction Difference Frame Without Motion Prediction Graphic taken from: “MPEG Video Compression Standard” this is not a rehearsal Mitchell, Pennebaker, Fogg & LeGall Difference Frame With Motion Prediction MPEG Coder Video Stream Inter Frame (P/B) Coder Intra Frame Coder General Structure this is not a rehearsal MPEG Stream Typical Picture Most 8x8 Pixel areas are evenly shaded this is not a rehearsal MPEG Intra -frame Coder Access Units from Inter-frame coder this is not a rehearsal DCT Quantiser Zig Zag Scan Run Length Encoder Entropy Coder MPEG Stream DCT Process fx y DCT x Time Domain this is not a rehearsal fy Frequency Domain Time and Frequency Domain Oscilloscope Time Domain Data Values 0.1, 0.2, 0.3, 0.35, 0.3... Spectrum Analyser TP Data Values 0, 0, 0, 1, 0, 0, .... Frequency Domain this is not a rehearsal DCT Spacial Frequency Patterns this is not a rehearsal Typical DCT Transform 83 87 92 90 89 91 47 95 98 81 77 96 71 44 58 49 27 43 65 40 64 99 61 83 21 45 51 59 80 87 94 56 62 41 98 82 68 79 72 42 70 76 53 85 this is not a rehearsal 284 18 -12 2 3 -7 0 0 10 9 8 1 -4 1 0 0 55 -6 2 1 0 2 4 2 -1 48 63 2 1 -1 1 -3 0 2 0 74 75 57 0 0 0 1 2 0 0 -1 54 46 52 60 1 -2 1 0 1 -2 3 0 84 69 50 97 67 0 0 0 -1 1 2 0 0 88 73 66 78 86 0 0 0 0 0 0 0 0 DCT Signal Entropy -1 10 Sample 10-2 Probability -3 10 -4 10 -5 10 0 50 100 150 Signal Level Before DCT this is not a rehearsal 200 250 Signal Entropy -1 10 Sample 10-2 Probability -3 10 -4 10 -5 10 -128 After DCT this is not a rehearsal 0 Signal Level 128 Quantisation Original DCT Coeff Matrix Quantisation Matrix 213 213 / 16 11010101 Decoder DCT Matrix 16 13 1101 Quantisation Error: 213-208 = 5 this is not a rehearsal 208 x 16 = 208 11010000 Mosquitos Source File 2:1 Compressed JPEG 4:1 Compressed JPEG 12:1 Compressed JPEG this is not a rehearsal Mosquitoes 12:1 Compressed JPEG this is not a rehearsal Zig Zag Scan Pattern 4 5 0 1 2 3 0 0 1 5 6 14 15 27 28 1 2 4 7 2 3 1 2 0 0 4 6 20 22 36 38 52 13 16 26 29 42 1 1 5 7 21 23 37 39 53 8 12 17 25 30 41 43 2 2 8 19 24 34 40 50 54 3 9 11 18 24 31 40 44 53 3 3 9 18 25 35 41 51 55 4 10 19 23 32 39 45 52 54 4 10 17 26 30 42 46 56 60 5 20 22 33 38 46 51 55 60 5 11 16 27 31 43 47 57 61 6 21 34 37 47 50 56 59 61 6 12 15 28 32 44 48 58 62 v 7 35 36 48 49 57 58 62 63 v 7 13 14 29 33 45 49 59 63 this is not a rehearsal 3 4 5 6 u 7 0 Zig Zag Scan Definition (0) 6 u 7 Alternate Zig Zag Scan Definition (1) Huffman Code Table Variable Length Code 0000 110 s 0010 0110 s 0010 0001 s 0010 0101 s 0000 100 s 0010 0100 s 0001 01 s 0001 00 s 0000 111 s 0000 101 s 0010 0111 s 0010 0011 s 0010 0010 s 000001 10 this is not a rehearsal Run Length 0 0 0 1 2 3 6 7 8 9 10 11 12 Escape End_of_block Level 4 5 6 3 2 2 1 1 1 1 1 1 1 Escape Codes To System Comparison this is not a rehearsal Run Length 0 - 63 Size -2047 to +2047 6 Bits 12 Bits Editing Betacam SX • There are three forms of editing: – Camcorder based backspace Edit – Tape to tape editing – Hard disk based editing this is not a rehearsal Editing Betacam SX • The unit recorded on tape is a GOP (B-frame, IFrame) • Recordings must start and finish at GOP boundaries • B frame must be re-coded if the reference I-frame is removed this is not a rehearsal Editing Betacam SX B I B I B I B I R I A Backspace/Assembly Edit performed with an R frame this is not a rehearsal B B I B Insert Editing in a SX VTR • • • • The two machines are connected via SDI Edit is decode - re-code Frame accurate editing is possible Existing frames around the edit point must be recoded • Pre-read heads are used this is not a rehearsal Pre-read Process BRR Decode Input Video this is not a rehearsal Edit Switch BRR Encode Pre-read Tape Direction Record Tape to tape editing • Four possible scenarios exist – In-point is • B Frame • I frame – Out point is • B frame • I frame this is not a rehearsal In-point B-frame A B I B I B I B I B I B I B B B I B I B I B I B I B I B B I B I B I B´ I´ B I B I B In-point Insert Edit performed in an A220 this is not a rehearsal In-point I-frame A B I B I B I B I B I B I B B B I B I B I B I B I B I B B I B I B I B´ I´ B I B I B Rec On In-point this is not a rehearsal Out-point B-frame A B I B I B I B I B I B I B B B I B I B I B I B I B I B B I B I B I B´ I´ B I B I B Rec On Out-point this is not a rehearsal Out-point I-frame A B I B I B I B I B I B I B B B I B I B I B I B I B I B B I B I B I B I B´ I´ B I B Rec On Out-point this is not a rehearsal Editing Betacam SX A I B I B I B I B I B I B I B B I B I B I B I B I B I B B I B I B´ I´ B´ I´ I´ B B´ I´ B´ Rec On Frame accurate Editing this is not a rehearsal Editing MPEG I B I B I B I B I B I B B I I B A DEC B DEC Overlap Edit Hard disc edit where scene A+B are played out concurrently this is not a rehearsal Editing MPEG Edit Flag I B I B I Not Used B I B I B Used MPEG transmitted as SDDI/4 x SDDI with an edit flag, in decoder first B frame after an edit is processed as an R frame this is not a rehearsal Multigeneration Performance SNR(dB) -20 -30 -40 Key: First Generation, Q variable Second Generation, Q=34 -50 0 10 20 30 40 Quantisation Value this is not a rehearsal 50 60 GOP Shift 36 Chroma 35 34 SNR 33 (dB) Luminance 32 31 30 29 28 4 1 Generation Number No GOP Shift Between Generations GOP Shift Between Generations this is not a rehearsal System Comparison • Betacam SX – – – – – – – – 4:2:2 sampling 608 lines MPEG-2 422P@ML 10:1 Compression ‘Full picture’ compression ratio 9.1:1 Video Data Rate = 18 MBits/s Audio - 4 Channel, 48KHz, Uncompressed Recorded Data Rate = 43.8Mb/s this is not a rehearsal System Comparison • DVC – – – – – – – – 4:2:0 sampling 576 lines DVC Compression (intra frame) ‘5:1’ Compression Video data rate = 25 MBits/s ‘Full picture’ compression ratio 6.6:1 Audio - 2 channel, 48KHz, uncompressed Recorded data rate = 40.4Mb/s this is not a rehearsal Compression Ratios • Is compression ratio a good measure of quality? • What else must be considered? this is not a rehearsal Compression Ratio 875 KByte 875 KByte Frame 1 Frame 2 I B I B 175KByte Betacam SX Ratio: 10:1 this is not a rehearsal Compression Ratio 875 KByte 875 KByte Frame 1 Frame 2 I I I I 300KByte I frame only MPEG Ratio: ~6:1 (equiv. 30 Mbps) this is not a rehearsal Compression Ratio 875 KByte 875 KByte Frame 1 Frame 2 DV DV DV DV 250KByte DV Ratio: ‘5:1’ (quoted) ‘full picture’ ratio 6.6:1 this is not a rehearsal Compression Ratio Frame 1 I Frame 2 Frame 3 P P I PPP Data rate 4-8Mbps typically MPEG Transmission Ratio: ~20:1 this is not a rehearsal Frame 4 P Quality Comparison • Cannot use analogue measurement techniques – all systems produce ‘perfect’ results • Data rate or Compression ratio is not a good measure of quality • Different systems have different efficiencies • Must measure system based on visual impairment this is not a rehearsal Textronix PQA • Picture Quality Assessor – Compares reference sequence with recorder output – weights data on the basis of how we see – produces a PQR (Picture Quality Rating) • • • • this is not a rehearsal 0: 0-5, 5-10 10+ Perfect Good Fair Poor - No measurable error - No noticeable error - Discernable errors, but ‘Broadcast Quality’ - Not ‘Broadcast Quality’ MPEG System Layer 'Presentation Units' Fr 'Access Units' B1 'Elementary Stream' B1 PES Packetised Elementary Stream Fr 1 I2 I2 Fr 3 B3 B3 HD Fr 2 Data 4 I4 I4 HD Data Variable Length <64k (Except Video) this is not a rehearsal Stream Conversion • • • • SX native is output on SDTI Stream convertor turns this into TS This is data re-ordering and there is no quality loss TS can be interfaced to ATM – again this is loss-less re-odering SX is MPEG & Stream Conversion is a loss-less Process this is not a rehearsal Transcoding • Transcoding is used by different people to mean different processes • We use it to mean: The conversion of one MPEG bit-stream into another without decoding back to baseband (SDI) this is not a rehearsal Transcoding • Transcoding is a key technology • Many companies are working on transcoding – Atlantic project • Sony have a chip set and can demonstrate its function • Transcoding has the advantage of: – minimising the loss of quality during the process – allowing coder decisions to be reused in a subsequent process – improving signal quality • Principally transcoding keeps the signal in the MPEG domain this is not a rehearsal Transcoding Bit-rate / GOP Change Stream In E.g., MPEG2 @ 8 - 10 Mbps Transcode Decoder Video Data Coding Decisions (Vectors, Q-Factor, etc.) Metadata this is not a rehearsal Transcode Encoder Stream Out E.g., MPEG2 @ 4 Mbps Transcoding MPEG Digi.. Beta Digi Editing I or IB 4:2:2 Server Transcoding Long/Short GOP 4:2:2 Archive (DTF) Long/Short GOP 4:2:2 this is not a rehearsal Transmission Transcoding DV Production Transcoding Analog Contribution /Distribution MPEG ENCODE Acquisition Long GOP 4:2:0 Transcoding 35 34 MPEG Transcoding Quality 33 32 31 30 29 28 1 2 3 4 5 Generation this is not a rehearsal 6 7 Digital-S/MPEG Hybrid(GOP shift) MPEG ES Video Sequence Header GOP Header Picture Header Picture Data Picture Header Picture Data Slice 1 Slice 2 Slice .. MacroBlock1 MacroBlock2 MacroBlock.. DCTData1 DCTData2 this is not a rehearsal DCTData.. Sequence End code MPEG Programme Stream Video Audio Data Clock Packetised Elemental Streams M U X Programme Stream Pack PH SH PH SH Multiplexed PES Packets System Header (Optional) Contains: eg.*Max Data Rate, No of Streams Timing *can be used by RX to determine whether to attempt to decode stream Pack Header Contains: System Clock Ref this is not a rehearsal PES Header PES-packet 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 PES-packet start code prefix x x x x x x x x Stream_id x x x x x x x x x x x x x x x x PES-packet length 1 0 x x x x x x P D x x x x x x x x x x x x x x Flags 1 Flags 2 PES header data length Presentation Time Stamp (if present) Decoding Time Stamp (if present) Further optional fields this is not a rehearsal MPEG Transport Stream PES Packets HD HD Data P P P Payload Alignment P P Header P P Adaption Field AF/Payload HD 188 Bytes Sync 8 TEI 1 PUSI 1 TP 1 PID 13 TSC 2 AFC 2 CC 4 47H/01000111 Transport Error Indicator Payload Unit Start Indicator Transport Priority Packet ID (which prgramme in stream) 17 have special meaning, therefore 8175 left Transport Scramble Control Adaption Field Control Continuity Counter (increments every 188 bytes) Total 32 bits this is not a rehearsal MPEG Transport Stream Generation Video 1 188 Video 2 Audio 1 Transport Stream 188 bytes Audio 2 V Service Info Private Data Null Packs this is not a rehearsal A V V SI PD Null V PMT etc. Programme Map Table Transport Packet containing PMT Programme Map Table for Prog. 3 PID for Clock Reference PID for Video PID for Audio PID for Subtitle Data etc this is not a rehearsal 46 512 76 5 Programme Control PID = 0 Programme Number 0 1 17 23 312 etc PID value for Prog. Map Table Network Information Table 12 43 325 871 6 etc PMT for Programme 1 PMT for Programme 2 this is not a rehearsal MPEG Clock System System Clock >Once per 0.7 S System Clock Sync DEC Clock 27MHz 42 Bit Release Time Stamp DEC Buffer 90kHz 33Bit Encoder Decoder this is not a rehearsal Time Stamps Buffer Decoder Monitor MPEG Stream DTS this is not a rehearsal PTS MPEG Transmission Sequence Presentation Stream F1 F2 F3 F4 F5 Access Stream I1 P2 B3 P4 I5 Transmission Stream this is not a rehearsal I1 P2 P4 B3 I5 MPEG-4 • Originally intended for very low data rate for: – Video conferencing – Portable video phones • • • • Low data limit raised (now 5Kbps to 10Mbps) Wide range of resolutions supported Progressive and interlace supported Now intended for complex multimedia type applications this is not a rehearsal MPEG-4 • Audiovisual scene made up of media objects: – Objects can be: • Natural or Synthetic • Audio or Video – Objects are composited into the scene – Objects can interact with objects at the receiver’s end this is not a rehearsal Audio Coding • Profiles: – – – – – – – – Speech Synthesis Scalable Main High quality Audio Low delay audio Natural audio Mobile audio internetworking this is not a rehearsal Natural Audio Coding 2 4 6 8 10 12 14 16 24 32 48 64 Scalable Coder TTS Speech coding General audio Coding 4KHz 8KHz Typical Audio Bandwidth this is not a rehearsal 20KHz Audio Coding • Synthetic Methods – TTS (Text to speech) • 200bps to 1200bps • text plus prosodic parameters • interface standard not normative sythesizer – Score driven • SAOL (Structured audio orchestra language) • Synthesis can be by many processes – wavetable, FM,additive and others • control is via “score” • MIDI can be used this is not a rehearsal Video Coding • Profiles – for natural objects • • • • • Simple Simple scalable Core Main N-bit – for synthetic objects • • • • this is not a rehearsal Simple facial animation Scalable textural Basic Animated 2D Texture Hybrid visual Video Coding • Synthetic objects – parametric coding of • faces • bodies • static and dynamic meshes • Face Animation – – – – Facial description parameters Facial animation parameters Face animation table Face interpolation technique this is not a rehearsal Video coding • 2D animated meshes – region of polygons – could have texture map Example of a mesh from the MPEG-4 Standard this is not a rehearsal Sprite Coding • MPEG-4 supports moving sprite over static background Example of sprite coding from the MPEG-4 Standard this is not a rehearsal MPEG Products • Broadcast – VTRs - Betacam SX, IMX – Servers - MAV-70, MAV-555 • Contribution – EBU Standard • DVD – players and authoring systems • Transmission – Satellite and terrestrial this is not a rehearsal MPEG Open World • Pro-MPEG Forum – Consortium of many companies working toward greater Interoperabilty www.pro-mpeg.org this is not a rehearsal Thank You this is not a rehearsal