Technology Seminar MPEG Video Technology Kit Barritt Principal Lecturer

advertisement
Technology Seminar
MPEG Video Technology
Kit Barritt
Principal Lecturer
Sony Training Centre
this is not a rehearsal
V2.0 Nov 2000
Content
•
•
•
•
•
•
Standards
Frame Structure
Frame Types
Frame Sequences
Difference Frames
Motion Prediction
this is not a rehearsal
Content
• MPEG Encoding
– Inter-frame Process
• I/P/B Frames
• Motion Prediction
– Intra-frame Process
•
•
•
•
this is not a rehearsal
DCT
Zig Zag Scan & Run length encoding
Entropy coding
Quantisation
MPEG
• Edit techniques
• Multi-generation considerations
• Compression System Comparisons
– MPEG-2 422P@ML
– DVC
• MPEG System Layer
– MPEG Transmission
• Programme Stream
• Transport Stream
• PID, PAT & PMT
– Time Clocks
this is not a rehearsal
Content
• MPEG-4
– history
– Audio coding
– Video coding
this is not a rehearsal
The MPEG Standards Set
MPEG-21
MPEG-7
MPEG
MPEG-2
MPEG-1
MPEG-4
MPEG-3
this is not a rehearsal
Levels and Profiles
• MPEG-2 Profile @ Level
• Profiles
– Simple, Main, SNR, Spatial, High, 422 Profile
• Levels
– High, High-1440, Main, Low
this is not a rehearsal
Levels and Profiles
Profile
Frame Types
Level
Chroma Sampling
I&P
4:2:0
Main
I, P & B
4:2:0
Samples/line
Lines/frame
High
Frames/sec
Max Bit-rate (MBps)
1920
1152
60
80
Samples/line
High Lines/frame
1440 Frames/sec
Max Bit-rate (MBps)
1440
1152
60
60
Samples/line
Main Lines/frame
Frames/sec
Max Bit-rate (MBps)
Low
this is not a rehearsal
Simple
Samples/line
Lines/frame
Frames/sec
Max Bit-rate (MBps)
720
576
30
15
720
576
30
15
352
288
30
4
SNR
I, P & B
4:2:0
Spatial
I, P & B
4:2:0
High
I, P & B
720
576
30
15
352
288
30
4
I, P & B
4:2:0 or 4:2:2 4:2:0 or 4:2:2
1920
1152
60
100
1440
1152
60
60
422
1920
1088
60
300
1440
1152
60
80
720
576
30
20
720
608
30
50
MPEG-2 Sample Structure
4:2:2
Represent Luminance Samples
Represent Chrominance Samples
this is not a rehearsal
MPEG-2 Sample Structure
4:2:0
Represent Luminance Samples
Represent Chrominance Samples
Skip
this is not a rehearsal
MPEG Block Structure
8 x 8 Pixel Block
Y, Cr or Cb
Y
Y
Block Block
Y
Y
Block Block
Cr
Block
Cb
Block
+
+
Cb
Block
Cr
Block
422P @ ML Macro Block consisting of
Y, Cr & Cb Blocks
this is not a rehearsal
MPEG Frame Structure
45
Single Macro Block
Slices Consisting of a
Number of Macro Blocks
38
An MPEG 422P @ ML Frame of 45 x 38 Macro Blocks
this is not a rehearsal
Inter-field and BRR Process
I Frame
Input
Video
I
Generate
B/P Frame
Bit Rate
Reduction
P/B
INTER frame Process
this is not a rehearsal
INTRA frame Process
Coded Stream to
Application or
Transmission Stream
Prediction Frame
Source Frame A
Source Frame B
Frame B is the
same as A
except for:
Frame A is encoded as a I Frame Frame B is encoded as a P Frame
this is not a rehearsal
Prediction Frame
Source Frame 1
Source Frame 2
Good Prediction
Coded as I Frame
this is not a rehearsal
Source Frame 3
Good Prediction
Coded as B Frame
Coded as I Frame
Frame Types
• I Frame
– Intra frame compression
• P Frame
– Forward Prediction frame
• Typically 30% Size of I frame
• B Frame
– Bi-directional Prediction
• May refer to I or P frames before or after the coding frame
• Typically 50% the size of P frame (I.e 15% size of I frame)
this is not a rehearsal
MPEG Frame Sequence
I 1 P2
P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 I13
12 Frame GOP
I 1 P2 B3 P4 I5 P6 B7 P8 I 9
4 Frame GOP
I 1 B2 I 3 B4 I 5 B6 I 7
2 Frame GOP
this is not a rehearsal
MPEG P Frame Generation
Presentation
Stream
Access
Stream
this is not a rehearsal
Frame 1
Frame 2
I1 = Fr1
P2 = Fr1 - Fr2
Frame 3
Frame 4
Difference Frame
Frame N
(Frame N) - (Frame N+1)
this is not a rehearsal
Frame N+1
Difference Frame
Frame N
(Frame N) - (Frame N+1)
this is not a rehearsal
Frame N+1
Difference Frame
Frame N
(Frame N) - (Frame N+1)
this is not a rehearsal
Frame N+1
R Frame Generation
Presentation
Stream
Access
Stream
Frame 1
Frame 2
I1 = Fr1
R2 = Fr 2 - Fr3
Frame 3
Note: MPEG does not support R Frames, they are a type of B Frame
this is not a rehearsal
Frame 4
B Frame Generation
Presentation
Stream
Frame 1
1/2
Access
Stream
I1 = Fr1
Frame 2
-
Frame 3
1/2
B2 = Fr1 + Fr3 -Fr2
2
Note: B frames can refer to I frames or P frames
Motion Prediction can refer to earlier or later frames
this is not a rehearsal
Frame 4
Motion Prediction
F1
F2
To Compression
this is not a rehearsal
Motion Prediction
F2
F1
Y
X
To Compression
(+ X, Y Vector Data)
this is not a rehearsal
Motion Prediction
Frame N
this is not a rehearsal
Frame N+1
Motion Prediction
Difference Frame
Without Motion Prediction
Graphic taken from: “MPEG Video Compression Standard”
this is not
a rehearsal
Mitchell, Pennebaker, Fogg & LeGall
Difference Frame
With Motion Prediction
MPEG Coder
Video
Stream
Inter Frame
(P/B) Coder
Intra Frame
Coder
General Structure
this is not a rehearsal
MPEG
Stream
Typical Picture
Most 8x8 Pixel areas are evenly shaded
this is not a rehearsal
MPEG Intra -frame Coder
Access Units
from Inter-frame
coder
this is not a rehearsal
DCT
Quantiser
Zig Zag
Scan
Run Length
Encoder
Entropy
Coder
MPEG
Stream
DCT Process
fx
y
DCT
x
Time Domain
this is not a rehearsal
fy
Frequency Domain
Time and Frequency Domain
Oscilloscope
Time Domain
Data Values
0.1, 0.2, 0.3, 0.35, 0.3...
Spectrum Analyser
TP
Data Values
0, 0, 0, 1, 0, 0, ....
Frequency Domain
this is not a rehearsal
DCT Spacial Frequency
Patterns
this is not a rehearsal
Typical DCT Transform
83
87
92
90
89
91
47
95
98
81
77
96
71
44
58
49
27
43
65
40
64
99
61
83
21
45
51
59
80
87
94
56
62
41
98
82
68
79
72
42
70
76
53
85
this is not a rehearsal
284 18 -12
2
3
-7
0
0
10
9
8
1
-4
1
0
0
55
-6
2
1
0
2
4
2
-1
48
63
2
1
-1
1
-3
0
2
0
74
75
57
0
0
0
1
2
0
0
-1
54
46
52
60
1
-2
1
0
1
-2
3
0
84
69
50
97
67
0
0
0
-1
1
2
0
0
88
73
66
78
86
0
0
0
0
0
0
0
0
DCT
Signal Entropy
-1
10
Sample 10-2
Probability
-3
10
-4
10
-5
10
0
50
100
150
Signal
Level
Before DCT
this is not a rehearsal
200
250
Signal Entropy
-1
10
Sample 10-2
Probability
-3
10
-4
10
-5
10
-128
After DCT
this is not a rehearsal
0
Signal
Level
128
Quantisation
Original
DCT Coeff Matrix
Quantisation
Matrix
213
213 / 16
11010101
Decoder
DCT Matrix
16
13
1101
Quantisation Error: 213-208 = 5
this is not a rehearsal
208
x 16 = 208
11010000
Mosquitos
Source File
2:1 Compressed JPEG
4:1 Compressed JPEG 12:1 Compressed JPEG
this is not a rehearsal
Mosquitoes
12:1 Compressed JPEG
this is not a rehearsal
Zig Zag Scan Pattern
4
5
0
1
2
3
0
0
1
5
6 14 15 27 28
1
2
4
7
2
3
1
2
0
0
4
6 20 22 36 38 52
13 16 26 29 42
1
1
5
7 21 23 37 39 53
8 12 17 25 30 41 43
2
2
8
19 24 34 40 50 54
3
9 11 18 24 31 40 44 53
3
3
9
18 25 35 41 51 55
4
10 19 23 32 39 45 52 54
4
10 17 26 30 42 46 56 60
5
20 22 33 38 46 51 55 60
5
11 16 27 31 43 47 57 61
6
21 34 37 47 50 56 59 61
6
12 15 28 32 44 48 58 62
v 7
35 36 48 49 57 58 62 63
v 7
13 14 29 33 45 49 59 63
this is not a rehearsal
3
4
5
6
u
7
0
Zig Zag Scan
Definition (0)
6
u
7
Alternate Zig Zag Scan
Definition (1)
Huffman Code Table
Variable Length Code
0000 110 s
0010 0110 s
0010 0001 s
0010 0101 s
0000 100 s
0010 0100 s
0001 01 s
0001 00 s
0000 111 s
0000 101 s
0010 0111 s
0010 0011 s
0010 0010 s
000001
10
this is not a rehearsal
Run Length
0
0
0
1
2
3
6
7
8
9
10
11
12
Escape
End_of_block
Level
4
5
6
3
2
2
1
1
1
1
1
1
1
Escape Codes
To System Comparison
this is not a rehearsal
Run Length
0 - 63
Size
-2047 to +2047
6 Bits
12 Bits
Editing Betacam SX
• There are three forms of editing:
– Camcorder based backspace Edit
– Tape to tape editing
– Hard disk based editing
this is not a rehearsal
Editing Betacam SX
• The unit recorded on tape is a GOP (B-frame, IFrame)
• Recordings must start and finish at GOP boundaries
• B frame must be re-coded if the reference I-frame is
removed
this is not a rehearsal
Editing Betacam SX
B
I
B
I
B
I
B
I
R
I
A
Backspace/Assembly Edit
performed with an R frame
this is not a rehearsal
B
B
I
B
Insert Editing in a SX VTR
•
•
•
•
The two machines are connected via SDI
Edit is decode - re-code
Frame accurate editing is possible
Existing frames around the edit point must be recoded
• Pre-read heads are used
this is not a rehearsal
Pre-read Process
BRR
Decode
Input
Video
this is not a rehearsal
Edit
Switch
BRR
Encode
Pre-read
Tape
Direction
Record
Tape to tape editing
• Four possible scenarios exist
– In-point is
• B Frame
• I frame
– Out point is
• B frame
• I frame
this is not a rehearsal
In-point B-frame
A
B
I
B
I
B
I
B
I
B
I
B
I
B
B
B
I
B
I
B
I
B
I
B
I
B
I
B
B
I
B
I
B
I
B´ I´
B
I
B
I
B
In-point
Insert Edit
performed in an A220
this is not a rehearsal
In-point I-frame
A
B
I
B
I
B
I
B
I
B
I
B
I
B
B
B
I
B
I
B
I
B
I
B
I
B
I
B
B
I
B
I
B
I
B´ I´
B
I
B
I
B
Rec On
In-point
this is not a rehearsal
Out-point B-frame
A
B
I
B
I
B
I
B
I
B
I
B
I
B
B
B
I
B
I
B
I
B
I
B
I
B
I
B
B
I
B
I
B
I
B´ I´
B
I
B
I
B
Rec On
Out-point
this is not a rehearsal
Out-point I-frame
A
B
I
B
I
B
I
B
I
B
I
B
I
B
B
B
I
B
I
B
I
B
I
B
I
B
I
B
B
I
B
I
B
I
B
I
B´ I´
B
I
B
Rec On
Out-point
this is not a rehearsal
Editing Betacam SX
A
I
B
I
B
I
B
I
B
I
B
I
B
I
B
B
I
B
I
B
I
B
I
B
I
B
I
B
B
I
B
I
B´
I´
B´ I´
I´
B
B´ I´ B´
Rec On
Frame accurate Editing
this is not a rehearsal
Editing MPEG
I
B
I
B
I
B
I
B
I
B
I
B
B
I
I
B
A
DEC
B
DEC
Overlap Edit
Hard disc edit where scene A+B are played out concurrently
this is not a rehearsal
Editing MPEG
Edit Flag
I
B
I
B
I
Not Used
B
I
B
I
B
Used
MPEG transmitted as SDDI/4 x SDDI with an edit flag,
in decoder first B frame after an edit is processed as an R frame
this is not a rehearsal
Multigeneration Performance
SNR(dB)
-20
-30
-40
Key:
First Generation, Q variable
Second Generation, Q=34
-50
0
10
20
30
40
Quantisation Value
this is not a rehearsal
50
60
GOP Shift
36
Chroma
35
34
SNR 33
(dB)
Luminance
32
31
30
29
28
4
1
Generation Number
No GOP Shift Between Generations
GOP Shift Between Generations
this is not a rehearsal
System Comparison
• Betacam SX
–
–
–
–
–
–
–
–
4:2:2 sampling
608 lines
MPEG-2 422P@ML
10:1 Compression
‘Full picture’ compression ratio 9.1:1
Video Data Rate = 18 MBits/s
Audio - 4 Channel, 48KHz, Uncompressed
Recorded Data Rate = 43.8Mb/s
this is not a rehearsal
System Comparison
• DVC
–
–
–
–
–
–
–
–
4:2:0 sampling
576 lines
DVC Compression (intra frame)
‘5:1’ Compression
Video data rate = 25 MBits/s
‘Full picture’ compression ratio 6.6:1
Audio - 2 channel, 48KHz, uncompressed
Recorded data rate = 40.4Mb/s
this is not a rehearsal
Compression Ratios
• Is compression ratio a good measure of quality?
• What else must be considered?
this is not a rehearsal
Compression Ratio
875 KByte
875 KByte
Frame 1
Frame 2
I
B
I
B
175KByte
Betacam SX
Ratio: 10:1
this is not a rehearsal
Compression Ratio
875 KByte
875 KByte
Frame 1
Frame 2
I
I
I
I
300KByte
I frame only MPEG
Ratio: ~6:1
(equiv. 30 Mbps)
this is not a rehearsal
Compression Ratio
875 KByte
875 KByte
Frame 1
Frame 2
DV
DV
DV
DV
250KByte
DV
Ratio: ‘5:1’ (quoted)
‘full picture’ ratio 6.6:1
this is not a rehearsal
Compression Ratio
Frame 1
I
Frame 2
Frame 3
P
P
I
PPP
Data rate 4-8Mbps typically
MPEG Transmission
Ratio: ~20:1
this is not a rehearsal
Frame 4
P
Quality Comparison
• Cannot use analogue measurement techniques
– all systems produce ‘perfect’ results
• Data rate or Compression ratio is not a good measure
of quality
• Different systems have different efficiencies
• Must measure system based on visual impairment
this is not a rehearsal
Textronix PQA
• Picture Quality Assessor
– Compares reference sequence with recorder output
– weights data on the basis of how we see
– produces a PQR (Picture Quality Rating)
•
•
•
•
this is not a rehearsal
0:
0-5,
5-10
10+
Perfect
Good
Fair
Poor
- No measurable error
- No noticeable error
- Discernable errors, but ‘Broadcast Quality’
- Not ‘Broadcast Quality’
MPEG System Layer
'Presentation Units'
Fr
'Access Units'
B1
'Elementary Stream'
B1
PES
Packetised Elementary Stream
Fr
1
I2
I2
Fr
3
B3
B3
HD
Fr
2
Data
4
I4
I4
HD
Data
Variable Length <64k (Except Video)
this is not a rehearsal
Stream Conversion
•
•
•
•
SX native is output on SDTI
Stream convertor turns this into TS
This is data re-ordering and there is no quality loss
TS can be interfaced to ATM
– again this is loss-less re-odering
SX is MPEG
&
Stream Conversion is a loss-less
Process
this is not a rehearsal
Transcoding
• Transcoding is used by different people to mean
different processes
• We use it to mean:
The conversion of one MPEG bit-stream into another
without decoding back to baseband (SDI)
this is not a rehearsal
Transcoding
• Transcoding is a key technology
• Many companies are working on transcoding
– Atlantic project
• Sony have a chip set and can demonstrate its function
• Transcoding has the advantage of:
– minimising the loss of quality during the process
– allowing coder decisions to be reused in a subsequent
process
– improving signal quality
• Principally transcoding keeps the signal in the MPEG
domain
this is not a rehearsal
Transcoding
Bit-rate / GOP Change
Stream In
E.g.,
MPEG2 @
8 - 10 Mbps
Transcode
Decoder
Video Data
Coding Decisions
(Vectors, Q-Factor, etc.)
Metadata
this is not a rehearsal
Transcode
Encoder
Stream Out
E.g.,
MPEG2 @
4 Mbps
Transcoding
MPEG
Digi.. Beta
Digi
Editing
I or IB
4:2:2
Server
Transcoding
Long/Short GOP
4:2:2
Archive
(DTF)
Long/Short GOP
4:2:2
this is not a rehearsal
Transmission
Transcoding
DV
Production
Transcoding
Analog
Contribution
/Distribution
MPEG ENCODE
Acquisition
Long GOP
4:2:0
Transcoding
35
34
MPEG
Transcoding
Quality
33
32
31
30
29
28
1
2
3
4
5
Generation
this is not a rehearsal
6
7
Digital-S/MPEG
Hybrid(GOP shift)
MPEG ES
Video Sequence
Header
GOP
Header
Picture
Header
Picture
Data
Picture
Header
Picture
Data
Slice 1 Slice 2 Slice ..
MacroBlock1 MacroBlock2 MacroBlock..
DCTData1 DCTData2
this is not a rehearsal
DCTData..
Sequence
End code
MPEG Programme Stream
Video
Audio
Data
Clock
Packetised Elemental Streams
M
U
X
Programme Stream
Pack
PH
SH
PH
SH
Multiplexed PES Packets
System Header (Optional) Contains:
eg.*Max Data Rate, No of Streams Timing
*can be used by RX to determine whether to attempt to decode stream
Pack Header Contains: System Clock Ref
this is not a rehearsal
PES Header
PES-packet
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
PES-packet start code prefix
x x x x x x x x
Stream_id
x x x x x x x x
x x x x x x x x
PES-packet length
1 0 x x x x x x
P D x x x x x x
x x x x x x x x
Flags 1
Flags 2
PES header data length
Presentation Time Stamp (if present)
Decoding Time Stamp (if present)
Further optional fields
this is not a rehearsal
MPEG Transport Stream
PES Packets
HD
HD
Data
P
P
P
Payload
Alignment
P
P
Header
P
P
Adaption Field
AF/Payload
HD
188 Bytes
Sync 8
TEI 1
PUSI 1
TP
1
PID 13
TSC 2
AFC 2
CC 4
47H/01000111
Transport Error Indicator
Payload Unit Start Indicator
Transport Priority
Packet ID (which prgramme in stream) 17 have special meaning, therefore 8175 left
Transport Scramble Control
Adaption Field Control
Continuity Counter (increments every 188 bytes)
Total 32 bits
this is not a rehearsal
MPEG Transport Stream
Generation
Video 1
188
Video 2
Audio 1
Transport Stream
188 bytes
Audio 2
V
Service Info
Private Data
Null Packs
this is not a rehearsal
A
V
V
SI
PD Null
V
PMT etc.
Programme Map Table
Transport Packet containing PMT
Programme Map Table for Prog. 3
PID for Clock Reference
PID for Video
PID for Audio
PID for Subtitle Data
etc
this is not a rehearsal
46
512
76
5
Programme Control
PID = 0
Programme
Number
0
1
17
23
312
etc
PID value for
Prog. Map Table
Network
Information Table
12
43
325
871
6
etc
PMT for
Programme 1
PMT for
Programme 2
this is not a rehearsal
MPEG Clock System
System
Clock
>Once per 0.7 S
System Clock
Sync
DEC Clock
27MHz 42 Bit
Release
Time Stamp
DEC Buffer
90kHz 33Bit
Encoder
Decoder
this is not a rehearsal
Time Stamps
Buffer
Decoder
Monitor
MPEG Stream
DTS
this is not a rehearsal
PTS
MPEG Transmission
Sequence
Presentation
Stream
F1
F2
F3
F4
F5
Access Stream
I1
P2
B3
P4
I5
Transmission
Stream
this is not a rehearsal
I1
P2
P4 B3
I5
MPEG-4
• Originally intended for very low data rate for:
– Video conferencing
– Portable video phones
•
•
•
•
Low data limit raised (now 5Kbps to 10Mbps)
Wide range of resolutions supported
Progressive and interlace supported
Now intended for complex multimedia type
applications
this is not a rehearsal
MPEG-4
• Audiovisual scene made up of media objects:
– Objects can be:
• Natural or Synthetic
• Audio or Video
– Objects are composited into the scene
– Objects can interact with objects at the receiver’s end
this is not a rehearsal
Audio Coding
• Profiles:
–
–
–
–
–
–
–
–
Speech
Synthesis
Scalable
Main
High quality Audio
Low delay audio
Natural audio
Mobile audio internetworking
this is not a rehearsal
Natural Audio Coding
2
4
6
8
10
12
14
16
24
32
48
64
Scalable Coder
TTS
Speech coding
General audio Coding
4KHz
8KHz
Typical Audio Bandwidth
this is not a rehearsal
20KHz
Audio Coding
• Synthetic Methods
– TTS (Text to speech)
• 200bps to 1200bps
• text plus prosodic parameters
• interface standard not normative sythesizer
– Score driven
• SAOL (Structured audio orchestra language)
• Synthesis can be by many processes
– wavetable, FM,additive and others
• control is via “score”
• MIDI can be used
this is not a rehearsal
Video Coding
• Profiles
– for natural objects
•
•
•
•
•
Simple
Simple scalable
Core
Main
N-bit
– for synthetic objects
•
•
•
•
this is not a rehearsal
Simple facial animation
Scalable textural
Basic Animated 2D Texture
Hybrid visual
Video Coding
• Synthetic objects
– parametric coding of
• faces
• bodies
• static and dynamic meshes
• Face Animation
–
–
–
–
Facial description parameters
Facial animation parameters
Face animation table
Face interpolation technique
this is not a rehearsal
Video coding
• 2D animated meshes
– region of polygons
– could have texture map
Example of a mesh from the MPEG-4 Standard
this is not a rehearsal
Sprite Coding
• MPEG-4 supports moving sprite over static
background
Example of sprite coding from the MPEG-4 Standard
this is not a rehearsal
MPEG Products
• Broadcast
– VTRs - Betacam SX, IMX
– Servers - MAV-70, MAV-555
• Contribution
– EBU Standard
• DVD
– players and authoring systems
• Transmission
– Satellite and terrestrial
this is not a rehearsal
MPEG Open World
• Pro-MPEG Forum
– Consortium of many companies working toward greater
Interoperabilty
www.pro-mpeg.org
this is not a rehearsal
Thank You
this is not a rehearsal
Download