Powerpoint file on video compression.

advertisement

VIDEO COMPRESSION

FUNDAMENTALS

Pamela C. Cosman

1

Compressing Digital Video

Exploit spatial redundancy within frames (like JPEG: transforming, quantizing, variable length coding)

Exploit temporal redundancy between frames

Only the sun has changed position between these 2 frames

Previous Frame

Current Frame

2

Simplest Temporal Coding - DPCM

 Frame 0 (still image)

Difference frame 1 = Frame 1

– Frame 0

Difference frame 2 = Frame 2

– Frame 1

If no movement in the scene, all difference frames are 0.

Can be greatly compressed!

If movement, can see it in the difference images

0 1 2 3

3

Difference Frames

 Differences between two frames can be caused by

Camera motion: the outlines of background or stationary objects can be seen in the Diff Image

Object motion: the outlines of moving objects can be seen in the Diff Image

Illumination changes (sun rising, headlights, etc.)

Scene Cuts: Lots of stuff in the Diff Image

Noise

4

Difference Frames

 If the only difference between two frames is noise (nothing moved), then you won’t recognize anything in the Difference Image

 But, if you can see something in the Diff

Image and recognize it, there’s still correlation in the difference image

 Goal: remove the correlation by compensating for the motion

5

6

7

8

Types of Motion

Frame n

 Translation: simple movement of typically rigid objects

 Camera pans vs. movement of objects

Frame n Frame n+1

Frame n+1

(Rotation)

Frame n+2

(Zoom)

Rotation: spinning about an axis

Camera versus object rotation

Zooms –in/out

Camera zoom vs. object zoom (movement in/out)

9

Describing Motion

Translational

Move (object) from (x,y) to (x+dx,y+dy)

Rotational

Rotate (object) by (r rads) (counter/clockwise)

Zoom

Move (in/out) from (object) to increase its size by

(t times)

Which is easiest? Which are we most likely to encounter?

10

Motion Estimation

 Determining parameters for the motion descriptions

For some portion of the frame, estimate its movement between 2 frames- the current frame and the reference frame

What is some portion?

Individual pixels (all of them)?

Lines/edges (have to find them first)

Objects (must define them)

Uniform regions (just chop up the frame)

11

General Idea

For a region P

C region P

R in the current frame, find a in the search window in reference frame so that Error(P

R

,P

C

) is minimized

Issues: Error measures, search techniques, choice of search window, choice of reference frame, choice of region P

C

Current

Search window

Reference

Frame

Portion of interest

Frame

P

C

12

Block-based Motion Estimation

P

C is a block of pixels (in the current frame)

The search window is a rectangular segment

(in the reference frame)

T=1 (reference) T=2 (current)

13

Motion Vectors

 A motion vector (MV) describes the offset between the location of the block being coded (in the current frame) and the location of the best-match block in the reference frame

T=1 (reference) T=2 (current)

14

Motion Compensation

The blocks being predicted are on a grid

1

9

5

6

2

14

13

10

3

7

15

11

4

8

12

1

5

9

13

2

6

10

14

3

7

11

8

12

15 16

16

The blocks used for prediction are NOT

4

15

Motion Vector Search

 1. Mean squared error

Select a block in the reference frame to minimize

Σ(b(B ref

)-b(B curr

)) 2

2. Mean abs. error

Select block to minimize

Σ|b(B ref

)-b(B curr

)|

 Given error measure, how to efficiently determine best-match block in search window?

Full search: best results, most computation

Logarithmic search – heuristic, faster

Hierarchical motion estimation

16

Motion Vector Search

 Full search: Evaluate every position in the search window

Logarithmic Search: First examine positions marked 1.

Choose best of these (lowest error measure) and examine positions marked 2 surrounding it

Choose the best of these, and examine the positions marked 3

Final result = best of these

17

Hierarchical Motion Estimation

 Use an averaging filter on the image, then downsample by a factor of 2

 Conduct a search on the downsampled image (only ¼ of the size)

 Given the results of the search on the downsampled image, return to the full resolution image and refine the search there

18

Motion Compensation

 The standards do not specify HOW the encoder will find the motion vectors (MVs)

 The encoder can use exhaustive/fast search,

MSE /MAE/other error metric, etc.

 The standard DOES specify

The allowable syntax for specifying the MVs

What the decoder will do with them

 What the decoder does is to grab the indicated block from reference frame, and glue it in place

19

Standard specifies bit stream

 The video compression standards define syntax and semantics for the bit stream between encoder and decoder

 This allows future encoders of better performance to remain compatible with existing decoders.

ENCODER bit stream

DECODER not this

Standard defines not this this

Encoder is not specified by

MPEG except that it produces a compliant bit stream

Compliant decoder must interpret all legal MPEG bit streams

 Also allows for commercially secret encoders to be compatible with standard decoders

Today’s Ho-Hum

Encoder

Tomorrow’s Nifty

Encoder

Very secret

Encoder

Today’s

Decoder

Today’s decoder still works!

20

Motion Compensation Example

Frame n-1

(0,0)

(0,0)

(-16,0)

(16,7)

(5,0)

(5,2)

(0,0)

(0,0)

(20,-24) (0,0) (-20,-18) (0,0)

Frame n

MOTION COMPENSATED

Frame n

21

Objects versus Macroblocks

 Real moving objects will not coincide with boundaries of macroblocks background moving object

Prediction error

Moving object well encoded with motion vector

Background well encoded (no motion vector)

Prediction error

If encoder sends MV=(MotX,MotY), object well coded, but background poorly coded

If encoder sends MV=(0,0), background well coded, but moving object poorly coded

Either approach is valid

22

Motion Compensation

 This glued together frame is called

 the motion compensated frame

The encoder can also form the difference between the motion compensated frame and the actual frame.

 This is called the motion compensated difference frame

 This difference frame formed using MC should have less correlation between pixels than the difference frame formed without using MC

23

Motion Compensated Difference

Frames

Suppose we are doing lossless coding

Encoder has sequence of frames: …, F(n-2), F(n-1)

 Next: encode F(n)

 Past frames have been losslessly encoded, so the decoder knows F(n-1) perfectly already

 Encoder sends the motion vectors for frame F(n) relative to frame F(n-1), to form motion compensated frame M(n)

Encoder knows M(n), Decoder knows M(n)

24

Motion Compensation Example

F(n-1)

(0,0)

(0,0)

(-16,0)

(16,7)

(5,0)

(5,2)

(0,0)

(0,0)

(20,-24) (0,0) (-20,-18) (0,0)

F(n)

M(n)

MOTION COMPENSATED

Frame

25

Encoding Difference Frames

 Encoder forms motion compensated diff frame:

MCD(n) = F(n) – M(n)

 Encoder losslessly encodes MCD(n)

 Decoder can then do

F(n) = MCD(n) + M(n)

→ knows F(n) exactly

With no motion compensation encoder could do frame diff:

FD(n) = F(n) – F(n-1)

Encoder losslessly encodes FD(n)

Decoder can then do

F(n) = FD(n) + F(n-1)

→ knows F(n) exactly

 If successive frames are very similar:

 fewer bits to send Motion Vectors + MCD(n) instead of FD(n)

 fewer bits to send FD(n) instead of F(n)

26

Motion compensated difference frames

 Decoder knows F(n-1) and, once you send the motion vectors, it knows M(n)

Send FD(n)

Reference Frame

F(n-1)

Original Frame

F(n)

Difference Image

FD(n)=F(n)-F(n-1)

Send Motion

Vectors

Motion compensated frame M(n)

Send MCD(n)

Motion compensated difference image

MCD(n) =F(n) – M(n)

27

Motion Compensated Difference

Frames

But we are NOT doing lossless coding

Encoder has sequence of frames: …, F(n-2), F(n-1)

 Next: encode F(n)

Past frames have been lossy encoded, so the decoder has versions …, G(n-2), G(n-1)

Encoder knows …, G(n-2), G(n-1) also

 Encoder sends the motion vectors for frame F(n) relative to frame G(n-1), to form motion compensated frame M(n)

28

Encoding Difference Frames

 Encoder forms motion compensated difference frame: MCD(n) = F(n) – M(n)

 Encoder lossy encodes MCD(n)

 Call the decoder version MCD*(n)

 If the decoder received MCD(n) exactly, could do: F(n) = MCD(n) + M(n)

 But with MCD*(n), decoder can do

G(n) = MCD*(n) + M(n)

→ knows F(n) approximately

29

Motion estimation philosophy

Goal of motion estimation is NOT to provide a careful analysis of the actual motion

Goal is to achieve a given quality of representation of the video while globally minimizing the bit rate required to send

The motion information

The prediction error information

Most of the time, for a given representation quality

 fewer bits to send MV+MCD(n) instead of sending FD(n)

 fewer bits to send FD(n) instead of sending F(n) itself.

30

Motion Compensation for Chrominance

Luminance is highly correlated, more so than chrominance

The “best” motion vectors are available by searching in the luminance plane

 Motion vectors for chrominance are not computed separately, simply scaled as needed

31

Motion Estimation/Compensation

Summary

 At the encoder:

For each block in the frame being coded, examine the search window(s) in the reference frame to find the best match block (do this for luminance only)

Form the MC difference image = original image minus motion compensated image

Scale the motion vectors for the chrominance, form the motion compensated chrominance frames, and form chrominance difference image

32

Motion Estimation/Compensation

Summary

 At the decoder:

Decode the reference frames (Y,Cr,Cb)

For each block in a temporally coded Y frame, use the motion vector to select a block from the reference frame and glue it in place

Add the Y difference image

For each block in temporally coded Cr,Cb frames, first scale the motion vector, then do the previous

2 steps with Cr and Cb data

33

Progress of Video Compression

34

Progress of Video Compression

35

Progress of Video Compression

36

Temporal Location of Reference

 The reference frame need not occur before the temporally coded frames which use it

 Why? Scene changes, allow better matches

37

Flavors of Motion Estimation

1. Forward predicted blocks: the best-match block occurs in the reference frame before the block’s frame

2. Backward predicted blocks: the best-match block occurs in the reference frame after the block’s frame

3. Interpolatively predicted blocks: the best-match block is the average of the best-match blocks from reference frames before & after

The motion compensation direction can be selected independently for each block in a frame.

38

MPEG Frame Types

 Intra (I) pictures: coded by themselves, as still images. No temporal coding. No motion vectors.

39

MPEG Frame Types

 Forward Motion Compensated predicted (P) pictures – forward motion compensated from the previous I or P frame

40

MPEG Frame Types

 Motion Compensated interpolated (B) pictures – forward, backward, and interpolatively motion compensated from previous/next I/P frames

41

Motion Vector Coding

 How are the motion vectors actually encoded for transmission to the decoder?

Start by taking the difference between the current motion vector and the most recent previous one of the same type (forward/backward/interpolative)

Encode the difference using variable length coding

Horizontal and vertical components coded separately

42

MPEG Frame Structure Terminology

 A block contains 8x8 pixels

The DCT unit

 A macroblock (MB) contains 4 blocks from the luminance, plus the corresponding chrominance blocks

4 blocks from each of Cr/Cb if 4:4:4 format

2 blocks from each of Cr/Cb if 4:2:2 format

1 block from each of Cr/Cb if 4:1:1 or 4:2:0 format

The motion compensation unit

43

MPEG Frame Structure Terminology

 A slice is a collection of macroblocks, tracing in a raster scan from upper left to lower right

The resynchronization unit

 A picture is a frame, either progressive (noninterlaced) or interlaced

The primary coding unit

A Group of Pictures (GOP) contains ≥ 1 frame.

The unit for random access into the sequence

44

MPEG GOP Structure

 A Group of Pictures (GOP) may contain

All I pictures

I & P pictures only

I, P, & B Pictures

 A common GOP format for 30 frames/sec:

I-picture spacing 15 frames (1/2 second)

P-picture spacing 3 frames (1/10 second)

I B B P B B P B B P B B P B B I

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

45

Frame Ordering

Display order (encoder input order):

B B I B B P B B P B B P B B P B B I

-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

But consider coding dependencies:

 Frame 2 (B) needs frame 4 (P) to be decoded first, etc.

 So better transmit frame 4 before frame 2

I B B P B B P B B P B B P B B I B B

1 -1 0 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15

46

Types of Coding Modes

 What if the best-match block in the reference frame is a great match ?

Then the motion vector is all you need to send

 What if it is a terrible match ?

Then don’t use the motion vector at all, just code the block by itself, with something like JPEG

(called intra mode coding)

 What if it is a so-so match ?

Then you can send the MV, and also send the frame difference information for that macroblock

47

Coding Mode I (Inter-Coding)

 Inter coding refers to coding with motion vectors

Macro

Block

Previous

Frame

Motion Vector

Current

Frame

48

Coding Mode II (Intra-Coding)

INTRA coding refers to coding without motion vectors

The MB is coded all by itself, in a manner similar to JPEG

Macro

Block

Previous

Frame

Current

Frame

49

I-Picture Coding

 Two possible coding modes for macroblocks in I-frames

Intra - code the 4 blocks with the current quantization parameters

Intra with modified quantization : scale the quantization matrix before coding this MB

 All macroblocks in intra pictures are coded

 Quantized DC coefficients are losslessly

DPCM coded, then Huffman as in JPEG

50

I-Picture Coding

 Use of macroblocks modifies block-scan order:

8 8

8

8

Quantized coefficients are zig-zag scanned and runlength/Huffman coded as in JPEG

Very similar to JPEG except (1) scaling Q matrix separately for each MB, and (2) order of blocks

51

P-Picture Coding: many coding modes

Motion compensated coding: Motion Vector only

Motion compensated coding: MV plus difference macroblock

Motion

Vector

DCT

Motion compensation: MV

& difference MB with modified quant. scaling

MV = (0,0), just send difference block

MV=(0,0), just send diff block, with modified quantization scaling

Intra: the MB is coded with DCTs (no difference is computed)

Intra with modified quantization scaling

52

How to choose a coding mode?

 MPEG does not specify how to choose mode

Full search = try everything…the different possibilities will lead to different rate/distortion outcomes for that macroblock distortion •

MV, no difference

MV, plus difference

Intra • rate

53

How to choose a coding mode?

Tree search: use a decision tree

For example:

First find the best-match block in the search window. If it’s a very good match, then use motion compensation.

Otherwise, don’t.

If you decided to use motion compensation, then need to decide whether or not to send the difference block as well. Make decision based on how good a match it is.

If you decided to send the difference block, then have to decide whether or not to scale the quantization parameter… check the current rate usage…

54

B-Picture Coding

 B pictures have even more possible modes:

Forward prediction MV, no difference block

Forward prediction MV, plus difference block

Backward prediction MV, no difference block

Backward prediction MV, plus difference block

Interpolative prediction MV, no difference block

Interpolative prediction MV, plus difference block

Intra coding

Some of above with modified Quant parameter

55

Group of Pictures

IIIII…: Every picture is intra-coded.

Fully decodable without reference to any other picture

Editing is straightforward

Requires about 2.5 more bit rate than bidirectional

IBBPBBPB…: Forward and bidirectional

Best compression factor

Needs large decoder memory

Hard to edit

Most useful for final delivery of post-produced material

(e.g., broadcast) because no editing requirement

56

Group of Pictures

IPPPPIPP…: Forward predicted only.

Needs less decoder memory

IBIBIB…: bidirectional compromise

Some of the bit rate advantage of bidirectional coding

Not nearly the full latency penalty of bidirectional

Editable with moderate processing.

For example, if the video after a B picture needs to be deleted, the B frame would not be decodable.

Solution is to decode the B frame first, re-encode it using forward prediction only. Some quality loss.

57

Download