Lecture 6 ppt

advertisement
MULTIMEDIA SIGNAL PROCESSING
ALGORITHMS
PART II – MINIMIZATION OF THE AMOUNT OF
INFORMATION TO BE PROCESSED AND
BASIC ALGORITHMS
The second principle of biological processing
SEEMS TO BE :
MINIMIZATION OF THE AMOUNT OF
INFORMATION TO BE PROCESSED
THAT IS THE PROCESSING SYSTEM ELIMINATES
AS MUCH INFORMATION AS POSSIBLE AND USES
ONLY ABSOLUTELY NECESSARY MINIMUM TO
ACHIEVE ITS TASKS
Why this principle is reasonable? Minimizing information
to be processed saves energy, increases speed, reduces
effort and is overall logical to do. This is not limited to
biology but also applies to technical systems.
IN PREVIOUS LECTURES THIS PRINCIPLE
WAS EVIDENT SEVERAL TIMES:
WE ARE ABLE TO RECOGNIZE OBJECTS
BASED ON VERY MINIMAL INFORMATION
THIS MEANS PROCESSING SYSTEM IS
ABLE TO REDUCE INFORMATION TO
MINIMUM OR IN OTHER WORDS TO
EXTRACT THE NECESSARY MINIMUM
SO WE CAN HAVE THE MAIN PRINICPLE FOR THIS
COURSE : FOR EFECTIVE MULTIMEDIA SIGNAL
PROCESSING ONE HAS TO MINIMIZE THE AMOUNT
OF INFORMATION PROCESSED, EXTRACT
THE ABSOLUTELY NECESSARY MINIMUM
FOR THE PROCESSING TASK. HOW TO DO THIS IS
NOT ALWAYS CLEAR AND EASY, WE NEED TO
STUDY THIS.
The second principle, as indicated before can be statistical
processing, producing results matched to the most likely
signals happening in the real world. But this principle also
has to be applied correctly.
NOW LET US GO TO TECHNOLOGY
ASSUME WE HAVE COMPUTER SYSTEM:
ASSUME WE HAVE COMPUTER WITH CAMERA
AND DIGITIZER CARD AND WE WOULD LIKE
TO EXTRACT VISUAL INFORMATION ABOUT
ENVIRONMENT LIKE OUR EYES DO (OR WE HAVE
MICROPHONES AND WE WOULD LIKE TO EXTRACT
ACOUSTICAL INFORMATION LIKE OUR EARS DO)
HOW WE SHOULD PROGRAM THE COMPUTER?
Let’s think about typical example which is already
becoming popular in cameras:
We would like to implement algorithms which will mark
faces in pictures, recognize familiar faces. This may of
course extended to other objects and complete scenes, for
example camera would recognize if the picture is taken of
familiar building or landscape. The problem is not easy
since objects can be seen from different viewpoints,
lighting, time.
But the input to algorithm which we have is digitized picture
• WHAT IS THE PICTURE AFTER DIGITIZATION?
IT IS A MATRIX OF NUMBERS. THE
MATRIX SIZE CAN BE EG. 256X256 OR
720x576 – TELEVISION PICTURE
1024X768 - COMPUTER MONITOR
1920x1080- HIGH DEFINITION
TELEVISION
PICTURE MATRIX ELEMENTS ARE
USUALLY 8-BIT NUMBERS, THIS
CORRESPONDS TO 256 LEVELS OF LIGHT
WHICH IS ENOUGH.
COLOR PICTURES ARE DESCRIBED BY
THREE SUCH MATRICES FOR EACH
BASIC COLOR
HERE IS A PICTURE FROM MARS LANDER
AND PART OF THE MATRIX NEAR
THE OBJECT
WHAT WILL HAPPEN WHEN THE
PICTURE RESOLUTION IS TOO
SMALL?
RESOLUTION WILL BE IMPAIRED
LESS DETAILS VISIBLE
HERE WE SEE WHAT WILL HAPPEN
WHEN RESOLUTION IS
REDUCED FROM 512X512
TO 32X32
WHAT IS THE SIZE OF
ONE TV PICTURE IN BITS?
720x576x3x8-bit = about 10 Mbits
• TOPIC: COLOR PROCESSING
IMAGES ARE REGISTERED
IN THREE BASIC COLOR
COMPONENTS:
RGB=RED, GREEN, BLUE
MIXTURE OF THESE COLORS
PROVIDES OTHER COLORS
WE HAVE TO USE THREE
IMAGE MATRICES TO
REPRESENT ONE COLOR
PICTURE
RGB REPESENTATION IS
USED FOR DISPLAY, E.G.
COMPUTER MONITORS OR
TELEVISION PANELS
ARE DRIVEN BY R,G,B
SIGNALS
• COLOR IMAGE AND RGB COMPONENTS
• WE OFTEN PERFORM CONVERSION TO MORE
SUITABLE COLOR SPACE
TWO SUCH SPACES ARE VERY USEFUL:
YUV SPACE AND HSV SPACE
YUV SPACE :
Y – INTENSITY OF (WHITE) LIGHT
U, V – COLOR CHROMINANCES
TO OBTAIN YUV REPRESENTATION
WE TAKE THE R,G,B COLOR MATRICES
FOR A PICTURE AND CONVERT THEM BY ->
• RGB->YUV TRANSFORMATION
Y   0.299 0.587 0.114
U    0.148  0.289 0.437
  
V   0.615  0.515  0.100
R 
 G 
 
  B 
NOTE: Y IS BLACK AND WHITE COMPONENT, THAT IS MIXTURE OF
R, G, B WHICH GIVES GRADATIONS OF WHITE COLOR, FROM BLACK TO
GREY TO WHITE.
U AND V ARE COLOR COMPONENTS – DO NOT HAVE PHYSICAL MEANING
THUS HERE INTENSITY OF LIGHT IS SEPARATED FROM COLOR INFORMATIO
• AFTER THIS TRANSFORMATION
INSTEAD OF THREE R,G,B MATRICES
WE GET THREE MATRICES Y, U, V
TRANSFORMATION IS INVERTIBLE SO ALL
INFORMATION IS PRESERVED
BUT NOW WE CAN PLAY A TRICK:
HUMAN VISUAL PROCESSING IS MUCH LESS
SENSITIVE TO COLOR INFORMATION THAN TO
BLACK AND WHITE LIGHT INTENSITY
INFORMATION
THUS, MATRICES U,V CAN BE REDUCED IN
SIZE
• SUBSAMPLING OF MATRICES U AND V
FOR 4 ELEMENTS OF Y THERE WILL BE
TAKEN ONLY ONE ELEMENT OF U,V
Y1 Y2
Y3 Y4
U U
U U
VV
VV
ELEMENTS U AND V CAN BE E.G.
AVERAGE VALUE OF ORIGINAL
4 ELEMENTS U AND V
THUS MATRICES U,V CAN BE REDUCED
BY FACTOR OF 4 IN SIZE
RETURNING BACK TO RGB FORM WILL
NOT CHANGE THE PICTURE VISUALLY
• THE RGB->YUV TRANSFORMATION
USES DIRECTLY PROPERTY OF HUMAN
VISION WHICH ALLOWS:
- TO REDUCE THE SIZE OF COLOR IMAGES
(IMPORTANT FOR COMPRESSION)
- TO USE ONLY LIGHT INTENSITY WITHOUT COLOR
INFORMATION (FOR E.G. RECOGNITION OF
OBJECTS)
• ANOTHER TRANSFORMATION IS HSI
HSI IS MORE RELATED TO HUMAN PERCEPTION WHERE WE
CAN SEE SATURATION OF COLORS THAT IS WE CAN TELL
”REDNESS”, ’BLUENESS’ OF COLORS AND SO ON.
TO GET THE HSI REPRESENTATION WE MAP RGB INTO
H – HUE (COLOR)
S – SATURATION (AMOUNT OF WHITE MIXED WITH COLOR)
I - INTENSITY (AMOUNT OF GREY LEVEL
EQUATIONS FOR HSI FROM RGB AND VICE VERSA:
BASIC ASPECTS OF THE HSI REPRESENTATION:
ON A CUBE THERE ARE SOME
OTHER ’BASIC’ COLORS
APART OF RGB, MAIN
DIAGONAL IS THE AMOUNT
OF WHITE
ON THE DIAMOND WE SEE
COLORS AROUND HEXAGON
HEIGHT IS AMOUNT OF
WHITE, SATURATION IS X-AXIS
LOOK WHERE IS THE
I (V) AXIS, S AXIS AND
HUE ANGLE
• HSI TRANSFORMATION IS USEFUL SINCE WE GET
REPRESENTATION IN COLOR SPACE WHICH
CORRESPONDS TO THE PROPERTY OF HUMAN
VISION, THAT IS INTENSITY LEVEL CAN BE
ESTIMATED. COLOR SATURATION, AND THE
COLOR ITSELF.
DIGRESSION ON COLOR SENSORS
ASSUME YOU BUY DIGITAL CAMERA WITH E.G.
5 MEGAPIXELS.
WHAT DOES THIS MEAN?
IT TURNS OUT THAT THE PIXEL DEFINTION IS
DIFFERENT FOR DIFFERENT APPLICATIONS.
TRADITIONALLY
1 PIXEL = R, G, B COLOR COMBINATIONS
SO WE NEED 3 COLOR SENSORS FOR
CAMERA OR
3 COLOR ELEMENTS FOR DISPLAY
FOR EXAMPLE:
LCD COMPUTER MONITOR WITH RESOLUTION OF
1280X1024 PIXELS
HAS 1280X1024 ELEMENTS FOR EACH R, G, B COLOR,
THAT IS IT HAS 1280X1024X3 DISPLAY ELEMENTS.
THE DISPLAY ELEMENTS ARE CALLED
SUBPIXELS, ONE PIXEL IS COMPOSED OF THREE
SUBPIXELS R G B
IN DIGITAL CAMERAS THIS IS DIFFERENT
SENSOR IN DIGITAL CAMERAS LOOKS LIKE
THIS:
IN DIGITAL CAMERAS EVERY
COLOR SUBPIXEL COUNTS AS
”PIXEL”
THE PIXELS ARE ARRANGED IN
A MATRIX CALLED BAYER SENSOR
EACH ”CAMERA” PIXEL IS MADE
BY 4 COLOR PIXELS: 1 RED,
2 GREEN, 1 BLUE
PIXEL 1 (REMEMBER THAT MOST OF
PIXEL 2
VISIBLE LIGHT IS GREEN)
WE CAN NOTICE THAT ”FULL” COLOR PIXEL CAN
BE MADE FROM OVERLAPPING SQUARES BY HALF
SHIFT
SO THE E.G. 5 MILION PIXELS IN DIGITAL
CAMERA IS NOT EXACTLY 5 MILION
IN THE DISPLAY SENSE.
IT SHOULD BE DIVIDED BY 4 OR BY TWO IF
WE TAKE INTO ACCOUNT INTERPOLATION
BUT THERE ARE TWO
EXCEPTIONS:
THERE ARE VIDEO
CAMERAS WHICH
HAVE 3 CCD SENSORS
SEPARATELY ONE
FOR EACH R,G,B COLORS
IN 3 CCD VIDEO CAMERAS OPTICALSYSTEM SPLITS
LIGHT INTO 3 SENSORS WHICH PICKUP R,G,B COLORS.
TOTAL NUMBER OF PIXELS CORRESPONDS TO THE
NUMBER OF PIXELS IN DISPLAY
ANOTHER EXCEPTION IS FOVEON SENSOR
IN FOVEON, THERE IS ONE SENSOR
BUT IT MEASURES ALL 3 RGB COLORS
IN ONE AREA THIS IS BASED ON THE
FACT THAT PHOTONS GO TO
DIFFERENT DEPTHS IN THE
SEMICONDUCTOR DEPENDING ON
THEIR WAVELENGHTS www.foveon.com
COMPARISON:
WE CAN SEE THAT SINGLE SENSOR DEVICES
HAVE LOWER RESOLUTION THAN 3 SENSOR
DEVICES OR FOVEON.
BUT THEY ARE EASIEST TO PRODUCE
SO THE NUMBER OF THEIR COLOR PIXELS IS
INCREASING ALL THE TIME AND RESOLUTION
PROBLEM IS SOLVED.....
• The elimination of information based on color
is an example of much more general principle:
Elimination of
information
Input signal
Output signal,
representation
of the input signal
which is
”just good enough”
for specific task
How to produce the ”good enough” representation is the essential problem
to solve
Next we will show example of representation by edges
•
EDGE DETECTION LINEAR FILTERING: AREA
AROUND EVERY POINT IN THE IMAGE MATRIX
IS MULTIPLIED
z l m
u x v
n p q
BY VALUES FROM
ANOTHER MATRIX AND
RESULT IS SUMMED UP
• DEPENDING ON THE MATRIX BY WHICH
WE MULTPILY WE HAVE SEVERAL TYPES
OF FILTERS:
LOW PASS – SUM OF FILTER COEFFICIENTS
IS ONE
BANDPASS – SUM OF FILTER COEFFICIENTS
IS ZERO
HIGPASS - SUM IS BETWEEN ZERO AND
ONE
• WE SAID THAT IN HUMAN VISUAL SYSTEM
IN THE RETINA PROCESSING ELEMENTS
ARE SENSITIVE FOR CHANGES IN LIGHT
LEVEL.
THIS IS EQUIVALENT TO BANDPASS
FILTERING
SPECIAL CLASS OF BANDPASS FILTERS
IS CALLED EDGE DETECTORS SINCE THEY
ARE DESIGNED TO DETECT SHARP CHANGES IN
IMAGE LIGHT INTENSITY
• LET US CONSIDER THE FOLLOWING
SITUATION – WHITE BAR ON BLACK
BACKGROUND OR OPPOSITE
OUR VISUAL SYSTEM AND
WE HERE ARE INTERESTED
MOSTLY IN AREAS
WHERE LIGHT IS CHANGING
IT VALUE, SHARP CHANGES
IN LIGHT VALUE ARE CALLED
EDGES
HOWEVER, THERE IS A PROBLEM
HERE: WHAT EXACTLY IS SHARP
CHANGE IN INTENSITY?
THIS IS NOT WELL DEFINED
ON THE RIGHT WE SEE SOME
EXAMPLES OF LIGHT CHANGE:
RAMP EDGE – LIGHT INCREASING
GRADUALLY
STEP EDGE – SHARP TRANSITION
NARROW LINE
ROOF EDGE
THERE COULD BE MANY MORE
SUCH EXAMPLES!
• EDGE DETECTION IS EQUIVALENT
TO DIFFERENTIATION IN
CONTINUOUS FUNCTION DOMAIN
F ( x, y )
0
x
if
F(x,y)=const
BUT IN IMAGES WE HAVE LIMITED NUMBER
OF PIXELS SO WE CAN PERFORM ONLY
APPROXIMATE DIFFERENCING
• EDGE DETECTORS
HERE WE HAVE TWO MATRICES
OF FILTERS FOR DIFFERENCING
NOTE THAT THE FIRST ONE WILL
PROVIDE ZERO OUTPUT
WHEN THERE ARE CONSTANT
VALUES IN VERTICAL DIRECTION
AND SECONDE WHEN THERE
ARE IN HORIZONTAL
DIRECTION
• NOW LET’S TAKE THE OUTPUTS OF
BOTH FILTERS AND COMBINE THEM
TOGETHER, FOR EXAMPLE BY
Z  H V
THE OUTPUT WILL NOW
BE QUITE INDEPENDENT
FROM THE DIRECTION
OF EDGES
NOTE THAT
GC/GR IS EQUIVALENT
TO THE DIRECTION
OF AN EDGE
• HERE WE HAVE EXAMPLE OF
RESULTS:
- ORIGINAL PICTURE
- HORIZONTAL DETECTOR
- VERTICAL DETECTOR
- BOTH COMBINED
AS WE CAN SEE THE COMBINED OUTPUT
GIVES BORDERS OF OBJECTS SO
WE CAN RECOGNIZE IT EVEN IF
THERE IS LITTLE INFORMATION
THIS MAY CORRESPOND IN SOME WAY
TO HOW HUMAN SYSTEM WORKS
• WHY WE USED JUST SUCH MATRIX FOR
EDGE DETECTION?
THERE CAN BE MANY SUCH
MATRICES USED, SOME OF
THEM ARE SHOWN HERE,
AND MANY OTHERS ARE
KNOWN
THEY DIFFER IN PROPERTIES
AND OPERATION IN NOISE
E.G. PREWITT, SOBEL ARE GOOD
• IF WE TALK ABOUT OPERATION IN NOISY
IMAGES, THERESHOLDING IS IMPORTANT
AFTER RUNNING A DETECTOR WE GET
OUTPUT SIGNAL. UNFORTUNATELY THIS
CAN BE MADE BY NOISE, NOT BY EDGE.
EDGE DETECTORS CAN BE SENISITVE TO
NOISE.
WE THRESHOLD THE OUTPUT SIGNAL
IF IT IS > THAN SOME VALUE T
IT IS CLASSIFIED AS EDGE
HERE OPERATION OF EDGE
DETECTOR IN NOISY CONDITIONS
WITH THRESHOLDING IS SHOWN:
AT LOW NOISE LEVEL IT IS GOOD
AT HIGHER NOISE LEVEL, WE GET
SOME NOISE POINTS CLASSIFIED
AS EDGES, AND SOME EDGE
POINTS ARE MISSING (WE SEE
GOOD EDGE)
AT VERY HIGH NOISE LEVEL,
THE DETECTOR OPERATION
BREAKS UP COMPLETELY AND
NO EDGE IS DETECTED
NOTE THAT WE CAN SEE SOME
EDGE IN THIS PICTURE
SO IN NOISY CONDITIONS THERE ARE PROBLEMS
WITH EDGE DETECTORS BUT SOMEHOW IN HUMAN
VISION THEY WORK VERY WELL – HOW???
RESEARCHERS MOTIVATED BY HUMAN VISION
NOTICED THAT FILTERING ELEMENTS IN HUMAN
RETINA AT THE BACK OF THE EYE ARE MORE
COMPLICATED THAN SIMPLE DETECTORS HERE.
• MOTIVATED BY OBSERVATION OF HUMAN SYSTEM
AND SOME CONSIDERATION OF OPTIMAL NOISE
ATTENUATION A ZERO-CROSSING, OR LAPLACIANOF-GAUSSIAN DETECTOR WAS DESIGNED
THIS DETECTOR IS OBRAINED
BY TAKING SECOND
DERIVATIVE OF GAUSSIAN CURVE
x  y ( x 2  y 2 ) / 2 s 2
1 / s [1 
]e
2
2s
2
2
4
The resulting curve has
characteristic ’Mexican’ hat shape
NOW IF WE TAKE SECOND DERIVATIVE OF THE OUTPUT,
WE NOTICE THAT EDGE IS WHEN SIGNAL CROSSES ZERO !
• ZERO CROSSING EDGE DETECTOR WILL
BE BETTER IN NOISY CONDITIONS BUT IT
IS MORE COMPLICATED SINCE IT
REQUIRES MUCH MORE OPERATIONS FOR
CALCULATION
Assuming the we have such detector the next problem is how
to build representation based on edges and this is shown next
• LINKING EDGE POINTS TO FORM CONTOURS OF
OBJECTS:
WE LINK OUPUT POINTS FROM EDGE DETECTOR
WHEN THEIR VALUES ARE SMILAR:
- SIMILARITY MEANS
- AMPLITUDE DIFFERENCE IS SMALLER
THAN SOME THRESHOLD
- ANGULAR DIRECTION IS SIMILAR
LINKED EDGES ARE THOUGHT TO BELONG TO
SAME OBJECT
• EXAMPLE
ORIGINAL
PICTURE
VERTICAL
DETECTOR
HORIZONTAL
DETECTOR
RESULT
OF EDGE
LINKING
• SEGMENTATION
HOW TO EXTRACT OBJECTS FROM PICTURES?
THIS CAN BE DONE BASED ON
FEATURES SUCH AS INTENSITY OR COLOR
• WE CAN GROUP AREAS WITH SPECIFIC
FEATURES BY LINKING THEM TOGETHER
IF TWO AREAS HAVE THE SAME FEATURE
WE LINK THEM TOGETHER
SEGMENTATION ALGORITHM
START WITH SOME AREA AND DIVIDE IT
IN FOUR PARTS, CONTINUE DIVISION UNTIL ONLY PARTS
WITH SPECIFIC FEATURE ARE KEPT
• THRESHOLDING
WE NEED TO DIFFERENTIATE BETWEEN THE
’USEFUL’ DATA AND ’NONEUSEFUL’
THRESHOLDING WORKS ON THE PRINCIPLE
THAT USEFUL SIGNAL IS STRONGER.
IF SIGNAL < T WE SET IT TO ZERO.
HOW TO SELECT T?
FOR THRESHOLDING,
HISTOGRAM CAN BE USED SINCE
IT OFTEN PROVIDES VIEW HOW
OBJECT AND BACKGROUND CAN
BE SEPARATED
HOWEVER, FULLY AUTOMATIC
THERSHOLDING IS DIFFICULT
SINCE NOISE AND OBJECT
LIGHT INTENSITIES MAY BE NOT
COMPLETELY SEPARATED
IF THE THRESHOLD
IS SELECTED HERE
WE CAN SEPARATE
BACKGROUND AND
OBJECT
• FEATURE DETECTION
FEATURES ARE SMALL PARTS OF OBJECTS
WHICH ARE CRITICAL FOR RECOGNITION
AND REPRESENTATION
FEATURES
• HOW TO DETECT FEATURES?
THIS IS QUITE DIFFICULT PROBLEM.
FEATURES ARE OFTEN COMPOSED OF SHORT
THIS CORNER
LINE SEGMENTS, E.G. CORNERS
WE CAN THINK TO APPLY EDGE
DETECTOR AND THRESHOLDING FOR
FINDING FEATURES
IS COMPOSED
OF TWO LINES
CORNER
EDGE
MMSP
Irek Defée
• FOR COMPACT REPRESENTATION WE HAVE
TO ELIMINATE ALL NONRELEVANT
SIGNAL ELEMENTS. THIS IS TASK SIMILAR TO
MEDIA COMPRESSION
MEDIA COMPRESSION HAS A GOAL TO
MINIMIZE DESCRIPTION OF MEDIA WHILE
PRESERVING PERCEPTUAL QUALITY.
THIS IS ALSO IMPORTANT TO GENERAL
MULTIMEDIA SIGNAL PROCESSING SINCE IT
MINIMIZES THE AMOUNT OF INFORMATION
TO BE PROCESSED.
:
MEDIA
SIGNAL IS A STREAM OF
BITS
HOW TO REDUCE THE
NUMBER OF BITS NEEDED
FOR THE DESCRIPTION?
THIS CAN BE DONE IN 2
WAYS:
-MORE EFFICIENT
DESCRIPTION OF BITSTREAM
-ELIMINATING PERCEPTUALLY
INSIGNIFICANT INFORMATION
Technically this is called compression
of information
COMPRESSION CAN BE DONE ON
BIT LEVEL -> BIT STREAM
BLOCK-LEVEL -> SMALL BLOCKS
OBJECT-LEVEL -> OBJECTS IN PICTURES
PICTURE-LEVEL -> SAME PICTURE IN
DIFFERENT SIZES IS VERY SIMILAR
COMPRESSION IS ALSO RELATED TO
REPRESENTATION OF VISUAL INFORMATION
LET’S TAKE THE FOLLOWING EXAMPLE:
a
d
g
b
e
h
c
f
i
This is matrix of 3x3 points taken from a
picture. Each point represents number from
0-255, that is 8-bit number.
How many different signal matrices can be
constructed out of these numbers?
(28)9 = 272 - this is huge number
ONLY MEANIGNFUL INFORMATION FROM THESE
MATRICES MUST BE EXTRACTED. BUT WHAT IS THIS
INFORMATION? IT IS ABOUT SPECIFIC SIGNAL
CHANGES....
What are then those changes in small areas of pictures
which might be of interest?
1. We were talking until now about edges
We also mentioned that there can be different types of
edges in pictures
2. There can be also other types of information in these
small areas (e.g. lines)
3. The question is how to account for this information?
Let see some examples: What is there?
Dark line?
Plus grey dots?
Plus black dots?
Dark Line?
Roof edge?
Edge?
Edge with white
dot?
We see here that interpretation of small areas of pictures
is ambiguous, several interpretations are possible.
Sometimes a feature looks like nonideal or contaminated
by other feature
Dots? Line?
Diagonal edge?
So how to interpret such real signals?
There has to be very efficient extraction mechanism allowing for
- extraction of multiple features
- dealing with imperfect features
What seems to be very important is that features are made by
grouping pixels which are touching and have similar values.
Second, sometimes features might be imprefect. Thus, we have to try
to assign each pixel where it might belong – to some feature(s) or not.
We take center pixel and try to find a group of pixels to which it
belongs. Pixel belongs if it has the same value, similar value or its
value can be INTERPOLATED from neighbouring pixels.
Where the center pixel belongs?
It belongs to vertical grey line
because pixel values are same,
it belongs to diagonal edge
if its value can be interpolated
from neighbouring pixels, that
is the pixel values change in linear
way
Pixel intensity
values,
center pixel
value isaverage
of the other two
So we can try to assign pixel to neighbouring
pixels. But there will be a problem if we look
into larger area, Pixels may belong to many
different areas
It will be good to detect
regularity in the areas
When areas are irregular
they may be random and
thus not interesting
How to find regularity?
By transforming area of a picture using periodic
(orthogonal) basis, e.g. Fourier Transform.
But Fourier transform has complex values which is
not the most efficient (2 real numbers)
In practice there are two other transforms used:
Discrete Cosine Transform, DCT and hierarchical
4x4 transform related to it
DCT TRANSFORMATION
• DCT : Discrete Cosine Transform
• Reduction of spatial redundancy
• Transform block size: 8 x 8 in our case
1 7
f (x ,y )  
4 u 0
 (2 x  1)u
 (2 y  1)v
c u cv f (u ,v )co s[
]co s[
]

16
16
v 0
7
 1/ 2 ,k  0
w here u ,v ,x ,y  0 ,1,...,7 c k  
1,
k 0
For color
pictures
we take
blocks:
16 pixels
1
2
3
4
5
6
Cb
Cr
16 lines
Y- black and white
blocks
Color blocks
DCT in the matrix form
One dimension:
H kn  H ( k , n)  ck
2
1 k 

cos ( n  ) 
N
2 N

Two dimensions:
H kn  H (n, m)  ck
2
1 k  
1 l 

cos (n  )  cos (m  ) 
N
2 N 
2 N

• FOR N=4 WE HAVE
DCT basis vectors
For N=8
For N=4
Basis vectors are obtained by multiplying vertical and horizontal
cosine functions
• Example of DCT calculation
Input matrix
Enlarged picture
with selected
block
Calculation of 1-D DCT
for columns of the input
matrix
The block values
Calculation of 1-D DCT for the
rows of the previous
DCT values
THE DCT TRANSFORM IS A MAPPING
FROM PICTURE BLOCK INTO
FREQUENCY DOMAIN
SINCE THERE WILL BE FEW HIGH
FREQUENCIES NORMALLY, THERE
WILL BE MANY ZEROS OR SMALL
NUMBERS IN THE DCT MATRIX
• EXAMPLE OF THE DCT CALCULATION
140 144 147 140 140 155 179 179
144 152 140 147 140 148 167 179
152 155 136 167 163 162 152 172
168 145 156 160 152 155 136 160
162 148 156 148 140 136 147 162
147 167 140 155 155 140 136 162
136 156 123 167 162 144 140 147
148 155 136 155 152 147 147 136
ORIGINAL PICTURE BLOCK
IN PRACTICE SINCE PICTURE
VALUES ARE IN (0,255)
WE SHIFT THEM TO (-127 , 128)
12 16 19 12 11 27 51 47
16 24 12 19 12 20 39 51
24 27 8 39 35 34 24 44
40 17 28 32 24 27 8 32
34 20 28 20 12 8 19 34
19 39 12 27 27 12 8 34
8 28 –5 39 34 16 12 19
20 27 8 27 24 19 19 8
SHIFTED BLOCK
• BLOCK AFTER DCT
185 –17 14 –8 23 –9 –13 –8
20 –34 26 –9 –10 10 13
-10 –23 –1 6 –18 3 -20
-8 -5 14 –14 -8 –2 -3
-3 9 7 1 -11 17 18
8 0 -2 3 -1 -7 -1
0 -7 –2 1 1 4 –6
6
0
8
15
-1
0
MANY SMALL NUMBERS
The DCT values allow to detect and evaluate
periodical structures in small areas.
Sometimes this may be very useful.
DCT has some drawbacks: It requires real
numbers (cosine functions) and high precision
of calculations.
Another transform was introduced recently
to improve on the DCT. This transform is
obtained by rounding the coefficients in the
DCT matrix
H  round {H DCT }
When  = 2.5 the following transform is obtained
1
1
1 1
2 1  1  2


H = 1 1  1 1 


1

2
2

1


This transform
has extremely simple
coefficients, no
multiplications are
involved
This transformation matrix is very simple.
We can see that the rows of the matrix
correspond to caclulations detecting:
- average value of four signal samples
- periodical function with period 1
- periodical function with period 2 (row 4)
Thus we get signal decomposition into
periodical functions.
ENERGY IN THE DCT DOMAIN
Lowest freq.(DC)
DCT
Large entropy
Small entropy
DCT coeff
Inverse DCT
Highest freq.
8 bit/pel
10
DCT coeff
8
4
2
DCT coeff
Compression
Average 8 bit/pel
Equal bit alloc
Average 3.2 bit/pel
Unequal bit alloc
bit/pel
QUANTIZATION
Quantization means removing information which is
not relevant.
Example: rounding of numbers,
round(4.076756) = 4
It turns out that high frequency information is not
very relevant for human vision. It can be thus
removed.
QUANTIZATION
High frequencies in DCT can be removed by
quantizing. Let K will be a value, we make the
operation:
n x round(K/n)
This will round K to in the interval delimited by
valus K-n/2, K+n/2
We can round numbers in such intervals:
QUANTIZATION INTERVALS
f$
f$
f
f
Uniform symmetric midtreader
Uniform symmetric midriser
QUANTIZATION MATRICES
FOR DCT
•
•
•
•
•
•
•
•
16
12
14
14
18
24
49
72
11
12
13
17
22
35
64
92
10
14
16
22
37
55
78
95
16 24 40
19 26 58
24 40 57
29 51 87
56 68 109
64 81 104
87 103 121
98 112 100
51 61
60 55
69 56
80 62
103 77
113 92
120 101
103 99
For luminance Y
17
18
24
47
99
99
99
99
18
21
26
66
99
99
99
99
24
26
56
99
99
99
99
99
47
66
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
For chrominance U, V
Each number in the DCT matrix is quantized (divided and rounded)
by a number in the quantization matrix above. Notice that high
frequencies have much higher quantization values.
• EXAMPLE of DCT CALCULATION
140 144 147 140 140 155 179 179
144 152 140 147 140 148 167 179
152 155 136 167 163 162 152 172
168 145 156 160 152 155 136 160
162 148 156 148 140 136 147 162
147 167 140 155 155 140 136 162
136 156 123 167 162 144 140 147
148 155 136 155 152 147 147 136
ORIGINAL PICTURE BLOCK
12 16 19 12 11 27 51 47
16 24 12 19 12 20 39 51
24 27 8 39 35 34 24 44
40 17 28 32 24 27 8 32
34 20 28 20 12 8 19 34
19 39 12 27 27 12 8 34
8 28 –5 39 34 16 12 19
20 27 8 27 24 19 19 8
SHIFTED BLOCK
• BLOCK AFTER DCT
185 –17 14 –8 23 –9 –13 –8
20 –34 26 –9 –10 10 13
-10 –23 –1 6 –18 3 -20
-8 -5 14 –14 -8 –2 -3
-3 9 7 1 -11 17 18
8 0 -2 3 -1 -7 -1
0 -7 –2 1 1 4 –6
6
0
8
15
-1
0
MANY SMALL NUMBERS
• QUANTIZATION
THE DCT VALUES ARE DIVIDED BY
SPECIAL CONSTANTS AN ROUNDED
3 5 7 9 11 13 15 17
5 7 9 11 13 15 17 19 QUANTIZATION TABLE
7 9 11 13 15 17 19 21
9 11 13 15 17 19 21 23
61 –3 2 0 2 0 0 -1
11 13 15 17 19 21 23 25
4 –4 2 0 0 0 0 0
13 15 17 19 21 23 25 27
-1 –2 0 0 –1 0 –1 0
15 17 19 21 23 25 27 29
0 010 00 0 0
17 19 21 23 25 27 29 31
0 0 0 0 00 0 0
0 0 –1 0 0 0 0 0
0 0 00 00 00
0 0 00 00 00
AFTER QUANTIZATION
OF THE MATRIX FROM
THE PREVIOUS PAGE
Another example – reconstruction of a block from quantized DCT coefficients
We see that approximation is better when more coefficients
are taken
THE ROLE OF DCT AND QUANTIZATION
Quantized DCT coefficients preserve very effectively
content of small picture blocks. That is relevant perceptual
information is well preserved and nonrelevant eliminated.
DCT is thus very good in the representation of image features
with minimized information. This is practically confirmed
since the DCT is used in image and video compression standards,
called JPEG, MPEG.
These standards are used in digital cameras, digital television,
DVD discs and internet media players.
• Minimization of information in video
Video is composed of picture sequences,
25-30 pictures per second
One can observed that video is composed
of ’shots’ or ’scenes’. These are short segments
which have the same content. In single shot
the difference between two subsequent pictures
(taken at 40 ms interval) is very small
Information representing video scene can
be minimized as follows:
- Pick and compress first picture
- Calculate motion compensated difference
between the second picture and first one
- Calculate the motion compensated difference
between the restored second picture and the third
one
- Continue for all pictures in the scene
So we only need information about first (compressed)
picture and differences between other pictures to
preserve initial information from all pictures. This
will result in huge saving of information
• Example
The difference is mostly caused by motion of objects
• Movement of objects- there is problem with
object borders, to avoid it we consider
movements of small picture blocks and try to
detect if they moved
• The difference between two pictures can be
reduced if motion vector of objects is found
and motion is compensated, that is object
which moved in the second picture is moved
back by its motion vector.
16x16 blocks
8x8 blocks
Error is lower when the blocks are smaller
4x4 blocks
• It is also possible to detect movements of blocks
with greater accuracy than 1 pixel, by
interpolation between pixels
Difference images will be
smaller
Half-pixel
interpolation
Quarter pixel
interpolation
Video information reduction
• Instead of having information about all
pictures it is enough to have
1. The first picture 2. Motion- compensated
difference between
subsequent pictures
Motion vectors representing
movements of picture blocks
This is very significant reduction of information and also provides movement
of objects information which is very important
Download