EE569 Introduction to Digital Image Processing HOMEWORK #3

advertisement
EE569 Introduction to Digital Image Processing
HOMEWORK #3 REPORT
JINGWEI LIU
9319448676
Email: jingwei@usc.edu
Problem 1: Geometric Modification
1.1 Motivation
Geometric Modification is very common technology in image processing operation that contains of
translation, scaling, rotation as well as warping. As a popular image processing technology, geometric
modification is mainly used in computer graphics and medical image processing. In this part of the
homework, I implemented basic geometric modification to do the puzzle matching and warping. And in
the part of extra credit, I used warping and morphing at the same time.
1.2 Approach
1.2. (a) Puzzle Matching
I implement translation, rotation and scaling separately to do the puzzle matching. The overall flow
chart of the geometric modification is as following:
And the details of the implementation is as following:
(1). Find the coordinate of the corners of the puzzle piece.
Firstly, I translated the color image into a gray-level image and then into binary image with the
threshold of 250. Then I scanned the binary image 4 times to find the four corner of the tiger piece. The
coordinate of the four corners are:
(2). Calculate the Cartesian coordinate of every pixel both in the input image and output image.
u = p+ 0.5 v = q + 0.5
(3). Calculate the rotation angle-sita with the coordinate of the four corners. The rotation angle is
(4). Choose the Top Left corner to be the origin of the rotation operation. So do Translation.
๐‘ข = ๐‘ฅ − 49.5
๐‘ฃ = ๐‘ฆ − 115.5
(5). Rotation. Calculate the width and height of the tiger piece according to the four corners
coordinate. Then implement the following fomular:
So in my program, the implementation is as following:
๐‘ข = 49 + ๐‘๐‘œ๐‘ ๐œƒ ∗ ๐‘ฅ + ๐‘ ๐‘–๐‘›๐œƒ ∗ ๐‘ฆ
๐‘ฃ = 115 + −๐‘ ๐‘–๐‘›๐œƒ ∗ ๐‘ฅ + ๐‘๐‘œ๐‘ ๐œƒ ∗ ๐‘ฆ
So
๐‘ = ๐‘ข − 0.5
๐‘ž = ๐‘ฃ − 0.5
(6). Bilinear. The p and q may be not integer so do bilinear to get the value of the desired pixel value.
Use the following formula:
(7). Scaling. After scaling, the size of tiger piece is (122,122). The formula of scaling is:
The implementation is the following. Sx=Sy that is 150/122 (150 is the width and height of the
original tiger piece).
๐‘ฅ
๐‘ข=
๐‘†๐‘ฅ
๐‘ฃ=
๐‘ฆ
๐‘†๐‘ฆ
(8). Bilinear.
(9). Find the coordinates of the hole in “tiger. raw”. I translated the tiger image into binary image and
scanned the image to find the up-left corner of the hole. Because I already knows that the hole is
120*120, so I can find other three corners according to the up-left coordinate.
(10). Put the puzzle piece back into “tiger.raw”. The size of puzzle piece is 122*122 which is larger
than 120*120. So the surrounding lines of the hole in “tiger.raw” is the average of the original pixel
value and puzzle piece value.
1.2. (b) Face Warping
I choose nose both of the man and the tiger to be the only control point of the warping operation. The
coordination of the nose of man is (x1, y1)=(317, 195), and that of the tiger is (x2, y2)=(385, 195). So
the align point is ((x1+x2)/2, (y1+y2)/2)=(351, 195).
And the steps to implement the warping operation is as following:
(1). Find the align point that is (351,195).
(2). Warping using 3 control point in a triangle.
1
๐‘ข
๐‘Ž0 ๐‘Ž1 ๐‘Ž2 ๐‘ฅ
=
๐‘ฃ
๐‘0 ๐‘1 ๐‘2 ๐‘ฆ
where a0~a2, b0~b2 are unknown.
(3). Using the 5 control points in the following image and above formula to compute the 4 inverse
matrix of a and b. The control points are (351, 195), (0, 0), (0, 399), (499, 0) and (499, 399).
(4). Find the four triangles in the image and calculate the 4 lines that separate the whole image into 4
regions in the about image.
(5). Reversely find the pixel value of the output image. Decide the pixel belongs to which triangle in
the output image and use the corresponding warping matrix to find (u, v) in the original image.
(6). Bilinear interpolation.
1.2. (c) Face Morphing
Do cross-dissolving between the two warped image- warped tiger and warped man according to the
following formula:
where 0<๐›ผ<1, and I is the color of the image.
1.2. (d) Back to 27
Implement 1.2. (b) and 1.2.(c) to do this part.
Do warping firstly using the control point of nose whose coordinate is (314,195) for the young man
and (328, 169) for the old man. And the align point is (321, 182).
After warping, do morphing for the both warped images.
1.2. (e) Video Morphing
In this extra part, I implement warping and morphing at the same time. And I separate the whole
process into 16 times.
I only use one control point for warping. The control point of warping is chin that is different from
the previous. And the coordinate of the chin in the two images are (239, 131) and (173, 101), which
leads the align point is (206, 116) after 16 times.
The morphing is done just after the warping every time that also implement 16 times.
The video is consist of 16 images and lasts 11 seconds.
1.3 Results
1.3. (a) Puzzle Matching
After translate the puzzle piece and tiger.raw into binary image, the result is
Fig 1.1 Coordinate of puzzle piece and hole in tiger.raw
The result of rotation and scaling of piece.raw is in Fig1.2.
Fig 1.2 Output of rotation and scaling
The result of hole-filling algorithm is in Fig 1.3.
Fig 1.3 Output of hole-filing of tiger.raw
1.3. (b) Face Warping
The result of warping for the tiger and the man is shown in Fig 1.4 and Fig 1.5.
Fig 1.4 (a) Original tiger (b) Warping tiger
Fig 1.5 (a) Original man (b) Warping man
1.3. (c) Face Morphing
The result of face morphing is shown in Fig 1.6.
Fig 1.6 (a) ๐›ผ=0.2 (b) ๐›ผ=0.5 (c) ๐›ผ=0.8
1.3. (d) Back to 27
The result of warping for the old man and young man is shown in Fig 1.7 and 1.8.
_
Fig 1.7 (a) Original man (b) Warping man
Fig 1.8 (a) Original old man (b) warping old man
The result of morphing is in Fig 1.9.
Fig 1.9 (a) Original young man (b) warping young man
1.3. (e) Video Morphing
The video is in extra.avi. The partial of the result is in Fig 1.10. The control point is chin.
Fig 1.10 (a) Original Fiona (b) ~ (d) warping Fiona (e) warped Fiona
1.4 Discussion
(1). Why the operation of geometric manipulation must be rotation, translation and scaling?
Ans:
I choose the up-left corner of the puzzle piece to be the rotation pivot and every pixel in the puzzle piece
change its coordinate to the new coordinate system. But I didn’t implement the translation in the output
image, so the piece in the result of rotation is on the top of the whole image that I think will make life
easier for the latter operation.
After rotation, the piece is horizontal and I only need to find the points in the piece instead of the whole
image. But if I implement the scaling before rotation, I should scaling the whole image first, and then do
the rotation for the reason that I can’t find the point of the piece of the tilt output image.
(2). In Puzzle Matching, why use 122*122 instead of 120*120?
Ans:
The scaling result is a little bit larger than the hole in tiger.raw that is 120*120. And the value of
overlapping surrounding part is the average of the puzzle piece and the tiger.raw. And the purpose of
this implementation is to get smooth boundary instead of obvious square boundary.
Overlap pixel value = value of board pixel + value of piece pixel /2
(3). In face warping, what types of feature you have selected and total number of control points?
Ans:
In the man—tiger problem, I choose the nose as the control point for the reason that the position of the
nose in tiger and young man is obviously different. The coordinate of the control point is (317, 195) and
(385, 195), and the final align point is (351, 195).
In the old--young problem, I also choose the nose as the control point to test the robust of my program.
Because that the two noses of the young man and the old man is just a little different, so the result is not
as obvious as that of the man—tiger problem. The coordinate of the control point is (314, 195) and (328,
169), and the align point is (321, 182).
I also tried to choose two control points for this problem, please see discussion (6).
In the extra part, I choose chin as the control point to get an obvious change inside the sixteen images
gradually. The coordinate of the control point is (173, 101) and (239, 131), and the final align point is
(206, 116). Every time of the sixteen iterations, the height changes 2 levels, and the width change only 1
level.
(4). Discuss your morphing result in part b. is the transformation smooth and reasonable? Why?
Ans:
It is smooth and reasonable. But because the two images have been warped, the change is only caused
by the color intensity. So it can get a smoother and more reasonable image by doing wrapping and
morphing together. Then the gradually change will looks smoother such as the extra part I did. And to
get a more reasonable image, more control points should be chosen. In the current result, only the noses
of the two images are aligned while the eyes are not. As an identical feature of the whole image, this
makes the gradually change not reasonable.
(5). How could the morphing result be further improved?
Ans:
The morphing result can be improved in two ways: (1) morphing while warpping; (2) choose more than
one control points together to make the gradually changes between the two images smoother.
(6). Tried to choose 2 control points when warping in part d.
Ans:
I choose chin as the second control point after the first warping operation using nose as the control
points. From Fig 1.11, the chins of the two images align each other while the nose no longer does. So
that more than one time of warping destroys the previous result and makes the image more blur.
When using more than one control points to do warping, I have to choose second-order warping instead
of first-order warping. Therefore, more than one control points can be used to calculate the warp matrix
that makes the warping result has more than one point to align each other.
Fig 1.11 (a) nose as control point when a=0.5 (b) nose and chin as the control points when a=0.5
(7). Implementation of extra part. Why I choose chin as the control point instead of nose.
Ans:
The two image of Fiona I choose don’t have much difference between the two noses. And considering of
the sixteen warping iterations, I choose the chin as the control point which can make the gradually
change more obvious and smoother.
(8)What can you further improve?
Ans:
I only use first-order and one control point. So for the further improvement, I can use more than one
control points at the same time to represent the changes between the two images. To implement more
than one control points, first-order warping is no longer enough. As a result use higher order warping,
such as second-order warping.
Problem 2: Texture Analysis and Segmentation using Laws Filters
2.1 Motivation
Texture is distinguished feature of an image that reflect the visual appearance and artificial features
of the image. According to the texture feature we can implement texture classification and image
segmentation. In this part, I tried to classify different texture images and segment different texture
region within one image.
2.2 Approach
2.2. (a) Texture Image Classification
In this part, I applied nine 5*5 Laws Filter to classify the ten texture samples. The flow chart to
implement classification is shown as below:
And the implement steps are as following:
(1). Histogram equalization. Apply digital histogram equalization that I write in homework 1 to
equalize the ten samples texture image brightness and increase the image contrast.
(2). Feature Extraction. Apply the three one-dimension Laws Filter to create nine 5*5 two-dimension
filters. The three one-dimension filters are
1
1
1
๐ฟ5 =
1 4 6 4 1 ๐ธ5 = −1 − 2 0 2 1 ๐‘†5 = [−1 0 2 0 − 1]
16
6
2
The nine two-dimension filters are
Apply the above nine filter to every pixel in the ten sample textures and store the nine-dimension vector
L1 ~ L9 for each texture.
(3). Feature Average. I used 13*13 windows to calculate the energy for each pixel in every sample
textures. The formula to calculate the energy is
Where window size=13
Then average the every vector for each sample.
where N=512.
The result of the nine average vectors for the ten samples are
(3). Classification. I used nearest-neighbors and second-nearest-neighbors to classify the 10 samples
by calculating the distance between each pair of the samples corresponding vector Ti.
To use the nearest-neighbors, if the two samples nearest neighbors are each other, then they are in the
same class. Then considering the second nearest neighbors to classify the sample with only one arrow in
the nearest neighbor.
2.2. (b) Texture Segmentation
In this part, I applied twenty-five 5*5 Laws Filter to extract the texture value of every pixel. The flow
chart to implement segmantation is shown as below:
And the implement steps are as following:
(1). Feature Extraction. Apply the five one-dimension Laws Filter to create nine 5*5 two-dimension
filters. The five one-dimension filters are
Apply the 25 filters in the following table to every pixel in the image and store the 25-dimension vector
L1 ~ L25 for the image.
L5*L5 (feature 0)
L5*E5 (feature 1)
L5*S5 (feature 2)
L5*W5 (feature 3)
L5*R5 (feature 4)
E5*L5 (feature 5)
E5* E5
E5*S5
E5*W5
E5*R5
S5*L5
S5*E5
S5*S5
S5*W5
S5*R5
W5*L5
W5*E5
W5*S5
W5*W5
W5*R5
R5*L5
R5*E5
R5*S5
R5*W5
R5*R5(feature 24)
(2). Energy Computation. I used 39*39 windows to calculate the energy for each pixel in the image.
The formula to calculate the energy is
Where window size=39, -19<=m,n<=19.
(3). Normalization: use the L5* L5 as divisor and normalize other 24 features to get the 24-dimension
vector for each pixel.
(4). K-means algorithm to perform segmentation. The steps of K-means algorithm is as follows:
(4-1). By observation, initialize K=5 for comb1.raw, and the coordinate of the 5 centers are {(20, 20),
(20, width-20), (height/2,width/2), (height-10, 20), (height-20, width-20)}. And for comb2.raw, let K=6
and initial the 6 centers to be {(20, 20), (20, width/2), (20, width-20), (height-20, 20), (height-20,
width/2), (height-20, width-20)}.
(4-2). Calculate the distance separately from the pixel E( i, j) to the centroids C1 to C5.
(4-3). Find the nearest centroid Cn to the pixel E(i,j) and set the label of E(i,j) to be n.
(4-4). After labeled all the pixels in the image, obtain the new cluster centers by adding all the labeled
n pixel’s vector separately and set the average to the new corresponding center.
(4-5). Repeat steps (4-2) to (4-4) till change in cluster centroids is less than 1%.
(4-6). According to the label n of every pixel, set gray value to the pixel. The value of the pixel is
v=n*(256/K)
where n is the label of the pixel, and K is the number of centers.
2.3 Results
2.3. (a) Texture Image Classification
After histogram equalization, the ten samples are shown in Fig 2.1.
Fig 2.1 Histogram equalization result of sample 1~ sample 10
The result of the distance table and nearest neighbor is
According to the nearest neighbors, the classification is
So the classification is
According to the second nearest neighbors, the classification is
So the classification is
2.3. (b) Texture Segmentation
The best result of texture segmentation is window size=39 as shown in Fig 2.2.
Fig 2.2 output of segmentation of (a) comb1 (b) comb2
2.4 Discussion
(1). What the result if not use histogram equalization in texture classification?
Ans:
The result without histogram is shown below:
So the classification is
According to the nearest neighbor algorithm, I only can classify two classes that are 1and 2, 4 and 6. But
this result is totally wrong.
By observation, 1and 2 are the two darkest ones within the ten samples, 4 and 6 are the two brightest
ones within the ten samples. So the classification of the texture is done according to the brightness.
While the brightness should not contribute to the texture classification, I did digital histogram
equalization to remove the effect of the classification.
(2). Why do histogram equalization before feature extraction in texture classification?
Ans:
According to the analysis in discussion (1), the differences of brightness of the ten samples affect the
classification result badly. So I preprocessed the ten samples by digital histogram equalization that make
the output image brightness distributes from 0 to 255 equally and remove the effect of the brightness.
(3). After histogram equalization, does the operation to minus mean affect the result of texture
classification?
Ans: there is not much difference of the two scenarios as shown in Fig 2.3 and Fig 2.4. By digital
histogram equalization, the brightness of the image is absolutely equally distributed, so the difference
between the two scenarios is not significant.
Fig 2.3 Result of distance table and nearest neighbor after minus mean brightness
Fig 2.3 Result of distance table and nearest neighbor before minus mean brightness
(4). Compare the result with reality and discuss the discrepancies and conflicts.
Ans:
The reality classification of the ten samples should be:
(1,3,7), (2,5,9), (4,8) and (6,10).
My result is (1,3,7), (2,4,8) and (5,9) according to nearest-neighbor and second-nearest-neighbor, and
the rest could not be classified to any classes.
Comparing the reality and my result, class (1,3,7) is correct which is the texture of wheat.
Sample 2 should be the same texture of sample 5 and 9, but my program treats it as the same as sample 4
and 8. This is because sample 2 is the texture of horizontal bricks that is different of the other two
vertical bricks texture. And the feature filter E5*L5 and L5*E5 are sensitive to horizontal edges and
vertical edges separately which makes the wrong decision to discriminate sample 2 and the other brick
texture images. On the other hand, sample 4 and 8 are the tilt line texture that is similar of edge
orientation. As a result, my program classifies sample 2 to be the same as 4 and 8.
Sample 6 and 10 are not classified correctly for the reason that sample 10 has too much noise. Sample
10 can find 6 to be its nearest neighbor, but sample 6 could not find 10 because of the noise. Then
sample 6 finds sample 3 as its nearest neighbor instead of 10. By observation, sample 6 and 3 is really
similar discard of sample 10.
(5). Why the result of texture classification is not perfect? How to improve the result?
Ans:
According to the analysis in discussion (4), the result of classification is not perfect because of the edge
orientation and noise. As a result, we can discard the feature of E5*L5 and L5*E5 to remove the
contribution of edge orientation. Besides, we can preprocess the sample image by denoising to deduce
the effect of noise.
(6). What the effect of window size when calculate the energy in texture segmentation?
Ans:
The larger the windows size, the better the segmentation result. But when the window size is large
enough, the result will not change a lot because of the window size. And in my program, I finally choose
39 as the window size. I think the reason that 39 gives a good result is that 39 window is large enough to
include features of different textures, such as bricks, tomatoes and grass of the texture. From Fig 2.4(a),
the brick and tomato do not segment because the window size is too small for these two textures and the
algorithm could not tell the differences between the two textures. While when the window size is large
enough to include one single representative feature such as one brick or one tomato, the algorithm can
segment them into two different textures.
On the other hand, the boundary of two different textures is wider as the window size larger. This is
because the sharp boundary could not be detected by a larger window size.
Fig 2.4 Result of segmentation when window size is (a) 13 (b) 17 (c) 31 (d) 39
Fig 2.5 Result of segmentation when window size is (a) 13 (b) 17 (c) 31 (d) 39
(7). How to improve the result of texture segmentation?
Ans:
According to the analysis of discussion (6), the segmentation result is better when the window size is
relatively large, while the boundary become wider as the window size growing.
This conflict situation can be improved by dynamic window size. We can use edge detection to detect
the edges in the whole image. Then choose different window size according to the result of edge
detection. If the pixel is not edge, apply larger window size. On the contrary, apply smaller window size
to do energy calculating.
And we can take the advantage that the same texture pixels seat in the same connected region.
What’s more, we can put different weight on the 24 features according to their importance when
calculating their distance.
(8). Reduce the feature of E5*L5 and L5* E5. Discuss your choices and compare your results.
Ans: The direction of texture is not important and sometimes contributes to the wrong decision such as vertical brick and horizontal brick. As a result, I remove the filters, which are sensitive to horizontal edges and vertical edges. Fig 2.6 is the result of texture segmentation after removing E5*L5 and L5* E5 filters. And this operation does not affect the result significantly.
Fig 2.6 Result of (a) 24 features (b) remove E5*L5 and L5* E5 (c) 24 features (d) remove E5*L5 and L5* E5
(9). Reduce the feature of ripple. Discuss your choices and compare your results.
Ans:
In the 24 feature vectors, not every feature vector is equally important in texture detection. So I remove
the all the feature vector including R5 and the result is not significant bad then the result derived from
24 feature vectors shown in Fig 2.7.
Fig 2.6 Result of (a) 24 features (b) remove R5 (c) 24 features (d) remove R5
(10). Why use energy to segment different texture?
Ans:
By summary of the squares of neighbors around the pixel, the inter-domain distance become larger
while intra-domain distance become smaller, which makes the segmentation more compact. As a result,
we implement energy to identify different texture of every pixel and segment different textures.
(11). What’s the effect of K-means initialization?
The choice of K-means initialization may contributes to different result for the same image. As a result,
the number and location of the K centers initialization is crucial for K-means algorithm. So in my
program, I manually choose the K centers by eyes and my choice is as following:
For comb1, the initial 5 k centers are-- {(20, 20), (20, width-20), (height/2,width/2), (height-10, 20),
(height-20, width-20)}.
And for comb2.raw, let K=6 and initial the 6 centers to be {(20, 20), (20, width/2), (20, width-20),
(height-20, 20), (height-20, width/2), (height-20, width-20)}.
Problem 3: Optical Character Recognition
3.1 Motivation
OCR represented of Optical Character Recognition is widely used today to translate a printed and
handwritten text into electronic document. To implement OCR, I design a decision tree according to the
training character and test the decision tree on the testing data and license plate recognition.
3.2 Approach
3.2. (a) OCR Training
The decision tress of Optical Character Recognition is as following:
The steps to implement the decision tree are as following:
(1). Calculate Euler number.
Scan the whole image with 2*2 window (Bit Quad). If the pattern hits any of the following patterns in
Fig 3.1, the corresponding counter adds one. The formula to calculate Euler number is
And the decision is:
8 ๐‘–๐‘“ ๐ธ = −1
9, ๐‘…, ๐‘ƒ ๐‘–๐‘“ ๐ธ = 0
5, ๐‘†, 2, ๐‘, ๐‘€, ๐‘ ๐‘–๐‘“ ๐ธ = 1
Fig 3.1 Patterns of Bit Quad
(2). Calculate the boundary box.
Scan the whole image, and store the “1” pixel on top, bottom, left and right. And the four pixels
construct a boundary box around the character.
(3). For P, R and 9.
According to the boundary box, scan two vertical lines, and record the how many times the vertical
line cross the “1” pixels and the “1” pixel’s top neighbor is not “1” as show in Fig 3.2.
Fig 3.2
9 ๐‘–๐‘“ ๐‘๐‘œ๐‘ข๐‘›๐‘ก = 3
๐‘ƒ ๐‘œ๐‘Ÿ ๐‘… ๐‘–๐‘“ ๐‘๐‘œ๐‘ข๐‘›๐‘ก! = 3
(4). For P and R.
According to the boundary box, scan one horizontal line, and record the how many times the
horizontal line cross the “1” pixels and the “1” pixel’s left neighbor is not “1” as show in Fig 3.3.
Fig 3.3
๐‘ƒ ๐‘–๐‘“ ๐‘๐‘œ๐‘ข๐‘›๐‘ก = 1
๐‘… ๐‘–๐‘“ ๐‘๐‘œ๐‘ข๐‘›๐‘ก = 2
(5). For M, N, 5, S, 2, Z.
According to the boundary box, scan one vertical lines, and record the how many times the vertical
line cross the “1” pixels and the “1” pixel’s top neighbor is not “1”.
๐‘€, ๐‘ ๐‘–๐‘“ ๐‘๐‘œ๐‘ข๐‘›๐‘ก = 1
5, ๐‘†, 2, ๐‘ ๐‘–๐‘“ ๐‘๐‘œ๐‘ข๐‘›๐‘ก! = 3
(6). For M and N.
Use symmetry to distinguish M and N. M is more than 80% pixels symmetry about the middle
vertical line of the boundary box, while N is less than 80% pixels symmetry about the line.
๐‘€ ๐‘–๐‘“ ๐‘ ๐‘ฆ๐‘š๐‘š๐‘’๐‘ก๐‘Ÿ๐‘ฆ > 0.8
๐‘ ๐‘–๐‘“ ๐‘ ๐‘ฆ๐‘š๐‘š๐‘’๐‘ก๐‘Ÿ๐‘ฆ < 0.8
(7). For 5, S, 2, Z.
Scan last a few lines of the boundary of the boundary box. And record how many “1” pixels in that
line. If the number in any lines above the width of the boundary box, than it is 2 or Z; otherwise they
may be 5 or S as shown in Fig 3.4.
Fig 3.4
๐‘† ๐‘œ๐‘Ÿ 5 ๐‘–๐‘“ ๐‘ค๐‘–๐‘‘๐‘กโ„Ž! < 0.8 ๐‘ค๐‘–๐‘‘๐‘กโ„Ž
2 ๐‘œ๐‘Ÿ ๐‘ ๐‘–๐‘“ ๐‘ค๐‘–๐‘‘๐‘กโ„Ž! > 0.8 ๐‘ค๐‘–๐‘‘๐‘กโ„Ž
(8). For 5, S.
According to the boundary box, scan one horizontal line, and record the how many times the
horizontal line cross the “1” pixels and the “1” pixel’s left neighbor is not “1”.
5 ๐‘–๐‘“ ๐‘๐‘œ๐‘ข๐‘›๐‘ก = 1
๐‘† ๐‘–๐‘“ ๐‘๐‘œ๐‘ข๐‘›๐‘ก = 2
(9). For 2, Z.
Scan first a few lines of the boundary of the boundary box. And record how many “1” pixels in that
line. If the number in any lines above the width of the boundary box, than it is 2 or Z; otherwise they
may be 5 or S.
2 ๐‘–๐‘“ ๐‘ค๐‘–๐‘‘๐‘กโ„Ž! < 0.8 ๐‘ค๐‘–๐‘‘๐‘กโ„Ž
๐‘† ๐‘–๐‘“ ๐‘ค๐‘–๐‘‘๐‘กโ„Ž! > 0.8 ๐‘ค๐‘–๐‘‘๐‘กโ„Ž
3.2. (b) OCR Testing: ideal case
The given testing image is not binary image, so I choose threshold of 250 to get the binary image.
Then use the same OCR program to recognize the input image character.
3.2. (c) OCR Application: license plate recognition
The given testing image is not binary image, so I choose threshold of 20 and 100 to get the binary
image. When the pixel value less than 20 or more than 100, change the pixel to “0”; the pixels between
the two threshold set it to “1”.
The binary image I got has little holes in the binary image as shown Fig 3.5. This must be removed
because the holes will affect the result of Euler number.
Fig 3.5
To remove the holes in the character, I scan the image. If the pixel is “0” and its surrounding are
more than 5 “1”, I recognize it as a member of the hole in the image.
Again, I scan the image. If the pixel is “1” and its surrounding are more than 5 “0”, I recognize it as a
member of the hole or glitch in the image.
After preprocessing, I apply the same OCR program to recognize the input image character.
3.2. (d) OCR Application: real-life case
The license plate is not in good condition, so I implement the following steps to preprocess the plate
before apply the OCR program.
(1). Translate the color image into gray level image.
(2). Histogram Equalization.
(3). Translate the gray level image into binary image. I choose threshold of 70 and 160 to get the
binary image. When the pixel value less than 70 or more than 160, change the pixel to “0”; the pixels
between the two threshold set it to “1”.
(4). Rotation. Rotate the whole license plate.
(5). After the rotation, the image is not binary any more. As a result, translate the image into binary
image with the threshold of 150.
(6). Segmentation. Assume the seven characters in the license plate are equally distributed in the plate
and segment the whole plate into 7 same size images with each of (153,282).
(7). The seven binary character image I got has little holes in the binary that must be removed
otherwise will affect the result of Euler number.
To remove the holes in the character, I scan the image. If the pixel is “0” and its surrounding are
more than 5 “1”, I recognize it as a member of the hole in the image.
Again, I scan the image. If the pixel is “1” and its surrounding are more than 5 “0”, I recognize it as a
member of the hole or glitch in the image.
(8). Implement OCR program.
3.3 Results
3.3. (a) OCR Training
The result of OCR training is shown in Fig 3.6:
Fig 3.6 Result of training
3.3. (b) OCR Testing: ideal case
The result of OCR testing in ideal case is in Fig 3.7:
Fig 3.7 Result of testing
3.3. (c) OCR Application: license plate recognition
Some samples of the result after preprocessing in Fig 3.8.
Fig 3.8 (top) binary result (bottom) preprocessed result
The result of OCR testing in ideal case is shown in Fig 3.9:
Fig 3.9 Result of license plate recognition
3.3. (d) OCR Application: real-life case
The result of gray-level image and the image after histogram equalization is in Fig 3.10 and Fig 3.11.
Fig 3.10 Gray level image
Fig 3.11 Histogram equalization result
Binary result is shown in Fig 3.12.
Fig 3.12 Binary result
Rotation result is shown in Fig 3.13.
Fig 3.13 Rotation result
Binary result is shown in Fig 3.14.
Fig 3.14 Binary result
Segmentation result is shown in Fig 3.15.
Fig 3.15 Segmentation result
The result of removing holes in the character image is shown in Fig 3.16.
Fig 3.16 Result after removing holes
The result of preprocessing is shown in Fig 3.17.
Fig 3.17 Result after preprocessing
The result of OCR testing in ideal case is as following:
Fig 3.18 Result of OCR program
3.4 Discussion
(1). What about the result applying Area and Perimeter?
Ans:
I have implemented the Area and Perimeter, which use the following formula:
But the result is not reasonable to make decision.
Fig 3.19 Result of Area and Perimeter
From Fig 3.19, the normalized area and perimeter is shown in Table 1.
Area
Perimeter
5
0.49
7.36
M
0.51
8.52
N
0.49
7.64
S
2
0.44
0.43
7.83
7.46
Table 1
P
0.38
6.13
Z
0.36
6.72
R
0.45
7.30
9
0.47
8.03
The difference of Area and Perimeter is significant enough to make different between the characters
even for the same font situation. So I don’t choose these two parameters to implement my OCR
program.
(2). Does the thinning contribute to OCR?
Ans:
I tried to do the thinning operation before make the decision tree with a disappointed result
unfortunately. For the training image, thinning does well and makes the decision easier. While for the
testing image, the thinning result is wield and could not be recognized because of glitch. In Fig 3.20, I
display some thinning result of testing image.
Fig 3.20 Result of thinning for testing image
(3). What parameters are robust to the font and non-idea scenario?
Ans:
Euler number is relatively robust comparing to others. The Euler number does not change because of
glitch, rotation, scaling and the font of the character. But if the binary image has holes or nosie, the
Euler number is not robust enough to give the correct number.
For the normalized Area and Perimeter, they do not change because of rotation and scaling. But they are
sensitive to the font.
And considering the symmetry parameter, it does not be affected by scaling while changes because of
rotation and font.
Finally consider aspect ration and circularity that are both insensitive to scaling because that they are all
the normalized value by the boundary box.
(4). What are the assumptions of real life license plate recognition?
Ans:
(a). Symbols are relatively dark and appear on a bright surface and I can choose the appropriate
threshold to get a better result.
(b). The characters on the plate distribute equally and I segment the whole plate by the same size.
(c). The holes inside the binary character is small and I can get rid of the holes to get the correct Euler
number.
(5). How I improve the OCR program?
Ans:
When I first finished the OCR program according to the training data, I did not get good result for the
testing data. Then I changed some parameter in my decision tree, such as threshold until the OCR works
fine for both the test data and the training data. And I repeated the steps when I’m doing part c and part
d.
(6). The robust of my OCR program. How to improve the robust?
Ans:
My OCR works totally fine for the training data, testing data and idea license plate recognition. But for
the extra part, it could not distinguish between 2 and Z, which, I guess is caused by the bad original
input image and the preprocessing.
For the input image, 2 and Z is very similar and it’s hard even for the eyes to distinguish.
And after my preprocessing, I get nearly the same binary image for 2 and Z which are shown in Fig 3.21
that lost identical feature between 2 and Z.
Fig 3.21 Z and 2 after preprocessing
To improve the OCR program, I think a large amount of training data is needed and more robust
parameters should be implemented too.
Reference:
1.
2.
3.
4.
5.
6.
7.
EE569_Fall2011_HW3_v6.pdf.
EE569_Fall_2011_Discussion_7.pdf.
EE569_Fall_2011_Discussion_8.pdf.
EE569_Fall_2011_Discussion_9.pdf.
EE569_Fall_2011_Discussion_10.pdf.
Digital Image Processing 4th Ed.pdf, Pratt Class Note
Reference Images:
Given by the assignment EE569_Fall_2011_HW3 and USC DEN EE569_ASSIGNMENT.
Download