EE569 Introduction to Digital Image Processing HOMEWORK #3 REPORT JINGWEI LIU 9319448676 Email: jingwei@usc.edu Problem 1: Geometric Modification 1.1 Motivation Geometric Modification is very common technology in image processing operation that contains of translation, scaling, rotation as well as warping. As a popular image processing technology, geometric modification is mainly used in computer graphics and medical image processing. In this part of the homework, I implemented basic geometric modification to do the puzzle matching and warping. And in the part of extra credit, I used warping and morphing at the same time. 1.2 Approach 1.2. (a) Puzzle Matching I implement translation, rotation and scaling separately to do the puzzle matching. The overall flow chart of the geometric modification is as following: And the details of the implementation is as following: (1). Find the coordinate of the corners of the puzzle piece. Firstly, I translated the color image into a gray-level image and then into binary image with the threshold of 250. Then I scanned the binary image 4 times to find the four corner of the tiger piece. The coordinate of the four corners are: (2). Calculate the Cartesian coordinate of every pixel both in the input image and output image. u = p+ 0.5 v = q + 0.5 (3). Calculate the rotation angle-sita with the coordinate of the four corners. The rotation angle is (4). Choose the Top Left corner to be the origin of the rotation operation. So do Translation. ๐ข = ๐ฅ − 49.5 ๐ฃ = ๐ฆ − 115.5 (5). Rotation. Calculate the width and height of the tiger piece according to the four corners coordinate. Then implement the following fomular: So in my program, the implementation is as following: ๐ข = 49 + ๐๐๐ ๐ ∗ ๐ฅ + ๐ ๐๐๐ ∗ ๐ฆ ๐ฃ = 115 + −๐ ๐๐๐ ∗ ๐ฅ + ๐๐๐ ๐ ∗ ๐ฆ So ๐ = ๐ข − 0.5 ๐ = ๐ฃ − 0.5 (6). Bilinear. The p and q may be not integer so do bilinear to get the value of the desired pixel value. Use the following formula: (7). Scaling. After scaling, the size of tiger piece is (122,122). The formula of scaling is: The implementation is the following. Sx=Sy that is 150/122 (150 is the width and height of the original tiger piece). ๐ฅ ๐ข= ๐๐ฅ ๐ฃ= ๐ฆ ๐๐ฆ (8). Bilinear. (9). Find the coordinates of the hole in “tiger. raw”. I translated the tiger image into binary image and scanned the image to find the up-left corner of the hole. Because I already knows that the hole is 120*120, so I can find other three corners according to the up-left coordinate. (10). Put the puzzle piece back into “tiger.raw”. The size of puzzle piece is 122*122 which is larger than 120*120. So the surrounding lines of the hole in “tiger.raw” is the average of the original pixel value and puzzle piece value. 1.2. (b) Face Warping I choose nose both of the man and the tiger to be the only control point of the warping operation. The coordination of the nose of man is (x1, y1)=(317, 195), and that of the tiger is (x2, y2)=(385, 195). So the align point is ((x1+x2)/2, (y1+y2)/2)=(351, 195). And the steps to implement the warping operation is as following: (1). Find the align point that is (351,195). (2). Warping using 3 control point in a triangle. 1 ๐ข ๐0 ๐1 ๐2 ๐ฅ = ๐ฃ ๐0 ๐1 ๐2 ๐ฆ where a0~a2, b0~b2 are unknown. (3). Using the 5 control points in the following image and above formula to compute the 4 inverse matrix of a and b. The control points are (351, 195), (0, 0), (0, 399), (499, 0) and (499, 399). (4). Find the four triangles in the image and calculate the 4 lines that separate the whole image into 4 regions in the about image. (5). Reversely find the pixel value of the output image. Decide the pixel belongs to which triangle in the output image and use the corresponding warping matrix to find (u, v) in the original image. (6). Bilinear interpolation. 1.2. (c) Face Morphing Do cross-dissolving between the two warped image- warped tiger and warped man according to the following formula: where 0<๐ผ<1, and I is the color of the image. 1.2. (d) Back to 27 Implement 1.2. (b) and 1.2.(c) to do this part. Do warping firstly using the control point of nose whose coordinate is (314,195) for the young man and (328, 169) for the old man. And the align point is (321, 182). After warping, do morphing for the both warped images. 1.2. (e) Video Morphing In this extra part, I implement warping and morphing at the same time. And I separate the whole process into 16 times. I only use one control point for warping. The control point of warping is chin that is different from the previous. And the coordinate of the chin in the two images are (239, 131) and (173, 101), which leads the align point is (206, 116) after 16 times. The morphing is done just after the warping every time that also implement 16 times. The video is consist of 16 images and lasts 11 seconds. 1.3 Results 1.3. (a) Puzzle Matching After translate the puzzle piece and tiger.raw into binary image, the result is Fig 1.1 Coordinate of puzzle piece and hole in tiger.raw The result of rotation and scaling of piece.raw is in Fig1.2. Fig 1.2 Output of rotation and scaling The result of hole-filling algorithm is in Fig 1.3. Fig 1.3 Output of hole-filing of tiger.raw 1.3. (b) Face Warping The result of warping for the tiger and the man is shown in Fig 1.4 and Fig 1.5. Fig 1.4 (a) Original tiger (b) Warping tiger Fig 1.5 (a) Original man (b) Warping man 1.3. (c) Face Morphing The result of face morphing is shown in Fig 1.6. Fig 1.6 (a) ๐ผ=0.2 (b) ๐ผ=0.5 (c) ๐ผ=0.8 1.3. (d) Back to 27 The result of warping for the old man and young man is shown in Fig 1.7 and 1.8. _ Fig 1.7 (a) Original man (b) Warping man Fig 1.8 (a) Original old man (b) warping old man The result of morphing is in Fig 1.9. Fig 1.9 (a) Original young man (b) warping young man 1.3. (e) Video Morphing The video is in extra.avi. The partial of the result is in Fig 1.10. The control point is chin. Fig 1.10 (a) Original Fiona (b) ~ (d) warping Fiona (e) warped Fiona 1.4 Discussion (1). Why the operation of geometric manipulation must be rotation, translation and scaling? Ans: I choose the up-left corner of the puzzle piece to be the rotation pivot and every pixel in the puzzle piece change its coordinate to the new coordinate system. But I didn’t implement the translation in the output image, so the piece in the result of rotation is on the top of the whole image that I think will make life easier for the latter operation. After rotation, the piece is horizontal and I only need to find the points in the piece instead of the whole image. But if I implement the scaling before rotation, I should scaling the whole image first, and then do the rotation for the reason that I can’t find the point of the piece of the tilt output image. (2). In Puzzle Matching, why use 122*122 instead of 120*120? Ans: The scaling result is a little bit larger than the hole in tiger.raw that is 120*120. And the value of overlapping surrounding part is the average of the puzzle piece and the tiger.raw. And the purpose of this implementation is to get smooth boundary instead of obvious square boundary. Overlap pixel value = value of board pixel + value of piece pixel /2 (3). In face warping, what types of feature you have selected and total number of control points? Ans: In the man—tiger problem, I choose the nose as the control point for the reason that the position of the nose in tiger and young man is obviously different. The coordinate of the control point is (317, 195) and (385, 195), and the final align point is (351, 195). In the old--young problem, I also choose the nose as the control point to test the robust of my program. Because that the two noses of the young man and the old man is just a little different, so the result is not as obvious as that of the man—tiger problem. The coordinate of the control point is (314, 195) and (328, 169), and the align point is (321, 182). I also tried to choose two control points for this problem, please see discussion (6). In the extra part, I choose chin as the control point to get an obvious change inside the sixteen images gradually. The coordinate of the control point is (173, 101) and (239, 131), and the final align point is (206, 116). Every time of the sixteen iterations, the height changes 2 levels, and the width change only 1 level. (4). Discuss your morphing result in part b. is the transformation smooth and reasonable? Why? Ans: It is smooth and reasonable. But because the two images have been warped, the change is only caused by the color intensity. So it can get a smoother and more reasonable image by doing wrapping and morphing together. Then the gradually change will looks smoother such as the extra part I did. And to get a more reasonable image, more control points should be chosen. In the current result, only the noses of the two images are aligned while the eyes are not. As an identical feature of the whole image, this makes the gradually change not reasonable. (5). How could the morphing result be further improved? Ans: The morphing result can be improved in two ways: (1) morphing while warpping; (2) choose more than one control points together to make the gradually changes between the two images smoother. (6). Tried to choose 2 control points when warping in part d. Ans: I choose chin as the second control point after the first warping operation using nose as the control points. From Fig 1.11, the chins of the two images align each other while the nose no longer does. So that more than one time of warping destroys the previous result and makes the image more blur. When using more than one control points to do warping, I have to choose second-order warping instead of first-order warping. Therefore, more than one control points can be used to calculate the warp matrix that makes the warping result has more than one point to align each other. Fig 1.11 (a) nose as control point when a=0.5 (b) nose and chin as the control points when a=0.5 (7). Implementation of extra part. Why I choose chin as the control point instead of nose. Ans: The two image of Fiona I choose don’t have much difference between the two noses. And considering of the sixteen warping iterations, I choose the chin as the control point which can make the gradually change more obvious and smoother. (8)What can you further improve? Ans: I only use first-order and one control point. So for the further improvement, I can use more than one control points at the same time to represent the changes between the two images. To implement more than one control points, first-order warping is no longer enough. As a result use higher order warping, such as second-order warping. Problem 2: Texture Analysis and Segmentation using Laws Filters 2.1 Motivation Texture is distinguished feature of an image that reflect the visual appearance and artificial features of the image. According to the texture feature we can implement texture classification and image segmentation. In this part, I tried to classify different texture images and segment different texture region within one image. 2.2 Approach 2.2. (a) Texture Image Classification In this part, I applied nine 5*5 Laws Filter to classify the ten texture samples. The flow chart to implement classification is shown as below: And the implement steps are as following: (1). Histogram equalization. Apply digital histogram equalization that I write in homework 1 to equalize the ten samples texture image brightness and increase the image contrast. (2). Feature Extraction. Apply the three one-dimension Laws Filter to create nine 5*5 two-dimension filters. The three one-dimension filters are 1 1 1 ๐ฟ5 = 1 4 6 4 1 ๐ธ5 = −1 − 2 0 2 1 ๐5 = [−1 0 2 0 − 1] 16 6 2 The nine two-dimension filters are Apply the above nine filter to every pixel in the ten sample textures and store the nine-dimension vector L1 ~ L9 for each texture. (3). Feature Average. I used 13*13 windows to calculate the energy for each pixel in every sample textures. The formula to calculate the energy is Where window size=13 Then average the every vector for each sample. where N=512. The result of the nine average vectors for the ten samples are (3). Classification. I used nearest-neighbors and second-nearest-neighbors to classify the 10 samples by calculating the distance between each pair of the samples corresponding vector Ti. To use the nearest-neighbors, if the two samples nearest neighbors are each other, then they are in the same class. Then considering the second nearest neighbors to classify the sample with only one arrow in the nearest neighbor. 2.2. (b) Texture Segmentation In this part, I applied twenty-five 5*5 Laws Filter to extract the texture value of every pixel. The flow chart to implement segmantation is shown as below: And the implement steps are as following: (1). Feature Extraction. Apply the five one-dimension Laws Filter to create nine 5*5 two-dimension filters. The five one-dimension filters are Apply the 25 filters in the following table to every pixel in the image and store the 25-dimension vector L1 ~ L25 for the image. L5*L5 (feature 0) L5*E5 (feature 1) L5*S5 (feature 2) L5*W5 (feature 3) L5*R5 (feature 4) E5*L5 (feature 5) E5* E5 E5*S5 E5*W5 E5*R5 S5*L5 S5*E5 S5*S5 S5*W5 S5*R5 W5*L5 W5*E5 W5*S5 W5*W5 W5*R5 R5*L5 R5*E5 R5*S5 R5*W5 R5*R5(feature 24) (2). Energy Computation. I used 39*39 windows to calculate the energy for each pixel in the image. The formula to calculate the energy is Where window size=39, -19<=m,n<=19. (3). Normalization: use the L5* L5 as divisor and normalize other 24 features to get the 24-dimension vector for each pixel. (4). K-means algorithm to perform segmentation. The steps of K-means algorithm is as follows: (4-1). By observation, initialize K=5 for comb1.raw, and the coordinate of the 5 centers are {(20, 20), (20, width-20), (height/2,width/2), (height-10, 20), (height-20, width-20)}. And for comb2.raw, let K=6 and initial the 6 centers to be {(20, 20), (20, width/2), (20, width-20), (height-20, 20), (height-20, width/2), (height-20, width-20)}. (4-2). Calculate the distance separately from the pixel E( i, j) to the centroids C1 to C5. (4-3). Find the nearest centroid Cn to the pixel E(i,j) and set the label of E(i,j) to be n. (4-4). After labeled all the pixels in the image, obtain the new cluster centers by adding all the labeled n pixel’s vector separately and set the average to the new corresponding center. (4-5). Repeat steps (4-2) to (4-4) till change in cluster centroids is less than 1%. (4-6). According to the label n of every pixel, set gray value to the pixel. The value of the pixel is v=n*(256/K) where n is the label of the pixel, and K is the number of centers. 2.3 Results 2.3. (a) Texture Image Classification After histogram equalization, the ten samples are shown in Fig 2.1. Fig 2.1 Histogram equalization result of sample 1~ sample 10 The result of the distance table and nearest neighbor is According to the nearest neighbors, the classification is So the classification is According to the second nearest neighbors, the classification is So the classification is 2.3. (b) Texture Segmentation The best result of texture segmentation is window size=39 as shown in Fig 2.2. Fig 2.2 output of segmentation of (a) comb1 (b) comb2 2.4 Discussion (1). What the result if not use histogram equalization in texture classification? Ans: The result without histogram is shown below: So the classification is According to the nearest neighbor algorithm, I only can classify two classes that are 1and 2, 4 and 6. But this result is totally wrong. By observation, 1and 2 are the two darkest ones within the ten samples, 4 and 6 are the two brightest ones within the ten samples. So the classification of the texture is done according to the brightness. While the brightness should not contribute to the texture classification, I did digital histogram equalization to remove the effect of the classification. (2). Why do histogram equalization before feature extraction in texture classification? Ans: According to the analysis in discussion (1), the differences of brightness of the ten samples affect the classification result badly. So I preprocessed the ten samples by digital histogram equalization that make the output image brightness distributes from 0 to 255 equally and remove the effect of the brightness. (3). After histogram equalization, does the operation to minus mean affect the result of texture classification? Ans: there is not much difference of the two scenarios as shown in Fig 2.3 and Fig 2.4. By digital histogram equalization, the brightness of the image is absolutely equally distributed, so the difference between the two scenarios is not significant. Fig 2.3 Result of distance table and nearest neighbor after minus mean brightness Fig 2.3 Result of distance table and nearest neighbor before minus mean brightness (4). Compare the result with reality and discuss the discrepancies and conflicts. Ans: The reality classification of the ten samples should be: (1,3,7), (2,5,9), (4,8) and (6,10). My result is (1,3,7), (2,4,8) and (5,9) according to nearest-neighbor and second-nearest-neighbor, and the rest could not be classified to any classes. Comparing the reality and my result, class (1,3,7) is correct which is the texture of wheat. Sample 2 should be the same texture of sample 5 and 9, but my program treats it as the same as sample 4 and 8. This is because sample 2 is the texture of horizontal bricks that is different of the other two vertical bricks texture. And the feature filter E5*L5 and L5*E5 are sensitive to horizontal edges and vertical edges separately which makes the wrong decision to discriminate sample 2 and the other brick texture images. On the other hand, sample 4 and 8 are the tilt line texture that is similar of edge orientation. As a result, my program classifies sample 2 to be the same as 4 and 8. Sample 6 and 10 are not classified correctly for the reason that sample 10 has too much noise. Sample 10 can find 6 to be its nearest neighbor, but sample 6 could not find 10 because of the noise. Then sample 6 finds sample 3 as its nearest neighbor instead of 10. By observation, sample 6 and 3 is really similar discard of sample 10. (5). Why the result of texture classification is not perfect? How to improve the result? Ans: According to the analysis in discussion (4), the result of classification is not perfect because of the edge orientation and noise. As a result, we can discard the feature of E5*L5 and L5*E5 to remove the contribution of edge orientation. Besides, we can preprocess the sample image by denoising to deduce the effect of noise. (6). What the effect of window size when calculate the energy in texture segmentation? Ans: The larger the windows size, the better the segmentation result. But when the window size is large enough, the result will not change a lot because of the window size. And in my program, I finally choose 39 as the window size. I think the reason that 39 gives a good result is that 39 window is large enough to include features of different textures, such as bricks, tomatoes and grass of the texture. From Fig 2.4(a), the brick and tomato do not segment because the window size is too small for these two textures and the algorithm could not tell the differences between the two textures. While when the window size is large enough to include one single representative feature such as one brick or one tomato, the algorithm can segment them into two different textures. On the other hand, the boundary of two different textures is wider as the window size larger. This is because the sharp boundary could not be detected by a larger window size. Fig 2.4 Result of segmentation when window size is (a) 13 (b) 17 (c) 31 (d) 39 Fig 2.5 Result of segmentation when window size is (a) 13 (b) 17 (c) 31 (d) 39 (7). How to improve the result of texture segmentation? Ans: According to the analysis of discussion (6), the segmentation result is better when the window size is relatively large, while the boundary become wider as the window size growing. This conflict situation can be improved by dynamic window size. We can use edge detection to detect the edges in the whole image. Then choose different window size according to the result of edge detection. If the pixel is not edge, apply larger window size. On the contrary, apply smaller window size to do energy calculating. And we can take the advantage that the same texture pixels seat in the same connected region. What’s more, we can put different weight on the 24 features according to their importance when calculating their distance. (8). Reduce the feature of E5*L5 and L5* E5. Discuss your choices and compare your results. Ans: The direction of texture is not important and sometimes contributes to the wrong decision such as vertical brick and horizontal brick. As a result, I remove the filters, which are sensitive to horizontal edges and vertical edges. Fig 2.6 is the result of texture segmentation after removing E5*L5 and L5* E5 filters. And this operation does not affect the result significantly. Fig 2.6 Result of (a) 24 features (b) remove E5*L5 and L5* E5 (c) 24 features (d) remove E5*L5 and L5* E5 (9). Reduce the feature of ripple. Discuss your choices and compare your results. Ans: In the 24 feature vectors, not every feature vector is equally important in texture detection. So I remove the all the feature vector including R5 and the result is not significant bad then the result derived from 24 feature vectors shown in Fig 2.7. Fig 2.6 Result of (a) 24 features (b) remove R5 (c) 24 features (d) remove R5 (10). Why use energy to segment different texture? Ans: By summary of the squares of neighbors around the pixel, the inter-domain distance become larger while intra-domain distance become smaller, which makes the segmentation more compact. As a result, we implement energy to identify different texture of every pixel and segment different textures. (11). What’s the effect of K-means initialization? The choice of K-means initialization may contributes to different result for the same image. As a result, the number and location of the K centers initialization is crucial for K-means algorithm. So in my program, I manually choose the K centers by eyes and my choice is as following: For comb1, the initial 5 k centers are-- {(20, 20), (20, width-20), (height/2,width/2), (height-10, 20), (height-20, width-20)}. And for comb2.raw, let K=6 and initial the 6 centers to be {(20, 20), (20, width/2), (20, width-20), (height-20, 20), (height-20, width/2), (height-20, width-20)}. Problem 3: Optical Character Recognition 3.1 Motivation OCR represented of Optical Character Recognition is widely used today to translate a printed and handwritten text into electronic document. To implement OCR, I design a decision tree according to the training character and test the decision tree on the testing data and license plate recognition. 3.2 Approach 3.2. (a) OCR Training The decision tress of Optical Character Recognition is as following: The steps to implement the decision tree are as following: (1). Calculate Euler number. Scan the whole image with 2*2 window (Bit Quad). If the pattern hits any of the following patterns in Fig 3.1, the corresponding counter adds one. The formula to calculate Euler number is And the decision is: 8 ๐๐ ๐ธ = −1 9, ๐ , ๐ ๐๐ ๐ธ = 0 5, ๐, 2, ๐, ๐, ๐ ๐๐ ๐ธ = 1 Fig 3.1 Patterns of Bit Quad (2). Calculate the boundary box. Scan the whole image, and store the “1” pixel on top, bottom, left and right. And the four pixels construct a boundary box around the character. (3). For P, R and 9. According to the boundary box, scan two vertical lines, and record the how many times the vertical line cross the “1” pixels and the “1” pixel’s top neighbor is not “1” as show in Fig 3.2. Fig 3.2 9 ๐๐ ๐๐๐ข๐๐ก = 3 ๐ ๐๐ ๐ ๐๐ ๐๐๐ข๐๐ก! = 3 (4). For P and R. According to the boundary box, scan one horizontal line, and record the how many times the horizontal line cross the “1” pixels and the “1” pixel’s left neighbor is not “1” as show in Fig 3.3. Fig 3.3 ๐ ๐๐ ๐๐๐ข๐๐ก = 1 ๐ ๐๐ ๐๐๐ข๐๐ก = 2 (5). For M, N, 5, S, 2, Z. According to the boundary box, scan one vertical lines, and record the how many times the vertical line cross the “1” pixels and the “1” pixel’s top neighbor is not “1”. ๐, ๐ ๐๐ ๐๐๐ข๐๐ก = 1 5, ๐, 2, ๐ ๐๐ ๐๐๐ข๐๐ก! = 3 (6). For M and N. Use symmetry to distinguish M and N. M is more than 80% pixels symmetry about the middle vertical line of the boundary box, while N is less than 80% pixels symmetry about the line. ๐ ๐๐ ๐ ๐ฆ๐๐๐๐ก๐๐ฆ > 0.8 ๐ ๐๐ ๐ ๐ฆ๐๐๐๐ก๐๐ฆ < 0.8 (7). For 5, S, 2, Z. Scan last a few lines of the boundary of the boundary box. And record how many “1” pixels in that line. If the number in any lines above the width of the boundary box, than it is 2 or Z; otherwise they may be 5 or S as shown in Fig 3.4. Fig 3.4 ๐ ๐๐ 5 ๐๐ ๐ค๐๐๐กโ! < 0.8 ๐ค๐๐๐กโ 2 ๐๐ ๐ ๐๐ ๐ค๐๐๐กโ! > 0.8 ๐ค๐๐๐กโ (8). For 5, S. According to the boundary box, scan one horizontal line, and record the how many times the horizontal line cross the “1” pixels and the “1” pixel’s left neighbor is not “1”. 5 ๐๐ ๐๐๐ข๐๐ก = 1 ๐ ๐๐ ๐๐๐ข๐๐ก = 2 (9). For 2, Z. Scan first a few lines of the boundary of the boundary box. And record how many “1” pixels in that line. If the number in any lines above the width of the boundary box, than it is 2 or Z; otherwise they may be 5 or S. 2 ๐๐ ๐ค๐๐๐กโ! < 0.8 ๐ค๐๐๐กโ ๐ ๐๐ ๐ค๐๐๐กโ! > 0.8 ๐ค๐๐๐กโ 3.2. (b) OCR Testing: ideal case The given testing image is not binary image, so I choose threshold of 250 to get the binary image. Then use the same OCR program to recognize the input image character. 3.2. (c) OCR Application: license plate recognition The given testing image is not binary image, so I choose threshold of 20 and 100 to get the binary image. When the pixel value less than 20 or more than 100, change the pixel to “0”; the pixels between the two threshold set it to “1”. The binary image I got has little holes in the binary image as shown Fig 3.5. This must be removed because the holes will affect the result of Euler number. Fig 3.5 To remove the holes in the character, I scan the image. If the pixel is “0” and its surrounding are more than 5 “1”, I recognize it as a member of the hole in the image. Again, I scan the image. If the pixel is “1” and its surrounding are more than 5 “0”, I recognize it as a member of the hole or glitch in the image. After preprocessing, I apply the same OCR program to recognize the input image character. 3.2. (d) OCR Application: real-life case The license plate is not in good condition, so I implement the following steps to preprocess the plate before apply the OCR program. (1). Translate the color image into gray level image. (2). Histogram Equalization. (3). Translate the gray level image into binary image. I choose threshold of 70 and 160 to get the binary image. When the pixel value less than 70 or more than 160, change the pixel to “0”; the pixels between the two threshold set it to “1”. (4). Rotation. Rotate the whole license plate. (5). After the rotation, the image is not binary any more. As a result, translate the image into binary image with the threshold of 150. (6). Segmentation. Assume the seven characters in the license plate are equally distributed in the plate and segment the whole plate into 7 same size images with each of (153,282). (7). The seven binary character image I got has little holes in the binary that must be removed otherwise will affect the result of Euler number. To remove the holes in the character, I scan the image. If the pixel is “0” and its surrounding are more than 5 “1”, I recognize it as a member of the hole in the image. Again, I scan the image. If the pixel is “1” and its surrounding are more than 5 “0”, I recognize it as a member of the hole or glitch in the image. (8). Implement OCR program. 3.3 Results 3.3. (a) OCR Training The result of OCR training is shown in Fig 3.6: Fig 3.6 Result of training 3.3. (b) OCR Testing: ideal case The result of OCR testing in ideal case is in Fig 3.7: Fig 3.7 Result of testing 3.3. (c) OCR Application: license plate recognition Some samples of the result after preprocessing in Fig 3.8. Fig 3.8 (top) binary result (bottom) preprocessed result The result of OCR testing in ideal case is shown in Fig 3.9: Fig 3.9 Result of license plate recognition 3.3. (d) OCR Application: real-life case The result of gray-level image and the image after histogram equalization is in Fig 3.10 and Fig 3.11. Fig 3.10 Gray level image Fig 3.11 Histogram equalization result Binary result is shown in Fig 3.12. Fig 3.12 Binary result Rotation result is shown in Fig 3.13. Fig 3.13 Rotation result Binary result is shown in Fig 3.14. Fig 3.14 Binary result Segmentation result is shown in Fig 3.15. Fig 3.15 Segmentation result The result of removing holes in the character image is shown in Fig 3.16. Fig 3.16 Result after removing holes The result of preprocessing is shown in Fig 3.17. Fig 3.17 Result after preprocessing The result of OCR testing in ideal case is as following: Fig 3.18 Result of OCR program 3.4 Discussion (1). What about the result applying Area and Perimeter? Ans: I have implemented the Area and Perimeter, which use the following formula: But the result is not reasonable to make decision. Fig 3.19 Result of Area and Perimeter From Fig 3.19, the normalized area and perimeter is shown in Table 1. Area Perimeter 5 0.49 7.36 M 0.51 8.52 N 0.49 7.64 S 2 0.44 0.43 7.83 7.46 Table 1 P 0.38 6.13 Z 0.36 6.72 R 0.45 7.30 9 0.47 8.03 The difference of Area and Perimeter is significant enough to make different between the characters even for the same font situation. So I don’t choose these two parameters to implement my OCR program. (2). Does the thinning contribute to OCR? Ans: I tried to do the thinning operation before make the decision tree with a disappointed result unfortunately. For the training image, thinning does well and makes the decision easier. While for the testing image, the thinning result is wield and could not be recognized because of glitch. In Fig 3.20, I display some thinning result of testing image. Fig 3.20 Result of thinning for testing image (3). What parameters are robust to the font and non-idea scenario? Ans: Euler number is relatively robust comparing to others. The Euler number does not change because of glitch, rotation, scaling and the font of the character. But if the binary image has holes or nosie, the Euler number is not robust enough to give the correct number. For the normalized Area and Perimeter, they do not change because of rotation and scaling. But they are sensitive to the font. And considering the symmetry parameter, it does not be affected by scaling while changes because of rotation and font. Finally consider aspect ration and circularity that are both insensitive to scaling because that they are all the normalized value by the boundary box. (4). What are the assumptions of real life license plate recognition? Ans: (a). Symbols are relatively dark and appear on a bright surface and I can choose the appropriate threshold to get a better result. (b). The characters on the plate distribute equally and I segment the whole plate by the same size. (c). The holes inside the binary character is small and I can get rid of the holes to get the correct Euler number. (5). How I improve the OCR program? Ans: When I first finished the OCR program according to the training data, I did not get good result for the testing data. Then I changed some parameter in my decision tree, such as threshold until the OCR works fine for both the test data and the training data. And I repeated the steps when I’m doing part c and part d. (6). The robust of my OCR program. How to improve the robust? Ans: My OCR works totally fine for the training data, testing data and idea license plate recognition. But for the extra part, it could not distinguish between 2 and Z, which, I guess is caused by the bad original input image and the preprocessing. For the input image, 2 and Z is very similar and it’s hard even for the eyes to distinguish. And after my preprocessing, I get nearly the same binary image for 2 and Z which are shown in Fig 3.21 that lost identical feature between 2 and Z. Fig 3.21 Z and 2 after preprocessing To improve the OCR program, I think a large amount of training data is needed and more robust parameters should be implemented too. Reference: 1. 2. 3. 4. 5. 6. 7. EE569_Fall2011_HW3_v6.pdf. EE569_Fall_2011_Discussion_7.pdf. EE569_Fall_2011_Discussion_8.pdf. EE569_Fall_2011_Discussion_9.pdf. EE569_Fall_2011_Discussion_10.pdf. Digital Image Processing 4th Ed.pdf, Pratt Class Note Reference Images: Given by the assignment EE569_Fall_2011_HW3 and USC DEN EE569_ASSIGNMENT.