Levit Gregory
323723270
1.
Introduction
2.
The Statistical Approach
3.
Principles of my Method
4.
Implementation details
5.
Examples and Results
6.
Conclusions
7.
Bibliography
The subject of “Handwritten digit recognition” is of great concern and has many applications in various fields, from zip code recognition(in the post office) to robotics.
My work is based on statistical approach to that subject, without using a large training database and without resorting to neural-nets...
The only real reasons I am avoiding the obvious choice of Neural Nets are:
1) It has been done before for character recognition.
2) It does not seem to be in the same vein as our class topics and lectures, and methods.
3) Neural nets, in my experience, don't seem to produce scalable, re-usable information about the underlying vision problem/solution... as far as I can tell they just prove that some systematic weighing of layers of perceptions yield the correct results most of the time.
Only few articles about handwriting recognitions problems, that I have read, refer to statistical methods for solving it. I have taken my idea from them, but I haven’t found any exact reference of my approach to the subject. I will conduct a review of various statistical methods in the next chapter.
Since every electronic image of a digit consists of pixel values that are represented by a spatial configuration of "0"s and "1"s, a statistical approach to image character recognition would suggest that one look for a typical spatial distribution of the pixel values that characterize each digit. In general, one is searching for the statistical characteristics of various digits.
These characteristics could be very simple, like the ratio of black pixels to white pixels, or more complex, like higher order statistical parameters such as the third moments of the image.
For Example:
Typically, an image of the digit "1" will have relatively fewer black pixels than an image of the digit "8". In the following illustration, there almost twice as many black pixels in the "8" image than in the "1", though both are drawn to the same scale.
Continuing the same approach, cursory analysis shows that the ratio of height to width for the digit "0" is less than the same ratio for the digit "6"
More advanced algorithms usually are based on the one dimensional histograms that can be extracted from digitized images. Such approach is carried out by producing a histogram that reflects graphically the number of black pixels in each line and in each row. By projecting the black pixel count horizontally and vertically, it is possible to differentiate between many typical cases of digits. Such a projection is demonstrated in the following figure.
In short, by careful analysis of the histograms of various digits, it is possible to differentiate between them.
Thus, the general flow of statistic based character recognition algorithms is as follows:
Compute the relevant statistics for a digitized image
Compare the statistics to those from a predefined database.
Assumptions:
I suppose that the image of digit is already extracted from background and contain only real contours of digit.
A digitized image is, after all, just a collection of numbers. For binary images, every point or pixel is assigned a value of either 0 or 1; for gray level images pixel values range from 0 to 255, and for color images pixel values usually consist of three numbers, each in the range of 0 to 255. While the ensuing discussion is valid for any type of image, for the sake of simplicity, only binary images will be addressed.
Idea:
The idea is to project input digitized image into "i-space". "i-space" is what I refer to as an "intersection-space". "i-space" is a compact representation of what I consider to be the defining qualities of a digit.
These qualities which I refer to are:
1) The number of intersections along various horizontal and vertical rays.
2) The Euler number of the image (Scalar whose value is the total number of objects in the image minus the total number of holes in those objects)
These qualities build particular profile for each of the digit images. Same digits, which differ a little in image, will have very similar profile. The opposite is not always true, i.e. if 2 profiles are very similar then digits on both images are probably equal, but not always. Thus relaying on that fact won’t give 100% accuracy, but still quite high success percentage, making the algorithm effective.
I-space :
For example, take the following binary image of the number 8.
First I draw 5 horizontal rays through the image (spacing is determined on the fly, depending on size of cropped image).
The number of intersections along each ray is recorded...
Similarly 5 vertical rays are shot and recorded...
Finally the Euler number is recorded in the last array entry
This array is then thought of as an 11-dimentional vector in "i-space".
Similarity between profiles, which was mentioned before, can be defined as the distance between vectors.
Training set:
The single precomputation that the algorithm has to perform is to prepare a base set of all 10 digit profiles. We can get that set by projection of each digit’s example-image to the i-space, one by one. Afterwards we will have 10 vectors in "i-space", each of them represents specific digit.
The answer for a specific input test image will be determined by checking which of those 10 vectors is found on the most minimal Euclidean distance to the vector of the test image.
I have implemented this algorithm in Matlab, because it has a lot of tools for image processing.
Program consists of 3 logical parts. Initializer, Profile Computator, Digit
Recognizer .
Initializer:
This part projects the training images into i-space so we can have database in ispace of digits.
First it reads from the disk 10 example images of digits, that must be prepared before, called ‘0.jpg’, ‘1.jpg’….’9.jpg’. Then it computes profile for each one of images, using Profile Computator part.
Profile Computator:
This part handles an input image. The purpose of this part is to count the number of intersections between digit contours and vertical and horizontal rays.
First, it do some preprocessing by cropping the blank space around the digit object. Input image aren’t obligatory binary, thus it converts them to binary with treshholding 0.7
There are 5 horizontal and 5 vertical rays, which equally distributed along the image.
It creates empty profile array of 11 cells, and it is filled according to the count of each ray. The 11 th cell is assigned by Euler number of the input image (as described earlier).
The count process of each ray is: scanning the pixels on its trace and whenever it meets transition from white pixel to black pixel, it increases appropriate cell in the profile.
Digit Recognizer:
The purpose is to recognize the input digit, by comparison of input digit’s profile with training set profiles (which obtained by the Initializer ).
The comparison between two profile vectors is done by Euclidian distance
(square root of differences squares). Then it finds the training set profile with minimum distance from input profile.
For training set I created an example of each digit 0-9 using ms-paint.
For testing I created 3 groups of 0-9 digit images also in ms-paint. I tried to make each group more different, and assigned different image characteristics to each of them; like image size, marker size and handwriting style.
Below is an example of a 3 testing image inputs, each one from another group.
Values in table are Euclidean distance between 2 vectors, these vectors represent training image and test image, in i-space.
From group A From group B From group C Test Samples
Training set
2.2361 3.4641 3.0000
3.4641 3.3166 2.0000
2.0000 3.3166 4.0000
2.2361 2.8284 3.8730
3.3166 3.4641 1.0000
3.1623 3.8730 4.8990
2.6458 4.0000 5.0000
2.8284 2.2361 2.8284
4.1231 5.8310 6.8557
2.0000 3.8730 4.6904
My algorithm’s answer
2* 7 4
* - My algorithm choosed 2 and not 9(which has same distance), due to digit order. I couldn’t find good solution for that problem.
All results for 30 tested images are in the following table.
Results real value group a group b group c
4
5
6
0
1
2
3
7
8
9 correct/total in each group
4
8
6
1
5
6
0
4
2
3
6/10
7
6
9
4
5
6
0
1
2
7
8/10
1
6
9
4
5
6
0
4
2
3
7/10 agenda
- wrong
- correct
Out of the 30 digit inputs 21 were correctly classified.
21/30 = 70 % correctness.
70% are quite surprising for me, because I didn’t expect more than 65% from a statistical approach method to “Handwritten digit recognition” problem.
Of course advanced Hybrid Methods and Voting Algorithms are showing 90-
95% in nowadays, but they contain bulky database and take long time to compute.
Main problem of my method is similar profiles of few pairs of digits in training set, they can affect algorithm which results in wrong answer, due to little shift in the contour of the test image. As can be seen from results table; ‘4’ and ‘1’,
‘7’ and ‘1’, ‘6’ and ‘8’.
In recognition of digit ‘1’, in all of the 3 groups, algorithm gave wrong answer
(‘4’ instead of ‘1’) in 2 out of the 3 groups. In recognition of digit ‘8’ algorithm also showed 33% correctness (‘6’ instead of ‘8’).
Limitations of this algorithm:
NOT Orientation/Rotation invariant.
Writing styles matter (weird 7's or incomplete zeros).
Salt and pepper noise can throw off results.
Algorithm results have great dependence on the training set, thus weird digits in training set may cause many wrong answers in test images. I don’t think my training set is very representative, and that is additional source of mistakes in my tests. For future work, I suggest it would be more effective to create training image profiles using average of many training images projections instead of just 1 per digit.
D.A. Forsyth and J. Ponch Computer Vision: A Modern Approach, , Prentice Hall, 2003.
Dori, Dov, and Alfred Bruckstein, ed. Shape, Structure and Pattern Recognition . New
Jersey: World Scientific Publishing Co., 1995.
Impedovo, Sebastiano. "Frontiers in Handwriting Recognition." Fundamentals in
Handwriting Recognition . Ed. Sebastiano Impedovo. New-York: Springer-Verlag, 1994.
Young, Tzay Y., and King-Sun Fu, ed. Handbook of Pattern Recognition and Image
Processing . New York: Academic Press, Inc., 1996.
N. Arica and F. Yarman-Vural. An overview of character recognition focused on off-line handwriting. Systems, Man,and Cybernetics , 2001.
Yehezkel Yeshurun “Principles of Intelligent Character Recognition” 1999