THE HONG KONG POLYTECHNIC UNIVERSITY Final Year Project Progress Report Optical Character Recognition and Chinese 2D Code Applications Student Name: YANG Fan Student ID: 06846354d Supervisor: Prof. Henry Chan This document serves as the project progress report for the final year project supervised by Prof. Henry Chan. The progress of this project is defined in this document. This pages is left blank intentionally Table of Contents Problem Statement ......................................................................................................................... 4 Objectives and Outcome ................................................................................................................. 4 Objectives .................................................................................................................................... 4 Outcome ...................................................................................................................................... 4 Previous and On-going Work........................................................................................................... 5 Image Preprocessing ................................................................................................................... 5 Edge Detection ........................................................................................................................ 5 Line Detection.......................................................................................................................... 7 Perspective Transformation .................................................................................................... 8 OCR Engine using Fourier Descriptor .......................................................................................... 9 OCR Engine using Artificial Neural Networks ............................................................................ 10 Front-end GUI program for character recognition.................................................................... 11 Overall Progress Summary ............................................................................................................ 12 System Architecture .................................................................................................................. 12 Development Environment ....................................................................................................... 12 OCR Engine ................................................................................................................................ 12 Preliminary Experiment Result .................................................................................................. 12 Problems to Tackle in the Future .............................................................................................. 13 Result Unstable...................................................................................................................... 13 Interference from icon .......................................................................................................... 13 Camera capability .................................................................................................................. 13 Reference ...................................................................................................................................... 14 Problem Statement There are a lot of applications focusing on the English 2D code. English 2D code makes it much easier to send it through SMS messages. Unlike QR code, a 2D code would not require the internet connection. A Chinese 2D code application will serve the same purpose but the system uses Chinese characters as the primary encoding characters. In order to recognize the code distributed through SMS, there must be an application that utilize optical character recognition technology to read this SMS by take a picture of the mobile phone and translate the image into code. To be more specific, following problems are tackled in this project: Which algorithm is most effective and efficient for recognizing a small subset of Chinese characters in a 2D code? Is there any other potential usage for this OCR system? Objectives and Outcome Objectives The objectives of this project primarily contain three goals: Design and Implement OCR System Design and implement mobile OCR system which can successfully and efficiently recognize characters from image. An effective and efficient OCR system will become the foundation of other OCR related applications. This OCR system will most likely be an artificial neural networks based system. This system will need only recognize a small subset of Chinese characters. Implement Application that utilizes the OCR system Design and implement a Chinese 2D code system which utilizes the OCR system to recognize the code on mobile phones distributed through SMS. Outcome The output of this project will potentially benefit the entire mobile user groups, the possible outcome includes: An OCR service that runs on PC which can convert a subset of Chinese characters contained in an image into text. An application which can recognize and decode Chinese 2D code (To Be Decided) A program to encode Chinese 2D code. Previous and On-going Work Image Preprocessing Image frame retrieved from digital camera usually is not suitable for direct OCR. Since many factors may reduce the quality of the result such as the background of text, phone frame, light reflection. Image preprocessing is essential to OCR process. Following techniques are utilized in the system. Edge Detection The first step of preprocessing is edge detection. Edge detection will generate a single channel gray scale image with only the edges of objects in the image. There are several algorithms for this task. In the system, I used Canny algorithm. This algorithm contains several stages. Noise reduction The Canny edge detector uses a filter based on the first derivative of a Gaussian, because it is susceptible to noise present on raw unprocessed image data, so to begin with, the raw image is convolved with a Gaussian filter. The result is a slightly blurred version of the original which is not affected by a single noisy pixel to any significant degree. Here is an example of a 5x5 Gaussian filter, used to create the image to the right, with σ = 1.4: An edge in an image may point in a variety of directions, so the Canny algorithm uses four filters to detect horizontal, vertical and diagonal edges in the blurred image. The edge detection operator returns a value for the first derivative in the horizontal direction (Gy) and the vertical direction (Gx). From this the edge gradient and direction can be determined: The edge direction angle is rounded to one of four angles representing vertical, horizontal and the two diagonals (0, 45, 90 and 135 degrees for example). Non-maximum suppression Given estimates of the image gradients, a search is then carried out to determine if the gradient magnitude assumes a local maximum in the gradient direction. So, for example, if the rounded angle is zero degrees the point will be considered to be on the edge if its intensity is greater than the intensities in the west and east directions, if the rounded angle is 90 degrees the point will be considered to be on the edge if its intensity is greater than the intensities in the north and south directions, if the rounded angle is 135 degrees the point will be considered to be on the edge if its intensity is greater than the intensities in the north west and south east directions, if the rounded angle is 45 degrees the point will be considered to be on the edge if its intensity is greater than the intensities in the north east and south west directions. From this stage referred to as non-maximum suppression, a set of edge points, in the form of a binary image, is obtained. These are sometimes referred to as "thin edges". Tracing edges through the image and hysteresis thresholding Intensity gradients which are large are more likely to correspond to edges than if they are small. It is in most cases impossible to specify a threshold at which a given intensity gradient switches from corresponding to an edge into not doing so. Therefore Canny uses thresholding with hysteresis. Thresholding with hysteresis requires two thresholds – high and low. Making the assumption that important edges should be along continuous curves in the image allows us to follow a faint section of a given line and to discard a few noisy pixels that do not constitute a line but have produced large gradients. Therefore we begin by applying a high threshold. This marks out the edges we can be fairly sure are genuine. Starting from these, using the directional information derived earlier, edges can be traced through the image. While tracing an edge, we apply the lower threshold, allowing us to trace faint sections of edges as long as we find a starting point. Once this process is complete we have a binary image where each pixel is marked as either an edge pixel or a non-edge pixel. From complementary output from the edge tracing step, the binary edge map obtained in this way can also be treated as a set of edge curves, which after further processing can be represented as polygons in the image domain. Figure Image before/after Canny Edge Detection Line Detection the straight line can be described as y = mx + b and can be graphically plotted for each pair of image points (x, y). In the Hough transform, a main idea is to consider the characteristics of the straight line not as image points (x1, y1), (x2, y2), etc., but instead, in terms of its parameters, i.e., the slope parameter mand the intercept parameter b. Based on that fact, the straight line y = mx + b can be represented as a point (b, m) in the parameter space. However, one faces the problem that vertical lines give rise to unbounded values of the parameters m and b. For computational reasons, it is therefore better to use a different pair of parameters, denoted r and θ (theta), for the lines in the Hough transform. The parameter r represents the distance between the line and the origin, while θ is the angle of the vector from the origin to this closest point. Using this parametrization, the equation of the line can be written as which can be rearranged to r = xcosθ + ysinθ It is therefore possible to associate to each line of the image a couple (r,θ) which is unique if and , or if and . The (r,θ) plane is sometimes referred to as Hough space for the set of straight lines in two dimensions. This representation makes the Hough transform conceptually very close to the two-dimensional Radon transform. (They can be seen as different ways of looking at the same transform.) For an arbitrary point on the image plane with coordinates, e.g., (x0, y0), the lines that go through it are , where r (the distance between the line and the origin) is determined by θ. This corresponds to a sinusoidal curve in the (r,θ) plane, which is unique to that point. If the curves corresponding to two points are superimposed, the location (in the Hough space) where they cross corresponds to a line (in the original image space) that passes through both points. More generally, a set of points that form a straight line will produce sinusoids which cross at the parameters for that line. Thus, the problem of detecting collinear points can be converted to the problem of finding concurrent curves. Using Hough transform we can find all lines in an image, the process can help the system to recognize the phone screen area where the OCR should happen. Figure Line detection Perspective Transformation After extracting polygon from image, the result usually is not a rectangle therefore cannot put into a new image. I applied perspective transformation in order to get a rectangular result image. By following equation, Where dst(i) = (xi’, yi’), src(i) = (xi, yi), I = 0, 1, 2 We can calculate the map_matrix, then we apply the transform at each pixel of original image After the transformation, a rectangular image will be obtained. This image will then be send to OCR engine. Figure Polygon extracted from original image OCR Engine using Fourier Descriptor Fourier Descriptor The term "Fourier Descriptor'' describes a family of related image features. Generally, it refers to the use of a Fourier Transform to analyze a closed planar curve. Much work has been done studying the use of the Fourier descriptor as a mechanism for shape identification. Some work has also been done using Fourier descriptors to assist in OCR. In the context of OCR, the planar curve is generally derived from a character boundary. Since each of a character's boundaries is a closed curve, the sequence of (x, y) coordinates that specifies the curve is periodic. This makes it ideal for analysis with a Discrete Fourier Transform. In this project, the Fourier descriptor approach will the primary way of character recognition due its claimed efficiency and ease of use. A single connected component (left image) and its boundary curves and centroids (right image). In this project, one simple OCR engine using Fourier Descriptor has been developed. Some experimental result will be presented in later chapter. OCR Engine using Artificial Neural Networks ANN The Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such kind of problems. The ANN is an information-processing paradigm inspired by the way the human brain processes information. Artificial neural networks are collections of mathematical models that represent some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. The key element of ANN is topology. The ANN consists of a large number of highly interconnected processing elements (nodes) that are tied together with weighted connections (links). Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true for ANN as well. Learning typically occurs by example through training, or exposure to a set of input/output data (pattern) where the training algorithm adjusts the link weights. The link weights store the knowledge necessary to solve specific problems. Originated in late 1950's, neural networks did not gain much popularity until 1980s’, a computer booming era. Today ANNs are mostly used for solution of complex real world problems. They are often good at solving problems that are too complex for conventional technologies (e.g., problems that do not have an algorithmic solution or for which an algorithmic solution is too complex to be found) and are often well suited to problems that people are good at solving, but for which traditional methods are not. They are good pattern recognition engines and robust classifiers, with the ability to generalize in making decisions based on imprecise input data. They offer ideal solutions to a variety of classification problems such as speech, character and signal recognition, as well as functional prediction and system modeling, where the physical processes are not understood or are highly complex. The advantage of ANNs lies in their resilience against distortions in the input data and their capability to learn. However, ANN is potentially more complex than the Fourier descriptor approach. So it will serve as a comparative object to Fourier descriptor approach unless it is proved to be very much efficient and thus feasible to deploy on mobile device. One basic ANN OCR engine has been successfully developed. It out-performs the Fourier Descriptor OCR engine and will be the main engine in this project in the future. Some experiment result will be presented in next chapter. Front-end GUI program for character recognition Front-end GUI program invokes the OCR engine to generate the result text. Before doing this, the front-end program also preprocess the image in order to extract the text area. This is a demo of the program GUI. The GUI is pretty preliminary at this stage. The central part is the camera view. The text box under the camera is where the result will be displayed. Overall Progress Summary System Architecture Preprocessing • Get frame from camera buffer Camera •Make frame copy •Convert copy to grayscale image •Edge detecion •Line detection •Search polygon •Get polygon corner point •Transform polygon in original image to new rectangular image •Run OCR on image •Output result on UI if the result text contains valid header and tail ANN OCR Engine Development Environment This system has been developing on Ubuntu 10.10 32bit PC. All codes are written in C++. The front-end program utilizes Qt UI library to render GUI. OCR Engine Two OCR engines have been developed. One uses Fourier Descriptor and another uses ANN. Although they are all shallow, very accurate result can be generated under good conditions. The ANN renders better result therefore is choose for future development. Preliminary Experiment Result Following image demonstrates a trial run of the system. The result is not always correct due to the icon in Android device. Problems to Tackle in the Future Result Unstable Since the system is totally real time, one code will be processed plenty of times. Not all the result comes from one scenario are the same. The result will change time to time. In order to tackle this issue, a probabilistic approach may be applicable in this situation. Interference from icon There is an icon in Android system message and chat program. This icon introduces interferences when recognizing the text. The icon might be removed by pattern recognition since it’s always square. Camera capability The camera I have don’t have auto-focus capability therefore introduces a great inaccuracy in the result because the image is not well focused. Reference 1. Canny, J., A Computational Approach To Edge Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8(6):679–698, 1986. 2. R. Deriche, Using Canny's criteria to derive a recursively implemented optimal edge detector, Int. J. Computer Vision, Vol. 1, pp. 167–187, April 1987. 3. Shapiro, Linda and Stockman, George. "Computer Vision", Prentice-Hall, Inc. 2001 4. Duda, R. O. and P. E. Hart, "Use of the Hough Transformation to Detect Lines and Curves in Pictures," Comm. ACM, Vol. 15, pp. 11–15 (January, 1972) 5. P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959 6. Shapiro, Linda and George C. Stockman (2001). Computer Vision, p. 257. Prentice Books, Upper Saddle River. ISBN 0130307963 7. H. Moravec (1980). "Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover”. Tech Report CMU-RI-TR-3 Carnegie-Mellon University, Robotics Institute. 8. C. Harris and M. Stephens (1988). "A combined corner and edge detector”. Proceedings of the 4th Alvey Vision Conference. pp. 147–151. 9. J. Shi and C. Tomasi (June 1994). "Good Features to Track,”. 9th IEEE Conference on Computer Vision and Pattern Recognition. Springer. 10. C. Tomasi and T. Kanade (2004). "Detection and Tracking of Point Features". Pattern Recognition. 11. T. Lindeberg (1994). Scale-Space Theory in Computer Vision. Springer. ISBN 0-7923-94186.