Final Year Project Progress Report

advertisement
THE HONG KONG POLYTECHNIC UNIVERSITY
Final Year Project
Progress Report
Optical Character Recognition and Chinese 2D
Code Applications
Student Name: YANG Fan
Student ID: 06846354d
Supervisor: Prof. Henry Chan
This document serves as the project progress report for the final year project supervised by Prof.
Henry Chan. The progress of this project is defined in this document.
This pages is left blank intentionally
Table of Contents
Problem Statement ......................................................................................................................... 4
Objectives and Outcome ................................................................................................................. 4
Objectives .................................................................................................................................... 4
Outcome ...................................................................................................................................... 4
Previous and On-going Work........................................................................................................... 5
Image Preprocessing ................................................................................................................... 5
Edge Detection ........................................................................................................................ 5
Line Detection.......................................................................................................................... 7
Perspective Transformation .................................................................................................... 8
OCR Engine using Fourier Descriptor .......................................................................................... 9
OCR Engine using Artificial Neural Networks ............................................................................ 10
Front-end GUI program for character recognition.................................................................... 11
Overall Progress Summary ............................................................................................................ 12
System Architecture .................................................................................................................. 12
Development Environment ....................................................................................................... 12
OCR Engine ................................................................................................................................ 12
Preliminary Experiment Result .................................................................................................. 12
Problems to Tackle in the Future .............................................................................................. 13
Result Unstable...................................................................................................................... 13
Interference from icon .......................................................................................................... 13
Camera capability .................................................................................................................. 13
Reference ...................................................................................................................................... 14
Problem Statement
There are a lot of applications focusing on the English 2D code. English 2D code makes it much
easier to send it through SMS messages. Unlike QR code, a 2D code would not require the
internet connection. A Chinese 2D code application will serve the same purpose but the system
uses Chinese characters as the primary encoding characters.
In order to recognize the code distributed through SMS, there must be an application that utilize
optical character recognition technology to read this SMS by take a picture of the mobile phone
and translate the image into code. To be more specific, following problems are tackled in this
project:


Which algorithm is most effective and efficient for recognizing a small subset of Chinese
characters in a 2D code?
Is there any other potential usage for this OCR system?
Objectives and Outcome
Objectives
The objectives of this project primarily contain three goals:

Design and Implement OCR System
Design and implement mobile OCR system which can successfully and efficiently
recognize characters from image. An effective and efficient OCR system will become the
foundation of other OCR related applications. This OCR system will most likely be an
artificial neural networks based system. This system will need only recognize a small
subset of Chinese characters.

Implement Application that utilizes the OCR system
Design and implement a Chinese 2D code system which utilizes the OCR system to
recognize the code on mobile phones distributed through SMS.
Outcome
The output of this project will potentially benefit the entire mobile user groups, the possible
outcome includes:



An OCR service that runs on PC which can convert a subset of Chinese characters
contained in an image into text.
An application which can recognize and decode Chinese 2D code
(To Be Decided) A program to encode Chinese 2D code.
Previous and On-going Work
Image Preprocessing
Image frame retrieved from digital camera usually is not suitable for direct OCR. Since many
factors may reduce the quality of the result such as the background of text, phone frame, light
reflection. Image preprocessing is essential to OCR process. Following techniques are utilized in
the system.
Edge Detection
The first step of preprocessing is edge detection. Edge detection will generate a single channel
gray scale image with only the edges of objects in the image. There are several algorithms for
this task. In the system, I used Canny algorithm. This algorithm contains several stages.
Noise reduction
The Canny edge detector uses a filter based on the first derivative of a Gaussian, because it is
susceptible to noise present on raw unprocessed image data, so to begin with, the raw image
is convolved with a Gaussian filter. The result is a slightly blurred version of the original which is
not affected by a single noisy pixel to any significant degree.
Here is an example of a 5x5 Gaussian filter, used to create the image to the right, with σ = 1.4:
An edge in an image may point in a variety of directions, so the Canny algorithm uses four filters
to detect horizontal, vertical and diagonal edges in the blurred image. The edge detection
operator returns a value for the first derivative in the horizontal direction (Gy) and the vertical
direction (Gx). From this the edge gradient and direction can be determined:
The edge direction angle is rounded to one of four angles representing vertical, horizontal and
the two diagonals (0, 45, 90 and 135 degrees for example).
Non-maximum suppression
Given estimates of the image gradients, a search is then carried out to determine if the gradient
magnitude assumes a local maximum in the gradient direction. So, for example,

if the rounded angle is zero degrees the point will be considered to be on the edge if its
intensity is greater than the intensities in the west and east directions,



if the rounded angle is 90 degrees the point will be considered to be on the edge if its
intensity is greater than the intensities in the north and south directions,
if the rounded angle is 135 degrees the point will be considered to be on the edge if its
intensity is greater than the intensities in the north west and south east directions,
if the rounded angle is 45 degrees the point will be considered to be on the edge if its
intensity is greater than the intensities in the north east and south west directions.
From this stage referred to as non-maximum suppression, a set of edge points, in the form of
a binary image, is obtained. These are sometimes referred to as "thin edges".
Tracing edges through the image and hysteresis thresholding
Intensity gradients which are large are more likely to correspond to edges than if they are small.
It is in most cases impossible to specify a threshold at which a given intensity gradient switches
from corresponding to an edge into not doing so. Therefore Canny uses thresholding
with hysteresis.
Thresholding with hysteresis requires two thresholds – high and low. Making the assumption
that important edges should be along continuous curves in the image allows us to follow a faint
section of a given line and to discard a few noisy pixels that do not constitute a line but have
produced large gradients. Therefore we begin by applying a high threshold. This marks out the
edges we can be fairly sure are genuine. Starting from these, using the directional information
derived earlier, edges can be traced through the image. While tracing an edge, we apply the
lower threshold, allowing us to trace faint sections of edges as long as we find a starting point.
Once this process is complete we have a binary image where each pixel is marked as either an
edge pixel or a non-edge pixel. From complementary output from the edge tracing step, the
binary edge map obtained in this way can also be treated as a set of edge curves, which after
further processing can be represented as polygons in the image domain.
Figure Image before/after Canny Edge Detection
Line Detection
the straight line can be described as y = mx + b and can be graphically plotted for each pair of
image points (x, y). In the Hough transform, a main idea is to consider the characteristics of the
straight line not as image points (x1, y1), (x2, y2), etc., but instead, in terms of its parameters, i.e.,
the slope parameter mand the intercept parameter b. Based on that fact, the straight
line y = mx + b can be represented as a point (b, m) in the parameter space. However, one faces
the problem that vertical lines give rise to unbounded values of the parameters m and b. For
computational reasons, it is therefore better to use a different pair of parameters,
denoted r and θ (theta), for the lines in the Hough transform.
The parameter r represents the distance between the line and the origin, while θ is the angle of
the vector from the origin to this closest point. Using this parametrization, the equation of the
line can be written as
which can be rearranged to r = xcosθ + ysinθ
It is therefore possible to associate to each line of the image a couple (r,θ) which is unique
if
and
, or if
and
. The (r,θ) plane is sometimes referred to
as Hough space for the set of straight lines in two dimensions. This representation makes the
Hough transform conceptually very close to the two-dimensional Radon transform. (They can be
seen as different ways of looking at the same transform.)
For an arbitrary point on the image plane with coordinates, e.g., (x0, y0), the lines that go
through it are
,
where r (the distance between the line and the origin) is determined by θ.
This corresponds to a sinusoidal curve in the (r,θ) plane, which is unique to that point. If the
curves corresponding to two points are superimposed, the location (in the Hough space) where
they cross corresponds to a line (in the original image space) that passes through both points.
More generally, a set of points that form a straight line will produce sinusoids which cross at the
parameters for that line. Thus, the problem of detecting collinear points can be converted to the
problem of finding concurrent curves.
Using Hough transform we can find all lines in an image, the process can help the system to
recognize the phone screen area where the OCR should happen.
Figure Line detection
Perspective Transformation
After extracting polygon from image, the result usually is not a rectangle therefore cannot put
into a new image. I applied perspective transformation in order to get a rectangular result image.
By following equation,
Where dst(i) = (xi’, yi’), src(i) = (xi, yi), I = 0, 1, 2
We can calculate the map_matrix, then we apply the transform at each pixel of original image
After the transformation, a rectangular image will be obtained. This image will then be send to
OCR engine.
Figure Polygon extracted from original image
OCR Engine using Fourier Descriptor
Fourier Descriptor
The term "Fourier Descriptor'' describes a family of related image features. Generally, it refers
to the use of a Fourier Transform to analyze a closed planar curve. Much work has been done
studying the use of the Fourier descriptor as a mechanism for shape identification. Some work
has also been done using Fourier descriptors to assist in OCR. In the context of OCR, the planar
curve is generally derived from a character boundary. Since each of a character's boundaries is a
closed curve, the sequence of (x, y) coordinates that specifies the curve is periodic. This makes it
ideal for analysis with a Discrete Fourier Transform. In this project, the Fourier descriptor
approach will the primary way of character recognition due its claimed efficiency and ease of
use.
A single connected component (left image) and its boundary curves and centroids
(right image).
In this project, one simple OCR engine using Fourier Descriptor has been developed. Some
experimental result will be presented in later chapter.
OCR Engine using Artificial Neural Networks
ANN
The Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such kind of
problems. The ANN is an information-processing paradigm inspired by the way the human brain
processes information. Artificial neural networks are collections of mathematical models that
represent some of the observed properties of biological nervous systems and draw on the
analogies of adaptive biological learning. The key element of ANN is topology. The ANN consists
of a large number of highly interconnected processing elements (nodes) that are tied together
with weighted connections (links). Learning in biological systems involves adjustments to the
synaptic connections that exist between the neurons. This is true for ANN as well. Learning
typically occurs by example through training, or exposure to a set of input/output data (pattern)
where the training algorithm adjusts the link weights. The link weights store the knowledge
necessary to solve specific problems.
Originated in late 1950's, neural networks did not gain much popularity until 1980s’, a computer
booming era. Today ANNs are mostly used for solution of complex real world problems. They
are often good at solving problems that are too complex for conventional technologies (e.g.,
problems that do not have an algorithmic solution or for which an algorithmic solution is too
complex to be found) and are often well suited to problems that people are good at solving, but
for which traditional methods are not. They are good pattern recognition engines and robust
classifiers, with the ability to generalize in making decisions based on imprecise input data. They
offer ideal solutions to a variety of classification problems such as speech, character and signal
recognition, as well as functional prediction and system modeling, where the physical processes
are not understood or are highly complex. The advantage of ANNs lies in their resilience against
distortions in the input data and their capability to learn.
However, ANN is potentially more complex than the Fourier descriptor approach. So it will serve
as a comparative object to Fourier descriptor approach unless it is proved to be very much
efficient and thus feasible to deploy on mobile device.
One basic ANN OCR engine has been successfully developed. It out-performs the Fourier
Descriptor OCR engine and will be the main engine in this project in the future. Some
experiment result will be presented in next chapter.
Front-end GUI program for character recognition
Front-end GUI program invokes the OCR engine to generate the result text. Before doing this,
the front-end program also preprocess the image in order to extract the text area. This is a
demo of the program GUI.
The GUI is pretty preliminary at this stage. The central part is the camera view. The text box
under the camera is where the result will be displayed.
Overall Progress Summary
System Architecture
Preprocessing
• Get frame from
camera buffer
Camera
•Make frame copy
•Convert copy to
grayscale image
•Edge detecion
•Line detection
•Search polygon
•Get polygon corner
point
•Transform polygon in
original image to new
rectangular image
•Run OCR on image
•Output result on UI if
the result text contains
valid header and tail
ANN OCR
Engine
Development Environment
This system has been developing on Ubuntu 10.10 32bit PC. All codes are written in C++. The
front-end program utilizes Qt UI library to render GUI.
OCR Engine
Two OCR engines have been developed. One uses Fourier Descriptor and another uses ANN.
Although they are all shallow, very accurate result can be generated under good conditions. The
ANN renders better result therefore is choose for future development.
Preliminary Experiment Result
Following image demonstrates a trial run of the system. The result is not always correct due to
the icon in Android device.
Problems to Tackle in the Future
Result Unstable
Since the system is totally real time, one code will be processed plenty of times. Not all the
result comes from one scenario are the same. The result will change time to time. In order to
tackle this issue, a probabilistic approach may be applicable in this situation.
Interference from icon
There is an icon in Android system message and chat program. This icon introduces
interferences when recognizing the text. The icon might be removed by pattern recognition
since it’s always square.
Camera capability
The camera I have don’t have auto-focus capability therefore introduces a great inaccuracy in
the result because the image is not well focused.
Reference
1. Canny, J., A Computational Approach To Edge Detection, IEEE Trans. Pattern Analysis
and Machine Intelligence, 8(6):679–698, 1986.
2. R. Deriche, Using Canny's criteria to derive a recursively implemented optimal edge
detector, Int. J. Computer Vision, Vol. 1, pp. 167–187, April 1987.
3. Shapiro, Linda and Stockman, George. "Computer Vision", Prentice-Hall, Inc. 2001
4. Duda, R. O. and P. E. Hart, "Use of the Hough Transformation to Detect Lines and Curves
in Pictures," Comm. ACM, Vol. 15, pp. 11–15 (January, 1972)
5. P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy
Accelerators and Instrumentation, 1959
6. Shapiro, Linda and George C. Stockman (2001). Computer Vision, p. 257. Prentice Books,
Upper Saddle River. ISBN 0130307963
7. H. Moravec (1980). "Obstacle Avoidance and Navigation in the Real World by a Seeing
Robot Rover”. Tech Report CMU-RI-TR-3 Carnegie-Mellon University, Robotics Institute.
8. C. Harris and M. Stephens (1988). "A combined corner and edge detector”. Proceedings
of the 4th Alvey Vision Conference. pp. 147–151.
9. J. Shi and C. Tomasi (June 1994). "Good Features to Track,”. 9th IEEE Conference on
Computer Vision and Pattern Recognition. Springer.
10. C. Tomasi and T. Kanade (2004). "Detection and Tracking of Point Features". Pattern
Recognition.
11. T. Lindeberg (1994). Scale-Space Theory in Computer Vision. Springer. ISBN 0-7923-94186.
Download