Character Recognition Technique using Feature Extraction

advertisement
Character Recognition Technique using
Feature Extraction
1UdayaTheja.V, 2Sangamesh, 3Dr.Rajshekhar
Ghogge
1
M.Tech (CNE), Dept of ISE, Dr.AIT, Bangalore, India, udayatheja.v@gmail.com
2
M.Tech (CNE), Dept of ISE, Dr.AIT, Bangalore, India, sangamesh.kollur@gmail.com
3
Associate Professor, Dept of ISE, Dr.AIT, Bangalore, India, rajsgsrm@yahoo.co.in
ABSTRACT- Character recognition has long
been a critical area of the Artificial Intelligence.
Recognition is a trivial task for humans, but to
computer easy. Classical methods in recognition
are not perfect for the recognition of visual
characters due to the following reasons [9]:
make a computer program that does character
1. The “same‟ characters differ in sizes, shapes and
recognition is extremely difficult. Recognizing
styles from person to person and even from time to
patterns is just one of those things humans do well
time with the same person. The source of confusion
and computers don’t. The reasons for this are the
is the high level of abstraction: there are thousands
many sources of variability, abstraction and
styles of type in common use and a character
absence of hard-and-fast rules that define the
recognition program must recognize most of these.
appearance of a visual character. Hence rules need
2. Like any image, visual characters are subject to
to be heuristically deduced from samples. Feature
spoilage due to noise. Noise consists of random
extraction is an essential step to recognize a
changes to a pattern, particularly near the edges. A
character through offline methods. This paper gives
character with much noise may be interpreted as a
a detailed overview of different feature extraction
completely different character by a computer
techniques for recognition process of different
program.
characters.
Index Terms- Character Recognition, Feature
Extraction.
3. There are no hard-and-fast rules that define the
appearance of a visual character. Hence rules need
to be heuristically deduced from the samples.
I. INTRODUCTION
Character recognition system is useful in license
Character recognition is the process to classify the
plate recognition system, smart card processing
input character according to the predefined
system, automatic data entry, bank cheque /DD
character class. With the increasing interest of
processing,
computer applications, modern society needs that
automation, address and zip code recognition,
the computer should read the text. The text may be
writer identification etc.
in the form of scanned handwritten document or
There exist several different techniques for
typed text in various fonts or a combination of
recognizing
both. The character recognition system helps in
characters by the number of loops in a character
making the communication between a human and a
and the other by direction of their concavities.
money counting
characters.
machine,
One
postal
distinguishes
These methods can be used one after the other to
problem of recognizing optically characters that are
increase accuracy and speed for recognition.
processed. Optical recognition is performed offline after the writing or printing has been
usually
completed, the opposed to on-line recognition
abbreviated to OCR is the mechanical, electronic
where computer recognizes the characters. Both of
conversion of the scanned or photographed images
hand printed and printed the characters may be
of the typewritten or printed text into computer-
recognized, but the performance of computer
readable text. OCR is used as a form of data entry
directly dependent on the quality of the input
from some sort of original paper data source
documents.
Optical
Character
Recognition,
whether passport documents or bank statement,
receipts, mail, business card or any number of the
printed records. Optical Character Recognition is a
common method of digitizing printed texts so that
they can be electronically searched, edited and
stored above compactly displayed on-line and used
in the machine processes such as machine
translation, key data extraction, text to speech and
text-mining. OCR is the different kind of field to
research in pattern recognition and artificialintelligence.
Optical Character Recognition is the mechanical or
electronic translation of images of handwritten,
Fig. 1: Areas of Character Recognition
typewritten or printed text (usually captured by a
The more constrained input is better performance
scanner) into a machine-editable text. It is often
of the OCR system. However, when it comes to the
used to convert paper books and documents into
totally unconstrained handwriting and the OCR
electronic files. When one scans a single paper
machines. However, the computer reads fast and
page into the computer, it is produces just an image
technical advances are continually bringing the
file and a photo of the page. The computer can’t
technology closer to its ideal.
understand the letters on the page, so you cannot
The above fig.1 shows the two main types of
search for words or edit this and have the words re-
Character Recognition i.e. Off-Line and On- Line.
wrap as you type and change the font, as in the
Firstly, the character recognition off line captures
word processor.
the statistics from documents through optical
You would be using OCR software to the convert it
scanners or cameras whereas secondly, the
into the text or word processor file so that you
character recognition system on-line, make use of
could do those things. The results are much more
the digitizers which directly captures writing
flexible and compact than the original page photo.
through the order of the strokes and speed
information.
Different Areas of Character Recognition:
In Character Recognition, there are two main types
Optical Character Recognition deals with the
of off-line character recognition i.e. Single
Characters and other is Handwritten Script. Optical
disconnections and noise. Instead of searching for
Character Recognition can be used for:-
linear strokes in the image, the global directional

Data entry for business documents, e.g.
check,
passport,
receipt
and
bank
statement.
information at each pixel of the image is computed.
This information is stored into several feature
maps. Assigning to each pixel a single orientation

Automatic number plate recognition.

Automatic
insurance
documents
is avoided in order to preserve useful information.
key
Each feature map is then processed by zones in
order to estimate the local orientation of the
information extraction.
strokes. Finally, image is recognized by means of a


Extracting business card information into
neural network classifier. These systems work for
a contact list.
the recognition of segmented cursive characters,
More quickly make textual versions of
cursive words and the first letter of cursive words.
printed documents.
There are simple and fast algorithms for detection
of italic, bold and all-capital words without doing

Make
electronic
images
of
printed
actual character recognition [3].
documents searchable.

Researchers present a statistical study which
Converting handwriting in real time to
reveals that the detection of such words may play a
control a computer (Pen computing).
key role in automatic Information Retrieval from
documents. Moreover, detection of italicized words
II. OFF-LINE RECOGNITION
can be used to improve the recognition accuracy of
a text recognition system. Considerable numbers of
Off-line recognition operates on pictures generated
document images have been tested and these
by an optical scanner. The data is two-dimensional
algorithms give accurate results on all the tested
and space-ordered which means that overlapping
images, and the algorithms are easy to implement.
characters cannot be separated easily. Off-line
Feature Extraction is one of the important method
handwriting recognition involves the automatic
in off-line recognition.
conversion of text in an image into letter codes
which are usable within computer and text-
III. FEATURE EXTRACTION
processing applications.
The data obtained by this form is regarded as a
The idea of the feature point extraction algorithm is
static representation of handwriting. Off-line
to identify characters based on features that are
handwriting recognition is comparatively difficult,
somewhat similar to the features humans use to
as different people have different handwriting
identify characters [1][8][11]. Programmers must
styles. And, as of today, OCR engines are primarily
manually determine the properties they feel are
focused on machine printed text and ICR for hand
important. Some example properties might be
"printed" text. Cursive handwriting utilizes the
Aspect Ratio, Percent of pixels above horizontal
Hough transform and a neural network [4]. The
half point, Percent of pixels to right of vertical half
Hough transform is a line detection technique
point , Number of strokes , Average distance from
which has the ability of tolerating deformation,
image centre , Is reflected y axis , Is reflected x
axis. Researchers have used many methods of
feature extraction for handwritten characters [5].
Shadow code, fractal code, profiles, moment,
template, structural (points, primitives), wavelet,
directional feature etc., have been addressed in the
literature as features. From the literature survey of
the existing pieces of works on characters
recognition, it was evident that not much effort is
given on feature enhancement to remove the
confusion between similar shaped characters for
their recognition.
Fig. 3.1 Projection method
This approach gives the recognizer more control
over the properties used in identification. Yet any
system using this approach requires substantially
more development time than a neural network
because
the
properties
are
not
learned
automatically. Selection of a feature extraction
method is probably the single most important factor
in achieving high recognition performance in
2. Border Transition Technique (BTT)
Border transition technique assumes that all the
characters are oriented vertically. Each character is
partitioned into four equal quadrants. The scanning
and calculation of zero-to-one transition in both
vertical and horizontal directions in each quadrant
take place. Fig. 3.2 shows the partition and
transition of a character 6 using BTT.
character recognition systems.
1. Projection Method The projection method
[10] does the compression of the data through a
projection. Black pixel counts are taken along
parallel lines through the image area to generate
marginal distributions. The direction of projection
can be horizontal axis, vertical axis, diagonal axis
or all of the above. Evermore, the character can be
divided vertically and horizontally into four parts
and do the same projection on each quarter. It will
improve the recognition rate. Fig. 3.1 [10] shows
horizontal and vertical projection of a character.
Fig.3.2 Border Transition Technique
A. Zoning is a method involves the division of the
character into smaller fragment of areas (zone)
[10]. The black pixels in each zone are counted and
accumulating or averaging the profiles in each zone
extracts features. Fig. 3.3 shows the 16X16 and
8X8zoning.
Fig. 3.4 Graph matching method
Fig. 3.4 shows character 3 on which Graph
Matching Method is applied and it is described
using end points, branch point and curve point.
IV. CONCLUSION
Fig. 3.3 16X16 to 8X8 zoning
The character recognition methods have developed
B. Graph Matching Method A graph matching
method [4] uses structural feature of character. It is
robust method to change of font or rotation. Three
features are defined. First, an end point is
connected only one pixel which has information of
position. A branch point is connected more than
three pixels. It has feature information which is
connected the branch point. The information
includes kind of features, position and direction.
And a curve point is connected two pixels.
However a straight line is also connected two
remarkably in the last decade. A variety of
techniques
have
emerged,
influenced
by
developments in related It is hoped that this
comprehensive discussion will provide insight into
the concepts involved, and perhaps provoke further
advances in the area. The difficulty of performing
accurate recognition is determined by the nature of
the text to be read and by its quality. Generally,
improper segmentation rates for unconstrained
material increase progressively from machine print
to handprint to cursive writing.
pixels. In order to discriminate between a curve
Fields such as image recognition and face
point and a straight line, direction information is
recognition. We believe that wise use of features
used.
extraction has led to improved accuracies. Features
of each character are required based on which a
character can be classified. We can combine two or
more techniques so as to improve the accuracy of
[7] Rumiana Krasteva, “Bulgarian Hand-Printed
the system. We have included a list of references
Character Recognition Using Fuzzy C-Means
sufficient to provide a more-detailed understanding
Clustering”,
of the approaches described.
robotics”, pp 112-117.
REFERENCES
[8] Mohammed Abu Ayshi, M.Jay Kimmel, Diane
Problems
of
engineering
and
C. Simmons, “Character recognition system using
[1] Dr. P. S. Deshpande, Mrs. Latesh Malik, Mrs.
spatial and structural features”, US 7,010,166B2.
Sandhya
[9]
Arora,
“Handwritten
Devanagari
Character Recognition Using Connected Segments
Shashank
Araokar,
“Visual
Character
Recognition using Artificial Neural Networks”.
and Minimum Edit Distance” IEEE 2007.
[10] Attaullah Khawaja, Shen Tingzhi, Noor
[2] Rókus Arnold, Póth Miklós, “Character
Mohammad Memon, AltafRajpa, “Recognition of
Recognition Using Neural Networks”, CINTI 2010,
printed Chinese characters by using Neural
978-1-4244-9280-0/10/$26.00 ©2010 IEEE, 311-
Network”, 1-4244-0794-X/06/$20.00 ©2006 IEEE,
314.
pp 169-172.
[3]
Feng
Yanga,
Fan
Yangb,
“Character
[11]
Yuk
Yirtg
Chung,
M„an
To
Wong,
Recognition Using Parallel BP Neural Network”,
“Handwritten Character Recognition By Fourier
ICALIP2008, pp 1595-1599, 978-1-4244-1724-
Descriptors And Neural Network”, 1997 IEEE
7/08/$25.00©2008IEEE.
TENCON, pp 391-394.
[4] Jieun Kim, Ho-sub Yoon, “Graph Matching
Method for Character Recognition in Natural Scene
Images“, INES 2011, pp 347-350, 978-1-42448956-5/11/$26.00 ©2011 IEEE.
[5] T. Wakabayashi, U. Pal, F. Kimura and Y.
Miyake,
“F-ratio
Based
for
Similar
Extraction
Weighted
Shape
Feature
Character
Recognition”, ICDAR.2009, pp 196-200, 978-07695-3725-2/09 $25.00 © 2009 IEEE.
[6] E.Kavallieratos, N.Antoniades, N.Fakotakis and
G.Kokkinakis, “Extraction and recognition of
handwritten
alphanumeric
application forms”.
characters
from
Download