Uploaded by mail

An Efficient OCR System based on the Regional Feature using the ASVM as Classifier

International Journal of Trend in Scientific
Research and Development (IJTSRD)
International Open Access Journal
ISSN No: 2456 - 6470 | www.ijtsrd.com | Volume - 1 | Issue – 5
An Efficient OCR System based on the Regional Feature
using the ASVM as Classifier
Maninder Kaur
Research Scholar, ECE Department,
Rayat and Bahra University, Mohali
Ms. Manjeet Kaur
Assistant Professor, ECE Department,
Rayat and Bahra University, Mohali
ABSTRACT
In Image Processing, sometimes due to poor
handwriting, the writer left some gap between
diacritics and character or between diacritics and
header line due to which small text blocks gets
created which leads to improper text line
segmentation and hence leads to wrong results and
overlapping. As a result accuracy of the algorithm
degrades. In proposed work Adaptive SVM will be
used to improve accuracy of the system.
INTRODUCTION:
Optical character recognition which is also commonly
known as optical character reader is the process of
converting the mechanical and electronic images into
the handwritten, printed text etc.OCR is a course by
which focused software is used to change the
skimmed pictures of manuscript to electronic text so
that digitized data can be examined, indexed and
recovered. The OCR are basically design to settled
and improved the multiple real world applications
such as mining data from business documents, checks,
passports, invoices, bank statements, insurance
documents, license plates etc. Each and every
application contains the processing data sets that
contains the hundreds and thousands of scanned
documents of the images in order to train and enhance
the systems and the processing of the drill data set is
naturally done by humans in order to provide accurate
data that can be used by the engine to learn and apply,
makes it smarter. OCR is mainly used as a form of
information entry from printed paper data records,
whether passport documents, invoices, bank
statements, computerized receipts, business cards,
mail, printouts of static-data, or any suitable
documentation. OCR is mostly used in the area of the
computer vision, artificial intelligence, and pattern
recognition. Handwriting text recognition (HTR) can
be defined as the ability of a computer to transform
handwritten input represented in its spatial form of
graphical
marks
into
equivalent
symbolic
representation as ASCII text. Usually, this
handwritten input comes from sources such as paper
documents, photographs or electronic pens and touchscreens.
Fig.1 Handwritten document Convert into Text
Approaches for learning Optical character
recognition:-The following are the approaches for
learning Optical character recognition.
➢ Histogram Approach:- This method is based on
the pixel histogram in which a Y-histogram
forecast is achieved which results in text line
position and to divide the line into different areas
a threshold is applied.
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017
Page: 1226
➢
➢
➢
➢
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
Header line and base line detection based OCR involved various steps to read the characters
method: -This method calculates the header line from a scanned Image. In proposed research, a model
and base line of a text document for line has been built for handwritten images. The system
segmentation.
extracts the characters from handwritten images and
Line Segmentation: This algorithm is based on writes into text file.
the projection profile method. This algorithm
professionally pacts with skewed text as well as Flow Chart OF OCR Model
with the overlapped and touched text lines.
Character Segmentation: To extract characters
Pre-processing
with overlapping, this method helps to removes
the vowel converters and consonant transformers.
By removing the consonant and vowel converters,
Segmentation
the word image contain only the base characters
with clear paths between them.
Normalization
Overlapping and Touching of Characters:- Due
to overlapping and touching of characters, there
may contains no important break between text
Feature extraction
lines and when the two or more text lines comes in
a same text block then they leads to wrong results.
Classification
Post -processing
Fig.3 Steps of OCR System
Fig.2 Overlapping of Character
Machine learning techniques used in hand written
recognition:
Artificial Neural Network (ANN) An Artificial
Neuron is basically an engineering approach of
biological neuron. ANN consists of a number of
nodes, called neurons. Neural networks are typically
organized in layers. In neural network each neuron in
hidden layer receives signals from all the neurons in
the input layer. The strength of each signal and the
biases are represented by weights and constants,
which are calculated through the training phase. After
the inputs are weighted and added, the result is then
transformed by a transfer function into the output.
Support Vector Machine: SVM is a non-linear
classifier which is now mostly new in the machine
learning which is used to solve the texture
classification and pattern recognition problems.SVM
is designed to work with only two classes by
determining the hyper plane to divide two classes and
the separation of these classes is performed by using
different Kernels.
PROPOSED METHODOLOGY
Data Acquisition: Most Important initial phase in
OCR is to gather the image from either device sensor
like PDA or tablets in case on online recognition or
getting the images containing characters directly for
offline recognition. The image should have a specific
format such as JPEG, BMP etc.
Pre Processing: The goal of pre-processing is to
simplify the pattern recognition problem without
missing any vital information. It reduces the noises
and inconsistent data. It enhances and prepares it for
the next steps.
Segmentation: Segmentation is an integral part of
any text based recognition system. It assures
efficiency of classification and recognition. Accuracy
of character recognition heavily depends upon
segmentation phase.
Normalization: The results of segmentation process
provides isolated characters which are ready to pass
through feature extraction stage, thus the isolated
characters are reduced to a specific size depending on
the methods used. The segmentation process
essentially renders the image in the form of m*n
matrix.
Feature Extraction: Feature extraction is the process
of extracting the relevant
features from
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017
Page: 1227
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
objects/alphabets to form a feature vectors. These Post Processing: The goal of post processing is the
feature vectors is then used by classifiers to recognize incorporation of context and shape information in all
the input unit with target output unit Feature the stages of OCR systems is necessary for
extraction methods are based on 3types of features:
meaningful improvements in recognition rates.
➢ Statistical
➢ Structural
➢ Global transformations and moments
Classification: The results Classification is the last
stage where we train the neural net using the feature
vectors obtained during feature extraction method
against the required targets
Feature extraction: The following is the feature
matching and classification algorithm for matching
the extracted plant disease image with the different
images of same plant, which are taken at different
times, from different viewpoints, or by different
sensors.
RESULT AND DISCUSSION:
No.
1
2
3
4
Images
Real Text
Recognized Text
The electrical
resistance of an
electrical conductor is
a measure of the difficulty to pass an
electric current
through that conoctor.
Difficult roads often
lead to beautiful
destinations
The electrical
resistance of an
electrical conductor is
a measure of the oiffi .
Culty to pass an
electric current
through that conoctor
Difficult roads often
lead to beautiful
destinations
It is a field of reserch
in pattern recognition,
artificial intelligence
and machine vision.
It is a field of reserch
in pattern recognition,
artificial intelligence
and machine vistion
Morning walk is a
good exercise. An
early-riser can be a
regular morning
walker.The benefit of
morning walk are
manifold.
Morning walk is a
gooo exercise. An
early-rtser can be a
regular morning
walker the benefii of
morning walk are
manifold
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017
Accuracy
(%)
96.52
100
97.50
95.09
Page: 1228
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
5
6
7
The future is what will
happe in the time after
the present.Its arrival
is considerd inevitable
due to the existence of
time and the laws of
physics.
The future is what will
happe in the time after
the present its arrival is
considerd inevitable
oue to the extstence of
time and the laws of
physics.
Teamwork makes the
dreamwork
Teamwoik mapeo the
dremwoik
Neural networks can
be used , if we have a
suitable dataset for
taining and learning
purposes. dataset are
one of most important
things when
constructing new
neural network.
Neural networks can
be used , if we have a
sutable dataset for
taining and learning
purposes dataset are
one of most important
things when
constructing new
neural network
Good things take time
Goool things tare time
8
9
10
97.56
80.00
98.62
83.34
Older ocr systems
match these images
against stored bitmaps
based on specific
fonts.
Older ocr systems
match these images
against stored 8itmaps
based on specific
fonts.
Social media are
computer-mediated
technologies that
facilitate the creation
and sharing of
informaton ideas ,
career interest and
other forms of
expession via virtual
communities and
networks .
Social media are
computer-medtated
technoloc-ies that
facilitate the creation
and sharting of
informaton ideas ,
career interest and
other forms of
expession via virtual
communities and
networrs .
Overall Accuracy of Proposed work
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017
98.61
97.60
94.48
Page: 1229
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
Fig.4 Accuracy of proposed work
In above figure, the accuracy of proposed work is represented in the form of graph. In graph, X-axis denotes the
number of samples which are included in the proposed work for the testing and Y-axis denotes the accuracy of
proposed work in percentage. Form the above graph; it has been observed that the average percentage of
accuracy is more than 94% with handwritten images.
Classifier
Accuracy (%)
Comparison between Classifier
ANN
SVM
86
93
ASVM
94
Fig.5 Accuracy comparison between Classifier
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017
Page: 1230
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
In above figure, the accuracy comparison between
artificial neural network (ANN), support vector
machine (SVM) and adaptive support vector machine
(ASVM) is represented in the form of bar graph. From
the figure we observe the accuracy of proposed
character recognition system with ASVM is better
than ANN and SVM classifier due to the best training.
Conclusion: Due to overlapping and touching of
characters, there remains no significant gap between
the text lines and hence two or more text lines comes
in a same text block which leads to wrong results. The
main focus in this research project is to experiment
deeply with, and find alternative solutions to the
image segmentation and character recognition
problems within the Overlapped Character
Recognition. In the existing work, SVM classifies is
applied but it has less accuracy. So in future, Adaptive
SVM will be applied to improve better accuracy of
the system.
Future work: In future, we can use the artificial
neural network along with the optimization algorithm
to achieve the better results by minimizing the more
noisy data from the images for the character
recognition system. The combination of the artificial
neural network as classifier instead of SVM with
optimization technique the precision of character
recognition will have to increase and the rate of noise
will decreases.
REFERENCES
1) Chame, Shivadatt D., and Anil Kumar.
"Overlapped
Character
Recognition:
An
Innovative Approach." Advanced Computing
(IACC), 2016 IEEE 6th International Conference
on. IEEE, pp.464-469, 2016.
2) Garg N.K., Kaur L., Jindal M.K. “The
segmentation of half characters in Handwritten
Hindi Text”, Springer Verlag Berlin Heidelberg,
pp.48-53, 2011.
3) Bansal G., Sharma D., “Isolated Handwritten
Words Segmentation Techniques in Gurmukhi
Script”, International Journal of Computer
Applications, vol.1, issue.24, pp. 104-111, 2010.
4) Kumar M., Jindal M.K., Sharma R.K.,
“Segmentation of Isolated and Touching
Characters in Offline Handwritten Gurmukhi
Script Recognition”, International Journal
Information Technology and Computer Science,
pp.58- 63, Feb, 2014.
5) Kumar R., Singh A., “Algorithm to Detect and
Segment Gurmukhi Handwritten Text into Lines,
Words and Characters”, IACSIT International
Journal of Engineering and Technology, vol.3,
issue.4, 2011.
6) Kumar R., Singh A., “Detection and Segmentation
of Lines and Words in Gurmukhi Handwritten
Text” Institute of Electrical and Electronics
Engineers, pp.353-356, 2010.
7) Mangla P., Kaur H., “An End Detection
Algorithm for segmentation of Broken and
Touching characters in Gurumukhi Word”,
Handwritten Institute of Electrical and Electronics
Engineers, pp.1-4, 2014.
8) Mehta B., Rani S., “Segmentation of Broken
Characters of handwritten Gurmukhi Script”,
International Journal of Engineering Sciences,
vol.3 pp.95-105, 2014.
9) Kumar R., Singh A., “Challenges in Segmentation
of Text in Handwritten Gurmukhi Script”
Proceedings in BAIP 2010, CCIS 70, SpringerVerlag Berlin Heidelberg, pp. 388-392, 2010
10) BinnyThakral, Manoj Kumar, “Devanagari
Handwritten Text Segmentation for Overlapping
and Conjunct Characters- A Proficient
Technique”, pp.1-4, IEEE 2014.
11) M. A. Massoud, M. Sabee, M. Gergais, R. Bakhit,
“Automated new license plate recognition in
Egypt”, Alexandria Engineering Journal, vol.5,
issue.2, pp.319-326, Science Direct, 2013.
12) Ching-Liang Su, “Car Plate recognition by whole
2-D image”, Expert Systems with Applications,
vol.38, pp.7195-7200, Science Direct, 2011.
13) Dening Jiang, Tulu Muluneh Mekonnen, Tiruneh
Embiale Merkebu, Ashenafi Gebrehiwot, "Car
Plate Recognition System", ICINIS, vol.1, pp.912, IEEE, 2012.
14) Setumin, U.U. Sheikh, S.A.R Abu-Bakar, "Car
Plate Character Extraction and Recognition using
Stroke Analysis", SITIBS, vol.1, pp.30-34, IEEE,
2010.
15) Pei-Chen Tseng, Jiun-Kuei Shiung, Chun-Ting
Huang, Shih-Mine Guo, Wen-Shyang Hwang,
"Adaptive Car Plate Recognition in QoS-Aware
Security Network", SSIRI, vol.1, pp.120-127,
2008.
16) Ping Wang, Wei Zhang, "Research and
Realization of Improved Pattern Matching in
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017
Page: 1231
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
License Plate Recognition", ISIITAW, vol.1,
pp.1089-1092, 2008.
17) Ratree Juntanasub, Nidapan Sureerattanan, "Car
License Plate Recognition through Hausdorff
Distance Technique", IICTAI, vol.1, pp.645-651,
IEEE, 2005.
18) Benjapa Ratchata sriprasert, Kittawee Kongpan,
Paruhat Punyarprateep, "License Plate detection
Based on Template Matching Algorithm", ICCCT,
vol.1, pp.139-143, 2012.
19) Clemens Arth, Florian Limberger and Horst
Bischof, “Real-Time License Plate Recognition on
an Embedded DSP-Platform”, Proceedings of
IEEE conference on Computer Vision and Pattern
Recognition, pp.1-8, June 2007.
20) Halina Kwasnicka and Bartosz Wawrzyniak,
“License plate localization and recognition in
camera pictures”, AI-METH 2002, November
pp.13-15, 2002.
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 1 | Issue – 5 | July-Aug 2017
Page: 1232