Challenges in Online Handwritten Character Recognition in Punjabi

advertisement
ONLINE HANDWRITTEN GURMUKHI
SCRIPT RECOGNITION AND ITS
CHALLENGES
R. K. SHARMA
THAPAR UNIVERSITY, PATIALA
Handwriting Recognition System
The technique by which a computer
system can recognize characters and
other symbols written by hand in
natural handwriting is called
handwriting recognition (HWR)
system.
Types of HWR systems
HWR
Off-line HWR
On-line HWR
 Handwritten Document is scanned and then
recognized by the machine, is called off-line
handwriting recognition.
 Handwritten Documents are recognized while being
written, it is called on-line handwriting recognition.
Increasing
COMPLEXITY
Handwriting Recognition System
 Writer dependent
 Writer independent
 Closed-vocabulary
 Open-vocabulary
A general recognition procedure for On-line HWR
Data Collection &
Preprocessing
Features Extraction &
Segmentation
Recognition Methods &
Post-processing
Data Collection
Input Pen Writing
Store pen movements
Text/Other file created
Text/Other file to be converted
to a suitable format
Need of an application for selected hardware device
• Pre-developed applications do not support the features for user
requirements, i.e., storing all pixels information for written text,
deletion and addition of strokes w.r.t. user requirements, scaling the
written text etc.
• Own GUI for user requirements needs to be developed.
Preprocessing
• Size Normalization
• Centering of text
• Interpolating missing points
• Smoothing of Text
• Slant Correction
• Resampling of points
Feature Extraction
• A feature extractor designed by Govindaraju converts chain code image
into feature vectors and then used in recognition phase.
• Hu et al. worked with point oriented features like stroke tangents for
handwriting recognition.
• Hu et al. also proposed a method where high-level features were extracted
and then combined with local-features at each sample point. These
introduced features were capable of covering large input pattern and had
invariance properties.
• Rocha designed feature extractor that reduced dimension of the problem
and provided structural description of a character shape that consists of
specification of its features and their special inter-ralations.
• Feature extractor designed by S.W. Lee extracted four directional feature
vectors with kirsch masks and one global feature vector linearly compressed
from normalized input image.
• Kirsch masks were also used by Chaos in recognition of handwritten
• Numerals.
• Blumenstein introduced a feature extraction technique for the recognition of
segmented handwritten characters.
• A hybrid feature extraction method proposed by PiFuei that was capable of
providing an effective feature set of full dimension for the multiclass cases.
Feature Categories
Features
Low-Level or Local
High-Level or Global
(directions, positions,
slope, area, slant etc.)
(loops, crossings,
Headline, straight line,
dots etc.)
Devices based features
 Time taken by the pen device for capturing a stroke is one of the
features as each stroke has its own complexity. If suitable
information is collected about each stroke time span, it may help in
recognition process.
 Density of points in a stroke is device dependent.
 Directions of pen movement in a stroke might be helpful in
recognition.
 Stroke area covered.
 Pressure of the pen movements.
back
Features’ Properties
 Features giving better results may vary from one script to another script.
 A method that gives good results for a script may not do so for other scripts.
 There is no standard method for computing features of a language.
 Features should vary to a reasonable extent.
 Features must be available from different users handwriting.
 Features should be measurable through algorithms.
 Features are selected in such a way that they represent the handwriting well
and emphasize the inter-class differences and intra-class similarities.
Recognition methods
Category
Method
Statistical
Hidden Markov Model,
Support Vector Machine
Researchers
Amlan kundu and Parambir
Bahl (1988); Beigi (1994);
Bellegarda (1994); Beim
(2001); Connell and Jain
(2002); Rigoll (1996);
Subrahmonia (1996)
Neural Network
TDNN
Guyon (1992); Schomaker
(1993); Morasso (1995);
Yeager (1998)
Syntactical and
Structural
Decision Tree
Kerrick and Bovik(1988); Chan
and Yeung(1999); Jung and
Kim(2000)
Elastic Matching
Dynamic Programming
Palvidis(1997); Wakahara and
Odaka(1997); Webster and
Nakagawa(1998)
Advantages and disadvantages of Recognition methods
Category
Advantages
Disadvantages
Statistical
Models temporal
relationship well.
Requires very large
amount of training data
Neural Network
Classification time is fast.
Does not model temporal
relationship well.
Syntactical and Structural
Less training data and
robust for WI system.
Feature choice is manual
and highly script
dependent.
Elastic Matching
Powerful high level
features.
Not good for the system,
where large variations
exists in handwriting.
Post Processing
Other important Aspect
Language rules
An Efficient Post Processing Algorithm for Online Handwritten Gurmukhi Character
Recognition using Set Theory”, International Journal of Pattern Recognition and Artificial Intelligence, 27(4), 1353002 (1-17), 2013 by Ravinder Kumar and
R.K. Sharma
Language Models
Challenges
• Reverse Handwriting
• Zone wise stroke predictions
• Confusing Strokes
• Prediction of half Akshras for example:
Pairi ‘ਹ’, Pairi ‘ਵ’
• New Classes in Handwritten Words
• New Features, Selection from existing features
• New Classifiers / Hybrid Classifiers
THANK YOU ALL !!!!!
Download