loop investigation

advertisement
Loop Investigation for Cursive
Handwriting Processing and Recognition
By Tal Steinherz
Advanced Seminar (Spring 05)
Outline

Background on cursive handwriting

Introduction to loops



Pattern recognition and machine
learning conflicts
Feature extraction solutions
Demonstrations and experimental
results
Cursive Handwriting (J. C.
Simon)
“Displacing a pen from left to right in an
oscillating movement, with loops,
descendants (legs), and ascendants
(poles).”
Cursive vs. Character


Cursive – continuous concatenated set of strokes.
produced by a human being in a free style.
Character – a single standalone symbol.
produced by a machine subjected to numerous
alternative fonts.
Online vs. Offline


Online – captured by pen-like devices.
the input format is a two-dimensional signal of pixel
locations as a function of time (x(t),y(t)).
Offline – captured by scanning devices.
the input format is a two-dimensional image of grayscale colors as a function of location I(m*n).
strokes have significant width.
Online vs. Offline (demo)
A Loop (T. Steinherz)
“A set of neighboring foreground pixels
surrounding a hole, i.e., a connected
blocked group of background pixels in
the word’s image, where all foreground
pixels are within stroke width distance
from the hole.”
Ascending (Descending) Loops
Axial (of the middle zone)
Loops
The importance of loops




Shared by many letters (especially
a,d,e,g,o,p,q)
Byproduct of the continuous nature of
cursive handwriting (like with
b,f,h,j,k,l,s,t,y,z)
Elementary and prominent features
Carry additional information given by a
set of descriptive parameters
The motivation to investigate
loops


Character recognition
supports discrimination between letters.
Writer modeling


Identification
Examination
contributes to applications in forensic
science and graphology.
The output of loop
investigation




Incomplete (open) loop identification
Hidden (collapsed) loop tracking - locating
blobs that correspond to online loops
Multi (encapsulated) loops understanding distinguishing natural from artificial loops
Temporal information recovery - retracing the
original path of a pen
The Engineering Approach
(J. C. Simon & T. Pavlidis)
“Requires understanding the structure of
the objects to be recognized and apply
the appropriate combination of (pattern
recognition) techniques.”
Feature extraction dilemmas




Offline cursive word signal representation
Loop identification
Signal to noise ratio
Feature vector translation
The difficulties consist in the feature extraction
and preprocessing rather than the machine
learning \ recognition engine phase.
Offline cursive word signal
representation
We use the external upper and lower
contours in conjunction with the internal
contour of all visible loops.
Loop identification
Given a set of singular points,
identification is provided by correlation
between pieces of the same contour
(around anchor points), of the opposite
contours and\or in association with
subsets of internal contours.
Signal to noise ratio
In order to improve the signal’s parametric
quantifiability and reduce noisy artifacts,
the contour is transformed to a polygon.
Hidden loop tracking the mutual distance principle
Hidden loop tracking the mutual distance principle
(cont.)
Hidden loop tracking the mutual distance principle
(cont.)
Multi loops understanding the continuity principle
Temporal information recovery
-the matching principle
Hidden loop tracking an application to ascending
(descending) loops
Writer#1
Writer#2
Writer#3
Writer#4
Writer#5
Writer#6
Total
Number of
words
223
219
223
170
215
223
1273
Number of
characters
1130
1113
1130
835
1083
1130
6421
Number of
Loops (all
kinds)
1039
1272
1013
745
1332
1146
6547
Hidden loop tracking an application to ascending
(descending) loops (cont.)
Real
Loops
Online
Loops
Offline Loops
Encapsulate
d
Disqualified
Found
Total
Number
1006
259
186
519
964
Rate
100%
25.7%
18.5%
51.6%
95.8%
Hidden loop tracking an application to ascending
(descending) loops (cont.)
Large
Loops
Online
Loops
Encapsulate
d
Disqualified
Found
Total
856
233
147
341
721
100%
27.2%
17.2%
39.8%
84.2%
(8<)
Number
Rate
Large
Loops
Offline Loops
Online
Loops
(6<)
Offline Loops
Encapsulate
d
Disqualified
Found
Total
Number
1105
288
177
390
855
Rate
100%
26.1%
16.0%
35.3%
77.4%
Hidden loop tracking an application to ascending
(descending) loops (cont.)
Threshol
d
Small Loops
8
180
209
389
6
131
209
340
No
Loops
Total
Multi loops understanding a classifier of beginning a-s
More than 40 writers with 1-4 samples per writer.
Multi loops understanding a classifier of beginning a-s
Total Loops
Type A
Type B
Error
Questionabl
e
Number
81/93
32/36
26/28
16/21
7/8
Rate
100%
39%/38%
30%/32%
19%/22%
7.5%/8%
Download