OObTfiflSb TOflO 3 2

advertisement
MIT LIBRARIES OUPl
3
TOflO
OObTfiflSb
2
DSWt
POCT
ALFRED
P.
T 1990
WORKING PAPER
SLOAN SCHOOL OF MANAGEMENT
AN IMPROVED STRUCTURAL TECHNIQUE FOR
AUTOMATED RECOGNITION OF HANDWRITTEN
INFORMATION
Patrick S-P
Wang
Amar Gupta
August 1990
WP# 3162-90
MASSACHUSETTS
INSTITUTE OF TECHNOLOGY
50 MEMORIAL DRIVE
CAMBRIDGE, MASSACHUSETTS 02139
AN IMPROVED STRUCTURAL TECHNIQUE FOR
AUTOMATED RECOGNITION OF HANDWRITTEN
INFORMATION
Patrick S-P
Wang
Amar Gupta
August 1990
WP# 3162-90
AN IMPROVED STRUCTURAL
TECHNIQUE FOR AUTOMATED
RECOGNITION OF HANDWRITTEN
INFORMATION
Wang
Amar Gupta
Patrick S-P
IFSRC No. 123-90
Comments and
suggestions relating to the contents of this paper should be directed
to the principal investigator:
Dr.
Amar Gupta
Principal Research Associate
Sloan School of Management
MIT Room E40-265
>
Cambridge,
02139
Telephone: 617-253-8906
MA
Abstract
This paper examines several line-drawing pattern recognition methods for handwritten
character recognition such as picture descriptive language(PDL), Berthod and Maroy's
method(BM), extended Freeman's chain code(EFC),
tree
error transformation(ET)method,
grammar(TG), and array grammar(AG). Then a new character recognition scheme
that uses improved extended octal codes as the primitives
fers
advantages of handing flexible
size, orientation,
and
is
introduced. This scheme
variations, need for fewer learning
samples needed and lower degree of ambiguity. Finally the simulation of
recognition by their real time on-line counterpart
Keywords
:
is
of-
off-line
character
investigated.
pattern recognition, character recognition, syntactic approach, structural
approach, grammars, on-line vs off-line, learning, degrees of recognizability, learnability
and ambiguity,
1.
artificial intelligence
Introduction
During the past quarter century there has been a growing interest
entists, engineers
and researchers
in the
function and machine simulation of
in this area
and a large number
for
computer
problems related to the replication of
human
reading.
of technical papers
sci-
human
Intensive research has been done
and reports on line-drawing pattern
recognition (including character recognition) can be found in the literature[l,2,3,8,10,1216,23-25,27-32]. This subject has attracted such an
because
it is
immense
interest
very interesting and challenging, but also because
automated postal code reading, check
verification, office
of banking, business, industrial, engineering
and
it
and
effort
not only
provides a strategy for
automation and a large variety
scientific applications[8,25,27].
Recently, the state of the art in line-drawing pattern recognition has advanced from
the use of primitive schemes for the recognition of machine-printed alphanumerals to the
application of sophisticated techniques for the recognition of a wide variety of handwrit-
ten characters and symbols [1,27,32,34].
It is
human
interesting to see that even
reading devices (eyes),
beings,
who
5-percent mistakes
make about
possess the best trained optical
when reading
in the
absence of
context [24]. These errors are mainly caused by infinite variations of shapes resulting from
the writing habit, styles, education, region of origin, social environment,
and other conditions of the
writer. For character recognition
mood, health
by computers, the error rate
could be even higher because of the reasons described above as well as other factors such
as the writing instrument, writing surface, scanning
mechanisms, learning techniques and
machine recognition algorithms[3,8].
This paper begins with an analysis and comparison of several line-drawing pattern
recognition methods.
namely
(1) syntactic
These approaches can be basically
two
categories,
approach (such as PDL, tree grammars and array grammars) and
structural approach (such as
BM
The reason we concentrate on
and structural methods rather than conventional
they are hierarchical in nature,
subpatterns according to
subpaterns and so on,
its
till
(2)
method and extended Freeman's chain code method).
Their advantages and disadvantages are discussed.
tactic
classified into
i.e.
statistical
methods,
is
syn-
because
they divide a large, complicated pattern into smaller
structure, which in turn can be further divided into smaller
primitives are found which can
no longer be divided. Using
such structural and syntactic approaches, one can directly take advantages of the powerful
data structures using grammatical rules, trees and directed labeled graphs, which are
widely used in dealing with linguistic problems [6,7,11, 18,20,21]. Such approaches are
similar to the powerful artificial intelligence
or complicated
problem
is
problem solving
str tegies in
which a large
reduced to smaller subproblems [17,19,26,35,36]. Further,
it
has been shown that syntactic and structural approaches can overcome some disadvantages found in classic decision theoretic (statistical) approach, which has difficulty in
distinguishing between two very similar patterns (characters) [6,18].
structural
and syntactic approaches, we
ognizability are normally
intuition of
human
more natural
beings,
handwritten symbols.
who
Among
the various
believe that those with a higher degree of rec-
in that they are
more accurate and
possess the best pattern recognition
skill for
closer to the
dealing with
A new
is
character recognition scheme using improved extended octal code as primitives
This scheme provides certain advantages such as flexible
discussed.
variations, need for fewer learning samples
size, orientation,
of ambiguity
and lower degree
and higher
de-
gree of recognizability. Finally, an off-line character recognition technique that simulates
on-line real time techniques
2.
studied.
is
Notations, Terminologies, Definitions, and Fundamentals
"
Frequently we hear people say,
But what constitutes a
level of character,
category
1
Your handwriting
It is
hard
to recognize."
well recognizable or a hardly recognizable handwriting at the
Looking at Table
word or even a sentence?
recognizable, category 2
is
is terrible.
is
2.1,
one can see that
recognizable with some effort, category 3
is
recognizable with great difficulty (and perhaps with a high risk of mis-recognition) and
category 4
is
beyond
more concrete
of
human
While
recognition.
this is intuitively true,
it is
necessary to define a
criterion (or criteria) of "recognizable" characters for
machine simulation
reading. Figure 2.1 shows a block diagram of a typical character recognition
scheme.
Definition 2.1 The degree of ambiguity of a character B can be defined
problem domain
D and
the encoding scheme
DegAmb(B,D,E)=max.
E
as follows
E
DegRec{B,D,E) =
terms of the
:
no. of interpretations of
B
in
D
Definition 2.2 The degree of recognizability of a character B,
under the encoding scheme
in
under E.
in the
domain of D,
can be defined as follows:
l/DegAmb(B,D,E)
if
B
is
in the dictionary
<
otherwise
Conventionally one would conjecture that the easier a character
is
to recognize
it
and
vice versa.
However,
this
is
is
not always true.
to learn the easier
We
it
do find examples
T*bl« 2.1
S«T«raI lavala of racognizability
The party begins.
^^
<£A^v-*- <f(S&u*-<J? (ZCSi^Ji.
2 drinks later.
£&~- dj^^r^i_ 6JZ^_
J j^. ^c
t+<£*W <€^Sr.
C-H.
£
/s
->l
1
Input 1-
Data
1
Figure 2.2
Continuous transformation between I and Y, C and
e
f
like
i
0.
which are
easier to learn
instance. While '0'
is
but harder to recognize and vice versa. Take '0' and 'Q' for
comparatively easier to learn (syntactically) than 'Q',
to recognize (semantically).
behaved printed
letters,
DegAmb(X,D,E)>
2
harder
Notice that Definitions 2.1-2.2 are valid not only for well
but also for handwriting variations. For instance, in Figure
and DegAmb(0,D,E)>
For more examples, please see Figure
2.3.
Syntactic Methods:
2.2,
2.
In general, there are 8 possible relations be-
tween learning and recognition as shown in Figure
3.
it is
2.4.
PDL, Tree Grammar and Array Grammar
This section investigates several syntactic methods for line-drawing pattern recognition, namely, picture descriptive language, tree
grammar and array grammar, with an
analysis and comparison between them.
S.l.
The
The
picture descriptive language
picture descriptive language
itive is labeled at
PDL.
(PDL) was developed by
two distinguished points, a
tail
and a head.
Shaw[7,20,21].
A
primitive can be linked
or concatenated to other primitives only at its tail and/or head.
Primitive Structure
Interpretation
Description
a+b
axb
head(a) CAT tail(b)
Each prim-
Graph
—
-
head(a) CAT head(b)
a-b
h
t
(tail(a) CAT tail(b))
a*b
a
(head(a) CAT head(b))
t
^
I_* h
b
The grammar that
generates sentences in
G=(Vn,
where
b
is
a context-free grammar
Vt, P, S)
Vn={S,Sl},
may be any
PDL
Vt={b,+,X,-,*,~,/,(,),l}
primitive (including the "null point primitive"
which has identical
c,
tail
and head) and P:
S
S
—
—
>b,
>/SL,
SL—-»/SL,
T—
-»
The
S
—
>STS,
SL
>S,
T^ +,
S— SL,
S—^~S,
SL
—>SLTSL,
T—
SL
— ~SL,
T—
x,
»
-,
*.
primitives can be as follows:
hi
-
h2
—
dl
/
d2
f
gl
d3
/
vl
\
g2
T
v2
\
I
g3
v3
Then, we can express the English characters as
follows[6].
A-
(d2+((d2+g2)»h2)+g2)
B—
((v2+((v2+h2+gl+(~(dl+vl)))*h2)+gl+(~(cll+vl)))*h2)
C-» (((~gl)+v2+dl+hl+gl+(~vl))x(hl+((dl+vl) xA)))
D^
(h2*(v3+h2+gl+(~(dl+v2))))
E—
((v2+((v2+h2)xhl))xh2)
F—
((v2+((v2+h2)xhl))xA)
G—
(((~gl)+v2+dl+hl+gl+(~vl))x(hl+((dl+vl+vl-hl)xA)))
H^
(v2+(v2x(h2 + (v2x(~v2)))))
I— (v3xA)
J^
((((~gl)+vl)xhl)+((dl+v3)xA)
K—
(v2 + (v2xd2xg2))
L—
(v3xh2)
M-+ (v3+g3+d3+(~v3))
N-» (v3+g3+(v3xA))
O—
(hl*((~gl)+v2+dl+hl+gl+(~(dl+v2))))
P-
((v2+((v2+h2+gl+(~(dl+vl)))*h2))xA)
Q-> (hl*((~gl)+v2+dl+hl+gl+(~dl+((~gl)xgl)+v2))))
R-
(v2+(h2*(v2+h2+gl+(~(dl+vl)))+g2)
S-»((((~gl)+vl)xhl)+((dl+vl+(~(gl+hl+gl))+vl+hl+gl+(~vl))xA))
T—
((v3+(hlx(~hl)))xA)
U-
((((~gl)+v3)xhl)+((dl+v3)xA))
V—
((~ g 3)xd3xA)
W^
(((~ g 3)+d3+g3)+(d3xA))
X-
(d2+((~g2)xd2xg2))
Y-
((v2+((~g2)xd2))xA)
Z—
((d3-h2)xh2)
10
For example, the character "A" can be described as follows
verbatim
:
\
I
\
I
\
I
Triangle
+
c
\
a
*
b
c
Tree Grammars[6,ll,20]
3.2.
By
extension of one- dimensional concatenation to multidimensional concatenation,
strings are generalized to trees. Naturally, if
a
+
tree,
it
Let
a pattern can be conveniently described by
can be easily generated by a tree grammar.
N+
be the
Let
set of strictly positive integers.
U be
N+
semigroup with identity element "0" generated by
free
The depth of a6
< b
if
only
and only
if
a
domain
^
if
.
denoted d(a) and defined as
is
,andb £
j
£
U
there exists x 6
and only
D, and (2)a
A
b
if
U
D
if
and
ranked alphabet
is
d(0)=0, d(a
:
x=b
.
and a binary operation
.
.
i)=d(a)+l, ie
a and b are incomparable
Figure 3.1 represents the universal tree domain U.
a.
D
such that a
the universal tree domain (the
is
a
finite
i
<
j in
a pair
subset of
N+
(£
r
>
U
implies a
r)
:£
satisfying(l) b €
.
where £)
—
N
11
=
i
D and
D
N+
a
finite set of
U {0}
N+
if
.
a
and
a tree
a < b implies a G
€ D.
is
is
".").
symbols and
For a 6 X!, r(a)
is
called the rank of a. Let
£ = '-»
n
A
tree over
£
(i- e -->
over (5Z,r))
is
a function
o
/
I
\
12
/
\
1.1
1.1.2
1.1.1
Figure 3.1
D
is
is,
domain
..
Universal tree domain
a tree domain and
r[a(a)]
that
1.2 ...
\
/
such that
3
= max{i
\
a.i
6
D}
the rank of a label at "a" must be equal to the
at a.
trees over
The domain
^. The depth
of a tree
of
a
is
a
is
denoted by D(a) or
Da
.
of branches in the tree
Let l\p be the set of
defined as d(a)=max{d(a)|a 6 D(q)}. Let
and "a" be a member of D(a). a/a, a subtree of a
a/a = {(b,x)
|
For example, please refer to Fig.
number
at "a",
(a.b,x) G
3.2.
GA =
(VA ,r A ,PA ,A)
12
a}
is
denned as
a be a
all
tree
where
VA = {A,Al,A2,Na ,Nd ,Ne },VTA =
r A {i)
=
{2}, r A (a)
=
r A (c)
{0, 2},
=
{i,a,c,d}
r A {d)
{0, 1},
=
{0}
Pa:
l
A
-->
/
\
A2
11
$
a
Al
—>
/
a /
\
Hb
Ha
c
—>
/
c
\
/
a
A2
\
/
\
d
c
\
/
\
I
Ic
Ba --> a, Id
—
>
d, He --> c
A tree grammar for character "A"
Figure 3.2
3.3.
Array Grammar( AG)[4,20,22,33]
Definition 3.1
An
isometric array
grammar
is
a quintuple
G=
where
Vff
is
a
finite
nonempty
terminal symbols, Vr n Vj
is
a
finite
nonempty
set of
(VN ,VT ,P,S,#)
nonterminal symbols, Vj
=0 and # ^
(Vjv
U
Vj>) is
set of rewriting rules of the
13
is
a
finite
nonempty
set of
the background or blank symbol.
form a
—
»/3,
where array a and array
P
/3
are geometrically identical over V)v
We
a
U Vy U { # }
say that array x directly generates array
—
>/3,
x contains a as a subarray and y
:
y,
is
a not
all
denoted
the set of
symbol
all
S in
=*. The
x=>
SgN
y, if
is
the starting symbol.
there
is
a rewriting rule
identical to x except that the subarray
replaced with the corresponding symbols of the array
sitive closure of
#'s.
/3.
a
is
Let =^>* be the reflexive tran-
language generated by an array grammar G, denoted L(G),
is
arrays of terminal symbols and #'s that can be generated from the starting
a
field of #'s.
Definition 3.2
The
distance between two arrays
as the smallest
Example
number
3.1:
X and Y
over Vt, denoted as d(X,
of error transformations required to derive
Given two arrays over
a,
Y
Y) can be
defined
from X.
b
b
aaaaa
aaaa
a
a
X =
Y =
and
a
a
a
a
a
b
we have
b
aaaaa
X =
b
b
aaaaa
a
aaa
a
aaa
a
Ti
a
Td
a
Ts
a
a
|-
a
I-
a
I-
a
a
a
a
a
a
a
a
b
14
= Y
where the symbol
I-
The minimum number
fore,
d(X,Y)
=
means "becomes".
*
*
*
*
*
*
*
*
*
3.2
6.
a
is
3.2, it
can be seen that
Therefore one would classify (a) and
while (c)
*
*
not very satisfactory as can be seen from the following example.
Anyone who knows English
T
*
Three arrays over '«*"
Given the 3 arrays in Fig.
(c).
*
*
*
Definition 3.2
is
There-
*
*
Figure 3.3
(b)
3.
*
*
*
rather than to
Y is
******
*
*
*
cluster.
*
*
*
=
*
*
*
d[(a), (c)]
to
3.
*********
Example
X
of error transformations required to transform
(c),
instead of (a)
will easily recognize that (a)
In fact, a survey of 18 persons showed that
is
d[(a), (b)]
all
and
more
is
of
=
18
(b), into
and
one
similar to (b)
them thought
that
probably an F or an E.
The above example shows
that the direct distance measure between two arrays
satisfactory for clustering analysis. This motivates trial of the following
15
method.
is
not
Assume
R
that every array
array 3, generated by an array
are
augmented
generate
9ft.
T? Because
Gc
to
,
a variation or a noisy counterpart of a pure, noiseless
is
grammar
resulting in
Ga
For instance, in Fig. 3.3,
the person
T
T
called 'core
,
called
why
must have learned
a typical, pure or noiseless
of this model for
,
Gc
grammar'.
If
'augmented grammar', then
:
"T has a
Ga
can also
does a person recognize (b), instead of
be a
this to
T
before,
(c) as
a
and one can assume that
was used as a model when he or she learned
could be
additional rules
it.
A description
horizontal stroke with a vertical stroke attached
below the center of the horizontal ones, and both strokes are of similar length". In other
words, when a
T
learned as
it is
grammar can be considered
one's brain. This
a certain kind of
learned,
is
flexibility,
that
is
if
there were an array
as a core
new augmented grammar, one can
T's, such as,
,
and
I
also recognize
built into
characterizing T.
with some extra rules attached to
resulting in a
I
grammar
grammar
this core
With
grammar
some variations or noisy
.
J
Definition 3.3
The
array
distance between an array
grammar G*
parsing of
Ga
da (G a
is
8?
and a group of arrays characterized by an augmented
= minimum number
,8?)
of
8?
symbols not covered by the
.
Definition 3.4
The
distance between an array
grammar
ofG a +
G
c
is d,.
(G c
,
8?)
8?
and a group of arrays characterized by a core array
= minimum number
of
3?
parsing steps using non-core rules
da {Ga »).
Notice that
,
if
there does not exist a parsing sequence for
3?,
then both
values are either infinite or undefined. In Fig. 3.2, for instance,
da [D 3 ,(a)} =
0, <*,[(?,, (a)]
16
= 0,da [G 3 ,{b)] =
0,
its
da and de
=
de [G 2 ,(b)}
We now
9,da [G 3 ,(c)}
propose an algorithm that
= -,de (G2 ,(c)) =
-.
produce an augmented grammar from a core
will
grammar.
Without
2-point normal form
being used,
one can assume that each core grammar
loss of generality,
[4,9],
we have a
is
in
Chomsky
Notice that because a different definition of connectedness
slightly different
Chomsky normal form,
in that every rule
is
obeys
one of the following:
C
#
A# -> BC,
-> B
A
C
#
->
,
A ->
A
A
#A -> C B,
B
B,
B
B
A
->
-> C
#
C
#
#
C
*
-> B
A
,
C
or A ->a
where
A
G
VN B,C
,
G
VN
U
VT
and a G
VT
.
In general, a rule can be represented by a triple (A, BC,3), where
instance,
A#
->
BC
7.
For
can be represented by (A, BC,0),
#
A
< d <
C
—>
B
can be represented by (A, BC,
1), etc.,
and
A
-» a
is
represented by (A,
a, -).
Notice
that this representation coincides with the Freeman's chain code octal primitives as to
17
be shown in Fig.
4.2. of
next section.
Algorithm Ai( Core-to-augmented)
Input, k core grammar
G =
i
{VN
.
>
VTi ,Pi,S ,#),i=l...k
i
Output, k augmented grammars
Step
1.
Step
2.
For i=l to k do step 2
and
P? =
Pi;U
For convenience,
{(A,BC,(d+
if in
l) mod6 ),(A,BC,(d+ 7)
Pi the jth rule
is
(A,
BC,
d),
ntM
)
|
(A,BC,d) € Pi}
then add j.(d+ l)mo<i8 and j. (d+7) m<xf8
into Pf.
Step
We
3.
are
now ready
Cluster Pass
Input.
A
= {G 1 ,G 2
I
set of
,
to propose a two-pass clustering procedure.
n
...,G k },
digitized patterns
where L(Gi)
Output. The assignment
(1
X={x\,Z2,
L{Gj)
=
for
i
/
j, i,
and a
set of
k core grammars Z
j=l,...,k.
of Xi, i=l,...,n to k clusters characterized by Gi, i=
CORE-TO-AUGMENTED
Step
1. Call
Step
2.
For
i
=
1
to
Step
3. For
j
=
1
to k do step 4-5
n do
...,i„},
step 3-5
18
1,
...,
k.
Step
4.
Compute da (Gj,Xi)
Step
5.
Assign
p with
z, to cluster
da (Gp ,Xi) - minj=1
Notice that
H
if
more than one
should be assigned to
all
k da (Gj,Xi).
...
have the same
clusters
minimum
distance with x t
then
these clusters.
has been assigned to more than one cluster, then use Cluster Pass
If i,
,
II.
Cluster Pass II
For
Input.
n,
which are assigned to more than one
all Xi 1 ,Xi J ,...,Xi rn
X{
i
and 1<
and
t\
<
its
k
assigned grammars Gi,, ...,Gi k where
< j=m <
1
do the following.
n, 1
<h<
k,
1<
ij
<
.
Output. Assignment
of
sc,-.
to a proper cluster.
Step
1.
Compute dc (Gi K ,Xi i )
Step
2.
Assign z 1; to cluster
X{-
cluster,
for all G,- 4 to
i
q
where
<f c
which
Xi-
has been assigned.
= min de
(Gi f ,i,y
(Gi k ,*i,) for
all
Gi k to which
has been assigned.
The
English characters adapted from
set of
Fu and
Lu[7]
is
illustrated using the
above
two-pass clustering procedure. There are 51 characters from nine different classes: P, X,
D, U, F, Y, K, H, and
respectively.
different
shown
is
D
and
P,
H
and K,
from the other eight
in Fig. 3.4.
the bottom.
In
characterized by GO, Gl, G2, G3, G4, G5, G6, G7, and G8,
Eight of the nine classes are selected from four groups, each with similar
structures; that
is
V
The
Fu and
U
classes.
and
V
and
X
and Y. The
Each character
is
an array on a 20*20
Generated in a top down fashion, each pattern
results of 9 clusters are satisfactory as
Lu[7], the use of a similarity
patterns in terms of
19
shown
measure using
grammar transformations
is
class of character
is
F
grid, as
parsed upwards from
in [29].
tree
grammar
for syntactic
proposed and a nearest neighborhood
25
•
I
10
13
12
26
27
28
——
;
29
30
31
32
33
34
33
36
rule
and a clustering procedure are described. This clustering procedure
ful for
this
quite success-
is
two-dimensional pattern analysis, especially for English characters. However, in
procedure, every character should be
first
encoded into a one- dimensional string via
Freeman's chain code[5] and Shaw's PDL[21]. Further, pattern 47^(Figure
when weighted transformations
accurately classified into group 2, D, even
threshold value as high as
proposed and
is
In Lu[ll], a tree-to-tree distance
3.
first
Please note that although the purposes of
for
not
are used with a
measurement algorithm
is
encoded into a tree representation and, further,
£ and
some confusions between As{f\) and E$(
sense that they
is
applied to clustering analysis for 2-dimensional patterns. Again, in this
algorithm, every pattern should be
there are
3.4.)
all ftim at
the
same target
two-dimensional patterns),
the model used in[29]
is
[29]
[29]
(i.e.
)
and
they
between £<(/
somewhat
[7,11] are
all
)
and C${r
).
similar, in the
try to solve the clustering
problem
employs an entirely different approach. For instance,
"isometric array
grammar"
— the
time such a model has been
first
tried for two-dimensional pattern clustering analysis. Also, the definition of distance be-
tween two patterns introduced in Section
error transformation
method.
distance between an array
paper does not use the conventional
we use the concept
Instead,
and a
3 of this
class of arrays characterized
by a Context-Free Array
Grammar(CFAG)[4]. Even though the same input data adapted from
test our
two-pass clustering algorithm,
differences discussed above,
we
we obtain a
different result.
also notice that pattern 7
*j
through our algorithm, in contrast to an
A
survey of 18 persons showed that 7 of
them thought of 'Tas an X,
it
was an F, while 3 of them thought
was
trained to write and the array
it
grammar depends on
BM
Structural Methods:
a.
Berthod and Maroy's primitives.
denote Berthod and Maroy's primitives
character
is
the algorithm of
Clearly
it
used to
is
classified as
Fu and
8 of
an
Lu[7,ll].
them thought
depends on how one
is
that training.
approach and Extented Freeman Approach[EF]
4.
We
neither.
[7,11] is
In addition to the
of Fig. 3.4
X
F through
and define the
of parsing
[3]
and encoding scheme
encoded into a string of the 5 primitives shown
21
in
Table
4.1.
as
BM. Every
<=T: straight line
P: positive curve
(counterclockwise)
M: minus curve
P
(clockwise)
L: penlift
R: cusp
J-
Table 4.1 The
five
Berthod and Maroy primitives
Examples of codes corresponding
Let the domain
D
be
all
to various characters are
English characters.
ters (strings of primitives) resulting
tablet
is
A
shown
in Figure 4.1.
dictionary consisting of about 26 charac-
from pre-processing of characters drawn on a graphic
compiled.
Code
M
Symbols
Code
Symbols
PT
TRTT
TRMLT
TTTT
PRT
Figure 4.1 Examples of codes corresponding to characters.
4
*
«
Figure 4.2 Freeman's octal chain code primitives
Figure
4.3.
(a)
Coded word of "C"
is
34567011 and (b) Coded word of "H"
30.
24
is
6 16
could be recognized as an "E", since both have
and
b.
"TTLTLT", which
is
in the dictionary
unambiguous.
is
Extented Primitives
We
adapt Freeman's chain code[5] and the 8 direction vectors shown in Figure 4.2
as primitives.
Each character
is
encoded into a string of primitives, but only those local
extrema points (points which are tangent to one of the 8 vectors) are extracted. The
letter
"C" could be encoded
as 3456701 as
consider the vector between 'pen-up' and
octal primitive with a bar on
shown
its
in Figure 4.3(a).
successive 'pen-down'
penlift,
and choose the
we
closest
Therefore the letter "H" could be encoded as 6165
it.
call
the above described encoding scheme EF(for Extended Freeman) and
English characters. Using the same sample data
words
To handle
as
in Figure 4.3(b).
Let us
D=
shown
in
>
compiled in an ordering
tionary, only 7
Further, this
1
>
...
words are ambiguous and DegAmb(B,
method
shown
in Table 4.3. In this dic-
Dm, EF)=2, Be{C,0,D,P,X,Y,T}.
eliminates the possibility of misrecognizing a
\/
A
I
I
*
be interpreted
0462050
a dictionary consisting of about 48
>6>7as
unacceptable symbols. For example,
will not
,
\
V
\
as the valid character "E".
let
number
of obviously
3456701
0462050
E
076526
P
046
T
04620
F
0462050
E
0464
J
04640
I
161
N
21
V
2121
W
245670123406
G
2761
M
3456701
CO
4560
C
4620
F
0462050
E
4675
S
560
C
5670123
CO
567012367
Q
575
S
51630
A
51715
N
51730
A
527
XY
61
L
602
U
6157
K
620657
R
62065765
B
6275
DP
6420
J
670
L
67012
U
6157
K
61630
H
620657
R
62065765
B
620765
DP
6271
N
62715
M
6275
DP
627575
B
630
T
7171
W
72
V
715
Y
725
X
Table
6.
5.1.
An DZF
dictionary
Off-Line Character Recognition
In this section, off-line character recognition simulating real time on-line technique
is
discussed and an algorithm for transforming two-dimensional line drawing patterns to
parsing sequences based on the "universal array
27
grammar"
is
constructed.
In conventional syntactic pattern recognition, each class of patterns
is
represented by
a grammar[6]. In order to classify or recognize an input pattern given different classes of
patterns characterized by different grammars, the input pattern
Then
to a pattern representation through preprocessing.
grammars and
is
such a distance
and
classified into the class
minimum
with the
is
hierarchical. It divides a rather complicated pattern into
normally transformed
compared with
all
given
distance measure, provided
Such an approach
within a certain threshold.
is
it
is
is
basically structural
subpattems, which in turn
are further divided into subpattems
and
so on, until primitives are found
which can no
longer be divided.
sound and can overcome some of the
difficulties of
It is
structurally
the decision theoretical (classic) approach discussed earlier.
However, when the number of classes in a pattern recognition problem
pattern matching and classification involve too
consuming
to
many grammars and
be practical. Besides, grammatical inference
Mainly because of
this, the
line line-drawing patterns,
is
it
is
very high,
becomes too time
shown N-P complete hard.
concept of "universal grammar" was proposed in
[32] for
on-
with classification determined by the parsing sequence rather
than by grammars.
Such concept of "universal grammar"
We
use the model of " array
methods
explored for
is
grammar" because
it
off-line
line-drawing patterns.
offers several
advantages over other
in dealing with two-dimensional patterns as described in Section 3.
*
* *
*
»
*
*
*
*
***
(a),
*
*
****
character 'c'
(b)
.
digit '2'
Figure 6.1. patterns 'c' and '2'
28
>
Example 6.1. Gl=(Vn,Vt ,P1
,
Example 6.2. G2=(Vn.Vt ,P2,S ,#)
,S ,#)
Vn={S ,S1 ,S2,S3 ,S4,SB ,S6
Vn={S,Sl.S2,S3,S4,SB,S6,
S7,S8,S9,S10,S11>
S7.S8.S9.S10}
P2:
PI:
SI
0.
# S
—>
SI
*
SI
1.
S2
S2
2.
—>
*
1.
SI #
—>
*
2.
S2
~>
*
S3
S3
#
53
3.
*
—>
#
S4
S3
3.
—>
54
*
~>
#
SB
S4
4.
—>
SB
*
—>
#
S6
SB
6.
—>
#
S6
6.
*
S6
*
#
—>
S7
S6
6.
—
S7 #
~>
*
S8
7.
*
S7
#
7.
*
SB
#
B.
*
S4
#
4.
S2
*
«>
#
S
*
—>
#
0.
S7
—
#
*
S8
8.
S8 #
~>
*
S9
8.
S8 #
—
*
S9
9.
S9 #
—>
*
S10
9.
S9 #
—>
*
S10
S10
—>
*
*
Sll
10.
10.
S10 #
29
—>
11.
Sll
—>
*
It
can be seen from Examples
character
array
'C and
grammars
ble, to construct
array
it
6.1
and
digit '2' respectively.
6.2 that arrays generated
Generally speaking, one should construct different
to represent different patterns.
adequate grammars
It is
for all patterns.
grammar (ULAG) was proposed
extremely
Therefore, in
grammar
itself,
(Vn. Vt
,
S,
P,
#)
,
whore Vn = {S}. Vt = {a} and
P:
0.
S #
—>
a S
#
1.
S
—>
S
#
2.
a
S
—>
S
#
a
S
3.
S
—>
a
4.
# S
—>
S a
a
S
B.
-->
#
6.
—>
#
S
a
S
#
7.
8.
S
a
S
S
S
—>
[32]
to distinguish
classes of patterns.
=
difficult, if
not impossi-
a universal line
to generate all on-line patterns. For recognition,
utilizes the parsing sequence, not the
ULAG Gu
by Gl and G2 are the
a
30
between
different
>
In this universal line array grammar, a code
associated with each production rule
is
except the terminal ride, and will be produced as part of the parsing sequence
corresponding rule
Using
ULAG,
is
P
=
'C
is 4,
5, 5, 6, 6, 6, 7, 0, 0, 0, 8,
'2' is 2, 0, 7, 6, 5, 5, 5, 6, 0, 0, 0, 8.
The above concept can be extended
OAG G
(Vn, Vt, P, S. #)
,
ohere Vn
to off-line character recognition.
=
{S, SI}, Vt = {a} and
=
#
0.
S #
—>
#
2.
S
SI S
1.
S
—>
S
«
s
—>
SI
s
S
3.
SI
—
SI
SI
S
4.
# S
—>
S
SI
B.
SI
S
the
successfully applied.
the parsing sequence of
parsing sequence of
if
—>
#
SI
S
S.
#
—>
S
7.
8.
SI
—>
S
9.
#
S
S
—>
—>
S
a
Example 6.3.
We can use UAG above to get the parsing sequence for English letter E:
31
and the
a a a
a
a a
as follows.
4
8
4
9
8
8 9
# S ==> S SI ==> S S ==> S a ==> S SI a ====> S a a ==> SI a a
S
89
890
6
====> a a a
==> a a a
====> a a a
S
====> a a a
8
====> a a a ====> a a a
SI
a
a
a
S
SIS
Sla
S a
80898089
86898689
6
9
========>
a a a
=========> a a a
a
a
a
a a
a a
a a
SI
a
a
SI
a a a
In the above example,
we used
UAG
these sequences are rather long because
parsing sequence to represent a pattern, but
some nonterminal
to nonterminal parsing
terminal rules were involved. Now, we propose an algorithm,
PATSEQ, which
an unique shorter parsing sequence from a pattern. In order to describe
some
produces
this algorithm,
definitions are required.
Definition
p2,
and
...,
6.1.
The neighbors
p8 shown in Figure
of a pixel, pO, are identified by the eight directors, pi,
6.2.
32
pB p6 p7
p4 pO p8
p3 p2 pi
Figure 6.2. Pixel pO and its neighbors
Definition
The segment
6.2.
of a pixel, pO,
is
a set of neighbors of pO which are
consecutive.
In the example below, there are two segments, Si and S2, for the following pixel pO,
where Si =(p5,p6,p7), S2 = (p2,p3).
p5 p8 p7
pO
p3 p2
Later,
to the
we use
number
Si to indicate the
segment
of elements in the segment
The length of a segment,
i.
Si.
length(Si),
For example, length(Sl)
=
is
equal
length(S2)
4,
=
2.
Definition
6.3.
ii)
pi
Definition
8).
We
2
<
6.4.
The
=
perfect
if it satisfies
the following conditions:
and
4,
in
center pixel of a perfect segment, Ci,
1
length(Si)
then Ci
<
3
mod
8.
(k=2
or 4 or 6 or
one segment, where i=2,4,6,8. j=(i+2)
is
the only element in
then Ci=pk, pk
is
is
as follows:
Si.
one of the elements of
Si
also call Ci as the next pixel of pO.
Definition
the
<
is
and pj are not
If length(Si)
ii) if
segment
length(Si)
i)
i)
A
number
6.5.
The parsing code from
the current pixel, pO, to one of
of the rule successfully applied.
Algorithm PATSEQ:
transfer a pattern to a parsing sequence.
33
its
next pixels
is
Input: a) Initial position (10, TO).
b) Digitized pattern in which all segments are perfect.
Output: A parsing sequence.
Method:
Step
1:
push(XO); push(YO)
Q:=empty;(Q stores the passed pixels).
;
Step 2: repeat
if stack is empty
Step 3:
then step 7.
T:=pop; I:=pop;
if (I,Y) is in Q then step 3;
push all next pixels of (I.Y) and their parsing codes
Step 4:
into stack if no next pixel for P0 then push(-l);
Step
Delete current pixel from pattern.
5:
z:=pop; if z=-l then step 6 else print (z )
Step 6:
Step 7: until stack is empty.
The following example illustrates the use of the algorithm PATSEQ.
1
2
3
4
E
6
7
> x
1
2
3
b
4
d
B
i
h
e
f
g
6 v
We use character a,b,..., to indicate positions (4,2)
34
,
(4,3) (5,3)
,
The parsing process of character
Pixel
1
Stack
'
r
'
is as follows.
Output
7.
Discussions and Conclusions
We
introduced the basic concept of degrees of recognizability and ambiguity. Their
Although there are many other factors such as human
relationship has been discussed.
factors, social environment, writing instruments, software
and hardware environment
as well as algorithm-oriented characteristics such as segmentation, data reduction, resolution, primitive selection, thresholding
and quantization, nevertheless, the degree of
ambiguity plays an inherent role as an important criterion of recognizability
for
hand-
written symbols.
Two
different categories of
experiments representing different recognition schemes for
on-line handwritten symbols were conducted.
The
first
used basic structural shapes as
primitives while the second used octal chain codes as primitives.
vided advantages of flexible
samples.
method,
It
it is
A summary
also possesses
less likely to
is
I
pro-
and the need
for fewer learning
an inherently lower degree of ambiguity.
Besides, in this
sizes, orientation, variations,
mis-recognize an obviously unacceptable symbol as a valid one.
depicted in Table 7.1.
I
|
I
|
The second method
Grammar
1
|
Structure
1
I
Accuracy
(degrees of ambiguity
and recognizability)
References
1.
and Suen, C.Y., Computer Recognition of Totally Unconstrained
Handwritten Zip Codes, Int. J. of Pattern Recognition and Artificial Intelli-
Ahmed,
P.
gence (IJPRAI),
2.
Components of an Omnifont Reader,
Symp. on Pattern Recognition, John Wiley and Sons, 343-362 (1986).
Baird, H., Simon, K., and Pavlidis, T.,
Int.
3.
vol. 1, no. 1, 1-15 (1987).
Berthod,M. and Maroy, J.P.,
Drawn on a Graphic Tablet.
Learning in Syntactic Recognition of Symbols
.
Corn-put.
Graphics and Image Processing
9,
166-
182, (1979).
4.
Cook, C. R. and Wang, P. S. P., A Chomsky Hierarchy of Isotonic Array Grammars and Languages, Comput. Graphics Image Process, vol. 8, no. 4, 144-152
(1978).
5.
On
Freeman, H.
the Encoding of Arbitrary Geometric Configurations,
IRE
Trans. Electronic Comput. 10, 206-268 (1968).
6.
Fu, K.
S.,
Syntactic Pattern Recognition and Applications, Englewood
Cliffs,
N.J., Prentice-Hall, (1982).
7.
Fu, K.S. and Lu, S.Y.,
Trans. Systems
8.
A
Clustering Procedure for Syntactic Patterns.
Man. Cybernet,
7,
IEEE
734-742 (1977).
Gupta, A., Hazarika, S., Kallel, M., and Srivastava, P., Optical Image Scanners
and Character Recognition Devices: A Survey and New Taxonomy, MIT D7SRC
Report No. 107-89, (1989)
9.
Hopcroft, J.E. and Ullman, J.D.. Introduction to Automata Theory, Languages,
and Ccomputation. Addison Wesley, New York, (1979).
10.
M.
Handprinted Hebrew character using Features
Selected in the Hough Transform Space, Pattern Recognition, vol. 18. no. 2,
Kushnir,
et al., Recognition of
103-114, (1985).
11.
Lu,
S. Y.,
IEEE
A
Trans.
Tree-to- Tree Distance
Pattern Anal.
Mach.
and
Its
IntelL
Application to Cluster Analysis,
PAMI-1,
vol.
1,
no.
2,
219-224
(1979).
12.
Mantas,
J.,
An
Overview of Character Recognition Methodologies, Pattern
Recognition, vol. 19, no.
6,
122-134 (1986).
37
13.
Masuda,
I.
et al,
Approach
IEEE-CUPR
Smart Document Reader System,
to
Proc. 550-557, (1985).
14.
Mori,
15.
Research on Machine Recognition of Handprinted Characters,
S. et al.,
IEEE-PAMI.
Morishita, T., Ooura,
gence(IJPRAI),
Ishii,
A
Nagahashi, H. Nakatsugama,M.,
of Structural Character.
17.
Int.
vol. 2, no. 1,
Nandhakumar,
N.,
Pattern Recognition
18, no. 6,
Y.,
Pattern Description and Generation Method
vol. 8, no. 1, 112-117 (1986).
IEEE-PAMI.
and Aggarwal,
-
A
Kanji Recognition Method which
J. of Pattern Recognition and Artificiallntelli181-195 (1988).
M. and
Detects Writing Errors,
16.
386-405 (1984).
vol.6, no. 4,
A
J.,
The
Artificial Intelligence
Approach
to
Perspective and Overview, Pattern Recognition, vol.
383-389 (1985).
18. Pavlidis, T., Structural Pattern Recognition, Springer- Verlag,New
19. Rich, E., Artificial Intelligence,
York, (1980).
McGraw-Hill, (1983)
A, Picture Languages: Formal Models for Picture Recognition, Academic Press, New York (1979).
20. Rosenfeld,
21.
Shaw, A.C.,
A
Formal Picture Description Scheme as
A
Basis For Picture Pro-
cessing Systems, Inf. Control vol. 14, 9-52 (1969).
22.
Siromoney, R., Array Languages and Lindenmayer Systems - A Survey, The
Book of L, Rozenberg, G. and Salomaa, A. (Ed.), Springer Verlag (1985).
23.
Suen, C.Y., Factors Affecting the Recognition of Handprinted Characters. Proc.
Int.
24.
Conf. Cybernetics and Society, 174-175 (1973).
Suen, C.Y., Berthod, M. and Mori,
Characters
25. Stern, R.,
-
S.,
The-state-of-the-art, Proc.
Automatic Recognition of Handprinted
IEEE vol.
From Intelligent Character Recognition
cessing, Proc. of Int. Electronic
68, no. 4, 469-487 (1980).
to Intelligent
Document Pro-
Imaging Exposition and Conference, Prentice-
Hall, 236-245 (1987).
26.
Tanimoto,
S.,
The Elements of Artificial Intelligence, Computer Science Press,
(1987).
27. Tappert, C.C., Suen, C.Y.
A
Survey, Proc. ICPR'88,
and Wakahara,
T.,
Rome, 1123-1132
38
On-Line Handwriting Recognition(1988).
Advances in Character Recognitions,
Reconition. Ed. by K. S. Fu, CRC 197-236 (1982).
28.
Ullmann,
29.
Wang,
J.,
P.S.P.,
An
Grammars
Application of Array
in Applications of Pattern
to Clustering Analysis. Pat-
tern Recognition, vol. 17, 403-410 (1984).
30.
Dual Level Pattern Recognition System
US Patent PN 4521909 (1985).
Wang,
P.S.P.,
-
A New OCR
Technique,
31.
Wang, P.S.P., A New Character Recognition Scheme With Lower Ambiguity
and Higher Recognizability. Pattern Recognition Letters 3 431-436 (1985).
32.
Wang,
P.S.P., On-line Chinese Character Recognition, International Conference
on Electronic Imaging 209-214. (1988).
33.
Wang,
P.S.P.(Ed.) Array Patterns,
Grammars and
Recognizers World Scientific
Publishing Co., (1989).
34.
Wang,
A More
Natural Approach for Recognition of Line- Drawing Patterns, Image pattern recognition: algorithm implementations, techniques and
P.S.P.,
technology, Proc. SPIE, vol. 755, 141-160, (1987).
35.
Winston, P. H.,
36.
Winston, P.H.(ed) with Shellard S., Artificial Intelligence
Frontiers, MIT Press, to appear (1990).
Artificial Intelligence,
39
Addison- Wesley, (1984).
at
MIT
-
Expanding
Date Due
Mil
3
TOfiO
LIBRARIES
OOblflAZb
2
Download