A Formal Description for Two-Dimensional Patterns

advertisement
A FORMAL DESCRIPTION FOR TWO-DIMENSIONAL PATTERNS*
C. T. Zahn
Stanford Linear Accelerator Center
Stanford University, Stanford, California
Summary
A " s t r u c t u r a l description" for two-dimensional
black and white patterns is defined as the set of contour
lines for an appropriate function which fits the binary
pattern. These contour lines are mutually nonintersecting closed polygonal curves with edges in only
eight different directions and they represent the bound­
aries between connected black and white areas of the
pattern. Rigorous procedures are described to t r a n s ­
f o r m a " m a t r i x p a t t e r n " into a " s t r u c t u r a l description"
and vice versa. Advantages of this method for d e s c r i b ­
ing patterns previous to pattern recognition are d i s ­
cussed at some length.
description" we employ consists of a set of simple polyg­
onal closed curves whose vertices are ordered in either
clockwise or counter-clockwise sense. These curves
represent the continuous boundaries between connected
sets of black points which we c a l l "objects" and con­
nected sets of white points which we call " h o l e s . " F i g .
2 depicts the structural description of the pattern in
Introduction
This paper presents a method for obtaining a " s t r u c ­
t u r a l description" of any two-dimensional picture whose
elements are a finite set of points in the plane each hav­
ing value zero (white) or one (black) and such that the
points are arranged in the f o r m of a square l a t t i c e .
Figure 1 shows one such p i c t u r e . The " s t r u c t u r a l
FIG. 2
F i g . 1. Our structure also includes information i n d i ­
cating which curves are inside each other as shown
schematically by dotted lines in F i g . 2. The points of a
curve are ordered so that the curve is traced keeping
black points to the right and white to the left. As a
r e s u l t , curves which define the outside boundary of
black areas (objects) proceed clockwise and those for
white areas (holes) proceed counter-clockwise. It
should be noticed that the curves do not go through the
outermost picture points of an area but rather go b e ­
tween these outermost points and the adjacent points of
opposite c o l o r .
Our selection of this particular format to describe
the information content of a black/white pattern was
motivated by a belief that the sequential trace of the
boundary of an object contains the most useful data for
FIG. 1
♦Work supported by U.S. Atomic Energy Commission.
-621-
recognizing objects one f r o m another in a great many a p p l i ­
cations . Experimental evidence f r o m the psychology of human visual perception and the neurophysiology of animal
visual perception support this belief to an amazing extent.
For a more detailed description of this " s t r u c t u r a l
d e s c r i p t i o n " and an extended discussion of i t s uses, the
reader is r e f e r r e d to Z a h n . 1
Structural Description as Contour
The closed curves of the " s t r u c t u r a l d e s c r i p t i o n "
a r e , in fact, the contour lines at a height of 1/2 for a
continuous function F(x,y) defined over the square area
of the picture and such that F takes value 1 where the
picture has a black point and 0 where it has a white
point. The precise construction of F is as follows:
Consider the subdivision of the picture area into
s m a l l e r squares defined by the square m a t r i x of picture
points by drawing horizontal and v e r t i c a l lines through
the points of the p i c t u r e . It should be clear that a t r i angulation* of the picture area can now be effected by
separately triangulating each smaller square, as shown
in F i g . 3. We shall describe rules for determining which
diagonal w i l l be added to each square to complete the
triangulation.
to connect two black points or two white points, choose
the black and t r y to have as few curvaturepoints as
possible. The triangulation r u l e embodied in (d) of
F i g . 3 assures that black points have precedence, while
rules (b) and (c) prevent a straight edge at 45° f r o m
being encoded as a saw-tooth contour line. The reader
may v e r i f y these statements by choosing the other d i a g ­
onal and then constructing the contour. The actual c o n ­
tour for the top pattern in F i g . 3 is presented at the
bottom of F i g . 3 and hatch m a r k s have been appended to
show which side of a contour line is the high or black
side.
There are only six essentially different combina­
tions of values possible for the corners of a square.
These are shown in F i g . 3 along with the proper d i a g ­
onal. Where no diagonal is shown it means that either
one may be chosen.
Now that the picture area has been triangulated, we
define our function F separately f o r each t r i a n g l e .
Supposing some triangle to have vertices A, B, and C,
we let P A , PB, P c denote the picture values at points
A, B, C which are indeed points of the " m a t r i x p i c t u r e . "
There is a unique linear function F*(x,y) = ax -t by + c
which takes on values PA, P g , PC at points A, B, C
Each triangle of the triangulation determines uniquely
an F*ABC and F is defined so that F(x,y) = F ^ g ^ x . y )
whenever (x,y) is inside or on triangle ABC.
The proofs that F is well-defined and continuous are
easily derived f r o m the exemplary behavior of linear
functions. In fact, the only way discontinuity could pos­
sibly occur is by way of F being double defined for some
point on a common edge SB between two triangles A^
and A2* The reason that this cannot occur is because
F1* and F2* (the linear functions for A^ and A 2 ) when
r e s t r i c t e d to the edge AB are both found to vary l i n e a r l y
between A and B and to have the same values at these
two points; hence F1* = F2* on the edge AB and F is w e l l defined .
Having defined F over the picture area as a c o n t i n ­
uous piecewise-linear approximation to the o r i g i n a l p i c ­
t u r e , we can simply repeat that the " s t r u c t u r a l d e s c r i p ­
t i o n " of the picture is the set of contour lines of F drawn
at a height of 1/2. This approach certainly corresponds
w e l l with any intuitive notion of edges in a binary p i c t u r e .
Curvaturepoints
We use the t e r m "curvaturepoints" to describe the
points where the contour lines (called "edges") change
d i r e c t i o n . For example, in the bottom of F i g . 3 the
numbered points 1 - 1 0 are the curvaturepoints whereas
E and F are points on the contour but not at bends. This
is because the two sections of contour line which meet
at E are both in the same d i r e c t i o n , t Referring to F i g .
4, we see that at E there is an incoming edge and an
outgoing edge both in d i r e c t i o n 5, whereas at point 8
there is an incoming edge in direction 6 and an outgoing
edge in direction 5.
It turns out that curvaturepoints can only occur at
points midway between two picture points. F u r t h e r ­
m o r e , the question of whether or not a given point is a
tContours are directed so that the hatch m a r k is on the
right.
-622-
FIG. 5
curve is not disturbed at a l l . The bend at point 5 in
F i g . 6 is not exactly +1 as can r e a d i l y be seen, but as
an approximation it does not r e a l l y go f a r wrong. F u r ­
t h e r m o r e , if m o r e precision is r e q u i r e d , it can be c a l ­
culated f r o m the X and Y coordinate values of the 3
points 1, 5 and 9.
The main advantages of p e r f o r m i n g this smoothing
of inflections are the reduction in the amount of data to
be processed in subsequent d i s c r i m i n a t i o n algorithms
and the elimination of some sequences of edge bends
which we may c a l l "spurious wiggles" since the c o r ­
responding edge in a typical o r i g i n a l continuous pattern
was probably straight. Figure 5 shows a "spurious
wiggle" of 4 p a i r s of points. In the example depicted
the reduction in data is substantial, amounting to 40%,
although the picture pattern contains a great deal of
real detail.
As a f i n a l point we want to make it perfectly clear
that this smoothing procedure is not an absolutely nec­
essary part of the o v e r a l l method. Depending on the
particular application, it may or may not be a p p r o p r i ­
ate. We present it here in place of several other
smoothing algorithms which we have experimented with
because it has been found to accomplish a substantial
reduction in data without eliminating many m a j o r d e ­
t a i l s and while r e q u i r i n g only simple decision c r i t e r i a .
SMOOTHED
EDGE CONTOUR
FIG. 7
FIG. 6
Recovery A l g o r i t h m
Bend Groups
When the sequence of curvaturepoints describing a
single closed contour has been constructed and the " i n flections" have been deleted as described in the previous
section, then a very compact but extremely characterizing signature can be computed by merging c u r v a t u r e points according to the following r u l e s :
T r a n s f o r m the closed curve into a sequence of i n tegers representing amounts of bend by adding together
the values of bend for two adjacent curvaturepoints of
like sign whenever their distance apart is less than some
threshold which is appropriate to a given application.
Figure 7 depicts how the grouping of like points
would be made f o r our sample curve of the last section,
assuming a threshold distance of approximately 4 . 0 .
Once again we must emphasize that special a p p l i cations w i l l generally suggest variations in the merging
of bends in addition to guiding the determination of the
distance threshold. Some applications may, of course,
r e q u i r e the retention of m o r e detailed edge length i n f o r mation in which case this simple scheme would not be
appropriate.
-625-
An important property of " s t r u c t u r a l descriptions"
is that an algorithm exists which w i l l recover the o r i g inal digitized binary m a t r i x pattern given only the struct u r a l description. We shall give a brief outline of such
an algorithm and discuss the theoretical foundation for
its validity.
The famous mathematical theorem known as the
"Jordan Curve Theorem" states that any simple closed
curve in the plane divides the remainder of the plane
into two distinct sets called the " i n s i d e " and " o u t s i d e , "
such that any two points in the same set can always be
connected with a continuous curve lying entirely in that
same set; any continuous plane curve joining a point
" i n s i d e " the curve to one "outside" must have at least
one point in common with the closed curve. The proof
of this theorem is much simpler for polygons than for
general curves; Courant and Robbins2 give a proof f o r
the polygonal case which depends on the fact that if a
point "outside" a closed curve
is connected by a
straight line L to another point P, then P is also " o u t side"
if and only if L intersects " an even number
of t i m e s . If L intersects
an odd number of t i m e s ,
then P is " i n s i d e "
The algorithm we propose is based on the latter
fact, which is also proved in Ref. 2. Referring to
F i g . 9, we shall show how it is possible to determine
while ranges a,
and
are outside. The proper
assignment of values then depends only on the status of
The determination that ranges p and 5 are inside
proceeds as follows:
Curve r is traced f r o m i t s top a l l the way around
and back to the top again. Whenever the curve
crosses the v e r t i c a l line
, we r e c o r d the fact by
making a m a r k in a l l points below the c r o s s i n g . For
example,
crosses f i r s t at
and a l l points in ranges,
and c are marked + as shown in the table.
T h i s means that each of these marked points can be c o n ­
nected by a straight line to the top of the picture area
only by a line crossing
at
The table is f i l l e d in
f o r a n d
i n a n entirely s i m i l a r manner. After
64, m e r e are no more crossings and the curve returns
to the starting point. We then determine which points
have received an odd number of m a r k s and find that a l l
points of ranges B and 5 are inside
When this procedure is c a r r i e d out f o r a l l v e r t i c a l
lines
then the " i n s i d e " points f o r w i l l have been
determined. In any actual implementation, we assume
that a l l
would be handled simultaneously while
was being traced only once.
If the same process is repeated for a l l curves
in
a " s t r u c t u r a l d e s c r i p t i o n " then the result is Just as valid
as f o r one curve because curve crossings always r e p r e ­
sent a color change.
No claim is made for the efficiency of this procedure;
we m e r e l y wanted to show w i t h some degree of mathe­
m a t i c a l r i g o r that the " s t r u c t u r a l d e s c r i p t i o n " contains
a l l the information of the o r i g i n a l digitized pattern and it
can be mechanically recovered if necessary.
Advantages of the Curvaturepoint Method
The generality of the curvaturepoint method is one
of i t s most important p r o p e r t i e s . The method applies to
any "black and w h i t e " pattern whose significant content
consists of connected sets of s i m i l a r points. Hence, it
also applies to input pictures which can be made to c o m ­
p l y w i t h this condition by suitable preprocessing; f a i r l y
simple local transformations on binary pictures have
been found to be very successful at r e g u l a r i z i n g pictures
in this way. Our method can be used in some cases to
extract contrast information from g r e y - s c a l e * pictures
by converting the picture to binary several times using
different thresholds. This idea is more f u l l y elaborated
in Zahn. 1 T h i s applicability to m u l t i l e v e l pictures
should come as no surprise when it is remembered that
the method is simply a contouring of a two-dimensional
d i s t r i b u t i o n of n u m e r i c a l values.
Simplicity and mathematical r i g o r are properties of
the method which we feel have been largely overlooked in
most "edge-detection" schemes. The two properties are
closely r e l a t e d , f o r the mathematical r i g o r w i t h which
"curvaturepoint e x t r a c t i o n " and " l i n k a g e " are performed
is a d i r e c t consequence of the simple and straightforward
definition of the " s t r u c t u r a l d e s c r i p t i o n . " Being contour
♦Picture values range over an ordered finite set such as
(0, 1 , 2, 3).
-626-
lines of a simple function, the " s t r u c t u r a l description"
is constrained to consist of non-intersecting closed
polygonal curves whose edges are directed in only eight
different ways and whose edge bends or "curvaturepoints"
are also tightly constrained. It is precisely because of
such constraints that the "linkage" of widely spaced
"curvaturepoints" can be accomplished in a completely
assured way. Not only is the transformation f r o m b i n ­
ary pattern to structural description rigorous and
uniquely defined; the reverse transformation (see Re­
covery Algorithm) exists as w e l l , proving a unique oneto-one correspondence between a binary pattern and its
s t r u c t u r a l description and also showing that the s t r u c ­
t u r a l description contains total information f r o m the
binary pattern.
One of the most serious obstacles to the further
development of digital picture processing is the volume
of data implied by the two-dimensionality of p i c t u r e s .
When the resolution is doubled the data volume is quad­
rupled; a picture 1,000 x 1,000 has one m i l l i o n data
points, an amount which is s t i l l prohibitively high for
even the largest computers. The " s t r u c t u r a l d e s c r i p ­
t i o n " on the other hand contains one-dimensional i n f o r ­
mation (contour lines) and therefore the data volume
increases only l i n e a r l y with resolution. This means
that if "curvaturepoint extraction" can be done in special
d i g i t a l hardware then the storage requirement for i m p l e ­
menting the method on a general purpose computer w i l l
be greatly reduced and more r i c h l y detailed patterns can
be handled. The "linkage" would be accomplished by
p r o g r a m m i n g . It turns out very happily that " c u r v a t u r e ­
point e x t r a c t i o n " is defined by extremely simple Boolean
logic (see Curvaturepoints) and its digital hardware
implementation could be accomplished with the addition
of two long shift r e g i s t e r s . We consider this efficient
implementation to be an extremely vital aspect of the
method.
The intuitive character of our method should prove
of considerable benefit in constructing recognition a l ­
g o r i t h m s . In a recent a r t i c l e U h r 3 claims that edges,
angles and the interrelations between lengths and slopes
are the important and meaningful properties for human
pattern perception and recognition. If this is true then
it is reasonable to expect that algorithms based on
" s t r u c t u r a l descriptions" w i l l be highly intuitive in na­
t u r e and hence less mysterious than those based on i n ­
formation other than edges. In support of U h r ! s c l a i m
we shall cite evidence f r o m psychology and neurophysi­
ology which tends to indicate the predominance of edge
information in animal visual perception. Attneave 4 in
experiments on human subjects found that the number of
bends in polygonal shaped objects accounted for 80% of
the difference between objects when rated by subjects
according to judged complexity. This c l e a r l y suggests
importance of curvaturepoints. Other researchers have
found that an object's v i s i b i l i t y is related to the length
of its boundary, strongly suggesting that for purposes of
perception the edge is the predominant information c o n ­
tent.
The research of L e t t v i n et ah 5 on the optic nerve
cells in the f r o g indicates that signals reaching the
f r o g ' s b r a i n f r o m its eyes are highly contrast-oriented.
Hubel and W i e s e l 6 determined that nerve cells in the
cat's brain are specific to the existence of edges in the
visual field of given slope. This neurophysiologyal
evidence if extrapolated to the human case lends further
support to the c l a i m that edge information is the raw
data for visual perception mechanisms in man.
Although the method is clearly directed toward r e c ­
ognition via edge-bend-sequence, nevertheless f a m i l i a r
quantitative properties such as p e r i m e t e r , area, m o ­
ments, height and width are easily calculated. In addi­
t i o n , we can define and compute reasonably intuitive
quantitative measures of oblongness, compactness,
wigglyness and total absolute curvature.
For some pattern recognition applications (character
recognition especially) it is useful to have a method
which is position and size invariant. "Structural d e ­
s c r i p t i o n s " contain position and size information but in
such a way that it is quite easily disregarded. Rotation
invariance is also easy to achieve by simply considering
the sequence of curvaturepoints without paying attention
to the i n i t i a l edge d i r e c t i o n . The information is always
there when needed, however, so that recognition methods
sensitive to position, size and rotation are not hampered
in the least.
A pattern description method is greatly enhanced if
it is compatible with some f a i r l y elegant language for
p i c t u r e s . This is p a r t i c u l a r l y true if the language has
a f o r m a l phrase structure grammar which allows pat­
t e r n recognition to be accomplished by the f o r m a l p a r ­
sing procedures which have been developed for such
g r a m m a r s . The survey paper of M i l l e r and Shaw 13
discusses this aspect of picture processing quite c o m ­
prehensively. Ledley 7 for example, has shown that
f o r m a l parsing techniques can be used to recognize d i f ­
ferent chromosome shapes by transforming their edge
sequences into a s t r i n g of p r i m i t i v e s and then parsing
this s t r i n g . For example, in F i g . 8 O means no bend,
E means sharp 180° convexity, Y means sharp 180° con­
cavity, etc. These are the p r i m i t i v e s of the language.
The f o r m a l g r a m m a r would define " a r m " as OEO,
"double a r m " as a r m Y a r m , etc.
encodings" of the curves defining boundaries in a binary
p i c t u r e . For example, the Freeman chain-encoding of
the closed curves of F i g . 3 would be (00766555333111)
and (5713). Each digit represents a unit vector as
shown in F i g . 4 and the curve is traced sequentially.
Our " s t r u c t u r a l descriptions" vary f r o m this format
only to the extent of merging l i k e - d i r e c t i o n contiguous
unit vectors into a single vector with length. The
" s t r u c t u r a l description" for the outer curve of F i g . 3 is
essentially
These curve
matching algorithms are capable 12 of putting together
an "apictorial jigsaw puzzle" which attests to the sub­
tlety of their shape d i s c r i m i n a t i o n .
Acknowledgements
The author would like to thank Alan Shaw of Cornell
University* and Jeno Gazdag of I B M for many hours of
very useful discussion on pattern recognition methods;
Robert H. Penny of General Electric Co. for introducing
us to and stimulating our interest in digital picture
processing and W i l l i a m F. M i l l e r of Stanford University
for his continuing encouragement and support of this
research. We also thank the referee for several con­
structive suggestions.
An essentially one-dimensional data format seems
to be advantageous for linguistic processing of pictures
because phrase-structure g r a m m a r s describe sets of
" s t r i n g s . " When the data is not in a t r s t r i n g " format
there are some subtleties involved in making the c o r ­
respondence between the data and linguistic f o r m a l i s m .
The recent work of Shaw 8 shows that automatic parsing
recognizers can be quickly implemented when the s t r u c ­
ture of the picture can be represented by a suitable onedimensional g r a m m a r . Syntax directed parsing methods
are employed so that new recognition tasks require only
a new syntax table and specially w r i t t e n p r i m i t i v e r e c ­
ognizers.
In addition to f o r m a l language parsing methods, our
" s t r u c t u r a l d e s c r i p t i o n " can be used very readily in a
decision tree approach to recognition. In fact, the cyclic
l i s t data f o r m a t of the " s t r u c t u r a l description" lends itself naturally to sequential decisions made as the l i s t is
t r a v e r s e d . As with a l l pattern description schemes, it
is possible to reduce to a vector of quantifiable p r o p e r ­
ties and then use one of the many procedures based on
the property vector representation of a pattern. The
algorithms of Freeman 9 , 10,11 for "curve segment
matching" are applicable with almost no change since
" s t r u c t u r a l descriptions" are essentially "Freeman
-627-
References
1.
C. T. Zahn, "Two-dimensional pattern description
and recognition via curvaturepoints," Report No.
S LAC-70, Stanford Linear Accelerator Center,
Stanford University, Stanford, California (1966).
2.
R. Courant and H. Robbins, What is Mathematics ?
(Oxford University Press, London, 1941); pp. 267269.
3.
L.Uhr, "Pattern r e c o g n i t i o n , " in Pattern Recogni­
tion, ed. Leonard Uhr (John Wiley & Sons, New
Y o r k , 1966); pp. 365-381.
4.
F. Attneave, "Physical determinants of the judged
complexity of shapes," J. E x p t l . Psychol. 53, 221
(1957).
5.
J. Y. Lettvin, H. R. Maturana, W. S. McCulloch
and W. H. P i t t s , "What the frog's eye tells the
frog's b r a i n , " P r o c . IRE 47, (November 1959);
pp. 1940-1951.
6.
D. H. Hubel and T. N. Wiesel, "Receptive f i e l d s ,
binocular interaction, and functional architecture
in the cat's visual c o r t e x , " J. Physiol. 160, 106
(1962).
7.
R. S. Ledley, "High-speed automatic analysis of
biomedical p i c t u r e s , " Science 146, 216 (1964).
8.
A. C. Shaw, "The f o r m a l description and parsing
of pictures, Report No. SLAC-84, Stanford Linear
Accelerator Center, Stanford University, Stanford
California (1968).
9.
H. Freeman, "On the encoding of a r b i t r a r y geo­
m e t r i c configurations," IRE T r a n s , on Electronic
Computers EC-10, (1961); pp. 260-268.
♦ f o r m e r l y Stanford University.
-628-
Download