A FORMAL DESCRIPTION FOR TWO-DIMENSIONAL PATTERNS* C. T. Zahn Stanford Linear Accelerator Center Stanford University, Stanford, California Summary A " s t r u c t u r a l description" for two-dimensional black and white patterns is defined as the set of contour lines for an appropriate function which fits the binary pattern. These contour lines are mutually nonintersecting closed polygonal curves with edges in only eight different directions and they represent the bound­ aries between connected black and white areas of the pattern. Rigorous procedures are described to t r a n s ­ f o r m a " m a t r i x p a t t e r n " into a " s t r u c t u r a l description" and vice versa. Advantages of this method for d e s c r i b ­ ing patterns previous to pattern recognition are d i s ­ cussed at some length. description" we employ consists of a set of simple polyg­ onal closed curves whose vertices are ordered in either clockwise or counter-clockwise sense. These curves represent the continuous boundaries between connected sets of black points which we c a l l "objects" and con­ nected sets of white points which we call " h o l e s . " F i g . 2 depicts the structural description of the pattern in Introduction This paper presents a method for obtaining a " s t r u c ­ t u r a l description" of any two-dimensional picture whose elements are a finite set of points in the plane each hav­ ing value zero (white) or one (black) and such that the points are arranged in the f o r m of a square l a t t i c e . Figure 1 shows one such p i c t u r e . The " s t r u c t u r a l FIG. 2 F i g . 1. Our structure also includes information i n d i ­ cating which curves are inside each other as shown schematically by dotted lines in F i g . 2. The points of a curve are ordered so that the curve is traced keeping black points to the right and white to the left. As a r e s u l t , curves which define the outside boundary of black areas (objects) proceed clockwise and those for white areas (holes) proceed counter-clockwise. It should be noticed that the curves do not go through the outermost picture points of an area but rather go b e ­ tween these outermost points and the adjacent points of opposite c o l o r . Our selection of this particular format to describe the information content of a black/white pattern was motivated by a belief that the sequential trace of the boundary of an object contains the most useful data for FIG. 1 ♦Work supported by U.S. Atomic Energy Commission. -621- recognizing objects one f r o m another in a great many a p p l i ­ cations . Experimental evidence f r o m the psychology of human visual perception and the neurophysiology of animal visual perception support this belief to an amazing extent. For a more detailed description of this " s t r u c t u r a l d e s c r i p t i o n " and an extended discussion of i t s uses, the reader is r e f e r r e d to Z a h n . 1 Structural Description as Contour The closed curves of the " s t r u c t u r a l d e s c r i p t i o n " a r e , in fact, the contour lines at a height of 1/2 for a continuous function F(x,y) defined over the square area of the picture and such that F takes value 1 where the picture has a black point and 0 where it has a white point. The precise construction of F is as follows: Consider the subdivision of the picture area into s m a l l e r squares defined by the square m a t r i x of picture points by drawing horizontal and v e r t i c a l lines through the points of the p i c t u r e . It should be clear that a t r i angulation* of the picture area can now be effected by separately triangulating each smaller square, as shown in F i g . 3. We shall describe rules for determining which diagonal w i l l be added to each square to complete the triangulation. to connect two black points or two white points, choose the black and t r y to have as few curvaturepoints as possible. The triangulation r u l e embodied in (d) of F i g . 3 assures that black points have precedence, while rules (b) and (c) prevent a straight edge at 45° f r o m being encoded as a saw-tooth contour line. The reader may v e r i f y these statements by choosing the other d i a g ­ onal and then constructing the contour. The actual c o n ­ tour for the top pattern in F i g . 3 is presented at the bottom of F i g . 3 and hatch m a r k s have been appended to show which side of a contour line is the high or black side. There are only six essentially different combina­ tions of values possible for the corners of a square. These are shown in F i g . 3 along with the proper d i a g ­ onal. Where no diagonal is shown it means that either one may be chosen. Now that the picture area has been triangulated, we define our function F separately f o r each t r i a n g l e . Supposing some triangle to have vertices A, B, and C, we let P A , PB, P c denote the picture values at points A, B, C which are indeed points of the " m a t r i x p i c t u r e . " There is a unique linear function F*(x,y) = ax -t by + c which takes on values PA, P g , PC at points A, B, C Each triangle of the triangulation determines uniquely an F*ABC and F is defined so that F(x,y) = F ^ g ^ x . y ) whenever (x,y) is inside or on triangle ABC. The proofs that F is well-defined and continuous are easily derived f r o m the exemplary behavior of linear functions. In fact, the only way discontinuity could pos­ sibly occur is by way of F being double defined for some point on a common edge SB between two triangles A^ and A2* The reason that this cannot occur is because F1* and F2* (the linear functions for A^ and A 2 ) when r e s t r i c t e d to the edge AB are both found to vary l i n e a r l y between A and B and to have the same values at these two points; hence F1* = F2* on the edge AB and F is w e l l defined . Having defined F over the picture area as a c o n t i n ­ uous piecewise-linear approximation to the o r i g i n a l p i c ­ t u r e , we can simply repeat that the " s t r u c t u r a l d e s c r i p ­ t i o n " of the picture is the set of contour lines of F drawn at a height of 1/2. This approach certainly corresponds w e l l with any intuitive notion of edges in a binary p i c t u r e . Curvaturepoints We use the t e r m "curvaturepoints" to describe the points where the contour lines (called "edges") change d i r e c t i o n . For example, in the bottom of F i g . 3 the numbered points 1 - 1 0 are the curvaturepoints whereas E and F are points on the contour but not at bends. This is because the two sections of contour line which meet at E are both in the same d i r e c t i o n , t Referring to F i g . 4, we see that at E there is an incoming edge and an outgoing edge both in d i r e c t i o n 5, whereas at point 8 there is an incoming edge in direction 6 and an outgoing edge in direction 5. It turns out that curvaturepoints can only occur at points midway between two picture points. F u r t h e r ­ m o r e , the question of whether or not a given point is a tContours are directed so that the hatch m a r k is on the right. -622- FIG. 5 curve is not disturbed at a l l . The bend at point 5 in F i g . 6 is not exactly +1 as can r e a d i l y be seen, but as an approximation it does not r e a l l y go f a r wrong. F u r ­ t h e r m o r e , if m o r e precision is r e q u i r e d , it can be c a l ­ culated f r o m the X and Y coordinate values of the 3 points 1, 5 and 9. The main advantages of p e r f o r m i n g this smoothing of inflections are the reduction in the amount of data to be processed in subsequent d i s c r i m i n a t i o n algorithms and the elimination of some sequences of edge bends which we may c a l l "spurious wiggles" since the c o r ­ responding edge in a typical o r i g i n a l continuous pattern was probably straight. Figure 5 shows a "spurious wiggle" of 4 p a i r s of points. In the example depicted the reduction in data is substantial, amounting to 40%, although the picture pattern contains a great deal of real detail. As a f i n a l point we want to make it perfectly clear that this smoothing procedure is not an absolutely nec­ essary part of the o v e r a l l method. Depending on the particular application, it may or may not be a p p r o p r i ­ ate. We present it here in place of several other smoothing algorithms which we have experimented with because it has been found to accomplish a substantial reduction in data without eliminating many m a j o r d e ­ t a i l s and while r e q u i r i n g only simple decision c r i t e r i a . SMOOTHED EDGE CONTOUR FIG. 7 FIG. 6 Recovery A l g o r i t h m Bend Groups When the sequence of curvaturepoints describing a single closed contour has been constructed and the " i n flections" have been deleted as described in the previous section, then a very compact but extremely characterizing signature can be computed by merging c u r v a t u r e points according to the following r u l e s : T r a n s f o r m the closed curve into a sequence of i n tegers representing amounts of bend by adding together the values of bend for two adjacent curvaturepoints of like sign whenever their distance apart is less than some threshold which is appropriate to a given application. Figure 7 depicts how the grouping of like points would be made f o r our sample curve of the last section, assuming a threshold distance of approximately 4 . 0 . Once again we must emphasize that special a p p l i cations w i l l generally suggest variations in the merging of bends in addition to guiding the determination of the distance threshold. Some applications may, of course, r e q u i r e the retention of m o r e detailed edge length i n f o r mation in which case this simple scheme would not be appropriate. -625- An important property of " s t r u c t u r a l descriptions" is that an algorithm exists which w i l l recover the o r i g inal digitized binary m a t r i x pattern given only the struct u r a l description. We shall give a brief outline of such an algorithm and discuss the theoretical foundation for its validity. The famous mathematical theorem known as the "Jordan Curve Theorem" states that any simple closed curve in the plane divides the remainder of the plane into two distinct sets called the " i n s i d e " and " o u t s i d e , " such that any two points in the same set can always be connected with a continuous curve lying entirely in that same set; any continuous plane curve joining a point " i n s i d e " the curve to one "outside" must have at least one point in common with the closed curve. The proof of this theorem is much simpler for polygons than for general curves; Courant and Robbins2 give a proof f o r the polygonal case which depends on the fact that if a point "outside" a closed curve is connected by a straight line L to another point P, then P is also " o u t side" if and only if L intersects " an even number of t i m e s . If L intersects an odd number of t i m e s , then P is " i n s i d e " The algorithm we propose is based on the latter fact, which is also proved in Ref. 2. Referring to F i g . 9, we shall show how it is possible to determine while ranges a, and are outside. The proper assignment of values then depends only on the status of The determination that ranges p and 5 are inside proceeds as follows: Curve r is traced f r o m i t s top a l l the way around and back to the top again. Whenever the curve crosses the v e r t i c a l line , we r e c o r d the fact by making a m a r k in a l l points below the c r o s s i n g . For example, crosses f i r s t at and a l l points in ranges, and c are marked + as shown in the table. T h i s means that each of these marked points can be c o n ­ nected by a straight line to the top of the picture area only by a line crossing at The table is f i l l e d in f o r a n d i n a n entirely s i m i l a r manner. After 64, m e r e are no more crossings and the curve returns to the starting point. We then determine which points have received an odd number of m a r k s and find that a l l points of ranges B and 5 are inside When this procedure is c a r r i e d out f o r a l l v e r t i c a l lines then the " i n s i d e " points f o r w i l l have been determined. In any actual implementation, we assume that a l l would be handled simultaneously while was being traced only once. If the same process is repeated for a l l curves in a " s t r u c t u r a l d e s c r i p t i o n " then the result is Just as valid as f o r one curve because curve crossings always r e p r e ­ sent a color change. No claim is made for the efficiency of this procedure; we m e r e l y wanted to show w i t h some degree of mathe­ m a t i c a l r i g o r that the " s t r u c t u r a l d e s c r i p t i o n " contains a l l the information of the o r i g i n a l digitized pattern and it can be mechanically recovered if necessary. Advantages of the Curvaturepoint Method The generality of the curvaturepoint method is one of i t s most important p r o p e r t i e s . The method applies to any "black and w h i t e " pattern whose significant content consists of connected sets of s i m i l a r points. Hence, it also applies to input pictures which can be made to c o m ­ p l y w i t h this condition by suitable preprocessing; f a i r l y simple local transformations on binary pictures have been found to be very successful at r e g u l a r i z i n g pictures in this way. Our method can be used in some cases to extract contrast information from g r e y - s c a l e * pictures by converting the picture to binary several times using different thresholds. This idea is more f u l l y elaborated in Zahn. 1 T h i s applicability to m u l t i l e v e l pictures should come as no surprise when it is remembered that the method is simply a contouring of a two-dimensional d i s t r i b u t i o n of n u m e r i c a l values. Simplicity and mathematical r i g o r are properties of the method which we feel have been largely overlooked in most "edge-detection" schemes. The two properties are closely r e l a t e d , f o r the mathematical r i g o r w i t h which "curvaturepoint e x t r a c t i o n " and " l i n k a g e " are performed is a d i r e c t consequence of the simple and straightforward definition of the " s t r u c t u r a l d e s c r i p t i o n . " Being contour ♦Picture values range over an ordered finite set such as (0, 1 , 2, 3). -626- lines of a simple function, the " s t r u c t u r a l description" is constrained to consist of non-intersecting closed polygonal curves whose edges are directed in only eight different ways and whose edge bends or "curvaturepoints" are also tightly constrained. It is precisely because of such constraints that the "linkage" of widely spaced "curvaturepoints" can be accomplished in a completely assured way. Not only is the transformation f r o m b i n ­ ary pattern to structural description rigorous and uniquely defined; the reverse transformation (see Re­ covery Algorithm) exists as w e l l , proving a unique oneto-one correspondence between a binary pattern and its s t r u c t u r a l description and also showing that the s t r u c ­ t u r a l description contains total information f r o m the binary pattern. One of the most serious obstacles to the further development of digital picture processing is the volume of data implied by the two-dimensionality of p i c t u r e s . When the resolution is doubled the data volume is quad­ rupled; a picture 1,000 x 1,000 has one m i l l i o n data points, an amount which is s t i l l prohibitively high for even the largest computers. The " s t r u c t u r a l d e s c r i p ­ t i o n " on the other hand contains one-dimensional i n f o r ­ mation (contour lines) and therefore the data volume increases only l i n e a r l y with resolution. This means that if "curvaturepoint extraction" can be done in special d i g i t a l hardware then the storage requirement for i m p l e ­ menting the method on a general purpose computer w i l l be greatly reduced and more r i c h l y detailed patterns can be handled. The "linkage" would be accomplished by p r o g r a m m i n g . It turns out very happily that " c u r v a t u r e ­ point e x t r a c t i o n " is defined by extremely simple Boolean logic (see Curvaturepoints) and its digital hardware implementation could be accomplished with the addition of two long shift r e g i s t e r s . We consider this efficient implementation to be an extremely vital aspect of the method. The intuitive character of our method should prove of considerable benefit in constructing recognition a l ­ g o r i t h m s . In a recent a r t i c l e U h r 3 claims that edges, angles and the interrelations between lengths and slopes are the important and meaningful properties for human pattern perception and recognition. If this is true then it is reasonable to expect that algorithms based on " s t r u c t u r a l descriptions" w i l l be highly intuitive in na­ t u r e and hence less mysterious than those based on i n ­ formation other than edges. In support of U h r ! s c l a i m we shall cite evidence f r o m psychology and neurophysi­ ology which tends to indicate the predominance of edge information in animal visual perception. Attneave 4 in experiments on human subjects found that the number of bends in polygonal shaped objects accounted for 80% of the difference between objects when rated by subjects according to judged complexity. This c l e a r l y suggests importance of curvaturepoints. Other researchers have found that an object's v i s i b i l i t y is related to the length of its boundary, strongly suggesting that for purposes of perception the edge is the predominant information c o n ­ tent. The research of L e t t v i n et ah 5 on the optic nerve cells in the f r o g indicates that signals reaching the f r o g ' s b r a i n f r o m its eyes are highly contrast-oriented. Hubel and W i e s e l 6 determined that nerve cells in the cat's brain are specific to the existence of edges in the visual field of given slope. This neurophysiologyal evidence if extrapolated to the human case lends further support to the c l a i m that edge information is the raw data for visual perception mechanisms in man. Although the method is clearly directed toward r e c ­ ognition via edge-bend-sequence, nevertheless f a m i l i a r quantitative properties such as p e r i m e t e r , area, m o ­ ments, height and width are easily calculated. In addi­ t i o n , we can define and compute reasonably intuitive quantitative measures of oblongness, compactness, wigglyness and total absolute curvature. For some pattern recognition applications (character recognition especially) it is useful to have a method which is position and size invariant. "Structural d e ­ s c r i p t i o n s " contain position and size information but in such a way that it is quite easily disregarded. Rotation invariance is also easy to achieve by simply considering the sequence of curvaturepoints without paying attention to the i n i t i a l edge d i r e c t i o n . The information is always there when needed, however, so that recognition methods sensitive to position, size and rotation are not hampered in the least. A pattern description method is greatly enhanced if it is compatible with some f a i r l y elegant language for p i c t u r e s . This is p a r t i c u l a r l y true if the language has a f o r m a l phrase structure grammar which allows pat­ t e r n recognition to be accomplished by the f o r m a l p a r ­ sing procedures which have been developed for such g r a m m a r s . The survey paper of M i l l e r and Shaw 13 discusses this aspect of picture processing quite c o m ­ prehensively. Ledley 7 for example, has shown that f o r m a l parsing techniques can be used to recognize d i f ­ ferent chromosome shapes by transforming their edge sequences into a s t r i n g of p r i m i t i v e s and then parsing this s t r i n g . For example, in F i g . 8 O means no bend, E means sharp 180° convexity, Y means sharp 180° con­ cavity, etc. These are the p r i m i t i v e s of the language. The f o r m a l g r a m m a r would define " a r m " as OEO, "double a r m " as a r m Y a r m , etc. encodings" of the curves defining boundaries in a binary p i c t u r e . For example, the Freeman chain-encoding of the closed curves of F i g . 3 would be (00766555333111) and (5713). Each digit represents a unit vector as shown in F i g . 4 and the curve is traced sequentially. Our " s t r u c t u r a l descriptions" vary f r o m this format only to the extent of merging l i k e - d i r e c t i o n contiguous unit vectors into a single vector with length. The " s t r u c t u r a l description" for the outer curve of F i g . 3 is essentially These curve matching algorithms are capable 12 of putting together an "apictorial jigsaw puzzle" which attests to the sub­ tlety of their shape d i s c r i m i n a t i o n . Acknowledgements The author would like to thank Alan Shaw of Cornell University* and Jeno Gazdag of I B M for many hours of very useful discussion on pattern recognition methods; Robert H. Penny of General Electric Co. for introducing us to and stimulating our interest in digital picture processing and W i l l i a m F. M i l l e r of Stanford University for his continuing encouragement and support of this research. We also thank the referee for several con­ structive suggestions. An essentially one-dimensional data format seems to be advantageous for linguistic processing of pictures because phrase-structure g r a m m a r s describe sets of " s t r i n g s . " When the data is not in a t r s t r i n g " format there are some subtleties involved in making the c o r ­ respondence between the data and linguistic f o r m a l i s m . The recent work of Shaw 8 shows that automatic parsing recognizers can be quickly implemented when the s t r u c ­ ture of the picture can be represented by a suitable onedimensional g r a m m a r . Syntax directed parsing methods are employed so that new recognition tasks require only a new syntax table and specially w r i t t e n p r i m i t i v e r e c ­ ognizers. In addition to f o r m a l language parsing methods, our " s t r u c t u r a l d e s c r i p t i o n " can be used very readily in a decision tree approach to recognition. In fact, the cyclic l i s t data f o r m a t of the " s t r u c t u r a l description" lends itself naturally to sequential decisions made as the l i s t is t r a v e r s e d . As with a l l pattern description schemes, it is possible to reduce to a vector of quantifiable p r o p e r ­ ties and then use one of the many procedures based on the property vector representation of a pattern. The algorithms of Freeman 9 , 10,11 for "curve segment matching" are applicable with almost no change since " s t r u c t u r a l descriptions" are essentially "Freeman -627- References 1. C. T. Zahn, "Two-dimensional pattern description and recognition via curvaturepoints," Report No. S LAC-70, Stanford Linear Accelerator Center, Stanford University, Stanford, California (1966). 2. R. Courant and H. Robbins, What is Mathematics ? (Oxford University Press, London, 1941); pp. 267269. 3. L.Uhr, "Pattern r e c o g n i t i o n , " in Pattern Recogni­ tion, ed. Leonard Uhr (John Wiley & Sons, New Y o r k , 1966); pp. 365-381. 4. F. Attneave, "Physical determinants of the judged complexity of shapes," J. E x p t l . Psychol. 53, 221 (1957). 5. J. Y. Lettvin, H. R. Maturana, W. S. McCulloch and W. H. P i t t s , "What the frog's eye tells the frog's b r a i n , " P r o c . IRE 47, (November 1959); pp. 1940-1951. 6. D. H. Hubel and T. N. Wiesel, "Receptive f i e l d s , binocular interaction, and functional architecture in the cat's visual c o r t e x , " J. Physiol. 160, 106 (1962). 7. R. S. Ledley, "High-speed automatic analysis of biomedical p i c t u r e s , " Science 146, 216 (1964). 8. A. C. Shaw, "The f o r m a l description and parsing of pictures, Report No. SLAC-84, Stanford Linear Accelerator Center, Stanford University, Stanford California (1968). 9. H. Freeman, "On the encoding of a r b i t r a r y geo­ m e t r i c configurations," IRE T r a n s , on Electronic Computers EC-10, (1961); pp. 260-268. ♦ f o r m e r l y Stanford University. -628-