PROGRESS I N EXTENDING THE VIRGIN PROGRAM VISION FLASH #20 Mark Dowson Massachusetts I n s t i t u t e o f Techno1ogy A r t i f i c i a l I n t e l 1igence Laboratory Vision Group September 1971 ABSTRACT The VIRGIN program w i l l i n t e r p r e t pictures o f simple scenes. This paper describes a program, SINNER, which w i l l deal with pictures which cont a i n cracks and shadows. In addition t o handling pictures o f t h i s r i c h e r world, SINNER employs heuristics which use know1edge about the s t r u t t u r e o f the three dimensional world t o reduce the number o f interpretations o f some pictures and t o augment the e f f i c i e n c y o f the parsing process. Work reported herein was conducted a t the A r t i f i c i a l Intelligence Laboratory, a Massachusetts I n s t i t u t e o f Technology research program supported i n p a r t by the Advanced Research Projects Agency o f the Department of Defense and monitored by the Offjce of Naval Research under Contract Number NO0014-70-A-0362-0002. introduction V i s i o n F l a s h #14 (1) discusses extensions o f t h e V I R G I N p i c t u r e p a r s i n g program ( 2 1 , whlch assigns I n t e r p r e t a t i o n s t o simple s t r a i g h t l i n e drawings, and "shadows". First, t o handle p i c t u r e s which i n c l u d e "cracks" T h i s proposed e x t e n s i o n c o n s i s t s of two p a r t s , the se.t o f j u n c t i o n theorems i s expanded t;o I n c l u d e l a b e l l n g s of "crack" and "shadow'' j u n c t i o n s . Second, a number o f 11 h e u r i s t i c s " a r e described which w i 1 1 a c t t o reduce the number o f p o s s i b l e i n t e r p r e t a t ions o f each p l c t u r e . B o t h these types o f exterlsion correspond t o a d d i t i o n a l knowledge about the v i s u a l w o r l d and i t s representat ion i n pictures. The e x t r a j u n c t i o n theorems c a r r y i n f o r m a t i o n about how the l o c a l f e a t u r e s o f a r i c h e r t h r e e dlmensional w o r l d a r e represented I n a two dlrnenslonal p i c t u r e ; the h e u r i s t i c s embody knowledge of a more g l o b a l n a t u r e about t h e s t r u c t u r e o f t h e t h r e e dimensional w o r l d so represented. The SINNER Program A program has been w r i t t e n , c a l l e d SINNER, which incl.udes a subset o f t h e a d d i t i o n a l j u n c t i o n theorems and soma o f the h e u r i s t i c s d e s c r i b e d i n F l a s h 114. Even t h i s p a r t i a l e x t e n s i o n r a i s e s enough issues t o be w o r t h d i s c u s s i o n a t t h l s stage. F i g u r e 2 shows the j u n c t i o n l a b e l t n g s which a r e embodied as j u n c t i o n theorems i n the o r i g i n a l V I R G I N program w h i l e f i g u r e 3 shows the a d d i t i o n a l shadow and crack l a b e l i n g s i n c l u d e d I n PAGE 2 SINNER, I n these labelings, an a r r o w a c r o s s a l i n e i n d i c a t e s t h a t t h e 1 i n e r e p r e s e n t s a shadow edge w i t h t h e shadow or) t h e s i d e o f t h e a r r o w head. Note that, A 1 i n e marked ' C R ' represents a crack. u n d e r t h e assumptfons g i v e n I n F l a s h #14, r e g i o n s on e i t h e r s i d e o f a " c r a c k " o r "shadow" t h e two 1 ine represent coplanar surfaces. F i g u r e 1 shows a p i c t u r e o f a scene w i t h b o t h c r a c k s and shadows. I t i s l a b e l e d w i t h one p o s s i b l e i r l t e r p r e t a t i o n t o i l l u s t r a t e t h e s e c o n v e n t ions. FIGURE 1 PAGE 3 FIGURE 2 The V I R G I N J u n c t i o n Label lngs PAGE 4 FIGURE 3 A d d i t i o n a l J u n c t i o n L a b e l i n g s I n c l u d e d i n SINNER ELLS The S I N N E R Heuristics The t e r m " h e u r i s t i c " i n t h i s context; i f i t does. c a r r i e s a c o n n o t a t i o n which i s u n f o r t u n a t e a t r i c k which may o r may n o t work b u t 1s u s e f u l I n general h e u r i s t i c s are, In a d i s g u i s e d form, the progranimer's u r l a r t f c u l a t e d knowledge about the p r o b a b l e s t r u c t u r e o f t h e w o r l d h i s program l i v e s i n . below a r e q u i t e d i f f e r e n t . explicitly, program. The h e u r ~ s t i c sd e s c r i b e d They a r e intended t o a r t l c u l a t e , knowledge about t h e s t r u c t u r e of t h e domaln o f t h e They can be thought o f as the p r o c e d u r a l analogues o f theorems i n p r o j e c t i v e geometry, o p t i c s and mechanics. The 'IShadow Source" H e u r t s t i c One c o n s t r a i n t on the scenes and v i e w i n g p o s i t i o n s which y i e l d p i c t u r e s t h a t SINNER w i l l I n t e r p r e t 1s t h a t small changes i n t h e v i e w i n g p o s i t i o n o r i n t h e p o s i t i o n o f the s i n g l e l i g h t source i l l u m i n a t i n g t h e scene w i l l n o t a l t e r t h e t o p o l o g y o f t h e resulting picture. This, I n p a r t lcular, excludes p i c t u r e s 1 i k e t h a t shown I n f i g u r e 4a where one body i n t h e scene c a s t s a shadow which f a l l s p r e c i s e l y on a v e r t e x o f a n o t h e r body. PAGE 6 FIGURE 4 Thus, no 1 i n e j o i n i n g a p a i r o f j u n c t i o n s which must r e p r e s e n t real, physical, v e r t i c e s may be i n t e r p r e t e d as a shadow edge. w When such a j u n c t i o n (Any ARROW, FORK, KAY , PEAK o r MULTl) r e c e I v e s an I n t e r p r e t a t i o n which l a b e l s one o f I t s l i n e s as a shadow , an a s s e r t i o n i s made whlch marks the j u n c t l o n a t the f a r end o f t h e 1 I n e as a "SHADOWEND". I n addition, each such j u n c t f o n theorem t e s t s t h a t t h e jtunctlon I t I s about t o l a b e l I s n o t so marked. F i g u r e 4b shows a n o t h e r w i s e a c c e p t a b l e l a b e l i n g o f a fragment o f a p i c t u r e e x c l u d e d by th1s h e u r i s t i c . PAGE 7 The S H A D D W ~ ~ E G I U NH e u r i s t i c Tho a d j acerit r e g i o n s may r e p r e s e n t . t h e shadowed p o r t f ons o f a p a i r of surfaces. s u r f a c e s nus; I f t h e r e I s h u t a s i n g l e 1 I g h t source t h e t w o -- be p h y s i c a l l y d i s t i n c t whtch I s t o say t h a t t h e l i n e separating t h e r e g i o n s cannot be a shadow l'ine. constraint This i s embodied I n S l NNER by marking as a "SHADOWREGION" t h e reglon on t h e a p p r o p r i a t e s i d e o f each l i n e l a b e l e d as a shadow edge; then, as each j u n c t i o n l a b e l i n g I s completed, a test i s made o f any new SHADOWREGION t o make sure t h a t i t does n o t share a shadow l i n e w i t h any o t h e r SHADOWREGI.ON. F i e u r e 5 shows a l a b e l trig o f a p i c t u r e fragment excluded b y t h l s t e s t . The procedure which marks r e g i o n s as SHADOWREGIONS i s an antecedant theorem, one o f a f a m i l y invoked by each l a b e l I n g a s s e r t i o n ~ l t h i neach j u n c t i o n theorem. Other members o f t h i s f a m i l y o f antecedant theorems make a s s e r t i o n s which a r e used i n t h e h e u r i s t i c s n e x t described. FIGURE 5 PAGE 8 Region P r e d i c a t e H e u r i s t i c s A s s i g n i n g an I n t e r p r e t a t i o n , as an edge, t o a 1 i n e which associates a p a i r o f regions a r t i c u l a t e s the r e l a t i v e i n c l i n a t i o n of t h e two s u r f a c e s t h a t t h e r e g i o n s represent; t h a t is, it tells whether t h e s u r f a c e s a r e c o p l a n a r o r meet a t a r e l a t i v e angle o f g r e a t e r o r l e s s than 180 degrees. Whenever a l l n e a s s o c i a t i n g a p a i r o f r e g l o n s (as do a l l l i n e s ) I s labeled, one o f t h e f a m i l y o f antecedant theorems mentioned above a s s e r t s a predicate o f the, form: (PRED J RA < p r e d i c a t e > R B I Where RA and RB a r e t h e names o f t h e two regions, J t h e name o f t h e j u n c t i o n b e i n g l a b e l e d and <pred.tcate> i s COPLANAR, V X o r CV appropriately. L i n e s l a b e l e d "crack" o r "shadow" COPLANAR p r e d i c a t e , 1 i n e s l a b e l e d "VX" o r "CV" give r i s e t o a t o VX o r C V p r e d i c a t e s r e s p e c t i v e l y. NOW s h o u l d a p a l r o f r e g l o n s have more than one l l n e -In common, t h e i n t e r p r e t a t i o n s t h a t t h e l i n e s may r e c e i v e a r e c o n s t r a i n e d . No two r e g l o n s may be a s s a c f a t e d by more t h a n one o f t h e p r e d i c a t e s above a t a time snd, if two o r more 1 Ines a t 1 g i v e r l se t o V X p r e d i c a t e s o r a1 1 g i v e r i s e t o CV p r e d i c a t e s , the 1 i n e s must be c o l l n e a r . F l g u r e 6 i l l u s t r a t e s these c o n s t r a i n t s . The l a b e l i n g g i v e n I n f i g , 6a i s unacceptable as the l i n e s AB and CD a r e b o t h common t o t h e p a l r o f r e g l o n s RO and R 1 so a t l e a s t one o f them must t a k e an g c > c c l u d i n gedge' fig. i n t e r p r e t a t i o n as i n 6b u n l e s s t h e y a r e c o l i n e a r as I n f i g . 6c. PAGE 9 FIGURE 6 PAGE 1 0 ' A s i m p l e e x t e n s i o n o f these c o n s t r a i n t s i s t o make t h e P r e d i c a t e COPLANAR t r a n s 1 t l v e and t r e a t any p a l r o f adjacent, coplanar, s u r f a c e s as a s i n g l e s u r f a c e f o r t e s t s based on t h e above c o n s t r a i n t s . More complex e x t e n s i o n s a r e p o s s i b l e . For i n s t a r i c e ( r e p r e s e n t l n g t h e f u l l p r e d i c a t e as ( R A < p r e d i c a t e > R B I for clarity) ( R A VX RB).(RB and s o on. VX R C ) I, W ( R A COPLANAR R C ) These e x t e n s i o n s have n o t y e t been I n v e s t i g a t e d systematical ly. A t e s t t o see whether any r e g i o n p r e d i c a t e c o n s t r a i n t s have been v i o l a t e d i s made a f t e r each j u n c t i o n has been l a b e l e d . The p r e s e n t v e r s i o n o f S I N N E R mere1 y t e s t s t o see t h a t no more than one o f t h e l t n e s common t o t h e same p a i r o f r e g i o n s has r e c e i v e d a V X o r CV l a b e l , b u t e x t e n d i n g t h i s t o i n c l u d e the whole range o f r e g i o n p r e d t c a t e c o n s t r a i n t s w i l l be a s i m p l e m a t t e r . PAGE 11 Rssul t s A1 thol..~p;fat h e p r i m a r y i n t e n t i o n o f t h e h e u r i s t i c s d e s c r i b e d above f s t o e x c l u d e some I n t e r p r e t a t i o n s o f p i c t u r e s which would o t t ~ e r w l : ;he ~ admissable, .[,rIthm r)arC,1n7 they augment t h e e f f l c l e n c y o f the i n a t h e r ways, Some J u n c t i o n I n t e r p r e t a t i o n s wf~lr:tl .. -4 or11 y he d i s c o v e r e d t o he i r ~ c o r r e c tl a t e r I n t h e parsIr~l *; :: r e j e c t e d Imniedlately b y the h e u r l s t i c s . I n the c u r r e r r t Ir~!plerbtentatI o n t h i s prov Ides o n l y a mare1n a l s a v i n g I n corrtputation t i m e s i n c e the r~echanisms f o r making t h e t e s t s a r e relatfvely "expensive'' and o n l y a sirhset o f the p o s s i b l e h e u r l s a i c s a r e a c t u a l l y applied; b u t l t i s expected t h a t , w i t h a r u l l e r s e t of heuristics, the savtng I n t i m e w i l l be more s l g n f f lcant, p a r t i c u l a r l y w i t h more compl i c a t e d p i c t u r e s . For comparlson, f I g u r e 7 i l l u s t r a t e s t h e performance of V I R G I N on a s i m p l e p l c t u r e , F i g u r e 7a I s the p i c t u r e p r e s e n t e d as d a t a t o V I R G I N ( I n t h e form o f a l i s t o f angle and j u n c t i o n t y p e a s s e r t i o n s ) and f i g u r e s 7 b t o 7e show the r e s u l t i n g p a r s i n g s . The time g i v e n b e s l d e each i n t e r p r e t a t i o n I s t h e processor t i m e taken, a f t e r s t a r t i n g t h e program, t o produce t h e I n t e r p r e t a t i o n . These times a r e g i v e n f o r comparlson o n l y s i n c e t h e a b s o l u t e t i m e s depend e n t i r e l y on t h e ImplementatIon. between t h e l a s t o f these and t h e t ime taken, ' The discrepancy t o t a l time' represents the a f t e r p r o d u c i n g the l a s t poss I b l e i n t e r p r e t a t i o n , to d i s c o v e r t h a t no o t h e r I n t e r p r e t a t i o n i s possible. Time taken up i n garbage c o l l e c t i o n has been s u b t r a c t e d throughout, PAGE 1 2 FIGURE 7 ' The f o u r f n t e r p r e t a t i o n s correspond t o e l t h e r a cube I n v i s i b l v supported c l e a r o f t h e s u r f a c e r e p r e s e n t e d b y t h e background o f t h e p i c t u r e o r s t u c k t o i t along some p a l r o f edges o f the cube. T h e p i c t u r e o f f i g u r e 8 has o n l y a s i n g l e I n t e r p r e t a t i o n w i t h t h e r e g i o n R 4 r e p r e s e n t i n g a shadow c a s t by the cube on I t s s l ~ p p a r Irlg t surface. S l N N E R takes 1 5 seconds t o produce t h i s t o t e r p r e t a t i a n and a f u r t h e r 65 seconds t o d i s c o v e r t h a t no f u r t h e r tnterpretatlons are possible. FIGURE 8 Note t h a t no e x p l i c i t I n f o r m a t i o n t e l l s SINNER t h a t R4 Is a shadow; t h e r e I s no o t h e r p o s s l h l e i n t e r p r e t a t i o n under t h e constraints we have s e t up. ambiguous. The p i c t u r e o f f i g u r e 9 I s more F i g u r e s 10a through 1 0 f g i v e t h e i n t e r p r e t a t i o n s t h a t SINNER r e t u r n s . t h e J u n c t i o n J27, Here t h e r e a r e f o u r d i f f e r e n t I n t e r p r e t a t ions o f two o f t h e J u n c t i o n J 2 6 and two o f J25. a l l o f t h e 16 combinations a r e p o s s i b l e , however, Not as Interdependencies amongst t h e I n t e r p r e t a t i o n s o f the j u n c t i o n s reduce the number o f p o s s i b l e t n t e r p r e t a t l o n s t o s i x . PAGE 14 PAGE 15 FIGURE 10a a '~hadocydirection' hevristie w o u l d eduk t k ~ i n kerprotat;oh ~ FIGURE IQe FIGURE 105 PAGE 1 8 . f t I s now p o s s i b l e t o see how more knowledge about t h e w o r l d c a n reduce t h e number o f i n t e r p r e t a t i o n s s t i l l f u r t h e r . The i n t e r p r e t a t i o n g i v e n i n 1 0 f would n o t appear i f the r e g i o n p r e d i c a t e s were made t r a n s f t i v e a s ' dtscussed above I n the s e c t i o n o n Regiarl P r e d i c a t e heuristics. with RO R 7 would be t r e a t e d as coplanar b o t h o f whlch a r e concavely r e l a t e d by non-col I n e a r lines to R6. The two o t h e r i n t e r p r e t a t i o n s which g i v e R 7 as a shadow region, l o b and 10d, c o u l d be e x c l u d e d by an h e u r i s t i c whlch t o o k i n t o account d i r e c t i o n s o f shadows s i n c e i f R 4 and R 5 a r e shadow r e g i o n s t h e d i r e c t i o n o f t h e 1 i g h t source must be such t h a t t h e r e i s no o b j e c t I n t h e scene t o c a s t a shadow I n the p o s i t i o n o f R7. T h i s leaves t h r e e i n t e r p r e t a t i o n s which c o r r e s p o n d t o the intuitively acceptable ones o f the bottom b l o c k b e i n g a t t a c h e d t o a w a l l (lOa), t n v t s l b l y supported ( 1 0 ~ )o r a t t a c h e d t o t h e f l o o r (10e). The p l c t u r e o f f i g u r e 11 has o n l y two I n t e r p r e t a t i o n s shown as l l a and l a b , and i t I s c l e a r t h a t t h e i n t e r p r e t a t i o n o f Ilb can be e x c l u d e d by t h e same h e u r i s t i c as \vould exclude 1 0 f above. However, t h e v e r y l o n g t i m e t h a t S I N N E R takes ( o f the 'order o f f o r t y m i n u t e s ) t o produce these i n t e r p r e t a t i o n s exposes a m a j o r d e f i c i e n c y I n t h e b a s i c s t r u c t u r e o f the SINNER program. PAGE 19 FIGURE lla FIGURE llb PAGE 20 What happens I s t h a t , I s R12, as one o f t h e f i r s t regtons t o be t a c k l e d t h e T - j u n c t i o n J l I s one o f the f i r s t t o be l a b e l e d . Due t o t h e o r d e r o f j u n c t i o n theorems i n t h e database t h e s h a f t o f t h e T g e t s l a b e l e d as an o c c l u d i n g edge. Now t h l s i s i n c o r r e c t as t h e o n l y p o s s i b l e l n t e r p r e t a t i o n o f R 1 Is as a shadow region. However, s i n c e R1 Is t h e l a s t r e g i o n t o be d e a l t w i t h , t h l s error can o n l y be c o r r e c t e d a f t e r b a c k i n g up through almost t h e whole i n t e r p r e t a t i o n t r e e t r y i n g d i f f e r e n t s i d e branches, u n s u c c e s s f u l 1 y, a t each l e v e l . What i s worse i s t h a t when backup f i n a l l y reaches J l , , t h e I n t e r p r e t a t i o n n e x t chosen f o r t h e s h a f t o f t h e TEE i s n o t t h e r l g h t one, Interpretatton. b u t a n o t h e r wrong I n a l l t h l s process i s repeated f i v e times b e f o r e the r l g h t i n t e r p r e t a t i o n surfaces. s t u p i d way t o I n t e r p r e t t h i s p i c t u r e . Clearly t h i s I s a For t h i s p a r t i c u l a r p i c t u r e t h e t r o u b l e c o u l d be c u r e d by r e o r d e r i n g the theorems i n the data-base, b u t t h e n t h e same t r o u b l e would be encountered on some d i f f e r e n t p i c t u r e . There a r e o t h e r p o s s i b i l l t l e s . One m i g h t be t o d e f e r t h e I n t e r p r e t a t i o n o f t h e s h a f t s o f TEE'S u n t i l t h e end o f an i n t e r p r e t a t i o n . worst trouble, Although t h l s would e l i m i n a t e some o f t h e i t would d e s t r o y t h e u n l f o r m l t y wi'th which t h e t o p l e v e l program handles v a r l o u s types o f p l c t u r e fragment. A mare g e n e r a l approach would be f o r the program t o t a k e some ftweral l v i e w a f t h e p i c t u r e i n advance o f t r y i n g t o make a r r e t a f l e d interpretation so as t o f o r m u l a t e same s t r a t e g i c plam ;3?mut how t o conduct t h e i n t e r p r e t a t Ian F l r s t , and s o on. ::.srnbit~nfior; o f t h i s -- whfch reglon t o t a c k l e The f i n a l approach 1s l i k e l y t o use some and o t h e r methods, i l ! v c > l v ~': a r e rlot even w e l l formulated, b u t as y e t t h e problenis much less c l e a r l y (ir-rd~a ~"s.tr.~~d. Cnncluslcns E x t e n d f n g t h e V I R G I N program t o deal w l t h a r i c h e r w o r l d and t o deal w i t h t h a t w o r l d l e s s n a i v e l y has answered some q u e s t i o n s about s h i s approach t o p i c t u r e I n t e r p r e t a t i o n and has opened many We have seen t h a t the approach f i r s t avenues f o r f u t u r e work. delfneated by Clowes ( 3 ) and Huffman ( 4 ) i s s t i l l a p p l i c a b l e In a ~ t c h e rv l s u a l world. The e f f o r t r e q u i r e d t o parse a p i c t u r e grows bv!th t h e tncreased v a r i e t y of p i c t u r e f e a t u r e s and w l t h the complexity o f the plcture, approach I m p r a c t l c a b l e , b u t n o t a t a r a t e which makes the Knowledge about t h e w o r l d l e s s l o c a l than t h a t c a r r i e d by a t a b l e o f permissable j u n c t i o n l a b e l l n g s w t l l reduce t h e a m b i g u i t y o t h e r w i s e i n h e r e n t I n some p l c t u r e s and decrease t h e e f f o r t requ l r e d t o I n t e r p r e t o t h e r s , Work i n t h e immediate f u t u r e w i l l c o n c e n t r a t e on I n c r e a s i n g t h e amount o f such knowledge a v a i l a b l e t o the SINNER program. T h i s vdi 1 1 i n c l u d e some knowledge o f geometry; here the t h e o r e t f c a l PAGE 2 2 framework p r o v l d e d by Clowes e t a1 ( 5 ) w i l l be o f use. long, however, Before t h e more c h a l l e n g i n g and d l f f l c u l t problem w i l l have t o be f a c e d o f g e t t i n g the program t o f o r m u l a t e some s t r a t e g i c p l a n i n advance o f a c t u a l l y making an i n t e r p r e t a t t o n o f t h e p i c t u r e . This, t o g e t h e r w i t h o t h e r v e r y g l o b a l methods o f irnpt.ov i n g t h e performance o f the pars l n g a1 g o r 1 thm, t o a more ' g e n e r a l ' kind o f Intelligence. corresponds I t seems l i k e l y t h a t t h e r e s u l t s o b t a i n e d I n the course o f t h l s e x e r c f s e w i l l be o f w i d e r a p y l i c a b i l I t y than t o j u s t t h l s s i n g l e v i s i o n program and w i l l open up many more avenues f o r f u t u r e research. ' PAGE 2 3 REFERENCES (1) Mark ti Hal tz, Oav f d Dnwsoti, ' c r a c k s and Shadows, A.I. Lab. Vlslon F l a s h 1 4 , 1971 (2) [lowson, Hark ' V I R G I N : a program which i n t e r p r e t s 1 ine drawings. I A r t l f l c l a l I n t e l l i g e n c e Lab, M.I.T, Memo Ne.244. ( I n progress) (3) Clowes, M.B, ' o n Seeing T h i n g s ' Artificial Intel1 igence Vo1.2, 1 Huffrt~an D.A. (5) Clowes M,B, t!ackwart:h A . K . Star] t a n R . 6.. 1971 ' I m p o s s i b l e O b j e c t s as Nonsense sentences' Machine I n t e l 1 Igence 6. Ed. C a l l I n s b M t c h i e 1970. ' P i c t u r e I n t e r p r e t a t I o n as a problem-:-olving process' Lab, o f E x p e r i m e n t a l Psychology U n i v e r s i t y o f Sussex June 1971.