Perceptual Organization as a Foundation for ... Mahoney, David Fleet,

From: AAAI Technical Report SS-02-08. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
Perceptual Organization as a Foundation for Intelfigent
Sketch Editing
Eric Saund, James Mahoney,David Fleet, Dan Lamer, Edward Lank
Palo Alto Research Center
3333 CoyoteHill Rd., Palo Alto, CA94304
{saund, mahoney,fleet, lamer, elank} @pare.corn
Abstract
Thispaperdiscussesthe designof intelligent sketch
editingtools exploitingintermediatelevels of visual
interpretation knownas PerceptualOrganization.
Sketches are muchmore than an accumulationof
strokes: they conveyinformation largely through
spatial configurationsand patterns amongmarkings.
Weproposethat the detectionof this visual structure,
basedon the principles of PerceptualOrganization,
will makesketch editing tools moreuseful for a wide
variety of hand-drawnand formatted diagrammatic
material. Wereviewour efforts whichare embodied
in a perceptuallysupportedsketcheditingtool, called
ScanScribe.
Introduction
Oneapplication of sketch understandingis to facilitate entering and editing of hand-drawnand formatted figures.
Throughrecognition of imagestructure, intelligent editors
can makeit easier for users to select collections of markings
that correspondto salient or semantically meaningfulentities. Additionally,recognizedimagestructure can be used to
modulatethe functioningof edit operations, such as by snapping or otherwiseenforcing geometricconstraints.
Mostexisting sketch-based systems rely heavily on prior
constraints imposedfrom a targeted application domain.
Domain-dependent
strategies have been adopted to various
degrees in, for example,the design of websites [Lin, et al
2000], the drawingof architecture sketches [Gross 1996],
and the entry of mechanicaldiagrams [Alvarado 2001]. By
reducing the space of possible interpretations, domainconstraint serves to simplify the system’s problemof recognizing structure in raw content and command
data. While
highly constrained systems do support editing and entry
tasks in the domainfor whichthey weredevised, these constraints becomebarriers to sketching outside narrowlimits.
In a sketch systemthat interprets and re-renders all foursided figures as rectangles, it can becomeimpossibleto draw
a rhombus.
Conversely,unconstrained tools for drawingand editing
sketches do not deal with sketch data in meaningfulways.
Copyright~) 2002,American
Associationfor Artificial Intelligence(www.aaai.org).
All rights reserved.
118
Paint-style programsenable unlimitedediting on a pixel-bypixel basis, but offer little supportfor selecting salient collections of pixels in sketchesbeyondrectangle drag and perhaps lassoing selection commands.
Digital ink editors (e.g.,
[Pedersen et al 1993]) sometimesmanageto performsimple
temporalgroupingof strokes, but fewof thembreak strokes
into salient fragmentsor form morecomplexgroups.
To build intelligent sketch editing tools for situations in
whichthere is no a prior/identified application domain,we
believethat the early levels of PerceptualOrganization
in the
humanvisual systemprovide inspiration for an approachto
the extraction of salient visual structure. Classically, Perceptual Organization has been concernedwith the Gestalt
Lawsof Grouping, such as smooth continuation, common
fate, symmetry,similar/ty, proximity, and closure [Koffka
1922, Wertheimer1923, Kanizsa 1979, Witkin and Tenenbanm1983]. Morerecently, ComputerVision researchers
haveattempted to modelthese and related processes compotafionally (see e.g. [Stevens 1978, Loweand Binford 1983,
Zucker 1983, Boyer and Sarkar, 1999]). AlthoughPerceptual Organizationfor general visual scenes will remaina
dauntingchallenge for quite sometime, intermediate visual
structure in manysketches and diagramsis morereadily accessible.
Application and Test Platform: The
ScanScribe DocumentImage Editor
Ourgroup has been developingthis idea in the context of a
perceptually-supporteddocumentimageeditor, called ScanScr/be, whichbuilds on our earlier workin this area whileit
extends the range of imagematerial that can be manipulated
and the kinds of imagestructure recognized. Like PerSketch
[Saundand Moran,1994], ScanScribeis designed to manipulate primitive, or pr/meobjects, as well as groupingsor collections of these, called compositeobjects. Theserelations
can occuras lattice structures and are not limited to tree hierarchies, as shownin Figure 1. While PerSketchwas targeted at digital ink manipulation,ScanScribeis designedto
workalso with imageregions as manipulableand groupable
objects.
By virtue of placing users in the loop, in an interactive
editing application, someburden of accuracy is removed
from recognition technologies. Accordingly, UI techniques
a
b
O
Figure1: Alattice structure enablesprimeobjects to belong
to multiple compositeobject groups, a. Image. b. Sensible
groups, c. Groupingstructure.
mustenable users to visualize or quicklyaccess visual structures inferred by the system, and to repair or workaround
mistakesin recognition. To the extent that automaticstructure recognition can succeed, we aim to endowthe ScanScribe image editor with WYPIWYG
(What You Perceive
Is WhatYouGet) capability that will makeediting complex
collections of perceptually salient markingsas effortless as
pointing and clicking.
Key Problem: Facilitating
Selection of
Meaningful Image Material
ScanScribeis designedto facilitate selection of salient image
material. Onceselected, standard operations of cut, copy,
delete, translate, rotate, scale, etc. can all be invoked.We
regard the difficulty of selecting imageobjects that are perceptually meaningfulto the user as the primary shortcoming
of existing imageeditors.
By default, whendigital ink or a scanned documentis
loaded into ScanScribean initial processingstep separates
foreground markings from background. Thereafter, the
backgroundis treated as transparent so that foregroundobjects maybe movedin proximity to one another without occlusion by backgroundwhite pixels. Users mayselect image
objects by any of four means:rectangle drag, lasso, polygon
enclosure, and clicking on established imageobjects. These
selection methodsare integrated and maybe used seamlessly
without the user having to appeal to a menuor otherwise
specify any particular selection mode.
Selection of compositeobjects, or collections of image
primitives that form groups, is done by repeatedly clicking
on any foregroundobject. Repeatedclicks cycle through the
groupsthat the clicked primeobject belongsto. The general
question of howto present multiple interpretation options in
a graphical user interface remains open. Our experience to
119
date suggests this methodto be quite adequatefor allowing
the user to access and select amongmultiple interpretations
of imagestructure as long as the numberof groupsis not too
great, andthey are all perceptuallysalient and meaningful
to
a reasonabledegree.
This basic interface supports the use of compositegroups
even in the absence of any automatic recognition. Oncea
collection of primeobjects has beenselected and acted upon,
a newcompositeobject is automatically formedto represent this group. The group nowbecomesavailable and can
be selected by clicking somenumberof times on any of its
prime constituents, dependingon its priority in each prime
object’s list of supportedcompositeobjects. Conversely,a
group, represented by a compositeobject, is automatically
brokenup whena subset of its constituent primeobjects are
removedfromproximityto their former locations, for example by deleting themor draggingthemfar away.
Groupscan also be built and destroyed manuallyby invocation of explicit "Group"and "Ungroup"commands.Thus
the basic ScanScribeeditor providesmeansfor users to take
ultimate control over any imagestructure maintainedby the
underlying program.Withthis platform we are nowprepared
to explore and introduce recognition methodsthat will form
groupingsautomaticallyon the basis of perceptual organization or other principles.
FindingVisual Structureat the Level of
Perceptual Organization
FromComputerVision we discern two basic approaches to
beginningto identify visual structure in images.The first
is to apply linear or nonlinear operators such as tuned illters of varyingscales, exhaustivelyacross the image.Examples are edgeand bar detectors [Freemanand Ade!son1991],
and spatiotemporalfilters for motiondetection [Barrenet al
1994]. The secondapproachis to decomposethe imageinto
geometric primitives, then perform grouping and matching
operations on these to identify compositefeatures and objects. The latter approachleads moredirectly to workable
solutions for a sketch input domainconsisting of thin, relatively sparsely distributed curvilinear markingson a unform
background.Wenote howeverthat some"sketchy" styles of
drawingsuch as those illustrated in Figure2 wouldbe better
suited to filter-style detectiontechniques,andfall outsidethe
capabilities of our current approach.
Segmentation to form Prime Image Elements
Aside from reliance on relatively clean linework, our initial processingstage is indifferent to whetherthe input is a
scannedimageor digital ink. This stage is designedto segmentprimeimageobjects of twotypes: (l) relatively straight
thin curvilinear fragments unbrokenby junctions, and (2)
spatially extendedregions whichare to be treated as unarticulated two-dimensional"blobs". This roughly corresponds
to distinguishingthe stroke elementsof line art fromthe distinct characters and wordsof printed and cursive text. While
this distinction is not a fundamentalone, it is convenientto
makeand arguablyreflects a categorization into the natural kinds that occur in hand-drawnmaterial. Peoplereadily
decomposesketching activity into diagrammatic"drawing",
and symbolic"writing". The differing appearanceof text and
line art pops out to humanobservers at a glance. Froma
scene analysis standpoint, different kinds of groupingoperations are suited to recognizingtextual versus diagrammatic
structure; the rules andproceduresunderlyingthe analysis of
fignral shapes,encirclings, arrows,etc., can be rather different fromthose neededto detect the words,lines, and columns
by whichtext is structured.
At least two challenging research questions confront the
segmentation stage. The first concerns the definitional
boundarybetweencurvifinear prime objects and blob prime
objects. Byand large, handwrittenand printed text consist of
complexlyshapedagglomerationsof relatively short strokes
or stroke segments,whilefine art consists of relatively longer
strokes. Frequently, the shape features of text are commensurate in scale with the widthof strokes or font size, while
the characteristic radii of curvature, distances betweenjunctions, and spacing of lineworkin diagramsis substantially
greater than the stroke width of fines. Most diagrammatic
materialsatisfies a "thin-fine" model:neither visual appearance nor geometricstructure and relationships are substantially changedby thinning foregroundobjects downto skeletons. The thin-line modelis manifestly violated by most
handwrittenand printed text, as witnessed by the degradation of imagequality often to the point of unreadabilitywhen
thinning is performedon text images.The intention of segmentationis to deliver as curvifinear primeobjects that image material for whicha thin-fine modelis appropriate, and
leave as blobobjects that materialfor whichit is violated.
Whilefrequently the curve fragmentscomprisingline art
are readily distinguished from blob primeobjects, someimage objects are ambiguous.In Figure 3 the sameimageobject serves in one context as a fragmentof fine art, and in a
different contextas part of a character.It is critical that any
categorization be soft enoughfor classified primeimageobjects to participate undermultiplekinds of structure identification processes.
Conversionof line art into curve objects such as line segments, arcs, splines, or chain code curves is known,in the
field of graphics recognition, under the general rubric of
"vectorization". Separationof fine-art from blob imageobjects is handled by ad hoc algorithms. ScanScribeemploys
a conglomerationof techniques including standard methods
Figure 2: "Sketchy" styles of finework are not segmented
into sensible primitives through simple fine thinning and
tracing techniques.
120
~"~i
dj b --
/
Figure 3: Thesameimageprimitive can serve as part of line
art or part of a text string in different contexts.
of imagemorphology,
thresholding, boundarytracing, statistical classification, thinning,junctiondetection, tracing, and
comerdetection.
A related challenge in segmentationof prime imageelementsconcerns touching or overlappingperceptually distinct imageobjects, for examplea long curvilinear fine that
touches text. This remains an outstanding problemin the
field of documentimageanalysis. The approachwe are investigating involvesiteration on the steps of fragmentation,
then classification of the resulting parts, until primefragmentsare returned that appear not to involve mixtures of
types as indicated by statical measureson contourproperties.
Grouping to form Composite Structures
In ScanScribe,significant perceptual structure in sketches,
handwritten notes, or for that matter any documentimage,
is represented by groups of prime imageobjects represented
by compositeobjects in the structure lattice. Whileour particular focus is on groupsrepresentingstructure at the level
of Perceptual Organization, this frameworksupports many
recognition methodologiesand representational ontologies,
including matchingto domain-specificdatabases of shapes,
relations, and semanticinterpretations.
Somestructure recognition algorithms mayoperate in
stages, forming more complexor "higher level" composite
objects with the help of intermediatelevel objects foundin
turn by groupingprimitives. As such, an ontological choice
is to be madeabout the hierarchical structure of supportrelations. Hgnre4 illustrates a simple examplein whicha
"rectangle"is defined as comprisingfour "fine segments".If
the objects at the primelayer themselvesdo not fulfill the
requirementsfor the rectangle’s "fine segment"slots, then
someadditional computationmaybe required to introduce
an intermediateobject creating a line segmentout of smaller
pieces. Thereare twowaysof handlingthis, "hierarchical"
versus"fiat". In the hierarchicalapproachthe supportlattice
can growto any depth. In a fiat approachlinks maintainrelations directly betweencompositeobjects and their prime
level support. In our design of ScanScribewe foundit much
simplerto manage
these as fiat relations.
Ourapproachto detecting perceptual structure follows a
basic two-stage strategy employedwidely in the Computer
Visionfield. Withinthis strategy fall manytechnical and design options, amongwhichnotable aspects of our approach
b
a
Figure 4: a. "Rectangle"model,b. Primeobjects labeled in
imagedata. c. Hierarchicallattice structure showingsupport
relationships betweencompositeobjects and prime objects.
d. Hatlattice structure.
are discussed below.
Thefirst generic stage is construction of a graphof object relations. Pairs or n-tuples of imageobjects are identified basedon proximityand/or internal attributes. Theserelations give rise to graphstructures, wherethe nodesare image objects and the links indicate relations. The links may
be attributed withpropertiesof the relation, suchas distance,
relative orientation, etc. Initially these computations
are performedon prime imageobjects, but in subsequent grouping
stages these can be compositeobjects as well. The graph
maybe used immediately, or it maybe massagedbased on
its local and aggregateproperties. For example,twovery different formsof this graphmassagingare iterative relaxation
[Hummel
and Zucker1983] and spectral graph partitioning
[Weiss 1999].
Thesecondgenericstage is traversal of the graphto collect
groupsof transitively linked imageobjects satisfying certain
properties. This traversal mayinclude various formsof tracing, coloring, matching,and search.
To date we have applied this frameworkin separate processes dedicatedto identifyingvisual structure in line art and
in text.
Line Art Analysis
Line art analysis operates primarilyon curvilinear type prime
image objects, or curve stroke fragments. Manyof the
relevant spatial relations amongstroke fragmentsoccur at
and amongtheir endpoints. Sets of nearby curve fragment
ends are foundby clustering. Linksare established pairwise
amongproximal curve fragment ends and labeled with measures of the geometricconfiguration betweenthem. A score
is given for howwell each pair formsa corner configuration,
and for howwell the curve fragmentsalign with one another.
This score is basedthe relative orientations, distances, and
other aspects of the local geometry.
Wehave explored ways of searching for three kinds of
perceptuallevel structure overthe link graphof stroke ends.
121
O
Figure 5: a-b. Hypothetical alignment scores for pairs of
curve fragments. 1.0 is perfect alignment, c. Lackof ambiguity makesit easy to decide whichpairs should be grouped
into compositeobjects (solid curves).
Theseare based on the Gestalt principles of smoothcontinuation, continuity, and closure.
SmoothContinuation Paths Smoothcontinuation search
involves tracing paths through the rink graph, accumulating stroke fragmentsthat align with one another, end-to-end.
Thesepaths reflect the path of the pen acrossother imagematerial, and commonly
reflects semanticallysignificant as well
as perceptuallysalient imagery.Theinteresting questionsare
howto decide whichpaths to follow, and howto handle ambignity. Figure 5 illustrates. Supposethat alignmentscore
varies from0 (poor alignment)to 1 (perfect alignment).
palrwise alignmentscores shownindicate affinity for each
curvefragmentto join withthe others, that is, to establish a
local path throughthe link graph. Clearly, the palrwisescore
alone does not dictate whethera particular potential join will
be perceptually preferred. The context of other curve fragmentsmustbe taken into account. In certain situations a robust strategy is available wherebypairs of curvesamjoined
whosepreferencefor joining with one another exceeds their
preference for joining with any other. The clear X in Figure 5a offers such a case. Figure 5b poses a somewhatmore
equivocalchoice. Here,one pair of curves mutuallyprefer to
join with each other, while a third comesinto play. However,
the alignmentscore for this third curve is muchworsethan
for the mutuallypreferringcurves, and this join can be safely
ruled out. Conditionssuchas these enable global search routines to construct extendedcurvesby straightforwardtracing,
yielding the compositecurve objects shownin Figure 5c.
Figures 6a and 6b howeverpose truly ambiguoussituations. Localevidenceoffers no indication as to whichof several possiblepaths throughthe link graphgivesrise to a perceptually salient curve continuation. BecauseScanScribe’s
prime/compositeobject lattice supports multiple overlap-
b
a
\
Figure 7: Weobserve two kinds of figural closure. A
maximally-turningclosure path traces the smallest figure
possible. A smooth-continuation closure path prefers
smooth continuation traces through junctions. Other
closed paths throughthe seed (thick contour fragment) are
perceptuallyinsignificant.
o
Figure 6: a-b. Hypothetical alignment scores for pairs of
curve fragments. 1.0 is perfect alignment, c. Local ambiguity supports alternative compositeobjects with overlapping support, d. Compounding
of local ambiguity can lead
to combinatorialexplosionin possible numberof alternative
smooth-continuationpaths.
ping interpretations, one approachto these situations is to
permit search routines to construct multiple compositecurve
objects accepting alternative paths, as in Figure 6c. The
drawbackof this approachis that it can lead to combinatorial explosionin the numberof global paths constructed, as
in Figure 6d. Heuristic terminationcriteria offer a stopgap
solution. A moreprincipled basis for handling curvifinear
paths might involve management
of howaggressive or conservative the search is in evaluatinglocal joins as unequivocal versus ambiguous.
Stroke Continuity Paths A second kind of perceptually
salient fine-art object occurs as a coherent stroke that happens to contain sharp comers. Wheresuch a change of direction presentsa clear continuationof the curve, it is meaningful to construct a global path proceedingthrough such
a comer. However,because the comeris in and of itself a
perceptuallysalient potential stoppingpoint for the curve, it
is sensible to construct, and makeavailable to users, alternative compositecurvilinear objects both boundedby sharp
comers, and proceeding through sharp comers. This follows
the strategy used in the PerSketchsystem[Saundand Moran
1994].
Closed Contour Paths A third kind of important perceptual structure in line art is basedonthe principle of closure.
Closedregions are knownpsychophysicallyto be perceptually salient to humanobservers. In sketches and hand-drawn
diagrams, closed regions represent individual physical objects; conceptualobjects; groupingsor collections; logical or
other abstract relations; emphasis;loopingpaths or circuits;
symbolsand characters (or fragmentsthereof); and tabular
122
cells.
Ourdetection of closed curvilinear paths reflects an observation that there are actually two kinds of perceptually
significant closed path, those that tend to obey the rule of
smoothcontinuation, and those that tend take a turn at every opportunityto enclosethe smallest possible region. See
figure 7. Accordingly,weconstruct additional finks in the
link graphreflecting twokinds of preferencefor tracing. One
kindof link scores the local preferencefor tracing fromone
node(curve fragment)to the next under a search for smooth
closed paths. Asecondkind of rink scores local preferences
reflecting search for maximafiyturning paths. A bidirectional search procedureconductsbest-first search that starts
froma seedand exploresthe link graphin both directions until either a geometricallyclosed path is foundor termination
criteria are met. See [Saund2001]for details.
Interaction of Curvilinear Path Types The most straightforwardoption for makingavailable curvilinear paths found
by the three precedingtracing/search strategies is to make
available to the ScanScribeuser all constructed composite
curvilinear path objects. However,there can be interactions
amongthese such that somepaths becomeless perceptually
apparentthan others. Therecognitionof this fact permitseither pruning of the options, thereby removingperceptually
insignificant choices,or at least demotionof less salient options in the orderingin whichthey are offered to users by repeated clicking on foregroundprimeobjects.
Figure 8 offers an example.In Figure 8a, the twosquares
are visually prominentas is the rectangle. A click on the X
wouldsensibly select either the square or the rectangle. In
Figure 8c, althoughclosed path search will deliver an analogousset of segmentsformingsquares, the fact that the rectangle’s bisector participates in a smoothcontinuationpath reducesits perceivedmembership
as the side of a square. The
larger rectangle shouldbe preferred by a user’s click on the
X. Figure 8e showsthat this effect becomeseven stronger
if the placementof SegmentA gives it no special geometric status, e.g. as a perpendicularbisector of the sides of the
rectangle. The analysis of such considerations can become
quite involvedand to date we havenot undertakento formulate a comprehensiveapproachwithin ScanScribe.
b
Jd
/o
c4~A-rmL: ~WT¢’7"
Figure 8: a Both a rectangle and two squares are visually apparent in this figure; all be madeavailable as composite objects (dotted lines in b). Theseare be found
smooth-continuation search and maximally-turning search
in our closed path finding algorithm, c, d. Whenthe bisector
becomespart of an extended smoothcontinuation path the
prominenceof the squares is reduced, e. The maximallyturning closed paths becomeeven less salient whenthe bifurcating segmentlacks special geometricplacement.
Text Layout Analysis
Pagelayout analysis for printed documentsis a majortopic
of investigation in the field of documentimage analysis
[Nagy 2000]. Universally, strong assumptions are made
about the uniformrectilinear arrangementof printed pages,
whichdo not hold for hand-drawnsketches and notes. Much
workhas also beendone on the recognition of hand-printed
and cursive characters and words, knownin the applications
trade as ICR("Intelligent Character Recognition"). Typically this is done after somepreprocessingstep or user interrace constraint has acted to isolate words.
Our concern in ScanScribe is with the organization of
markingsinto words, lines, columns,tables, and other arrangementsof textual information (regardless of script or
language), leaving any actual character or wordrecognition to third-party modules.Thedifficulty of this problem
ranges fromrelatively easy, for neat, cleanly written, well
separated, horizontallyaligned lines of text, to nearly indecipherable cramped,skewed,scratchings with ascendersand
descendersoverlappingmultiple lines. See Figure 9. Anissue often neglectedis the identification of text groupsregardless of the size at whichthey occur.Size variation arises both
from differences in the physical size of writing, and from
sensor characteristics such as scanner resolution or spatial
samplingof a stylus.
For expediency, the current version of ScanScribe is
designed with the assumption that blob prime image objects roughlycorrespondto individual characters, groupsof
touchingcharacters within a word,or entire words(e.g. cursive words). In other words, we assume that the segmentation stage has not delivered blob objects correspondingto
multiple wordson different text lines linked by touchingascenders and descenders.
Following the generic frameworkoutlined above, links
are formedbetweenblot) prime objects, in this case based
on a scale-normalized measureof proximity. Next, group-
123
a
b
Figure 9: Samplesof handwrittentext that is moderate(a)
and high (b) in difficultly of groupinginto wordsand text
lines. Difficulty in a is due to similarity in between-word
and
within-wordcharacter spacing.
ing takes place througha series of steps involvinghierarchical agglomerativeclustering, formationof intermediatestage
"proto-word"objects fromstable clusters, formationof links
amongthese using local proximity and orientation information, and finally assemblingcollections of proto-wordsinto
extendedstructures that most often correspondto lines of
text. The details of this processare continuouslyin flux as
we attempt to improveits performance.A sample result is
shownin Figure I0.
Asoccurswithin the realmof fine art grouping,text layout
analysis raises issues of ambiguous
interpretations, multiple
valid interpretations, and the interaction of structure found
within and betweendifferent interpretation modules.Figure
11 a showsa case wherethe contextof onetext line influences
the membership
ofa blob in another. In Figure1 lb a slightly
different configurationof these elementsyields overlapping,
equally salient groups. Figure11c presents a case in which,
by its shape, a prime object demandsto be considered for
bothline art groupingand text grouping.Thecontext of surroundingline art weakensits perceivedmembership
in a text
group.
Theseexamplesillustrate the kinds of complexitiesthat
our future research mustgrapplewith in order to offer users
just those groupings,representedby compositeobjects in the
structure lattice, that correspondto the imageobjects they
mightwantto select, and no spuriousothers.
Conclusion
At this writing ScanScribeis just beginningto makeits way
into the handsof (non-researcher)users. Whileour current
line art and text structuring operations are incompleteand
of variable performance,they represent an initial attempt
abstracted fromthe basic data level.
Acknowledgments. Wethank ThomasP. Moran for substantial contributions in the conceptualand initial design
stages of this project.
References
Figure 10: Groupsfound by ScanScribe’scurrent text grouping algorithm are shownby their boundingboxes
¥
¥
C
a
b
/
I
J
t
Figure 11: The salience of different groups formedby blob
prime objects dependson surroundingcontext.
to integrate a suite of systematicperceptual groupingfacilities into a useful documentimageeditor capable of supporting everydaywork. ScanScribeis positioned to become
a platform and testbed for further progress on fundamental sketch recognition problems.As such it could amplify
knowledge-intensivesketch-based applications for targeted
domains[Forbus et a12001].
By focusing on the detection of visual structure at the
level of PerceptualOrganization,we proposethat intelligent
sketch editing tools will becomeuseful for a wide variety
of hand-drawnand formatted diagrammaticmaterial. Moreover, wesuggestthat this step will also lead to a solid foundation for recognizingtargeted, domain-specificvisual objects
and syntax. By endowingsketch editors with perceptuallevel WYPIWYG
abilities, we recognize that sketches are
muchmorethan an accumulationof strokes. Theyare a particular form of visual scene in whichsignificant structure
does not occur on an element-by-elementbasis, but is found
in a rich array of emergentspatial relationships and patterns
124
Alvarado, C., and Davis, R. 2001. Resolving ambiguities
to create a natural computer-basedsketching environment.
ProcIJCAI, Seattle, Vol.2: 1365-1371.
Barron, J.L., Fleet, D.J., and Beauchemin,
S.S. 1994. Performanceof optical flow techniques. International Journal
of Computer
Vision, 12( 1 ):43-77.
Boyer, K., and Sarkar, S., eds., 1999. ComputerVision and
ImageUnderstandingSpecial Issue on Perceptual Organization, 76(1).
Forbus, K., Furguson, R.W., and Usher, J~l. 2001. Towards
a Computational Modelof Sketching. IUI ’01, Santa Fe,
NewMexico.
Freeman,W., and Adelson, E.H. 1991. The design and use
of steerable filters. IEEETrans. PAM1,
13:891-906.
Gross, M. 1996. The electronic cocktail napkin - computer support for workingwith diagrams, Design Studies,
17(1):53-70.
Hummel,R.A., and Zucker, S.W. 1983. On the Foundations
of Relaxing Labelling Processes, IEEETrans. PAMI,5(3):
267-287.
Kanizsa,G. 1979. Organizationin Vision: Essayson Gestalt
Perception, Praeger, NewYork.
Koffka, K. 1922.’Perception: An introduction to Gestalttheorie. PsychologicalBulletin, 19: 531-585.
Lin, J., Newman,M., Hong, J., and Landay, J. 2000.
DENIM:
Findinga tighter fit betweentools and practice for
website design. Proc ACMCH1,The Hague, 510-517.
I.owe, D., and Binford, T. 1983. Perceptual Organizationas
a Basis for Visual Recognition,Proc. AAA1-83.
Nagy,G. 2000. Twentyyears of documentimage analysis in
PAMI.IEEETrans. PAMI,22(1): 38-62.
Pedersen, E., McCall, K., Moran,T, and Halasz, E 1993.
Tivoli: An electronic whiteboard for informal workgroup
meetings. Proc ACMCHI, 391-398.
Sannd, E. and Moran, T. 1994. A perceptually supported
sketch editor. Proc ACM
UIST, Marinadel Rey, 175-184.
Shi, J., and Malik, J. 2000. Normalizedcuts and imagesegmentation. IEEETransactions PAMI,22(8):888-905.
Stevens, K. 1978. Computation
of locally parallel structure.
Biological Cybernetics, 29:19-26.
WeissY. 1999. Segmentationusing eigenvectors: a unifying
view. Proceedingslh-~.g International Conferenceon Computer Vision, Corfu, Greece,975-982.
Wertheimer,M. 1923. Lawsof Organization in Perceptual
Forms,in Ellis, W., ed, A sourcebookof Gestalt psychology,
Routledge & KeganPaul, London, 1938.
Witkin, A.P., and Tenenbanm,J.M. 1983. Onthe role of
structure in vision, in Beck,J., Hope,B., and Rosenfeld,A.
(eds.), Humanand MachineVision, AcademicPress, 481543.
Zucker, S. 1983. Computationaland PsychophysicalExperiments in Grouping:Early Orientation Selection, in Beck,J.,
Hope,B., and Rosenfeld, A. (eds.), Human
and MachineVision, AcademicPress, 545-567.
125