Simulating Perceptual Clustering by Gestalt Principles Erich Rome

advertisement
25th Workshop of the Austrian Association for Pattern Recognition, ÖAGM/AAPR 2001
Berchtesgaden, Germany, June 7–8, 2001
OCG, Vienna, 2001, ISBN 3-85403-147-5, S. Scherer (ed.), pp. 191–198
Simulating Perceptual Clustering by Gestalt Principles
Erich Rome
GMD – German National Research Center for Information Technology
Institute for Autonomous intelligent Systems
Schloss Birlinghoven, D-53754 Sankt Augustin, Germany
rome@gmd.de, http://ais.gmd.de/
Abstract:
In this paper we propose a method for the detection of salient non-local structures in vector
graphics. Non-local structures may consist of similar graphical objects—the constituents of a
vector graphics—or of objects which are orderly arranged. They may be perceived immediately,
but they are not explicitly represented in the internal description of a graphics. Information
on such cognitive relevant structures may serve as additional indices to the graphics data
base of a graphics retrieval system or may guide higher scene interpretation routines. Nonlocal structures emerge as a result of grouping processes of visual perception. The method
used to detect non-local structures is the simulation of models of organizing phenomena of
human visual perception. In particular, we use Treisman’s feature map model and Palmer’s
transformational approach to human visual perception.
1
Introduction
Non-local relational image structures, such as three rectangles of equal dimensions and colors,
that are equidistantly positioned on a horizontal line (fig. 1), may be perceived immediately as
a whole, but they usually are not explicitly represented as an entity in the internal description
of a graphics or scene segmentation. Such non-local structures can be found in photographs,
bitmapped and graphical images. They may be salient and of strong visual appeal, thus they
seem to be of some cognitive relevance to a viewer of such images. It seems desirable to
develop methods to detect these structures automatically in order to aid scene interpretation
or to assist a user in designing or retrieving graphical images.
Figure 1: A non-local structure: Perceivable, but not explicitly represented.
The process of perceiving non-local relational structures as entities is called perceptual grouping. At the beginning of the 20th century, adherents of the Gestalt school of psychology
Figure 2: Gestalt principles of proximity, good continuation, and similarity (by color).
investigated perceptual grouping processes and found a number of principles that guide them
[13, 25]. They qualitatively formulated about a dozen grouping principles. Among them are
the well known Gestalt “laws” of grouping by proximity, good continuation, and similarity
[9, 3]. Examples are shown in figure 2.
But it was not until the late 20th century that perceptual psychologists and cognitive scientists
came up with promising quantitative models of Gestalt grouping. Anne Treisman’s [23] Feature Map Model of and Palmer’s [17] Transformational Approach to human visual perception
provide a basis for the computer simulation of grouping processes and are being used in our
own MAX Gestalt grouping simulation. MAX has been incorporated in an innovative Graphics Design Assistant System, the GraphicsDesigner [11]. The first application has been the
graphics retrieval component SkaGra [10], enabling a user to search for Gestalt like arrangements of graphical elements in a data base of over 300 vector graphics slides. MAX simulates
the clustering of primitive graphical objects according to the Gestalt grouping principles of
proximity, similarity, and good continuation.
The paper is organized as follows. First, the state of the art in computational approaches to
perceptual grouping is critically reviewed. A brief survey of modern models and theories of
perceptual grouping follows. Then it is described how MAX employs ideas from two of these
models. Test results of MAX’s performance on Gestalt test patterns and on presentation
graphics are presented. We conclude with summarizing the main issues of MAX, remarks on
pure computational approaches, and pointing out promising research trends.
2
State of the Art in Perceptual Grouping Simulation
Computational approaches to perceptual grouping have been investigated for a long time. The
earliest respective paper found by the author is a 1961 article of Guiliano et al. [7]. Many
approaches modify well-known techniques like the Hough transform [4], Simulated Annealing
[12], or methods from Spectral Graph Theory [21, 22] to construct groups of pictorial elements.
Some perceptual grouping algorithms themselves have become classic techniques, like Zahn’s
Minimum Spanning Tree method [27].
In the respective literature, one of the most cited publications is the classical 1984 dissertation
of David G. Lowe [15]. This work stands out for its by then unique combination of a perceptual
grouping stage, model-based form and viewpoint matching, and a component for reasoning
under uncertainty. Lowe adopts Witkin and Tenenbaum’s [26] hypothesis that non-accidental
patterns are of significance for visual perception, and he applies this principle in his vision
system. Lowe’s grouping component clusters points by proximity, and then tries to construct
continuous curved line segments from proximity groups at increasingly larger scales. A small
gap at a large scale is more likely to be accidental than on a small scale. Thus at a large scale
a continuation will be constructed to fill the gap, whereas a gap of the same size will not be
bridged at a small scale. Compared with other work on perceptual grouping methods, Lowe’s
grouping component alone does not stand out.
A number of computational approaches employ minimizing or optimizing techniques. In [28],
a distance function is constructed that weighs grouping terms. The “best” grouping has a
minimum total weight. A similar idea can be found in [16] and [12], where modern energy
minimizing techniques are being used to construct groups. The basic idea that a “good”
grouping is simpler or has less energy than alternative groupings has a long tradition and
goes back to the original Gestalt principle of Prägnanz or figural goodness. In the 1950s,
several information processing definitions of the Prägnanz principle have been generated, “...
suggesting that figural goodness may be inversely proportional to the amount of information
necessary to describe or specify a figure” [8, p. 194]. Critics objected that this definition
depends strongly on the coding of the describing data.
Computational approaches differ in several aspects. These are the number of phenomena
they simulate, the kind of pictorial data on which they operate, the modus operandi, i.e.
supervised versus unsupervised operation, use of 3D clues, attentive or preattentive grouping,
hierarchical versus flat grouping, the kind of employed algorithm, and the application area.
Some methods have been successfully demonstrated to improve certain grouping tasks in the
respective application area, where other methods are mere conceptual studies. A feature that
is common to most of these approaches is that they usually are simply inspired by Gestalt
grouping phenomena, but do not rely on contemporary models and theories of human visual
perception. It is the author’s belief that contemporary results of cognitive psychology are well
suited for designing improved grouping methods.
3
Theories and Models of Perceptual Grouping
The recent research interest in Gestalt phenomena—a phenomenon of its own—has been
called Neo-Gestaltism. Neo-Gestaltist research of the last decade came up with new Gestalt
principles (“connectedness” and “common region”, [19]), psychophysical investigations on the
quantification of proximity, similarity and temporal grouping [14], investigations on mutual
effects of different grouping principles, and on the influence of 3D clues on grouping [19].
In psychophysical experiments, Kubovy and Gepshtein [14] recently found the pure distance
law that quantitatively describes proximity grouping. Incorporating this law in proximity
grouping algorithms should yield more reasonable grouping results than other methods.
With the emergence of Artificial Neural Networks, several connectionist approaches to perceptual grouping have been issued. One of the most elaborate connectionist approaches is
Grossberg’s FACADE theory of visual perception, which has been developed for more than
a decade [6, 2, 18, 5]. This body of work is too large to be described in this article, but it
should be noted that FACADE models various Gestalt phenomena, like figure-ground separation, Gestalt grouping, and subjective contours, to name some. This work is clearly a valuable
source of quantitative models of Gestalt phenomena.
In the next section, we describe briefly those aspects of two models [23, 17] of human visual
perception that we have used for our own grouping simulator MAX.
4
Perceptual Grouping Simulator MAX
The grouping simulator MAX operates on vector representations of 2.5 D graphical images.
The primitive graphical objects of its representation language EPICT are dots, lines, arrows,
rectangles, rounded rectangles, ellipses, polygons, splines, text objects, and bitmap objects.
Typical attributes of these objects are fill color, line color, position, size etc. For grouping
purposes, additional attributes are being computed, like orientation and brightness.
In Treisman’s Feature Map model [23], a perceived picture is split up into a number of intermediate pictures along several feature dimensions. These intermediate pictures are called
feature maps. Each feature map consists of those parts of the picture that contain a certain
feature. There are two basic questions concerning this model: 1. What are the features that
are relevant for human vision? and 2. How are scene elements assigned to feature maps
(how are feature values “coded”)? Treisman found some evidence that brightness, color, size,
orientation, curvature, blobness, and line discontinuities may be relevant visual features. In
each dimension, feature values are coded relative to a given standard value.
The feature maps may be conceived as similarity classes, thus we have adapted the preattentive
part of the model for our simulation of similarity grouping. During this simulation, all EPICT
objects constituing a presentation graphics are assigned to feature maps—at most five—in
the feature dimensions of form, color, brightness, orientation, and size. In other words, if
a graphics contains only red and blue squares and circles of similar size, orientation, and
brightness, then there is one feature map that contains all the circles, one that contains all
the squares, one that contains all the red objects, and one that contains all the blue objects.
Since the same object may be contained in several feature maps, it may also belong to several
found groupings, if any. The overall procedure for the simulation of Gestalt grouping of a
vector graphics is organized roughly according to the following steps.
1. Grouping by similarity of form, color, brightness, size, and orientation: Put each constituent graphical
object into five similarity classes according to its actual feature values regarding the above mentioned
features. Here, we use absolute feature value coding according to psychophysical laws and findings.
2. For each object in each similarity class perform the following steps:
a) Proximity grouping: Construct a local environment with a fixed radius that contains a maximum of
eight nearest neighbors.
b) Continuity grouping: Construct good continuations between neighboring local environments (this
idea stems from [17]).
3. Repeat Step 2 for all objects that remained ungrouped, using a larger radius.
4. Stop if all objects are grouped or if the radius of the local environments has reached its maximum, i.e.
the maximum extent of the presentation graphics in any direction.
5. Collect the groupings found in all similarity classes, unite the results, if possible, and sort them according
to some heuristic saliency criteria.
Groups of primitive objects may themselves be grouped in a hierarchical fashion. One important aspect of the form of a group is the spatial arrangement of its constituent parts. Such
arrangements are represented using polygons, their vertices being the center points of group
elements. Arrangement polygons are compared by a matching algorithm that computes a
similarity metric for polygonal shapes [1]; this makes it possible to retrieve similarly arranged
groups in a data base of presentation graphics. For finding classes of similar polygonal forms,
the metric is combined with an unsupervised numerical clustering algorithm [24].
5
Experimental Results
MAX has been tested with classical Gestalt test patterns, variations of those, and with presentation graphics. In the three patterns of figure 2, MAX correctly constructs the three
proximity pairs of circles, the two crossing continuations in the second pattern and, in the
third pattern, the five dominant vertical columns based on similar color and good continuation.
Figure 3: Grouping by a) similar size and b) orientation. c) No dominant similarity grouping.
d) Continuity grouping with varying distances.
Similarly, MAX generates the five dominant row structures in figure 3 a and b, based on
continuity and similar size and orientation, respectively. The pattern 3 c), however, does not
exhibit a dominant similarity grouping. All elements look alike, and only the positions differ.
Based on proximity and continuity, MAX generates five row and five column groups of five
circles each.
The pattern in figure 3 d can be perceived as two continuity groups. Here, we have varied the
distances between group elements so that they lie within the tolerance of proximity grouping,
except for the circles close to the junction of the bowed-T-shaped pattern. This yields two
disjunct groups. Narrowing the gap would make the junction circle belong to both groups.
Figure 4 shows a typical presentation graphics generated with the GraphicsDesigner. The
dominating global structure is the diagonal arrangement of shadowed text boxes. They form
a group according to the Gestalt principles of similarity and of good continuation.
Assisting Computer (AC)
"Intelligent" Support,
Planning, and Design
Applications in
Engineering
Sciences (in prep.)
Technical
Interpretation
of Numbers
Designing
Text in its
Contents
Designing Layout
of Graphics
Hoschka AC 23
GMD Institute for Applied Information Technology
Figure 4: A vector graphics example: Presentation slide exhibiting a salient Gestalt structure.
For the graphics of fig. 4, MAX found 10 groups, among them the most salient diagonal
arrangement of text boxes. Of the other groups, four could be considered salient, among them
the text rows in the upper left corner and the footnote like subtitles. The remaining 5 are of
rather small saliency. More details on the algorithms, summaries of the used models, and an
examination of the state of the art in Gestalt clustering by machine can be found in [20].
In general, MAX’s grouping results can be roughly divided into three categories. Salient groups
that can be easily perceived belong to the first category. Groups of the second category are
perceivable, but usually not considered to be salient. The third category contains groups that
can only be explained with knowledge of MAX’s grouping algorithm.
6
Conclusion
A method has been presented that is able to find perceptually salient non-local structures
in 2.5 D vector graphics. Primitive graphical objects are grouped according to the Gestalt
principles of proximity, good continuation, and similarity of form, color, size, brightness, and
orientation. The method employs graph-based techniques and is mainly based on the Feature
Map Model of preattentive visual perception [23]. For the implementation, some ideas have
been adopted from the Transformational Approach to human visual perception [17]. The
presented method MAX differs from many other related approaches in the following aspects:
1) MAX works without tuning or supervision by a user, 2) MAX provides some grouping
criteria that are rarely found in other approaches, namely grouping by similarity of form and
by similar orientation, and a basic hierarchical grouping feature, 3) MAX’s application area
is Graphical Search in 2.5 D vector graphics, whereas all known other approaches perform
grouping on dot patterns or on preprocessed or raw bitmapped images. 4) MAX is based on
modern psychological models of preattentive perceptual grouping processes that can be quantified. The original Gestalt theory offered only qualitative descriptions of grouping phenomena.
Many other computing approaches modify well-known techniques like energy minimization or
Hough Transform in order to achieve Gestalt-like grouping of pictorial elements, but lack the
foundation of contemporary psychological models and findings.
Experimental results have shown that many of the classical test patterns of Gestalt Psychology
are correctly grouped by MAX. Good results have also been achieved with varied patterns
and combinations of pattern elements. In a test data base of 16 presentation graphics, MAX
formed almost all of the groups that may be perceived as salient non-local structures, but also
a number of non-salient groups. This can be attributed to the aimless preattentive grouping
method and the lack of appropriate “goodness” criteria for the found groups. While this is
of less importance for the application area of content-based retrieval—except that it increases
search time—, it might be an undesired feature for other application areas.
The presented work and the reviewed related work shows that there are special application
areas where Gestalt-like grouping aids image analyzing. It is the author’s opinion that the
most successful grouping procedures do not rely on computational approaches alone, but
have a sound foundation of contemporary theories or psychological models of human visual
perception. With more interdisciplinary approaches, we will see better and more general
artificial vision systems in the future. Approaches that incorporate attentional mechanisms
and exploit 3D clues seem to be a step into the right direction.
References
[1] E.M. Arkin, P. Chew, D.P. Huttenlocher, K. Kedem, and J.S.B. Mitchell. An efficiently computable metric
for comparing polygonal shapes. Technical Report TR 89-1007, Cornell University, Dept. of Comp. Sci.,
Ithaca, NY, 1989.
[2] G.A. Carpenter and S. Grossberg. The art of adaptive pattern recognition by a self-organizing neural
network. In J. Diederich, editor, Artificial Neural Networks — Concept Learning, pages 69–80. IEEE
Computer Society Press, Washington D.C., 1990.
[3] W.D. Ellis, editor. A Source Book of Gestalt Psychology. Lowe & Brydone, Thetford, UK, 1974. 5th
impression, first published 1938.
[4] D.F. Gillies and G.N. Khan. Perceptual grouping and the hough transform. In Proceedings of the SPIE
- The International Society for Optical Engineering, volume 1607, pages 188–196, 1992. Conference:
Intelligent Robots and Computer Vision X: Algorithms and Techniques. Boston, MA, USA, 11-13 Nov
1991.
[5] S. Grossberg. Publications, 2001. URL: http://cns-web.bu.edu/Profiles/Grossberg/abstracts.html.
[6] S. Grossberg and E. Mingolla. Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations. In Stephen Grossberg, editor, The Adaptive Brain II — Vision, Speech, Language,
and Motor Control, volume II, pages 144–210. North-Holland, Amsterdam, 1987.
[7] V.E. Guiliano, P.E. Jones, G.E. Kimball, R.F. Meyer, and B.A. Stein. Automatic pattern recognition by
a Gestalt method. Information Control, 4:332–345, 1961.
[8] R.N. Haber and M. Hershenson. The Psychology of Visual Perception. Holt, Rinehart and Winston, Inc.,
New York (NY), 1973.
[9] H.H. Helson. The fundamental propositions of Gestalt psychology. Psychological Review, 40:13–32, 1933.
[10] P. Henne and G. Schmitgen. Graphisches Suchsystem SkaGra – Konzepte und Realisierung. TASSOReport 48, GMD, Sankt Augustin, Germany, 1993.
[11] P. Hoschka, editor. Computers as Assistants. Lawrence Erlbaum Associates, Hillsdale, NJ, 1996.
[12] P. Kahn, A. Winkler, and C.Y. Chong. Perceptual grouping as energy minimization. In 1990 IEEE
International Conference on Systems, Man and Cybernetics, pages 542–546, New York, NY, USA, 1990.
IEEE. Conference: Los Angeles, CA, USA, Nov 4-7, 1990.
[13] W. Köhler. Gestalt Psychology. Liveright, New York (NY), 1929.
[14] M. Kubovy and S. Gepshtein. Gestalt: From phenomena to laws. In K.L. Boyer and S. Sarkar, editors,
Perceptual Organization for Artificial Vision Systems, pages 69–80. Kluwer Academic Publishers, Boston,
2000.
[15] David G. Lowe. Perceptual Organization and Visual Recognition. PhD thesis, Stanford University,
Stanford, (CA), September 1984. Report No. STAN-CS-84-1020.
[16] J.D. McCafferty. Human and machine vision: computing perceptual organisation. Ellis Horwood, Chichester, UK, 1990.
[17] S.E. Palmer. The psychology of perceptual organization: A transformational approach. In J. Beck,
B. Hope, and A. Rosenfeld, editors, Human and Machine Vision, pages 269–339, Orlando, FL, 1983.
Academic Press.
[18] R. Raizada and S. Grossberg. Context-sensitive bindings by the laminar circuits of v1 and v2: A unified
model of perceptual grouping, attention, and orientation contrast. Visual Cognition, 2001. in press.
[19] I. Rock and S.E. Palmer. The legacy of Gestalt psychology. Scientific American, pages 84–90, December
1990.
[20] E. Rome. Simulierte Gestalt-Erkennung in Präsentationsgrafiken. PhD thesis, University of Bremen,
Bremen, Germany, 1995.
[21] Sudeep Sarkar. Supervised learning of large perceptual organization: Graph spectral partitioning and
learning automata. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(5):504–525, May 2000.
[22] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern
Analysis and Machine Intelligence, 22(8):888–905, Aug 2000.
[23] A. Treisman. Preattentive processing in vision. Computer Vision, Graphics, and Image Processing,
31:156–177, 1985.
[24] R.M. Umesh. A technique for cluster formation. Pattern Recognition, 21(4):393–400, 1988.
[25] Max Wertheimer. Untersuchungen zur Lehre von der Gestalt II. Psychologische Forschung, (4):301–350,
1923.
[26] A. P. Witkin and J. M. Tenenbaum. On the role of structure in vision. In J. Beck, B. Hope, and
A. Rosenfeld, editors, Human and Machine Vision, pages 481–543, Orlando, FL, 1983. Academic Press.
[27] C.T. Zahn. Graph-theoretical methods for detecting and describing Gestalt clusters. IEEE Transactions
on Computers, C-20(1):68–86, January 1971.
[28] A. L. Zobrist and W.B. Thompson. Building a distance function for Gestalt grouping. IEEE Transactions
on Computers, C-24(7):718–728, July 1975.
Download