Emergent features and feature combination

Emergent features and feature combination James R. Pomerantz and Anna I. Cragin Rice University, Houston, Texas, USA To appear in: Oxford Handbook of Perceptual Organization Oxford University Press Edited by Johan Wagemans Abstract We propose that when a whole differs from the sum of its parts by the appearance of certain Emergent Features (EFs), the result is a unitary object or Gestalt rather than a mere collection of parts. We hypothesize that the essence of a Gestalt is the presence of those EFs, and that EFs both lead to, and can then be diagnosed from, configural superiority wherein combinations of parts are perceived more quickly and accurate than are the parts alone. Using this approach, we confirm a number of EFs at the simplest level of static visual forms whose parts are dots or line segments. 1. Introduction to Emergent Features (EFs) 1.1. Emergence The idea of emergence lies at the heart of perceptual organization. Since the earliest scientific approaches to perception, the notion has persisted that percepts are composed of sensations as a wall is made of bricks. If we could determine how those sensations – features, in contemporary parlance – are detected, we could understand how we perceive the world, namely by adding up or otherwise integrating those features into wholes. Emergence provides a challenge to this linear, feedforward view of perception because when certain features are close in time and space, novel, unexpected, and salient properties may arise. Those properties – emergent features – behave as though they were elementary themselves, sometimes even being detected far more efficiently than the nominally more basic features from which they arise. What are these emergent features (EFs), and how are they detected and employed in perception? 1.2. Philosophical issues and reductionism Most of us are familiar with emergence, although perhaps not by that name. Our first encounter may come in chemistry when we see two clear liquids poured together to form a dark mixture, perhaps accompanied by smoke or an explosion. Or when we discover that hydrogen and oxygen gases may combine to form water, a liquid with a host of properties possessed by neither of its constituents separately. Chemistry provides examples of the emergence of new phenomena not present in the descriptions and models from the underlying physics, just as biology provides examples not present in chemistry. These phenomena form the primary challenge to reductionism in the physical sciences. Emergence is also a key concept in philosophy and cognitive science (Stephan 2003), and its central tenet is not merely quantitative non-additivity, wherein the combination of two parts does not add up to the resulting whole. Most sensory processes are non-linear above threshold, after all: the brightness of two superimposed lights does not equal the sum of the two lights alone. Emergence also requires novelty, unpredictability, and surprise that make the whole qualitatively different from the sum of its parts. 1 1.3. Emergence in perception The Gestalt psychologists’ key claim was that a whole is perceived as something other than the sum of its parts, a claim still often misquoted as “more than the sum of its parts.” Indeed, the Gestalt psychologists argued such summing was meaningless (Pomerantz and Kubovy 1986; Wagemans et al. 2012b). That elusive “something other” they struggled to define can be regarded as emergence: those properties that appear, or sometimes disappear, when stimulus elements are perceived as a unitary configuration. To take the example of apparent motion with which Wertheimer (1912) launched the Gestalt school (Wagemans et al. 2012a, b): if one observes a blinking light that is then joined by a second blinking light, depending on their timing, one may then see not two blinking lights but a single light in apparent (beta) motion, or even just pure (phi) motion itself. What is novel, surprising and super-additive with the arrival of the second light is motion. What disappears with emergence is one or both of the lights, because when beta motion is seen we perceive only one light, not two, and with phi we may see only pure, disembodied motion; in this respect the whole is less than the sum of its parts. 1.3.1. Basic features and feature integration The reigning general view of perception today derives from a two-stage model best associated with Neisser (1967) and with Treisman and Gelade (1980) involving so-called basic features (what in an earlier day Structuralists such as Titchener might have called “sensations”) and their subsequent integration (see also Feldman, in press). For visual perception, in the first stage, basic features are detected simultaneously and effortlessly, in parallel across the visual field. The criteria for basic are several but include popout, rapid texture segmentation, illusory conjunctions, and search asymmetry (Treisman and Gelade 1980; Treisman and Gormican 1988; Treisman and Souther 1985). Considering popout as a prototypical diagnostic, a red square will pop out from a field of green squares virtually instantaneously, irrespective of the number of green squares; thus, color (or some particular wavelength combinations) qualifies as a basic feature. Similarly a blinking light will pop out from a field of non-blinking lights, a large object will pop out from a field of small objects, a moving object from a field of stationary, a tilted line from a field of verticals, a near object from a field of far ones, and so on. One current estimate (Wolfe and Horowitz 2004) holds that there are perhaps 20 such basic features. In the second stage of the standard two-stage model, basic features detected in the first stage are combined or integrated. This process is both slow and attention-demanding. Originally, the second stage was dubbed “serial,” in contrast to the “parallel” first stage; but in light of rigorous analyses by Townsend (1971), this language was replaced by the more process-neutral terms “efficient” and “inefficient.” Either way, the combination of basic features is thought to take place within a “spotlight” of attention that covers only a portion of the visual field at one time. This spotlight can be moved, but that requires time and effort. Thus the time to detect a target defined by a combination of basic features is long and rises with the number of items in the field: a red diagonal in a field of mixed green diagonals and red verticals does not pop out but must be searched for attentively. Among the other diagnostics for basic features is spontaneous texture segregation (Julesz 1981): if a texture field contains vertical elements on its left and diagonal on its right, observers will detect a “seam” down the middle where the two textures meet. A similar outcome results with red vs. green or large vs. small. But if the texture contains clockwise spirals on the left and counterclockwise on the right, observers will not perceive the seam because this feature is not basic. Regarding search asymmetry, it is easier to find a target containing a basic feature in a field of distractors lacking it than vice versa; thus it is easier to find an open circle in a field of closed circles than vice versa, suggesting that terminators may be the basic feature whose presence is detected in open circles. Finally, basic features may lead to illusory conjunctions, particularly in the visual periphery when attentional load is high: in a field of red squares and green circles, observers will sometimes report 2 seeing an illusory red circle, suggesting that both the color and the shape distinctions are basic features. 1.3.2. Gestalts arise from Emergent Features (EFs) In the strongest version of the argument we outline here, Gestalts are configurations or arrangements of elements that possess EFs. Three closely and evenly spaced points arranged in a straight line will form a salient Gestalt, as with Orion’s Belt in the night sky where three stars group by virtue of their proximity, symmetry, nearly equal brightness, and linearity. Three stars more widely and unevenly spaced, varying in brightness, and not forming any regular geometric arrangement would thus contain no EFs and are unlikely to be seen grouping into a Gestalt. The parallelism of two lines, the symmetry of a snowflake, and the good continuation of the two diagonals crossing to form an X are all emergent features, as detailed below. From the viewpoint of the Theory of Basic Gestalts (Pomerantz and Portillo 2011) and related approaches, Gestalts, grouping, and EFs are inseparable concepts; when we say that two elements group, we mean that salient, novel features emerge from their juxtaposition in space or time. If a collection of elements contains no EFs (using the definition below), that collection is not a perceptual group. The essence of Gestalts is their primacy in perception: EFs are perceived more accurately and rapidly than are the basic features from which they emerge. Below we discuss in detail the Configural Superiority Effect by which EFs are diagnosed, but for now it is illustrated in Figure 1. Panel A shows four line segments: three positive diagonals and one negative diagonal. These line segments differ in the classic basic feature of orientation. Panel B shows these same diagonals each accompanied by identical horizontal/vertical pairs forming Ls. Subjects are much faster and more accurate at finding the triangle that has emerged from a field of arrows in Panel B (as fast as telling black from white) than at finding the negative diagonal in Panel A, even though the Ls add no discriminative information, rather only homogeneous “noise” with potential for impairing perception through masking and crowding. Panels D and E show a similar configural superiority effect involving line curvature rather than orientation. This configural superiority effect shows better processing of wholes – Gestalts – than of their parts, and we show below how it may arise from the EFs of closure, terminator count, and intersection type. EFs and configural superiority pose challenges for the standard two-stage model of perception. If the integration of basic features is slow and requires attention, why are Gestalts so salient and so quickly perceived if they too require feature integration? How can EFs be more basic than the more elementary features from which they arise? First we review the evidence that Gestalts are in fact highly salient, and then we consider how their existence can be reconciled with perceptual theory. 3 Figure 1. Configural Superiority and Inferiority Effects. Panel A: Base odd quadrant display of diagonals; B: Composite display with L-shaped context elements added, with arrows & triangles emergent to create configural superiority; C: Composite display with slightly different Ls added, yielding forms lacking emergent features and producing configural superiority; D: Base display of parentheses; E: Composite display with a left parents added to create emergent features and configural superiority; F: Composite display with rotated parentheses yielding forms lacking emergent feature differences and producing configural inferiority. 1.4. Emergent Features are not just perceptual anchors Because EFs necessarily entail relationships among parts, could configural superiority simply reflect our superiority at relative judgments over absolute judgments? E.g., we can better judge whether one line is longer than another than identify the length of either, and we can better tell whether two tones match in pitch than identify either as a middle C. This explanation cannot work, however, because for every configural superiority effect, there are far more configural inferiority effects. Panel C of Figure 1 shows configural inferiority when the L-shaped context is shifted relative to the diagonal to eliminate EF differences. This demonstrates that making a judgment easier merely by providing a comparison, contextual stimulus cannot explain configural superiority; instead the context must mix with the target to create highly specific EFs for this effect to arise. Panel F provides another illustration of inferiority with curves. 1.5. Not all relational properties qualify as emergent EFs abound in perception: from a few squiggles on paper, a face emerges; from three Pac-man figures, a Kanizsa triangle emerges (Kanizsa 1979). Are there constraints on what can and cannot be regarded as an EF? Certainly there are. One might claim that any arbitrary relationship may constitute an EF; e.g., the ratio of the diameter of the left eye to the length of the right foot. To establish this unlikely whole as emerging from those two parts, one must find empirical confirmation through a configural superiority effect or other converging operation. Below we consider several possibilities, ranging from whether “wordness” emerges as a salient feature from sequences of letters to whether topological properties arising from arrangements of geometrical forms are similarly salient. When the Dalmatian dog first pops out of the famous R. C. James photograph, it is certainly a surprise for the perceiver, meeting that criterion for a Gestalt. But should we claim that any and all acts of recognition constitute emergence, or are some of them the result of more conventional (albeit complex) processes of recognition through parts, as with Feature Integration 4 Theory? As we shall see, there are as yet only a few hypothesized EFs that have passed the initial tests to be outlined here, so it seems likely that conventional feature integration may be the norm. 2. Candidate EFs in human vision 2.1. The classic Gestalt “laws” If the human visual system perceives only certain special relationships as Gestalts – if wholes emerge from only certain configurations of parts – what are the top EF candidates we should consider? The Gestaltists themselves generated hundreds of “laws” (principles) of grouping, although some of these are vague, others may be merely confounded with other, genuine grouping principles, and yet others may simply be minor variants from each other. According to our view, each of the remaining laws could potentially be linked to a testable EF. Figure 2 shows a classic example of a configuration typically seen as a curvy X: two lines that intersect to form a cross. The same configuration could be seen instead as two curvy, sideways Vs whose vertices are coincident (“kissing fish”), but this is rarely perceived, arguably because of the law of good continuation: perception favors alternatives that allow contours to continue with minimal changes in direction. Figure 2. Ambiguous figure: crossing lines or kissing fish? As Figure 2 illustrates, candidates for EFs often are tied to non-accidental properties (Biederman 1987; Rock 1983), i.e., image properties that are unlikely to arise from mere accidents of viewpoint. Exceptions to this rule will be noted below. For the curvy Vs interpretation to be correct, not only would the two vertices have to be superimposed perfectly from the given viewing angle, but both pairs of line segments making up the Vs would have to be oriented perfectly to continue smoothly into one another. This interpretation is exceptionally unlikely and so perception rejects it as highly improbable. Below we identify a number of plausible EFs in vision underlying the classic Gestalt laws. Historically, support for these EFs, in the form of grouping laws, came largely from phenomenology. In the subsequent section we consider rigorous methodologies that go beyond simple phenomenology to confirm psychological reality of certain of these potential EFs. The resulting advantage over timehonored Gestalt grouping principles would be a systematic approach to those principles, not only introducing a single method for confirming their existence but perhaps a uniform scale on which they can be measured. 2.2. Possible EFs in human vision Figure 3 illustrates 17 potential EFs in vision, properties that emerge from parts that meet at least the test of phenomenology. We start in Panel A with potential EFs that emerge from the simplest possible stimuli: dot patterns. 5 A. B. 6 C. Figure 3. Potential basic EFs in human vision created from simple configurations of dots (Panel A) or line segments (B). Panel C depicts five other EFs arising from elements more complex than dots or lines. The pair of figures on the left of each row shows a Base discrimination with dots or lines differing in location and/or orientation. The middle pair shows two identical Context elements, one of which is added to each base to form the Composite pairs on the right that contain potential EFs. In actual experiments, these stimulus pairs were placed into oddquadrant displays with one copy of one of the two base stimuli and three copies of the other. Note that many of the rows contain additional EFs besides the primary one labeled at the far right. 2.2.1. Proximity If the field of vision contains just a point or dot, as in Panel A’s Base displays, that dot’s only functional feature is its location (x, y coordinates in the plane). If a second dot is added from the Context displays to create the Composite display, we have its position too, but new to emerge is the distance or proximity between the two. (This is separate from Gestalt grouping by proximity, which we address below.) Note that proximity is affected by viewpoint and thus is a metric rather than a non-accidental property. 2.2.2. Orientation In this two-dot stimulus, a second candidate EF is the angle or orientation between the two dots. Orientation too is an accidental property in that the angle between two locations changes with perspective and with head tilt. 2.2.3. Linearity Stepping up to 3-dot configurations, all three dots may all fall on a straight line, or they may form a triangle (by contrast, two dots always fall on a straight line.) Linearity, as with all the potential EFs listed below, is a non-accidental property in that if three points fall on a straight line in the distal stimulus, they will remain linear from any viewpoint. 7 2.2.4. Symmetry (axial) Three dots may be arranged symmetrically or asymmetrically about an axis (by contrast, two dots are necessarily symmetric). More will be said about other forms of symmetry in a subsequent section. 2.2.5. Surroundedness With four-dot configurations, one of the dots may fall inside the convex hull (shell) defined by the other three, or it may fall outside (consider snapping a rubber band around the four dots and seeing whether any dot falls within the band’s boundary). We now consider the EFs in Panel B, which require parts that are more complex than dots to emerge. Here we use line segments as primitive parts. 2.2.6. Parallelism Two line segments may be parallel or not, but a minimum of two segments is required for parallelism to appear. 2.2.7. Collinearity Again, two line segments are the minimal requirements. Items that are not fully collinear may be relatable (Kellman & Shipley, 1991), or at least show good continuation, which are weaker versions of the same EF. 2.2.8. Connectivity Two line segments either do or do not touch. 2.2.9. Intersection Two line segments either intersect or do not. Two lines can touch without intersecting if they are collinear and so form a single, longer line segment. 2.2.10. Lateral endpoint offset If two line segments are parallel, their terminators (endpoints) may lie perpendicular to each other such that connecting them either would or would not form right angles with the lines (if not, they may look like shuffling skis). 2.2.11. Terminator count This is not an emergent feature in the same sense as the others, but when two line segments configure, their total terminator count is not necessarily four; if the two lines form a T, it drops to three. This would illustrate an eliminative feature (Kubovy and Van Valkenburg 2002), where the whole is less than the sum of its parts in some way. 2.2.12. Pixel count This too is not a standard EF candidate, but the total pixel count (or luminous flux or surface area) for a configuration of two lines is sometimes less than the sum of all the component lines’ pixel counts; if the lines intersect or if they superimpose on each other, the pixel count will fall, sometimes sharply. Finally, Figure 3 Panel C depicts five other EFs arising from elements more complex than dots or lines. These EFs can be compelling phenomenally even though their key physical properties and how they might be detected are less well understood. 8 2.2.13. Topological properties When parts are placed in close proximity, novel topological properties may emerge, and these are often salient to humans and other organisms. Three line segments can be arranged into a triangle, adding the new property of a hole, a fundamental topological property (Chen 2005) that remains invariant over so-called rubber sheet transformations. If a dot is added to this triangle, it will fall either inside or outside that triangle; this inside-outside relationship is another topological property. 2.2.14. Depth Depth differences often appear as EFs from combinations of elements that are themselves seen as flat. Enns (1990) demonstrated that a flat Y shape inscribed inside a flat hexagon yields the perception of a cube. Binocular disparity, as with random dot stereograms, is another classic example of emergence (Julesz 1971). Ramachandran (1988) presented a noteworthy demonstration of depth emerging from the combination of shading gradients and the shape of apertures. 2.2.15. Motion and flicker Wertheimer’s (1912) initial demonstrations may rank motion as the quintessential EF, arising as it does from static elements arranged properly in time and space. When noninformative (homogeneous) context elements are delayed in time from a base display such that motion is seen in the transition composite, huge CSEs result using the same method otherwise as described above. Flicker behaves similarly and, as with motion, is so salient they are standard methods for attracting attention in visual displays. Higher-order motion phenomena too suggest further EFs, as with Duncker’s (1929) demonstration of altered perceived trajectories when lights are attached to the hub and wheel of a moving bicycle. 2.2.16. Faces A skilled artist can draw just a few lines that viewers will group into a face. We see the same, less gracefully, in emoticons and smiley faces: . Does ‘faceness’ constitute its own EF, or is it better regarded as only a concatenation of simpler, lower-level grouping factors at work, including closure, symmetry, proximity, etc.? This question encounters methodological challenges that will be considered below. 2.2.17. Subjective (Kanizsa) figures With the arrangement of three suitably placed Pac-man figures, a subjective triangle emerges that is convincing enough that viewers believe it is physically present (Kanizsa 1979; Kogo & van Ee, in press). Certainly this demonstration passes the phenomenological test for EFs. Remaining to be resolved is whether the subjective triangle is a unique EF in its own right or whether it results merely from conventional (non-Gestalt) integration of more primitive EFs; e.g., subjective lines could emerge from the collinear contours of the Pac-man figures, but the appearance of a whole triangle from three such emergent lines might not be a proper Gestalt. 2.3. Similarity and proximity as special EFs Two well-known Gestalt principles, grouping by similarity and by proximity, merit further discussion. Similarity is excluded from this chapter because it often refers to a psychological concept of how confusable or equivalent two stimuli appear to be rather than to the physical concept of objective feature overlap or equivalence. The existence of metamers and of multistable stimuli forms a double dissociation between perceptual and physical similarity that may help clarify this distinction. Also, the term similarity can be overly broad; proximity, for example, could be seen as similarity of position; parallelism or collinearity could be viewed as similarity of orientation, etc. The limiting case of similarity is physical identity. It’s true that the same-different distinction is highly salient in vision, but it can be regarded as a form of symmetry, viz. translational symmetry (see below on symmetry). 9 Above we present proximity as the first on our list of potential EFs in vision, and below we present evidence confirming this possibility. We believe proximity may be a qualitatively different property from the others in the sense that it appears to work in conjunction with, or to modulate the effects of, other principles listed above (like parallelism and symmetry) rather than being a grouping principle in its own right. E.g., collinearity will be salient between two lines if they are proximal, and thus they will group; but not if they are separated further. Proximity alone doesn’t force grouping: attaching a door key to a coffee cup does not make them group into a single object despite the zero distance separating them. Unrelated objects piled together may form a heap, but they usually will create no emergence or Gestalt. 2.4. A note on symmetry Symmetry has been a pervasive property underlying Gestalt thinking from its inception (van der Helm in press A, this volume). From its links with Prägnanz and the minimum principle (van der Helm in press B, this volume) to its deep involvement with aesthetics, symmetry appears to be more than just another potential EF in human perception. And well it might be, given the broad meaning of symmetry in its formal sense in the physical and mathematical sciences. In the present chapter, we focus on axial (mirror image) symmetry, but rotational and translational symmetry may be considered along with translational symmetry. Formally, symmetry refers to properties that remain invariant under transformation, and so its preeminence in Gestalt theory may come as no surprise. We could expand our list of potential EFs to include the same versus different distinction as a form of translational symmetry. We have only begun to explore the full status of symmetry, so defined, using the approaches described here. 3. Establishing and quantifying Emergent Features via configural superiority With this long list of potential EFs in vision, how can we best determine which of them have psychological reality for human perceivers? How can we tell that a Gestalt has emerged from parts, as opposed to a structure perceived through conventional, attention-demanding feature integration? A start would be finding wholes that are perceived more quickly than their parts. If people perceive triangles or arrows before perceiving any of their component parts (e.g., three line segments or their vertices), that suggests the whole shapes are Gestalts; otherwise it would be more prudent to claim that triangles and arrows are assembled following the detection and integration of their parts in a conventional feedforward manner. 3.1. Configural superiority, the odd quadrant task, and the superposition method We start with the odd quadrant paradigm: Subjects are presented with displays like those shown in Figure 1 to measure how quickly and accurately they can locate the odd quadrant 1. No recognition, identification, description, or naming is required. As noted, people are much faster and more accurate at finding the arrow in a field of triangles in Panel B than at finding the negative diagonal in a field of positive diagonals in Panel A. The diagonal’s orientation is the only element differentiating the arrow from the triangle, so it follows that “arrowness vs. triangularity” must not be perceived following perception of the diagonals’ orientations. Instead, this whole apparently registers before the parts, thus displaying configural superiority. The simplicity of this superposition method – overlaying a context upon a base discrimination – and its applicability to almost any stimuli are what make it attractive. Returning to Figure 3, we see 1 Although we typically use four-quadrant stimuli for convenience, there is nothing special about having four stimuli or about arranging them into a square. In some experiments we use three in a straight line or eight in a circle. 10 several base and composite stimuli that have been tested using the odd quadrant task. The discriminative information in each base is the same as in its matching composite displays: We start with a fixed Base odd quadrant display and place one of the two base stimuli into one quadrant and the other into the remaining three quadrants. We then create the Composite display by superimposing an identical context element in each of the four quadrants of the Base. Any context can be tested. In the absence of EFs, the context should act as noise and make performance worse in the composite. The logic behind this superposition method follows from the eponymous superposition principle common to physics, engineering, and systems theory. Again, the composite is far superior to the base with the arrow and triangle displays in Figure 1, indicating a configural superiority effect (CSE). But it remains unclear which EF is responsible for this CSE – it could involve any combination of closure, terminator count, or intersection type because arrows differ from triangles in all three whereas positive diagonals differ from negatives on none of them. As Panel C shows, shifting the position of the superimposed Ls eliminates all three potential EFs and eliminates the CSE as well. Panels D and E show another CSE using base stimuli varying in direction of curvature rather than in orientation. Here again, discriminating pairs of curves such as (( and () is easier than discriminating single curves, a result that could be due to any combination of parallelism, symmetry, or implied closure, all of which emerge in the composite panel. Panel F shows that rotating the context curve eliminates both the EF differences and the CSE, indicating that it is not just any inter-curve relationship from which a CSE arises but rather only special ones giving rise to EFs. 3.2. Confirmation of proximity, orientation, and linearity as EFs Figure 3 shows a large number of base and composite stimuli, each of which suggests some potential EF or EF combination that has been evaluated using this criterion of CSEs (Pomerantz and Portillo 2011). A future goal will be disentangling these CSEs to show what EFs appear with the simplest stimuli. For now, with the dots in Panel A, observers are faster to find the quadrant containing dot pairs differing in proximity than to find the single dot oddly placed in its quadrant, even though that odd placement is solely responsible for the proximity difference. Stated differently, viewers can tell the distance between the dots better than the positions of the individual dots, implying that proximity is computed before, not after, determination of the dots’ individual positions. This in turn indicates that proximity is an EF in its own right, a Gestalt of the most elementary sort, emerging as it does from just two dots. The next row in Panel A shows that viewers can similarly tell the orientation or angular difference between two dots better than the position of either dot. Again, this indicates that orientation is not derived from those positions but is registered directly as an EF. Subsequent panels of three-dot patterns similarly show CSEs where the EFs at work appear to be symmetry and linearity. The sets in Figure 3 Panel B show CSEs for selected EF candidates from two-line stimuli (Stupina [Cragin] 2010), which allow for additional EF candidates beyond those possible with just dots. The number of configurations possible from two line segments varying in position and orientation is huge, but Cragin sampled that stimulus space using the odd quad paradigm. Her results confirmed several candidate EFs working in combination: parallelism, collinearity, connectivity, and others shown in Figure 3 Panel B. For example, people are faster to discriminate parallel line pairs from nonparallel than they are to discriminate a single line of one orientation from lines of another orientation even though that orientation difference is all that makes the parallel pair differ from the non-parallel pair. Stated differently, people apparently know whether two lines are parallel before they know the orientation of either. This again is a CSE, and it indicates confirmation of parallelism as an EF. 11 Although these results confirm EFs arising with two-line stimuli, they do not provide independent confirmation for each individual EF because EFs often co-occur, making it hard to isolate and test them individually. Just as the arrow-triangle (three-line) example showed a confounded cooccurrence of closure, terminator count, and intersection type, it can be challenging to separate individual EFs even with two-line stimuli. For example it is difficult to isolate the feature of intersection without engaging the feature of connectivity, because lines must be connected to intersect (albeit not vice versa). Stupina ([Cragin] 2010) has shown that our ability to discriminate two-line configurations in the odd quadrant task can be predicted well from their aggregate EF differences. As noted below, however, further work is needed to find independent confirmation of some of these EF candidates. For now, it is clear there are multiple, potent EFs lurking within these stimuli. Panel C of Figure 3 shows additional EFs involving a number of topological features (which often yield very large CSEs), depth cues (Enns 1990), Kanizsa figures, and faces. Yet more cannot be displayed readily in print because they involve stereoscopic depth, motion, or flicker. To date, no experiments using the measurements described above have found clear EFs appearing in cartoon faces or in words, but future work may change that with such stimuli that seem to have Gestalt properties. 3.3. Converging Operations from Garner and Stroop Interference If configural superiority as measured by the odd quadrant task is a good method for detecting EFs, it is still only a single method. Converging operations (Garner, Hake, and Eriksen 1956) may help separate EFs from the particular method used to detect them. Another converging measure is selective attention as measured by Garner Interference (GI), the interference observed in speeded classification tasks from variation on a stimulus dimension not relevant to the subject’s task. When subjects discriminate an arrow from a triangle differing from it only in the orientation of its diagonal, they are slower and less accurate if the position of the superimposed L context also varies, even though logically that variation is irrelevant to their task. This interference from irrelevant variation is called GI, and it indicates subjects are attending to the L even though it is not required. This in turn suggests the diagonals and Ls are grouping into whole arrows and triangles, and that it is those wholes, or the EFs they contain, that capture Ss’ attention. Similarly if subjects discriminate rapidly between (( and (), logically they need attend only to the right hand member of each pair. But if the left hand member varies from trial to trial, such that they should make one response to either (( or )( and another response to () or )), they become much slower and more error-prone than when the left element remains fixed. This indicates again that Ss are attending to both members of the pair, suggesting the two curves grouped into a single stimulus and Ss were attending to the whole or EF. If the irrelevant parenthesis is rotated 90 degrees so that no identifiable EFs arise, GI disappears. Cragin et al. (2012) examined various configurations formed from line segments and found broad agreement between the CSE and GI measures of grouping, with the latter also being well predicted by the number of EFs distinguishing the stimuli to be discriminated. These results agree with the CSE data and so converge on the idea that both CSE and GI reveal the existence of EFs. If GI converges well with CSEs, will Stroop Interference (SI) converge as well? Unlike GI, which taps interference from variation between trials on an irrelevant dimension, SI taps interference from the content on an irrelevant dimension on any one trial. In classifying pairs of curves such as (( or () from )( or )), will subjects be faster on the pairs (( and )) because their two curved elements are congruent, but slower on pairs () and )( where the curves are incongruent, curving in opposite? That too might indicate that the curves had grouped and either both were processed or neither processed. In general, however, little or no SI arises with these stimuli or with most other stimuli that are known to 12 yield GI (see Pomerantz, Carson, and Feldman 1994 for dozens of examples).2 Why might this contradiction exist between GI and SI, two standard methods for assessing selective attention? In brief, GI occurs for the reason given above: the two elements group, and Ss attend to the EFs arising between the elements, EFs that necessarily span the irrelevant parts. However with SI, the same grouping of the elements precludes interference: for any two elements to conflict or be congruent, there must of course be two elements. If the two elements group into one unit, there are no longer two elements and thus no longer an opportunity for the two to be congruent or incongruent. Perceivers are looking at EFs, not elements. There is an alternative explanation for the lack of SI when parts group. The two elements in the stimulus (( may seem congruent in that they both curve to the left; but when considered as a whole, the left element is convex and the right is concave. Thus the two agree in direction of curvature but disagree in convexity. The conclusion: when Gestalts form, the nature of the coding may change radically, and a measure like SI that presumes separate coding of elements is no longer appropriate. In sum, GI provides a strong converging operation for confirming EFs, but SI does not. 3.4. Converging operations from redundancy gains and losses Stimuli can often be discriminated from one another more quickly if they differ redundantly in two or more dimensions. Thus red versus green traffic lights are made more discriminable by making them different in their position as well as color; coins are made more discriminable by differing in diameter, color, thickness, etc. When two configurations are made to differ in multiple parts rather than just one, do they too become more discriminable? Not necessarily; sometimes the opposite happens. Consider a square in Figure 4 whose width is increased significantly to create a rectangle. If that rectangle is increased in height, this may not create even greater discriminability from the original because the shape goes back to being a square, albeit a larger one. Or consider the triangle in the lower part that is made into an arrow by changing the orientation of its diagonal. If that arrow is then changed by moving its vertical from the left to the right side of the figure, will the result be even more different from the original triangle? No, we will have returned to another triangle, which – while different in orientation from the original triangle – is harder to discriminate from the original than was the arrow. The conclusion is that just as the arrow and triangle stimuli show CSEs and GI, they also show “redundancy losses,” a third converging operation that taps into EFs: by changing the diagonal and then the vertical of a triangle, the EFs end up unchanged. 2 Exceptions to this generalization may occur when EFs happen to be correlated with congruent vs. incongruent pairs. E.g. with the four-stimulus set “( (, ( ), ) (, ) )” congruent stimuli such as (( contain the EF of parallelism but lack symmetry about the vertical axis whereas incongruous stimuli like ( ) contain symmetry but lack parallelism. This set yields Garner but no Stroop. With the stimulus set “ | | , | / , / / , / | ” however, congruent stimuli such as | | contain symmetry and parallelism whereas incongruous stimuli such as | / lack either. This set yields both Garner and Stroop. The key factor determining whether Stroop arises is the mapping of salient EFs onto responses; configurations by themselves yield no Stroop. 13 Figure 4. Two progressions in which an original form A is modified in one way to create a different form B, but a second modification results in a form C that is more similar to the original than is B. 3.5. Theory of basic Gestalts, EF hierarchies, and the Ground-Up Constant Signal Method Disentangling multiple potential EFs remains a challenge because it is difficult or impossible to alter any aspect of a form without inadvertently altering others; for example, altering the perimeter of a form generally alters its area. As a result, we face the challenge of confounded potential EFs. The Theory of Basic Gestalts (Pomerantz and Portillo 2011) addresses this challenge by combining the Ground-Up Method for constructing configurations from the simplest possible elements Figure 6 with a Constant Signal Method that minimizes these confounds by adding context elements incrementally to a fixed base discrimination. This allows EFs to reveal their presence through new CSEs in the composites. Figure 5. Ground-Up Constant Signal Method for revealing hierarchies of EFs. Top row shows how novel features emerge as additional dots are added to a stimulus, while the bottom row shows the same for line segments. Adapted with changes from Pomerantz & Portillo 2011. 14 Figure 6 Panel A shows a baseline odd quadrant display containing one dot per quadrant, with one quadrant’s dot placed differently than in the other three quadrants. In Panel B, a single, identically located dot is added to each quadrant, which nonetheless makes locating the odd quadrant much faster. This is a CSE demonstrating the EF of proximity (Pomerantz and Portillo 2011). In Panel C, another identically located dot is added again to make a total of three per quadrant, and again we see a CSE in yet faster performance in Panel C than in the baseline Panel A. This second CSE could be taken as confirmation of the EF of linearity, in that it is so easy to find the linear triplet of dots in a field of nonlinear (triangular) configurations. But first we must rule out that the CSE in Panel C relative to Panel A is not merely the result of the already-demonstrated EF of proximity in Panel B. Dot triplets do indeed contain the potential EF of linearity vs. triangularity but they also contain EFs of proximity and/or orientation arising from their component dot pairs, so the task is to tease these apart. Figure 6. Building EFs with the Ground-Up Constant Signal method. Panel A shows the base signal, with the upper left quadrant having its dot at the lower left, versus the lower right in the other three quadrants. Panel B adds a first, identical context dot to each quadrant in the upper right, yielding a composite containing an EF of the orientation between the two dots now in each quadrant, a diagonal versus vertical angle. Panel C adds an identical, third context dot to each quadrant, near to the center, yielding a composite containing an EF of linearity versus nonlinearity/triangularity. Speed and accuracy of detecting the odd quadrant improves significantly from Panel A to B to C, although the signal being discriminated remains the same. The first key to dissociating these two is that the identical stimulus difference between the odd and the remaining three quadrants exists in Panel C as exists in Panels B and A of Figure 6. This is the unique contribution of the Ground-Up Constant Signal Method: the signal that Ss must detect remains the same as new context elements are added. The second key is that Panel C shows a CSE not only with respect to Panel A but also with respect to Panel B. This indicates that the third dot does indeed create a new EF over and above the EF that already had emerged in Panel B. That in turn supports linearity’s being an EF in its own right, over and above proximity. It shows how EFs may exist in a hierarchy, with higher-order EFs like linearity arising in stimuli that contain more elements. Pomerantz and Portillo (2011) used this Ground-Up Constant Signal method to demonstrate that linearity is its own EF with dot triplets whether the underlying signal contained a proximity or orientation difference with dot pairs. They also showed that the EF of proximity is essentially identical in salience to the EF of orientation in that the two show comparably sized CSEs compared with the same base stimulus with just one dot per quadrant. Over the past 100 years, it has been difficult to compare the strengths of different Gestalt principles of grouping because of “apples vs. oranges” comparisons, but because the Ground-Up Constant Signal Method measures the two on a common scale, their magnitudes may be compared directly and fairly. To date this method has confirmed that the three most basic or elemental EFs in human vision are proximity, orientation, and linearity. They are most basic in the sense that they emerge from the 15 simplest possible stimuli and that their EFs do not appear to be reducible to anything more elemental (i.e., the CSE for linearity occurs over and above the CSEs for the proximity or orientation EFs it necessarily contains). Axial symmetry has yielded mixed results; further tests will be needed to determine whether it is or is not a confirmed EF. The results for surroundedness have been somewhat less ambiguous: it does not appear to be an EF, although the evidence is not totally conclusive (Portillo 2009). Work is ongoing to test additional potential EFs using the same Ground-Up, Constant Signal Method to ensure fair comparisons and to isolate the unique contribution made by each EF individually, given that they often co-occur. As a lead up to that, Stupina ([Cragin] 2010) has explored several regions of two-line stimulus space using this method, and she has found up to 8 EFs there. 3.6. Strengths and limitations of the method The primary strengths of the Ground-Up Constant Signal Method are allowing an objective measurement of EF (grouping) strength; ensuring this strength can be compared fairly across different EFs on the same scale of measurement; and ensuring that the EFs it detects cannot be reduced to more elementary EFs. The method has limitation, however. It is almost certainly an overly conservative method that is more likely to miss genuine EFs than to issue false positives. This is because as context elements are added to the base signal discrimination – added dots or line segments – deleterious consequences will accumulate, thus making it harder for a CSE to appear. Besides allowing EFs to arise, the superimposed context elements could mask or crowd the targets (Levi 2008), making performance worse. Moreover, because the added context elements are always identical, they should dilute the dissimilarity of the target to the distracters (Tversky 1977). Adding context elements also increases the chances that perceivers will attend to the irrelevant and non-informative contexts rather than to the target signal, and it increases the overall informational load – the total stimulus ensemble – that must be processed. When CSEs are detected, they occur in spite of these five factors, not because of them. And with the Ground-Up Constant Signal Method where new context elements are piled on top of old, it becomes less and less likely that any benefit from new EFs would suffice to overcome the resulting mountain of negatives. For this reason, efforts are underway to measure the adverse effects of these five factors separately and to correct our CSEs measurements for them. If this effort succeeds, more CSEs – and thus EFs – may become apparent. 4. Other types of Emergent Features This review has focused on EFs underlying classic Gestalt demonstrations that have received wide attention over the last 100 years since their introduction. All of them so far have been in the visual domain, but EFs likely abound in other modalities. There are other likely EFs in vision too that are not normally associated with Gestalt phenomena but might as well be. 4.1. Color as a Gestalt. Color is usually treated as a property of the stimulus and in fact makes the list of “basic features” underlying human vision (Wolfe and Horowitz 2004). However, color is not a physical feature but rather a psychological one; wavelength is the corresponding physical feature, and color originates “in the head,” from interactions of units that are sensitive to wavelength. Color certainly meets the criterion of a non-linear, surprising property emerging when wavelengths are mixed: combining wavelengths seen as red and green on a computer monitor to yield yellow is surely an unexpected outcome (Pomerantz 2006)! What is more, even color fails to qualify as a basic feature in human vision, because it is color contrast to which we are most sensitive; colors in a Ganzfeld fade altogether. Moving (non-stabilized) edges providing contrast are required for us to see color. 16 4.2. EFs in other sensory modalities. Potential EFs arise in modalities other than vision, possibly in all modalities. In audition, when two tones of similar but not identical frequency are sounded together, one hears beats or difference tones, which are so salient that musicians use them to tune their instruments. With other frequency relationships, one may experience chords if the notes are separated harmonically; lowering one of the three tones in a triad of a major chord by a semitone can convert it into a minor chord that, phenomenally, leads to a vastly different percept. Whether this major-minor distinction qualifies as an EF by the CSE criterion advanced here remains to be determined; that would require the majorminor difference to be more salient that the frequency difference separating the two tones that make a chord sound major versus minor. Other potential EFs with simple tone combinations might involve dissonance and the octave relationship. Gestalt grouping arises in the haptic senses, as has been recently demonstrated (Overvliet, Krampe & Wagemans 2012), suggesting that EFs may be found in that modality. Potential EFs may abound in the chemical senses as well; after all, a chef’s final creation is clearly different from the mere sum of its ingredients. Human tasters are notoriously poor at identifying the ingredients in foods, as the long-held secret of Coca Cola’s formula attests. This suggests that what people perceive through smell and taste are relational properties that emerge when specific combinations of odorants or tastants are combined. Future research may identify configural properties in our chemical senses that lead to superiority effects; if so, this should identify the core EFs that guide our perception of taste and odors. 4.3. Hyper-Emergent Features? If novel features can emerge from combinations of more elementary, “basic” features, then can novel features arise from combinations of EFs too, creating something we may call hyper-emergent features? Given that our ultimate goal is to understand how we perceive complex objects and scenes, these may play an essential role there. 5. Conclusions This chapter aims to define EFs, explaining how they are identified and quantified, and enumerating those that have been confirmed to date. The Gestalt psychologists struggled to define grouping, likening it variously to a belongingness or to a glue binding parts together, and advancing ambiguous claims such as, “A strong form coheres and resists disintegration by analysis into parts of by fusion with another form” (Boring 1942). Working from the Theory of Basic Gestalts (Pomerantz and Portillo 2011), we view grouping neither as a coherence, as a glue or a belongingness, nor as a loss of independence when two items form a single perceptual unit. Instead we see grouping as the creation of novel and salient features – EFs – to which perceivers can and do preferentially attend. When we view an isolated stimulus such as a dot, we can roughly determine its x and y coordinates in space, but we are much better determining the distances and angle between two dots than we are at determining the position of either dot. This superiority of configurations, even simple ones, is the defining feature of EFs, and we have uncovered over one dozen that meet this criterion. The goal of future work is to explore additional EFs meeting this criterion and to ensure that these new EF are detectable through other, converging operations such as those derived from selective attention tasks. 5.1. Unresolved issues and challenges One current challenge to this method is that it may be, and probably is, overly conservative, and so is more likely to miss a genuine EF than to false-positively identify one that is not genuine, as noted above. Determining a correction for this is an immediate challenge. 17 A second challenge will be to develop neural and computational models to explain configural superiority. When perceivers view a triangle, we have a fairly clear idea how its three component line segments may be detected by the simple and complex cells discovered decades ago by Hubel and Wiesel (1962). We know less well how a feature such as closure is processed; not only do we not know how the closure of three lines is detected but how that occurs more quickly than the orientation of its three component line segments is detected. A major advance on this problem was made recently by Kubilius et al. (2011), showing that brain area LOC is best able to tell arrows from triangles but that V1 is best able to distinguish line orientations. But how is it that people can respond more quickly to the arrows and triangles if those are processed in LOC then they can respond to oriented line segments that can be processed in V1? A possible explanation is that V1 can detect but cannot compare line orientations; LOC handles the latter, but more slowly with line segments than with whole arrows and triangles. 18 6. References Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 2, 115-147. Boring, E. G. (1942). Sensation and Perception in the History of Experimental Psychology. New York: Appleton-Century-Crofts. Chen, L. (2005). The topological approach to perceptual organization. Visual Cognition, 12, 553-637. Stupina, A.I. (now Cragin, A.I) (2010). Perceptual Organization in Vision: Emergent Features in TwoLine Space. (Unpublished master’s thesis) Rice University, Houston, Texas, USA. Cragin, A.I., Hahn, A.C., and Pomerantz, J.R. (May 2012) Emergent Features Predict Grouping in Search and Classiﬁcation Tasks. Talk presented at the 2012 Annual meeting of the Vision Sciences Society, Naples, FL, USA. In: Journal of Vision, 12(9), article 431. doi:10.1167/12.9.431 Duncker, K. (1929). Über induzierte Bewegung. Ein Beitrag zur Theorie optisch wahrgenommener Bewegung. [On induced motion. A contribution to the theory of visually perceived motion]. Psychologische Forschung, 12, 180–259. Garner, W. R. (1974). The Processing of Information and Structure. Potomac, MD: Erlbaum. Garner, W. R., Hake, H. W. and Eriksen, C. W. (1956). Operationism and the concept of perception. Psychological Review, 63, 3, 149-156. Enns, J. T. (1990). Three dimensional features that pop out in visual search. In D. Brogan (Ed.), Visual Search (p. 37–45). London: Taylor and Francis. Feldman, J. (in press). Probabilistic models of perceptual features. Chapter to appear in J. Wagemans, Oxford Handbook of Perceptual Organization. London: Oxford University Press. Hubel. D. H. and Wiesel, T.N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology, 160, 106-154. Julesz, B. (1981). Textons, the elements of texture perception, and their interaction. Nature, 290 (March 12, 1981), 91 – 97. Kanizsa G. (1979). Organization in Vision: Essays on Gestalt Perception. New York: Praeger Publishers. Kellman, P. J., & Shipley, T. F. (1991). A theory of visual interpolation in object perception. Cognitive Psychology, 23, 141-221. Kogo, N., & van Ee, R. (in press). Neural mechanisms of figure-ground organization: Borderownership, competition and perceptual switching. Chapter to appear in J. Wagemans, Oxford Handbook of Perceptual Organization. London: Oxford University Press. Kubilius, J., Wagemans, J., & Op de Beeck, H. P. (2011). Emergence of perceptual Gestalts in the human visual cortex: The case of the configural superiority effect. Psychological Science, 22, 1296-1303. Kubovy, M. and Van Valkenburg, D. (2002). Auditory and visual objects. In Scholl, B. J. (Ed.), Objects and Attention. Cambridge, MA: MIT Press, (97-126). Levi, D. M. (2008). Crowding--an essential bottleneck for object recognition: a mini-review. Vision Research, 48 (5), 635-654. Neisser, U. (1967). Cognitive Psychology. New York: Appleton, Century, Crofts. Overvliet, K. E., Krampe, R.T., & Wagemans, J. (2012). Perceptual Grouping in Haptic Search: The Influence of Proximity, Similarity, and Good Continuation. Journal of Experimental Psychology: Human Perception and Performance, 38(4), 817-821. Pomerantz, J. R., Carson, C. E., and Feldman, E. M. (1994). Interference effects in perceptual organization. In S. Ballesteros (Ed.), Cognitive Approaches to Human Perception (pp. 123-152). Hillsdale, NJ: Lawrence Erlbaum Associates. Pomerantz, J. R. (2006). Color as a Gestalt: Pop out with basic features and with conjunctions. Visual Cognition, 14, 619-628. Pomerantz, J. R., & Kubovy, M. (1986). Theoretical approaches to perceptual organization. In K. R. Boff, L. Kaufman, & J. Thomas (Eds.), Handbook of Perception and Human Performance (pp. 361 – 36-46). New York: John Wiley & Sons. Pomerantz, J. R. & Portillo, M. C. (2011). Grouping and emergent features in vision: Toward a theory of basic Gestalts. Journal of Experimental Psychology: Human Perception and Performance, 37, 1331-1349. Pomerantz, J. R. & Portillo, M.C. (2012). Emergent Features, Gestalts, and Feature Integration Theory. J. Wolfe & L. Robertson (Eds.), From Perception to Consciousness: Searching with Anne Treisman (pp. 187-192). New York: Oxford University Press. 19 Pomerantz, J. R., & Pristach, E. A. (1989). Emergent features, attention, and perceptual glue in visual form perception. Journal of Experimental Psychology: Human Perception and Performance, 15, 635-649. Pomerantz, J. R., Sager, L. C., & Stoever, R. J. (1977). Perception of wholes and their component parts: Some configural superiority effects. Journal of Experimental Psychology: Human Perception and Performance, 3, 422-435. Portillo, M. C. (2009). Grouping and Search Efficiency in Emergent Features and Topological Properties in Human Vision. (Unpublished doctoral dissertation) Rice University, Houston, Texas, USA. Ramachandran, V. S. (1988). Perception of shape from shading. Nature, 331, 14, 163- 166. Rock, I. (1983). The Logic of Perception. Cambridge, MA: MIT Press. Stephan, A. (2003). Emergence. Encyclopedia of Cognitive Science. London, UK: 2003 Nature Publishing Group/Macmillan Publishers. Townsend, J. T., & Wenger, M. J. (2004). A theory of interactive parallel processing: New capacity measures and predictions for a response time inequality series. Psychological Review, 111, 1003–1035. Townsend, J. T. (1971) A note on the identifiability of parallel and serial processes. Perception & Psychophysics, 10, 161-163. Treisman, A., & Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. Treisman, A., & Gormican, S. (1988). Feature analysis in early vision: evidence from search asymmetries. Psychological Review, 95, 15-48. Treisman, A. & Souther, J. (1985). Search asymmetry: a diagnostic for preattentive processing of separable features. Journal of Experimental Psychology: General, 114, 285-310. Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327-352. Van der Helm, P. A. (in press A). Symmetry perception. Chapter to appear in J. Wagemans, Oxford Handbook of Perceptual Organization. London: Oxford University Press. Van der Helm, P. A. (in press B). Simplicity in perceptual organization. Chapter to appear in J. Wagemans, Oxford Handbook of Perceptual Organization. London: Oxford University Press. Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R (2012a). A century of Gestalt psychology in visual perception I: Perceptual grouping and figureGround organization. Psychological Bulletin, 138 (6), 1172-1217. Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R. Pomerantz, J. R., van der Helm, P. & van Leeuwen (2012b). A century of Gestalt psychology in visual perception II: Conceptual and theoretical foundations. Psychological Bulletin, 138 (6), 1218-1252. Wertheimer, M. (1912). Experimentelle Studien über das Sehen von Bewegung [Experimental studies on seeing motion]. Zeitschrift für Psychologie, 61, 161-265. (Translated extract reprinted as “Experimental studies on the seeing of motion”. In T. Shipley (Ed.), (1961). Classics in psychology (pp. 1032-1089). New York, NY: Philosophical Library.) Wolfe, J. M. & Horowitz, T.S. (2004). What attributes guide the deployment of visual attention and how do they do it? Nature Reviews: Neuroscience, 5, 1-7. 20

Emergent features and feature combination

Related documents

Products

Support

Emergent features and feature combination

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib