In Press, Psychological Science Common fate grouping as feature selection Brian R. Levinthal and Steven L. Franconeri Department of Psychology, Northwestern University Address correspondence to either author: b-levinthal@northwestern.edu, franconeri@northwestern.edu Northwestern University Department of Psychology 2029 Sheridan Rd Evanston, IL 60208 Phone: 847-497-1259 Fax: 847-491-7859 The visual system groups elements within the visual field that are physically separated, yet similar to each other. Despite intense study for a century, mechanisms for many of these grouping processes remain elusive. Here, we propose that a primary mechanism for grouping by common fate is attentional selection of a direction of motion. This account makes a unique prediction – that we must be limited to forming only a single group at a time. If so, then finding a particular group among other groups, or among non-groups, should be highly inefficient. We show that this is true in searches for moving dots for a vertical among horizontal groups (Experiment 1) and motion-linked groups among non-linked objects (Experiment 2). Feature selection may limit the visual system to one common fate group at a time, exposing the experience of simultaneous grouping as an illusion. Key Words: Visual Perception, Perceptual Organization, Attention, Selection The human visual system breaks the incoming image into separate elements, but also reassembles these elements in the form of groups. Grouping processes have been studied for a century, and much of this work has focused on classifying different types of groups (Wertheimer, 1923), determining when grouping occurs (Palmer, 2002), and measuring the relative strength of one form of grouping compared to another (Kubovy & van den Berg, 2008). Although neurally plausible mechanisms have been proposed for some types of grouping (e.g., contour grouping; Roelfsema, 2006), for many other types of grouping such explanations remain elusive. Here we propose a mechanism for the Gestalt principle of common fate, where objects appear grouped when they display the same pattern of motion. We argue that a primary mechanism underlying this powerful form of grouping is the selection of a direction of motion. The parsimony of this explanation rests on previous demonstrations that the selection of a direction of motion (as well as features such as color and orientation) can occur in parallel across the visual field (Andersen, Müller, & Hillyard, 2009; Maunsell & Treue, 2006; Saenz, Buracas, & Boynton, 2002). Selecting a direction of motion could produce a spatially organized map of visual field locations corresponding to the selected direction (see Figure 1), with a ‘bump’ on the map for each element that also moved in that direction (see Huang & Pashler, 2007, for a similar account). The initial motion direction could be extracted from a statistical summary of motion patterns within a display (Williams & Sekuler, 1984), or by selecting a single object and then encoding its motion direction. As the group's motion direction changes, a simple feedback loop could update the selected direction to match that of the group (see Martinez-Trujillo, Cheyne, Gaetz, Simine, & Tsotsos, 2007, for evidence of motion direction updating while tracking translating objects). This map would provide two critical components of the grouping process. First, it would help isolate processing of features and identities to the objects at those locations (Treisman & Gelade, 1980). Simultaneously selecting these locations may lead them to appear to ‘belong’ together, even though they are spatially separated (Xu, 2008). This simultaneous selection may also provide a summary representation (e.g., Alvarez & Oliva, 2009) of other features available at those locations, leading to a holistic representation of these otherwise separate objects. For example, selecting 1 In Press, Psychological Science a set of rightward moving objects could also facilitate judgments about the number of objects (Halberda, Sires, & Feigenson, 2006) or the distribution of sizes of those objects (Chong & Treisman, 2005). Second, because any area of the visual field containing that feature would result in a ‘bump’ in the spatially organized map, the complex distribution of the selected locations would convey the shape of the group (Huang & Pashler, 2007). By enhancing target activity and attenuating distractors in early visual areas (e.g., area V1; Saenz, Buracas, & Boynton, 2002), shape processing could operate as if only those target locations were present in the display. Alternatively, higher-level areas such as the inferior intraparietal sulcus (Xu, 2008) may represent these spatially organized maps. The IPS is associated with tracking of selected locations in the visual field Culham, et al., 1998), and is sensitive to grouping among those locations (Xu, 2008). Figure 1. Without selecting a direction of motion, all moving elements in a display would be processed together, and there would be no common fate grouping. As a consequence of selecting a direction of motion, target elements are enhanced, forming a common fate group. Changes in the direction of selected motion could be updated via feedback connections within motionsensitive areas of the visual system (e.g., MartinezTrujillo et al., 2007). The proposed mechanism is consistent with a powerful property of common fate grouping: a group can be constructed in a massively parallel operation where all elements across the visual field sharing a feature are selected. However, it also makes a unique prediction about a salient weakness – ‘bumps’ on a spatially organized map would be distinguishable only by their location, so we should only be able to construct one group at a time. This contrasts with our phenomenology of seeing multiple common fate groups simultaneously. Here, we report the results of two visual search experiments that demonstrate that this intuition is an illusion. There are severe limitations on the number of common fate groups that can be constructed at once. Experiment 1 In Experiment 1 we asked participants to perform a visual search for a vertical line among horizontal lines, with search objects defined by pairs of moving dots. Because searches for odd-oriented lines typically occur efficiently (e.g., Treisman & Gelade, 1980), this search task is well suited for determining whether common fate groups can be formed in parallel. A control condition verified this efficiency in our displays, by demonstrating parallel search after each common fate group was connected by a 1-pixel-thin line. Methods Participants. Ten participants, either volunteers from the Northwestern University student population (ages 18-21), or paid participants from the Evanston, IL community (ages 18 – 21), participated in this experiment. Apparatus. Stimuli were presented on a 17-inch CRT monitor with 1024x768 resolution and 85HZ refresh rate, and were generated using MATLAB and the Psych Toolbox (Brainard, 1997; Pelli, 1997). Stimuli and Procedure. Displays consisted of pairs of moving dots against a static background with 250 randomly-positioned dots (approximately .3 dots per square degree). Dot pairs moved at a constant velocity (2.65°/sec), and each pair was assigned an initial random angle of motion. Dot pairs moved in a straight-line path until reaching the boundary of an invisible 5.2° square region, where the angle of motion was reflected. This motion continued indefinitely until participants made a present/absent response via key press. Each dot maintained a constant distance (2°) from its pair throughout each trial. In target-absent trials (50%), all dot-pairs were arranged horizontally. In targetpresent trials (50%), a single pair of dots was arranged vertically. In a connected control condition, a single pixel line connected each dot in a pair. Either 2, 4 or 8 dot-pairs were present in each trial, and were placed in adjacent positions along an imaginary 8° radius circle. Movies of samples trials are available at http://viscog.psych.northwestern.edu/demos/ Results Figure 2 depicts the task and results. If only one common fate group is created at a time, adding more groups should substantially increase response times. Response times revealed a significant increase in search slope (55ms/item; t(9) = 17.3, p < .001) as more 2 In Press, Psychological Science distractor objects were present in the display. When common fate groups were replaced by connectivity groups, which can be processed in parallel (Franconeri, Bemis, & Alvarez, 2009; Rensink & Enns, 1995), adding more groups to the display did not substantially affect response times (6ms/item; t(9) = 2.7, p = .023). Furthermore, search through common fate groups was significantly less efficient than search through connectivity groups (t(9) = 12.3, p < .001). Thus, the costs of searching through common fate groups were specific to the construction of those groups, and not to increases in display complexity or shape identification requirements of the search task. Figure 2. In each experiment, participants indicated the presence or absence of a target via keypress. Experiment 1 – Vertical targets and horizontal distractors moved in straight-line paths and bounced off of invisible walled regions around fixation. Dot pairs either formed common-fate groups (shown) or were connected by onepixel thin lines. When common fate grouping was required to perform the task searches were inefficient (55ms/group) compared to when the elements were connected by lines (6ms/group). Experiment 2 – dot pairs moved in orbiting paths; the target was a dot pair that moved in-phase among dot pairs that moved out of phase. In a control condition, target dots were indicated by a valid color cue. When participants searched for a common fate group among non-groups, their searches were inefficient (82ms/group) compared to when the target group was cued (7ms/group). Experiment 2 Experiment 1 showed a severe capacity limitation on extracting the shape of a common fate group. However, even if shape information is not available, the visual system may be able to detect coarse details of groups in the display (e.g., the number of ‘clusters’ of grouped elements; see Trick & Enns, 1997). Whereas discrimination of the shape of common fate groups may be inefficient, observers may nonetheless be able to efficiently detect the existence of such groups. To test this possibility, we altered the visual search task so that participants judged whether a single common fate group was present in a display filled with non-grouped objects (see Figure 2). Methods Participants. Twelve participants (ages 18 – 21) took part in this experiment, either volunteers from the Northwestern University student population or paid participants from the Evanston, IL community. Stimuli and Procedure. Dot-pairs were initially arranged similarly to Experiment 1, except that 5 or 9 pairs were used in this experiment. All dots followed an orbiting motion at a rate of 6.28°/sec around an imaginary circle with a 1° radius. In distractor dot pairs, each dot revolved around a separate locus, and 180° out of phase. Distractor dots never approached closer than 0.5°, and were never separated by more than 3.5°. Target dots moved in an identical fashion, but revolved around separate loci in phase, and therefore always maintained a constant 2° distance. Target dots were assigned a unique phase and distractor dots were always at least 75° out of phase from the target group. The motion persisted until participants indicated a target present/absent response via key press. Participants were instructed to determine whether a target was present (50%) or absent (50%). Movies of samples trials are available at http://viscog.psych.northwestern.edu/demos/ Results Response times revealed significant costs (82ms/item; t(11) = 12.3, p<.001) as more distractor pairs were present, suggesting that even detecting the presence of common fate groups shows severe capacity restrictions. To ensure that the costs of additional distractors were not simply due to increased display complexity, a control condition confirmed that when the target group was cued with a unique color, adding more distractors to the display did not affect response times (7ms/item; t(11) = 1.49, p=.16). These two slopes also differed significantly from each other, t(11) = 7.45, p < .001. 3 In Press, Psychological Science While the present experiment shows that detecting groups is severely capacity-limited, there may be ‘tricks’ that would allow an observer to detect them efficiently in other displays. For example, consider a common fate group consisting of coherently moving elements placed among randomly moving elements. As the number of objects within the group increases (relative to the total number on the screen), that group’s motion direction would begin to dominate the ‘global’ distribution of motion directions available from the entire display. In this case, competition among multiple motion directions may be resolved through simple feedback mechanisms in the visual system (Chey, Grossberg, & Mingolla, 1997; Tsotsos, et al., 1995), such that the dominant motion direction would tend to be selected first. This would result in what appears to be an efficient and seemingly automatic representation of a common fate group. In contrast, Experiment 2 shows that when a common fate group’s motion direction does not dominate others by a significant ratio, that group cannot be efficiently detected. Discussion We have argued that a core mechanism for common fate grouping is the selection of a direction of motion. The selection of a motion direction would enhance the neural activity associated with similarly-moving elements across the visual field, providing the visual system with a map of locations for further processing. This account makes the surprising prediction that only one common fate group can be formed at a time. We found evidence consistent with this prediction across two experiments. In Experiment 1, participants searched among groups for a vertical line among horizontal lines. When common fate grouping was required to perform the task, searches were highly inefficient. But when elements were connected by lines, the vertical group could be instantly identified. Similarly, in Experiment 2, when participants searched for a single common fate group among nongrouped objects, searches were highly inefficient. We argue that these inefficient searches were due to the need to sequentially select the current motion direction of each group. We note that two alternatives are theoretically possible. First, common fate groups might be formed in parallel, but the search process cannot efficiently access that representation. This alternative is less likely, because previous studies show visual search tasks to be a sensitive measure of visual grouping processes (e.g., Trick & Enns, 1997; Rensink & Enns, 1995; Wolfe & Bennett, 1997). Second, the inefficient search for common fate groups might stem from a limited-capacity parallel grouping process (see Palmer, 1995, for discussion). This alternative is also less likely, because it is unclear why such a resource limitation might arise, and this type of solution would require additional mechanisms that maintain the representations of a potentially unrestricted number of common fate groups. Compared to each of these alternatives, the sequential selection account has the powerful advantage of parsimony. Because previous work shows that motion directions can be selected in a global fashion (Saenz, Buracas, & Boynton, 2002), we can concretely specify how this known mechanism could support common fate grouping. The burden of proof therefore falls on alternative accounts to demonstrate experimental effects that cannot be explained by the serial feature selection account. The feature selection account, at first glance, appears to contradict prior observations that feature-based grouping can be accomplished ’outside the focus of attention‘ (Kimchi & Razpurker-Apfeld, 2004; Moore & Egeth, 1997; Russell & Driver, 2005). Those prior studies showed that when participants perform a demanding primary task at fixation, the perceptual organization of an irrelevant background (groupings of luminance, color, or orientation) influenced performance, even when participants reported no awareness or memory for the background's content. However, any ostensible conflict with our results would be due to different definitions of 'attention.' Prior studies show a processing bottleneck where perceptual groups do not reach the processing stages required for awareness or memory. Grouping by feature selection would still be possible, especially if that type of selection were not incompatible with the primary task. Although the present study has focused on common fate grouping, feature selection may be a more generalized mechanism for any type of similarity grouping, including not just similarity of motion, but also similarity of color, shape, or orientation (Huang & Pashler, 2007). If so, similarity grouping may be subject to the same one-group limits as common fate grouping: only one group can be created at a time. This prediction finds support in recent studies –when observers are asked to determine whether multicolored patterns are symmetrical, judgments are fast when the displays consist of few colors, and slower when displays consist of several colors, suggesting that symmetry judgments can be made for only a single color subpattern at a time (Morales & Pashler, 1999; Huang & Pashler 2002). The proposal that similarity grouping stems from feature selection implies a substantial difference between similarity grouping and other types of grouping. Specifically, grouping by proximity, connectivity, or 4 In Press, Psychological Science common region may rely on a different mechanism that can produce discrete units in parallel (Franconeri Bemis, & Alvarez, 2009; Palmer & Rock, 1994; Rensink & Enns, 1995). The term ‘grouping’ is likely too broad to capture the diversity of mechanisms that cause some elements of the visual field to associate with others. Selection of a motion direction may not be the only mechanism available for grouping moving objects. For some types of frequently encountered moving patterns, such as bodies walking, wings flapping, or mouths moving during speech, there are almost certainly longterm representations for more specific patterns of motion (Cavanagh, Labianca, & Thornton, 2001) that could facilitate perception. A challenge for our proposed mechanism, which predicts the availability of only a single surface at a time, is to explain more complex grouping abilities related to common fate such as the appearance of rigid three-dimensional objects from numerous and dense patterns of moving dots (structurefrom-motion, Wallach & O’Connell, 1953). While such percepts might require complex local processing of the motion pattern (Ullman, 1979) we argue that the simpler mechanism described here may produce surprisingly rich percepts, especially when combined with other cues such as statistical information (Alvarez & Oliva, 2009; Balas, Nakano, & Rosenholtz, 2009) about the distribution of motion directions, and edges created by motion contrast (Regan & Beverley, 1984). For both simple common fate groups, and possibly more complex motion patterns, feature selection presents a parsimonious account of grouping by motion. We may construct a single such group at a time, and anything more may be an illusion of perceptual detail. Acknowledgements. We thank Hyunyoung Park for assistance in data collection. This work was partially supported by NSF CAREER grant BCS-1056730 (S.F), and NIH grant T32NS047987(B.L.). References Alvarez, G.A., & Oliva, A. (2009). Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proceedings of the National Academy of Sciences, 106, 7345-7350. Andersen, S.K., Müller, M.M., & Hillyard, S.A. (2009). Color-selective attention need not be mediated by spatial attention. Journal of Vision, 9(6):2, 1-7. Balas, B., Nakano, L., & Rosenholtz, R. (2009). A summary-statistic representation in peripheral vision explains visual crowding. Journal of Vision, 9(12):13, 1-18. Brainard, D.H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433-436. Cavanagh, P., Labianca, A.T., Thornton, I.M. (2001). Attention-based visual routines: sprites. Cognition, 80 (1-2), 47-60. Chey, J., Grossberg, S., & Mingolla, M. (1997). Neural dynamics of motion grouping: from aperture ambiguity to object speed and direction. Journal of the Optical Society of America, 14, 2570-2594. Chong, S.C., & Treisman, A. (2005). Statistical processing: computing the average size in perceptual groups. Vision Research, 45, 891-900. Culham, J.C., Brandt, S.A., Cavanagh, P., Kanwisher, N.G., Dale, A.M., & Tootell, R.B.H. (1998). Cortical fMRI activation produced by attentive tracking of moving targets. Journal of Neurophysiology, 80(5), 2657-2670. Franconeri, S.L, Bemis, D.K., & Alvarez, G.A. (2009). Number estimation relies on a set of segmented objects. Cognition, 113, 1-13. Halberda, J., Sires, S.F., & Feigenson, L. (2006). Multiple spatially overlapping sets can be enumerated in parallel. Psychological Science, 17(7), 572 – 576. Huang, L., & Pashler, H. (2002). Symmetry detection and visual attention: a “binary map” hypothesis. Vision Research, 42, 1421-1430. Huang, L., & Pashler, H. (2007). A Boolean map theory of visual attention. Psychological Review, 114, 599631. Kimchi, R., & Razpurker-Apfeld, I. (2004). Perceptual grouping and attention: Not all groupings are equal. Psychonomic Bulletin & Review, 11(4), 687-696. Kubovy, M., & van den Berg, M. (2008). The whole is equal to the sum of its parts: a probabilistic model of grouping by proximity and similarity in regular patterns. Psychological Review, 115(1), 131 – 154. Martinez-Trujillo, J.C., Cheyne, D., Gaetz, W., Simine, E., & Tsotsos, J.K. (2007). Activation of area MT/V5+ and the right inferior parietal cortex during the discrimination of transient direction changes in translational motion. Cerebral Cortex, 17(7), 17339. Maunsell, J.H.R., & Treue, S. (2006). Feature-based attention in visual cortex. Trends in Neuroscience, 29, 317-322. Moore, C.M., & Egeth, H.E. (1997). Perception without attention: Evidence of grouping under conditions of inattention. Journal of Experimental Psychology: Human Perception and Performance, 23, 339-352. Morales, D., & Pashler, H. (1999). No role for colour in symmetry perception. Nature, 399, 115-116. 5 In Press, Psychological Science Palmer, S., & Rock, I. (1994). Rethinking perceptual organization: The role of uniform connectedness. Psychonomic Bulletin & Review 1(1), 29-55. Palmer, J. (1995). Attention in visual search: Distinguishing four causes of set-size effects. Current Directions in Psychological Science, 4, 118123. Palmer, S.E. (2002). Perceptual grouping: It’s later than you think. Current Directions in Psychological Science, 11(3), 101-106. Pelli, D.G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437-442. Regan, D., & Beverley, K. I. (1984). Figure-ground segregation by motion contrast and by luminance contrast. Journal of the Optical Society of America, 1, 433 – 442. Rensink, R.A., & Enns, J.T. (1995). Preemption effects in visual search: Evidence for low-level grouping. Psychological Review, 102, 101-130. Roelfsema, P.R. (2006). Cortical algorithms for perceptual grouping. Annual Review of Neuroscience, 29, 203-227. Russell, C., & Driver, J. (2005). New indirect measures of “inattentive” visual grouping in a change-detection task. Perception & Psychophysics. 64(4), 606-623. Saenz, M., Buracas, G. T., & Boynton, G. M. (2002). Global effects of feature-based attention in human visual cortex. Nature Neuroscience, 5, 631-632. Trick, L. M., & Enns, J. T. (1997). Clusters precede shapes in perceptual organization. Psychological Science, 8 (2), 124-129. Treisman A.M., & Gelade, G. (1980). A featureintegration theory of attention. Cognitive Psychology, 12(1), 97-136. Ullman, S. (1979). The interpretation of structure from motion. Proceedings of the Royal Society B. 203, 405-426. Wallach, H., & O’Connell, D. N. (1953). The kinetic depth effect. Journal of Experimental Psychology, 45, 205-217. Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt. Psychologishe Forschung, 4, 301-350. Williams, D.W., & Sekuler, R. (1984). Coherent global motion percepts from stochastic local motions. Vision Research, 24(1), 55-62. Wolfe, J.M. & Bennett, S.C. (1997). Preattentive Object Files: Shapeless bundles of basic features. Vision Research, 37(1), 25-44. Xu, Y. (2008). Representing connected and disconnected shapes in intraparietal sulcus. Neuroimage, 40, 1849 – 1856. 6