Introduction: Psychology has demonstrated a long-standing interest in the remarkable ease with which humans recognise faces. It is no mean feat to exhibit perceptual discrimination in an object-class that is largely homogenous; as well as to associate and retrieve large amounts of - sometimes arbitrary e.g. names – information to each recognisable face. The published literature in face-research until now, can be crudely dichotomised into these two areas: i) the nature of perceptual expertise in face-discrimination; and ii) decoding of this perceptual input for information such as person-identity, race, social communication, etc. Models have been conceived for these two broad stages of face-processing (e.g. {Valentine, 1991 #201;Bruce, 1986 #202} and more recently, efforts have been made to reconcile the two in a unified computational model of face processing – from ‘pixels to person’ {Burton, 1999 #200}. What is often neglected in models of face-processing, however, is a stage that can be reasonably assumed to take place even before face-perception. Before a face can be processed, it must be attended to. The processed face is not always presented at the centre of fixation, as an isolated percept, as most models of face-processing seem to implicitly assume. A naturalistic visual scene is often “cluttered” with a variety of non-face images. Thus, it is paramount for a face to be detected and oriented towards first, before it can be processed. It is this stage of face-processing that this dissertation concerns itself with. More specifically, it is the aim of the experiments reported here to investigate the relationship between spatial-attention and face-detection. To do so, face-detection shall be studied within experimental designs more traditionally associated with spatialattention. In this chapter, the nature of face-detection will be first be illustrated in a pioneering study on the topic; that is, the face-detection effect. Converging findings on the same theme will be presented and their implications discussed. Following this, two current face-detection models will be considered. Having discussed face-detection as a demonstrable process as well as its role in face-processing models, an attempt will be made to understand the mechanics of this process. It has been suggested that facedetection occurs easily because spatial attention automatically orients to face-stimuli (de Gelder & Rouw, 2001). This claim will be investigated through a selective review of experiments that cover the broad themes of spatial-attention and face-processing: visualsearch, ERP studies, change-detection, neuropsychological patient testing. What is face-detection? The term face-detection implies several things. Firstly, it refers to the cognitive stage when a person is only just conscious of the presence of a face. At this point, there is no recognition of the person or arguably, even the object-type. Also, implicit to the term “face-detection” is the assumption that this is a process that is special to faces as a visual object; or at least, that a face-like visual pattern fulfils a set of criterion that grants it privileged access to conscious awareness. Succinctly, it is the ability to be aware of the presence of a face prior to actually registering that the detected image is a face; or any particular person’s face for that matter. Last but not least, it suggests that the visual system can register the presence of complex configurations - as opposed to line orientations and blobs {Hubel, 1963 #150;Hubel, 1968 #151} – even early on in the visual processing stream, depending the perceived configuration’s identity. This runs counter to the popular notion that perception is entirely bottom-up; that is, humans reconstruct their visual environment from binding simple features into complex configurations. The following paragraphs will present experimental support for the claim that the structural configuration of a face can facilitate its detection even prior to classification. Are Faces Special? The human face is a visual pattern with obvious biological significance. Whether or not the perceptual processes pertaining to a face-pattern is special continues to be a topic that is much debated over. There are certainly reasons to believe that faces might be perceived differently from other objects and a brief selection will be presented here. The following is only meant to introduce the reader to a selective background of the issues; primarily, to explain why face-processing is considered by some to be special from general object processing. Not everyone is in agreement with this view and comprehensive reviews of this topic are readily available elsewhere in the literature (e.g. {Kanwisher, 2000 #170;Tovee, 1998 #172}. To justify this claim, it is necessary to prove that there exists processes supporting face-perception that is not utilised in the perception of other objects. Evidence supporting the position that faces are special, stems mainly from three lines of enquiry. From behavioural findings, a common finding is that recognition accuracy for faces suffers a disproportionate decrement with a vertical image-inversion, in comparison to other objects that are similarly mono-oriented e.g. houses {Yin, 1969 #157}. This has led to the claim that the processes driving face-recognition are more sensitive to the configural layout of the image’s component features, compared to general object-recognition. This is not to say that face-recognition does not rely on featural information at all. Rather, face-processing relies so heavily on the configurational information present in a face-image, such as the spatial relations between features, that an entirely unfamiliar face-percept can be derived merely by fusing the top and bottom halves of highly familiar faces (cf. {Young, 1987 #175}). This effect can be negated simply by inverting this ‘chimeric’ face, validating both the assumption that inversion specifically distorts the configural information present in a face-image as well as the reliance of face-processing on this information. Even when learning face-features e.g. nose, it was reported that it was better learnt when presented within the context of an upright face compared to an inverted face {Tanaka, 1993 #177;Tanaka, 1997 #178}. Given the vital relevance of such studies with dissertation, the effects of face-inversion and how it affects configural processing will be covered in greater detail later. Suffice to say for now, that face-processing is special from object-processing in its particular reliance on configural information. Alternatively, claims that face-processing is special can also be obtained at the neuronal level, from the use of a host of neurophysiological techniques. Functional neuroimaging investigations have reported regions in the right fusiform gyrus that respond selectively to face-images, at least twice as strongly for faces as compared to a variety of non-face object stimuli {Kanwisher, 1997 #179;Sergent, 1992 #180}. Scalp ERP readings and MEG studies that benefit from a finer temporal resolution of eventrelated response also corroborate these results by similarly showing selective responses to the presentation of upright faces {Liu, 2000 #182;Bentin, 1996 #181}. Single-cell recordings of the primate brain also reveals cells at the neuronal level, in the temporal cortex, that fires exclusively to faces and not other objects, rendering further support of the specificity of face mechanisms {Perrett, 1982 #164;Perrett, 1984 #165}. The existence of specific neural mechanisms that respond preferentially to the presence of faces accords with the notion that face-processing is special and different from object recognition (however, see {Tarr, 2000 #183} for an alternative view). In fact, Haxby and colleagues {Haxby, 2000 #184} conducted a detailed review of neurophysiological studies pertaining to face-processing and concluded that there was sufficient information to propose a plausible model of a face-specific neural system, fully illustrating the relationships of specific neural regions to corresponding aspects of face-perception. Finally, the double dissociation between neuropsychological patients with specific impairments of prosopagnoia and general object agnosia distinguishes face-processing as an independent process. Prosopagnosia was first reported by Bodamer {Bodamer, 1947 #171} and refers to a neurological condition whereby sufferers experience a specific deficit in identifying familiar faces (for recent review; see {De Renzi, 1997 #152}. By comparing different patient groups that suffer from selective cognitive impairments, a case can be made for separable processes. For example, lesion patient (C.K.) suffers from severe object agnosia and cannot recognise simple line drawings of everyday objects but retains the ability to recognise familiar faces. This, is in stark contrast to a prosopagnosic such as patient L.H. demonstrates a reverse pattern of cognitive performance {Moscovitch, 1997 #153}. The existence of such double dissociations can be treated as evidence for the uniqueness of face-processes, from general object recognition. Later on in the chapter, we shall examine how further testing of prosopagnosic patients on a variety of face-processing tasks has allowed theoretical models incorporating the process of face-detection to be formed. The findings presented here is merely a sample of the reasons that has motivated the research of face-perception as a distinct system of processes from general object recognition. To surmise, face-perception is believed to be more reliant on the abstract configural information contained within the visual image, compared to general object recognition. In addition, there exists specific neural substrates that respond selectively to faces and not other objects. Lesioning of these specific regions can result in a specific neurological disorder i.e. prosopagnosia, that exhibits face-specific processing deficits. Hence, the visual image of a face is commonly believed to be processed differently from any other object. Still, the question remains as to where specifically and how early on in the visual processing stream are faces accorded their special status. Next, we will discuss how the face-pattern can be treated preferentially, even at an early perceptual stage. Face Detection Effect While there are those who believe that face-recognition is special and different from the recognition of other object stimuli (e.g. {Ellis, 1989 #141}, the distinction is commonly thought to involve higher cognitive processes of recognition rather than lowlevel visual processes such as reflexive orienting. The traditional viewpoint, such as implied by Bruce & Young’s (1986) popular model, is that face-processing is a special topic of discussion only after it has been structurally encoded and that the processing of structurally encoding a face is no different from that of any other object. In the late 1980s, several studies mooted the possibility that faces could enjoy privileged status at an early visual processing stage, prior to being structurally encoded and identified. In the general experimental paradigm, participants were presented with a target stimuli that appeared on either side of a fixation cross and were expected to make a two-alternative forced choice (2AFC) response corresponding to the target’s position whenever they detected the presence of a visual stimuli. By introducing a backward visual mask shortly after target presentation, presentation times of the target stimuli could be varied for each participant until a consistent accuracy performance of 75% was achieved. This measure was termed the detection threshold. Comparing the detection thresholds for different visual patterns, the main findings were that participants had lower detection thresholds for a normal face stimuli (38ms) in comparison to an equally complex visual target; that is, a scrambled face comprising the same parts (56ms) {Purcell, 1986 #27}. A later study replicated the same findings using an inverted face as a comparison stimuli, showing that it is specifically the upright and normal configuration of a face features that enhanced its detection {Purcell, 1988 #24}. In a later section of this chapter, it will be seen that the face-detection effect can be demonstrated, even in prosopagnosic patients who suffer from specific deficits in face-recognition and were previously believed to have lost all face-processing related processes {de Gelder, 2001 #33;de Gelder, 2000 #34}. The crucial point to note in these experiments is that successful task completion did not depend on identifying the target. In fact, when participants were presented with the same experimental trials and asked to classify the targets according to identity, the classification threshold for upright faces were significantly longer than its corresponding detection threshold (Experiment 5: Purcell & Stewart, 1988). The significance of these findings is that the configural structure of a face facilitates its detection even before the structure is fully perceived to allow for identification. In a more recent experiment, the same findings were replicated with the use of smiley faces (see Fig. 1.2 for examples) demonstrating that this effect is not dependent on a detailed or veridical face-pattern (Experiment 1:{Shelley-Tremblay, 1999 #121}. Target stimuli were presented for a fixed duration (16ms) before the application of a backward visual mask. This experiment differed from those conducted by Purcell & Stewart (1986; 1988) in several ways. Firstly, trials varied for target presence and participants were only required to indicate whether or not a stimuli was presented between the start of trial and the backward visual mask. Hence, no localisation decision was necessary. Also, participants were requested to provide a subjective rating for each target stimuli’s clarity on a scale of 1-5, when deemed present. Finally, fixed measurements of detection accuracy were taken across 5 different stimulus onset asychrony (SOA) between target stimuli and mask instead of the detection threshold measure. In support of previous findings, measures of detection accuracy were higher for normal upright smiley faces than scrambled and inverted faces. In addition, subjective ratings for clarity were similarly higher for normal upright smiley faces over scrambled and inverted faces. To understand the face-detection effect, it is important to understand the timecourse of this effect. One way of doing so is by directly measuring single-cell responses of face-sensitive neurons when faced with this task. Preceding the study conducted by Shelley-Tremblay & Mack (1999), the above behavioural findings were extended by a series of experiments that utilised the same experimental procedure, but with both macaques and humans {Rolls, 1994 #7}. Using macaques allowed for single-cell recordings to be retrieved from a neuronal population in the macques’ superior temporal sulcus that were already known to show preferential firing to faces. The primary aims of this experiment were to compare differences of neural firing rates in relation to variations of the experimental parameters, such as to better understand how neural events corresponded with the behavioural findings of Purcell & Stewart’s (1996;1998) experiments. Thus, the firing rates of face-selective neurons were compared across a range of SOAs between a photograph of an upright/scrambled face (16ms) and a visual mask. The main findings were as such: i) the introduction of a backward visual mask served to attenuate neural-firing such that the neuron would stop firing shortly after the introduction of the mask; ii) at short SOAs (i.e. 20ms), neural firing rates were indiscriminate for normal and upright faces; iii) gradually, the neurons fired more responsively to the upright face over the scrambled face, as the SOA is increased through 60ms. When the same experiment was conducted with human participants, their findings coincide with Shelley-Tremblay and Mack’s (1999) for measures of judged clarity; but more importantly, upright faces were judged to be significantly clearer than scrambled faces only within the same critical SOA range of 40-60ms. This also corresponds with the psychophysical findings of Purcell & Stewart (1986; 1988) that demonstrated the emergence of the FDE in the temporal window of 40-60ms. Considering the neurophysiological and behavioural data in concert, we could surmise that the rapid timing of face-detection (that is, the point in time when the image is deemed sufficiently clear to register in the consciousness) bears relation to when firing rates of face-specific neurons begin to discriminate in favour of normal upright faces. The face-detection effect claims that the early stages of the perceptual system is sensitive to the structure of a face. No doubt, this sounds like a contradiction in itself. After all, why should and indeed, how could the identity or object-class of a visual pattern i.e. faces facilitate its detection prior to its identification? In other terms, how can a face speed along its own detection by virtue of its structure before it is registered as a face, at least on the conscious level. In neurophysiological studies describing the architecture of the visual cortex claim, the early stages of the primate visual pathway (i.e. V1) is only known to be responsive to edges and blobs, not complex patterns such as faces {Dow, 2002 #142; Hubel, 1968 #8}. Traditional accounts of visual object recognition will argue that a complex figure i.e. face can be perceived only when mentally reconstructed from these basic sensory inputs (e.g. Marr, 1985). The facedetection effect, however, imputes that the face-pattern has an influence, much earlier in the visual pathway than previously assumed. To surmise, the mere configuration of a face-pattern enjoys certain privileges in the early stages of visual processing. Specifically, prior to its identification. This implies an early face-specific process in the visual stream that precludes face identification. Moreover, this effect operates even with basic faceschemas i.e. smiley faces that are not strictly speaking, faces. This effect also has a specified time-course of between 40-60ms, during which a face-pattern is considered to be significantly clearer than other images comprising the same visual components. 1.2 Corroborating Evidence In the previous section, the face-detection effect was examined in detail. Essentially, it raises the possibility that human perceivers can utilise the structural configuration of a face to facilitate its detection even before the same image is fully identified as a face. This effect is believed to reflect face-selective processes that emerge early on in the visual processing stream. Bearing this in mind, we will now consider other findings in the face-research literature spanning across a wider breadth of methodologies, that could render support for the same claim. Developmental studies Another worthwhile question to ask is whether this privileged status is conferred to faces as a result of object familiarity or if it is, unlike the process of face-recognition that requires learning, an innate function. Studies that investigate neonatal visual preferences is one way for testing the independency of face-processes that motivate detection and identification. Neonates are not expected to possess the extensive experience with faces that adults do. Despite this, Goren, Sarty & Wu {Goren, 1975 #155} have reported that newborns (3 – 27 mins) tend to monitor simple face-like patterns over scrambled images comprising the same elements and controlled for symmetry. More speculatively, this sensitivity enables infants (as young as 12.5 – 201 hrs) to recognize their mother’s face {Bushnell, 1989 #203} and by the age of 17-22 weeks, discriminate between different faces {Fagan, 1972 #204}. Thus, it can be assumed that prior to the acquisition of any extensive experience with face-patterns, infants do in fact orient towards face-patterns. Some authors might argue that perceptual precocity of this nature reflects an innate selectivity for socially significant stimuli i.e. face-like patterns (Bowlby, 1969; Fantz, 1961; Gibson, 1969). After all, there is an ecological benefit in doing so and it is not unreasonable to argue for an evolved predisposition towards faces that is hard-wired into the neural system. Termed the structural hypothesis, proponents of this view argue that face-like patterns have privileged access to the visual system; perhaps due to their social significance. There is, however, a more parsimonious explanation known as the energy hypothesis that focuses on the limitations of an early visual system and how an infant’s visual preference is more tightly governed by what it is, effectively, capable of sensing {Kleiner, 1987 #156}. Kleiner (1987) argues that early affinity to a face-like pattern can be explained, without recourse to the stimuli’s social significance, simply by the fact the visual properties of a face-like stimulus (measured in terms of spatial frequencies) is in sync with what early infants are able to see. The early visual system can be surmised as being particularly responsive to patterns of low spatial frequencies and high contrasts {Banks, 1985 #145;Banks, 1978 #143;Banks, 1983 #144}. Kleiner (1987) does not dispute that an infant is likelier to orient towards a face-like pattern over a simple lattice pattern. However, she argues that this is by virtue of the amount of visual information that ‘survives’ the filtration imposed by the infant’s visual limitations in acuity and contrast. A face-like pattern is claimed to survive this process far better than any other patterns. However, Kleiner (1987) does not venture to suggest if there might be comparable image-patterns with the same level of ‘energy’ as face-patterns that could result in comparable preferential viewing in infants. In Kleiner’s (1987) study, preferential viewing levels of infants (mean age = 1.7 days) were compared on a set of images that were carefully controlled for levels of Fig 1.2: Experimental stimuli used in Kleiner amplitude and phase (1987): A – schematic face figure; B – complex lattic figure; (see Fig 1.2 for examples). Phase levels determine the configural layout of an image C – hybrid image of image A’s phase and B’s amplitude while amplitude levels determine the level of image contrast or stimulus ‘energy’ levels. levels; D – hybrid image of A’s amplitude and B’s phase In her study, infants preferred the images in the order of A>D>(B=C). levels Firstly, it should be noted that her findings replicate early findings of how infant observers will tend to prefer a schematic face image above all the other images. What is interesting though is that the hybrid image D should be viewed preferentially over images B and C, when most adult observers would rate hybrid image C as being more face-like. These findings are not easily interpretable. Kleiner (1987) accounts for these findings by claiming that sensitivity to a face-configuration is a secondary criterion in the prioritisation of viewing preference. She claims that infants are first sensitive to the levels of perceptual ‘energy’ contained within an image but thereafter, they will attend preferentially to the face-configuration. The debate concerning precedence of a face-configuration in infant visual preference was more recently resolved. A series of experiments were designed with the specific aim of directly comparing the influence of a normal face configuration versus optimal spatial frequencies on infant viewing preferences {Valenza, 1996 #207}. In the first few experiments, the above findings were replicated and infants were found to prefer a schematic face configuration over an equivalent image with misplaced features, as well as patterns that contained spatial frequencies tailored to fit an infant’s early visual system over those that did not. By subsequently juxtaposing these same images that elicited these visual preferences, it was found that it was the face-like pattern that captured the infants’ attention, rather than the pattern of optimal spatial frequency. Thus, there is strong claim to the existence of an early face-monitoring system that primes infants to the presence of a face-image, independent of the limited range of spatial frequencies that characterise the early visual system. Overall, all the evidence presented so far supports the existence of an early sensitivity to an upright face-configuration in infants, that cannot be attributed to perceptual face-learning. The simple explanation for this phenomenon is to postulate that humans possess an innate visual affinity to faces and that this corresponds with a built-in neural correlate. Certain neurological studies support this view and has found hemispheric specialisation to face-processing within the first few months of infancy {de Schonen, 1986 #206;de Schonen, 1987 #205}. Despite this proof of a precocious sensitivity to the face configuration, studies that compare different age-groups for identification skills have demonstrated that perceptual expertise commonly associated with face-recognition does not fully develop until around 10-11 years of age {Feinman, 1976 #209;Ellis, 1990 #208}. In fact, young children are not believed to capitalise on the more subtle but more disciminative configural differences between highly similar faces as a means for identification until puberty, relying instead on inconsistent featural differences such as hairstyles {Flin, 1980 #211;Carey, 1980 #210}; although, see {Flin, 1985 #212} for a slightly improvised account). Hence, even in developmental studies do we notice this distinction between the early visual preference for the face-configuration i.e. face-detection, and the ability to identify and discriminate a face from highly similar counterparts i.e. face-recognition. Change detection It is sometimes believed that conscious perception cannot occur without attention. A group of phenomenon that support this notion includes: inattentional blindness {Mack, 1998 #194}, the attentional blink (Raymond, Shapiro, & Arnell, 1992; Shapiro, 1994), repetition blindness (Kanwisher, 1987; Kanwisher & Potter, 1990), change blindness {Simons, 1997 #195;Rensink, 1997 #193} and visual-neglect that occur with neuropsychological patients (Rafal, 1998; Rafal & Robertson, 1995; Bisiach, Luzzatti, & Perani, 1979). Generally, these studies operate on the assumption that visual attention can be circumscribed by a zone of finite spatial parameters. Thus, participants can be empirically tested for awareness or change-detection of stimuli that fall outside this “zone of attention”. Experiments – such as in inattentional blindness {Mack, 1998 #194} - usually involve participants attending to a specified spatial region, where a task-relevant stimuli is located. During the experiment, an unexpected (or the critical) stimuli appears and participants are post-experimentally quizzed for their conscious awareness of this unexpected event. The consistent finding was that participants did not notice the appearance of the critical stimuli when it appeared in a different region from the studied stimuli, but did so when it was close to the attended region. In order for a visual object to reach consciousness, it is necessary to first attend to the given object. This ties in with our current interest with the face-detection effect. By virtue of their configuration, perceivers are highly conscious of a face’s presence in a visual scene, a well-replicated finding in tests for the face-detection effect {Purcell, 1986 #27;Purcell, 1988 #24;Shelley-Tremblay, 1999 #121}. Concurring with the face-detection effect, was Mack and Rock’s 1998) lack of success in replicating findings of inattentional blindness with upright faces. Despite their best efforts to draw attention away from the critical stimuli i.e. faces, their participants’ attention were always inevitably drawn to the faces’ presence. The only manipulation that rendered faces susceptible to inattentional blindness was an imageinversion of their natural configuration. Such findings indicate that there might be a spatial component to the face-detection effect. The upright face-configuration is particularly resilient to a good variety of experimental paradigms designed to induce change blindness. Another good example is the use of a flicker paradigm introduced by Rensink and colleagues {Rensink, 1995 #197;Rensink, 1997 #193}. In the flicker paradigm, the presentation of an original image repeatedly alternates with the same image, modified. Interspersed between the two presentations is a blank field interval that prevents change-detection from the use of lowlevel motion signals. During each trial, the observer is allowed to view this repeated sequence for as long as is required to identify the modified differences between the original image and its modified counterpart. Traditionally, such experiments are used with real-world scenes and the main finding is that the flicker across the two disparate images can obscure even large visual changes, such that it takes a surprisingly long time for obvious changes to be registered. Predominantly, changes involving objects of marginal interest are more susceptible to the flicker effect than objects of central interest {Rensink, 1997 #193}. Unfortunately, there has not been a systematic study to qualify what counts as an object of marginal or central interest. Objects that are classified to be of central interest are simply those that have been mentioned in the verbal description of each scene, by at least 3 or more participants and objects of marginal interest are those that have not met this critertion. This definition is circular and does not indicate why certain objects should be of central or marginal interest. Predictions in accordance with the face-detection effect would predict that an upright face configuration should be of central interest and more resilient to the flicker paradigm than an equally complex image e.g. inverted/scrambled face. Using face-stimuli as the sole critical image in the flicker paradigm could yield interesting results concerning the aspect of faces that are most resilient to changeblindness. In one such experiment, the main modification between the original faceimage and the transformed comparison stimuli was in the spatial relationships between features {Davies, 2002 #196}. Either the eyes or the mouth position was shifted a fixed and minute distance, either upwards or downwards. Thus, the change that had to be detected was of a configural nature as the particular features were no different across both sequentially-presented images. In addition, the paired images were presented in either their normal upright orientation or inverted. The results in this experiment were in strict correspondence with what we would expect from previous studies. Visual salience of a face was a privilege associated with an upright face configuration, such that changedetection performance for configural changes was consistently better for upright compared to inverted faces, on both measures of detection accuracy and latency. Hence, an upright face-configuration is automatically paid better attention then an inverted faceconfiguration, particularly for configural changes. This finding is reiterated in another study wherein it was found that an inversion-transformation of the upright configuration severely impaired change detection for changes in eye and mouth positions {Barton, 2003 #198}. The performance findings for the change-detection of featural modifications across inversion transformations are less clear. Davies and Hoffman (2002) found that the upright configuration sensitized observers to featural changes such as the localised inversion of features i.e. eyes and mouth, compared to inverted faces. However, it could be argued that localized inversion of the eye and mouth region are, in fact, configural rather than featural modifications. As Davies and Hoffman admits in the same paper, localized inversions of eyes and mouth can also disrupt configural information as demonstrated in Thompson’s {Thompson, 1980 #199} classic illusion. In his illustration, he introduces the use of these local inversions to confer disproportionate grotesqueness upon Magaret Thatcher’s face, that is immediately negated upon inverting the face. Hence, Davies and Hoffman (2002) recommended the use of featural changes such as the lightening of mouth colour or change in eye-colour instead, a suggestion that was promptly picked up in Barton et al’s (2003) paper. In their experiment, face-inversion had no significant effect in reducing change-detection performance for featural changes that modified only for eye-colour and the lightness-contrast of the mouth. To surmise, the high visual salience of an upright face is particular to its configuration and not its features. Inversion of the face-configuration removes this privileged status and renders changes at the configural level just as susceptible to inattentional blindness as those on the featural level. So far, the findings under discussion in this section have dealt the detection of changes within a single face-image, upright or inverted. The main finding has been that there is a strong sensitivity to changes in the configural information that defines a facepattern and that this is mitigated by the upright face-orientation. Using the same flicker paradigm, other investigators have also questioned whether an upright face-pattern is more salient than other non-face objects contained within the same visual scene. In a recent study, the flicker paradigm was conducted with a circular array of 6 different objects positioned equidistant from and evenly spaced around a fixation point. Each item was a member of a different class-category of namely: faces, food, clothes, musical instruments, appliances and plants. Half the trials involved a change in one object item between the two repeated serial visual presentations whilst the other half did not depict a change at all. Overall, a change in the face item resulted in more accurate changedetection as well as shorter detection latencies, compared to all other object items (Expt 1: {Ro, 2001 #73}). Yet again, any benefit for the change-detection of face-items was completely eradicated when the objects were presented as inverted images (Expt 2b: {Ro, 2001 #73}). Thus, the key to a face’s visual salience clearly lies in its upright configuration. Interestingly, when participants were asked to rate each object-category for change-detection difficulty, participants were not conscious of their performance and did rate faces an easier object-category to detect changes for. It can be derived from this that the participants in this study were not relying upon their perceptual expertise in facerecognition to detect the changes in visual array. Rather, their privileged sensitivity to the face-configuration precedes explicit awareness and is more related to the mechanism of face-detection. Visual neglect The difficulties posed to participants in solving change-detection experiments are often to inattention Prosopagnosia & Face-detection Given the disproportionate importance of configural processing to successful face recognition as compared to object recognition, it has also been suggested that prosopagnosia reflects a general loss of configural processes and a regression to the use of part-based processing in face recognition {Levine, 1989 #161}. In fact, the loss of a face-inversion effect – described in the preceding section – has been offered as a diagnostic marker for normal face-processing {Yin, 1970 #158}. This argument, however, is circular and as we shall soon discover, fails to fully appreciate the complicity of prosopagnosic deficits. Furthermore, there is proof that under certain conditions, certain prosopagnosic patients do possess a sensitivity to configural information and in fact, cannot adopt a feature-based processing strategy even if there is a task-benefit in doing so. Levine and Calvanio (1989) based their claim that prosopagnosia was a general impairment of configural processing when their patient L.H., a prosopagnosic patient, performed badly on a test battery of standardised configural processing tasks. Nonetheless, these findings were rendered inconclusive when subsequent testing of the same patient (LH), but only on basic perceptual tasks that also required configural processing e.g. Kanisza-type visual illusions, resulted in performance equivalent to normal controls {Etcoff, 1991 #185}. Unfortunately, both studies were concerned with LH’s general configural processing abilities and only tested him with line drawings, abstract figures and Kanizsa-type visual illusions, instead of with faces. All that can be concluded from these studies is that LH does not suffer from general perceptual deficits such as those experienced with apperceptive and integrative agnosia (e.g.{Riddoch, 1987 #186}. It would have been more relevant to have tested LH’s configural processing in a way that was relevant to face-images. The truth is, testing prosopagnosia patients on face-configuration tasks reveal a far more complex picture than one would imagine. If it was true that the emergence of prosopagnosia simply signalled the complete loss of configuration-based processing, at least with regards to faces, then we ought to expect prosopagnosic sufferers to process inverted faces no differently from upright faces. Specifically, prosopagnosia should result in the loss of the inversion-inferiority effect that is well-documented in normal faceprocessing {Valentine, 1988 #154}. Instead, certain prosopagnosic patients have been reported to reflect a reverse set of cognitive performance; that is, an inversion-superiority effect (for example: patient LH, {de Gelder, 2000 #35;Farah, 1995 #176;de Gelder, 2000 #36}; patient RP, {de Gelder, 2000 #34}). In these reports, prosopagnosic patients performed better with inverted faces than upright faces on sequential matching tasks of unfamiliar faces. Furthermore, prosopagnosic participants are also impaired on face feature-matching tasks when the feature is found within the context of a face. Again, this is a reverse of another well-known advantage i.e. face-context effect, related to the use of configuration-based processing with upright faces {Homa, 1976 #167}. Even when presented with the explicit instruction to only attend to face-features, prosopagnosic patients are detrimentally affected by the presence of the upright face-configuration. Clearly, prosopagnosic victims do not suffer from a complete loss of sensitivity to a faceconfiguration. In addition, rather than demonstrating a general regression to the exclusive use of feature-based processing, certain prosopagnosic patients are, in fact, unable to avoid using configuration-based processes even when it is task-inefficient to do so. Still, there is a class of prosopagnosics who do not demonstrate the inversionsuperiority effect. Known as developmental prosopagnosia (DP), this syndrome differs from the more common form of acquired prosopagnosia (AP) in that its sufferers have no medical history of sudden neural injury despite sharing a face-specific deficit. Thus, DPs’ face-specific impairment are believed to be of a purely developmental origin. Unlike APs, DPs do not possess a pre-trauma face-learning experience as they have never been able to recognise faces. Thus, a comparison study of a DP (patient A.V.) and a AP (patient R.P.) was carried out to find out whether the reverse pattern of inversion superiority in APs had anything to do with pre-trauma face-learning {de Gelder, 2000 #34}. First of all, it must be noted that the face-detection effect was noted with both patients, A.V. and R.P. as well as the 15 normal participants recruited as controls. Thus, both classes of patients are not completely insensitive to configurations; at least, not to that of a face-like pattern. This bolsters the supposition that face-detection is a process that is independent of learning and served by early perceptual processes. On simultaneous and delayed matching tasks of unfamiliar faces, it was found that R.P. replicated the previous performance of other APs (e.g. L.H. {de Gelder, 2000 #35;de Gelder, 2000 #36;Farah, 1995 #176}) by showing better performance when faces were inverted compared to when they were in their normal upright position. Normal controls on the other hand, replicated the classical findings of inversion-inferiority and did poorer with inverted faces instead (cf. {Yin, 1969 #157}. Patient A.V., however, displayed performance superiority for neither upright nor inverted faces-images, displaying a pure insensitivity to face-orientation on matching tasks. On a different task, participants were required to match facial features contained either within an upright or inverted face-context. Again, only patient R.P. and the normal controls showed an orientation-effect, but with opposite trends of performance. Whilst normal controls benefited from seeing the feature in an upright face-configuration, patient R.P showed the reverse pattern and performed better with seeing the feature contained within an inverted face-configuration. In accordance with the earlier results, patient A.V. did no better or worse with matching face-features that were enclosed by either upright or inverted face. When considered in concert, the opposing trends in performance displayed by patient R.P. and the normal controls can only be attributed to the face-learning experience that patient A.V. lacks. There are developmental theories that could explain this (e.g. see {Diamond, 1986 #188}. Briefly surmised, it claims that extensive experience with faces results in perceptual expertise that is primarily characterised by an over-reliance on configural information contained within a face; that the imagetransformation inversion specifically distorts. It is the ability to retrieve such abstract information that facilitates fine within-class object discrimination to take place. There is data to support this perspective. Configural processing as measured by the inversion effect is known to increase with age {Carey, 1994 #190;Carey, 1977 #189}. Therefore, patient A.V. who does not possess the extensive experience with face-processing in the manner that normal controls do, has to rely entirely on a feature-based strategy in the matching tasks and remains completely unaffected by face-orientation. While this explains the difference in performance between the normal controls and developmental prosopagnosic A.V., it does not offer a ready explanation for patient R.P.’s paradoxical inversion superiority effect. Interesting, both developmental and acquired prosopagnosia exhibits the facedetection effect which is central to this dissertation. This has allowed de Gelder and Rouw {de Gelder, 2001 #33;de Gelder, 2000 #34} to conceive a model to explain the effects of inversion-superiority noted in acquired prosopagnosia, but not in developmental prosopagnosia (see Fig 3). In this model, the perceiver is able to utilise a part-based or whole-based approach to analyse any visual image presented depending on whichever is the more effective in accurate object identification. This mirrors the holistic hypothesis presented by Farah and colleagues that argues for a whole-based processing bias for faces because faces are represented in memory as a whole Gestalt unit {Farah, 1998 #192;Farah, 1992 #191;Farah, 1995 #176}. However, they did not fully explain why faces and not other objects should be represented in memory as a holistic representation to begin with. The involvement of an innate face-detection process in the early stages of development of object representation offers a reasonable explanation. The precocious face-detection ability allows infants to perceive and encode faces as a Gestalt unit. In time, this lends to a shift in whole-based processing for faces that develops with facelearning experience (cf. {Flin, 1980 #211;Carey, 1980 #210}). Associative learning allows the innate face-detection module to act like a switch that automatically alerts the face-identification system to the presence of a face, involuntarily triggering the use of whole-based object processing. This provides an explanation of the inverse-inferiority effect in normal controls as well as why the inverse-inferiority effect increases with age {Flin, 1985 #212}. Acquired prosopagnosic patients do not suffer from a general loss in configural processing, though their injury prevents them from recognising or match faces on the basis of configural information. Despite this, their pre-trauma experience now prevents them from utilising anything other than whole-based encoding processes when presented with an upright face pattern. Developmental prosopagnosics, however, do not suffer from such a learnt disadvantage. Having never possessed the ability to encode faces as Gestalt units, their intact face-detection abilities remain to be a completely independent process from general object processing and do not bias object processing towards the use of configural or featural processes. Hence, developmental prosopagnosics remain completely unaffected by manipulations of face-orientation. Figure 1.4 is a brief illustration adapted from de Gelder and Upright Rouw’s (2001) first Face image publication of this theoretical model. Fa ce General Object Recognition Essentially, de Detection Face Gelder and Rouw (2001) claimed that Identificati Part-based Fig. 1.3. Dual-route account of face recognition the face-detection on Whole-based systemought comprising independent face detection system to be considered autonomous fromand face-identification processes. In addition, (& related each system (see Table 1). identification systems. (Adapted from Gelder & Rouw they came up with a list of attributes thatdeought to characterise (2001) Given this dissertation’s particular interest effects) with the spatial components of face-detection, we shall limit our discussion to the claim that the face-detection system is based on 1 2 3 Face detection Fast Based on exogenous attention 4 Based on coarse-grained representations/processes Requires limited stimulus exposure 5 Category specific/unique 6 Neuronal basis is distributed across a variety of brain areas that contain facesensitive cells Ontogenetically primitive 7 Face identification Slower Under the influence of endogenous attention and perceptual strategies Requires fine-grained representations Depends on extensive learning between ages 0 and 12 year Shares resources with object recognition system In FFA and overlapping with object recognition areas Ontogenetically complex as assembled from more primitive components Table 1. List of theoretical attributes associated with the autonomous systems of facedetection and face-identification exogenous attention. In the next section, we shall look at some relevant findings in the attentional literature and discuss how that might relate to face-detection, if it is true that the face-detection influences exogenous attention. Features and Conjunctions: Searching for a face The visual search paradigm was designed to understand how the human visual system selects an item in a cluttered scene for further processing. In a standard visual search task, participants have to locate a pre-specified target item amongst distractor items {Treisman, 1985 #162;Treisman, 1988 #122}. Efficiency of visual search is gauged by plotting measures of reaction times (RT) and accuracy as a function of the number of items in the display (set-size). Much replicated findings are that targets that can be differentiated from distractors by a single basic feature e.g. red from green, produce RT x set-size functions with slopes nearing zero. Such searches are termed efficient or parallel; that is, analyses of both the target and distractors for criterion-suitability can proceed in parallel such that the number of display items is not a limiting factor. In contrast, targets discriminable from their distractors only through a conjunction of features – e.g. red oblique amongst green obliques and red vertical lines – return steep linear search slopes, exceeding the 6 msec per item criterion {Treisman, 1985 #162}. Such searches are described as serial and self-terminating, whereby each item in the display has to be individually assessed for target-suitability. With parallel searches, the target item is said to ‘pop-out’ such that the time taken to detect it remains the same regardless of the number of items in the visual array {Treisman, 1980 #70;Treisman, 1985 #162}. In this light, the parallel search process resembles the face-detection effect. This naturally begs the question as to whether upright faces do pop out in a visual search task. Treisman and colleagues explain that parallel search occurs pre-attentively at the level of topological feature maps {Treisman, 1998 #65;Treisman, 1980 #70;Quinlan, 2003 #72}. If a target can be identified on the basis of a single feature, it’s presence can be detected almost automatically. The need for serial search arises only when there is a need to check each item for whether its spatial location ‘lights’ up across 2 or more featural maps. For this interpretation to be upheld, it is vitally important to define what constitutes a feature. One way of doing so is by inferring from neurophysiological evidence, special purpose neural systems that are primarily dedicated to processing a single visual attribute (see, e.g., {Zeki, 1976 #163}). By this criterion, the list of features could include orientation, colour, spatial frequency, and movement. By the same token, the configuration of a face-pattern might qualify as a feature, despite being a complex image that could also be described in terms of the preceding list of features. Primate studies indicate the presence of neurons in the temporal lobe that respond selectively to faces, with response rates greater by a factor of 2 to 10 than those obtained for other stimuli {Rolls, 1984 #166;Perrett, 1982 #164;Perrett, 1984 #165}; These cells are also selective for the spatial configuration of features making up a face, giving weaker responses to scrambled photographs of faces. Also, their firing responses are unaffected by unnatural featural transformations such as colour, thus suggesting that responses from these face-selective neurons could be separable from established features and that face-configuration could qualify as a feature in itself. Refering back to the facedetection effect, the face-detection effect shares the same temporal parameters as when face-specific neurons start to respond preferentially to images containing a normal faceconfiguration as opposed to those with a scrambled configuration {Rolls, 1994 #7}. There is also some support from the visual search literature for this proposition. When requested to detect non-oblique lines in an array comprising oblique distractors, nonoblique lines that formed a face-configuration were more readily detected than if they made up other complex configurations (see Fig. 1.4 for examples) {Gorea, 1990 #21}. Yet, participants were hardly a ) b ) aware of the presence of these configurations and if asked to classify the configurations, performed no d displays cFig. 1.4: Examples of search better for a face-like configuration compared to either symmetrical or asymetrical configurations. utilised by Gorea & Julesz (1991) that require ) ) detection of non-oblique lines that make up a : Therefore, we can infer from these findings that the a) face configuration, b) symmetrical non-face c) asymetrical non-face configuration of a face can work towards increasing the configuration, salience of its component configuration, d) face configuration amongst features, without the conscious knowledge of the perceiver. results could explain highlyThese similar distractors. other face-superiority effects wherein sequential matching of a face feature is better if the feature is first presented within the context of a normal upright face {van Santen, 1978 #168;Homa, 1976 #167}. Like this experiment here, the only critical stimuli that need matter is not the configuration but the configuration’s components. Nonetheless, having an upright face for a configuration, even a schematic one, greatly facilitates the processing of the critical components. Presumably, this is because the face-configuration is particularly salient and draws attention to itself and its components. Besides this study, there have been several studies that have applied the visualsearch paradigm directly to the search for faces. A logical application would be to investigate whether upright faces ‘pop-out’ in a crowded scene of non-face distractors, producing a flat RT x set-size function. In one of such experiments, the target stimuli was a simple line-drawn face amongst the distractors like itself except with the internal features jumbled {Nothdurft, 1993 #136}. The task in each trial was to determine whether the target was present or absent in a crowded array varying for set-size, up to 48 items. Contrary to expectations, RTs for face-detection varied as a linear function to setsize. The relationship between the RTs for target-present trials and array size was found to be on average, 113ms per item. This same finding was replicated with the use of inverted faces as distractor items. Similar experiments were conducted by Kuehn and Jolicoeur {Kuehn, 1994 #135}, using schematic faces derived from Photofit kits that included basic skin and hair tones. Their experiments also differed from Nothdurft’s by using much smaller arrays to a maximum set-size of twelve items only. Their predictions followed the same premise that search for an upright face amongst inverted distractor faces would be efficient and result in a shallow search slope of no more than 6msec per item. Conversely, search for an inverted target amongst upright face-distractors was expected to occur in a serial fashion, with steeper search slopes. Unfortunately, this prediction was not borne out. The search slope for an upright face was by no means flat and in fact, search times for target-present trials were not even consistently faster for upright faces than inverted faces. Furthermore, search slopes for target-absent trials were double that of target-present trials for upright faces, indicating that searches proceeded in a serial and self-terminating fashion. Arguably, Nothdurft’s (1993) and Kuehn and Jolicouer’s (1994) failure to find a ‘pop-out’ effect could have resulted from their use of abstract schematic faces as target stimuli. However, this is highly unlikely when we consider how the face-detection effect was proven even with the use of a smiley-face {Shelley-Tremblay, 1999 #121}; and also, the parsimonious configural make-up of a face-pattern in Gorea and Julesz’s (1990) featural search experiment. Nonetheless, the search for upright face ‘pop-out’ in the peripheral vision has since been attempted with the use of actual face-photographs instead of schematic faces {Brown, 1997 #132}. Instead of RT measures, face ‘pop-out’ was measured in terms of the probability that the first eye-saccade would be towards the upright face instead of non-face distractors i.e. inverted/scrambled faces, equidistant to the point of fixation. Measurements were obtained using an eye-tracker. Thus, participants were presented with a circular array of upright face target and distractor inverted faces on each trial, each equidistant from a fixation point and requested to make an eye-movement towards the upright face-target. Unfortunately, saccades towards the upright face were neither more probable than saccades towards a distractor item, nor even above probability of chance. Measures of saccade latency were no different in the event a saccade towards an upright face was made compared to when it was not. Even changing the nature of distractors to scrambled faces failed to make a difference. This inability to find a ‘pop-out’ effect for an upright face-configuration is disappointing and contradicts the proposition that the face-configuration can be processed pre-attentively and orients attention. Still, it is particularly puzzling why a feature contained within a face-configuration should be any easier to detect than if it was not {Gorea, 1990 #21}. Even more surprising is that their target features were contained within a visual depiction of a face configuration that was far more parsimonious and abstract than those utilised in subsequent studies of visual searches that employed the actual face as a target stimuli (cf. {Nothdurft, 1993 #136;Brown, 1997 #132;Kuehn, 1994 #135}). Careful examination of these studies raises the possibility that task demands might be key in explaining the failure to find a “pop-out” effect with upright faces. The main difference between Gorea and Julesz’s (1990) study and subsequent studies {Nothdurft, 1993 #136;Brown, 1997 #132;Kuehn, 1994 #135} lie in the nature of the pre-specified target. The search task that did find a facilitative effect for the presence of a face-configuration required participants to search for a feature that unbeknown to them, made up an upright face-configuration. Participants in this task were not consciously in search of an upright face unlikely those in the studies that actually failed to find a benefit for the upright face-configuration. In fact, asking participants of the same experimental paradigm to consciously identify the configurations yielded percentage accuracies as one might expect from the ‘failed’ findings of visual search studies; that is, participants were no better in identifying an upright face from an equally complex non-face stimuli. Whilst it might seem counter-intuitive to presume that searching for an upright face could actually block its detection privileges, it is not theoretically implausible. Firstly, the face-detection effect is repeatedly reported to occur without the subjects full knowledge of the detected item’s identity {Purcell, 1988 #23;Purcell, 1986 #27}. Furthermore, de Gelder and Rouw’s (2001) dual-route account of face-processing is one whereby the automatic stream of face-detection is subordinate to and can be override by the conscious processing stream of face-identification. The experiments in Chapter 2 have been designed to test if faces can attract spatial attention and result in pop-out when participants are not consciously in search of an upright face. Final Conclusions In this chapter, we have presented the experimental phenomenon of the facedetection effect {Purcell, 1986 #27;Purcell, 1988 #24;Shelley-Tremblay, 1999 #121}. The face-detection effect refers to how human observers are likelier to detect the presence of an upright face-configuration in a visual array than a non-face pattern of equivalent complexity, such as inverted/scrambled faces. This effect reveals itself when the image suffers from limited presentation i.e. backward visual masking, such that faceidentification is not reliably accurate. For this reason, face-detection is raised as an autonomous process from face-identification and one that has logical precedence to it. Taking into account the face-detection effect, we are better equipped to understand why it should be so that upright face-configurations are conferred visual privileges within the context of other experimental paradigms. By assuming that facedetection is an innate and perhaps sub-cortical mechanism {Valenza, 1996 #207}, we can understand why neonates prefer looking at face-like patterns despite their apparent lack of perceptual and social learning experience {Goren, 1975 #155;Kleiner, 1987 #156}. Also, by assuming that this early detection for faces connotates a sensitivity to changes in the face-image, we can better explain why the face-configuration repeatedly defies various experimental attempts to induce change-blindness to it (e.g. {Ro, 2001 #73;Mack, 1998 #194}. Furthermore, neuropsychologists have been able to explain anomalies in their prosopagnosic data by coming up with a more comprehensive model of the faceprocessing system that considers the role of face-detection in the learnt acquisition of face-identification expertise {de Gelder, 2001 #33}. This model is an improvement from traditional accounts (e.g. Bruce and Young (1986)) for two main reasons. Firstly, the model is a developmental account that not only attempts to explain adult face-identification expertise, but takes into account the qualitative differences between children and adult face-processing performance. That is, how adult faceprocessing relies more on the configural information present in a face than that of a child {Flin, 1980 #211;Carey, 1980 #210}. In addition, this recent model presents a unified and more detailed account of how a face-image is treated by early perceptual processes. Bruce and Young’s (1986) model did not consider the visual processing of faces any different or special to that of general objects prior to cognition or its identification. Nevertheless, the face-detection effect contests this assumption. In this more recent model by de Gelder and Rouw (2001), we are presented with a detailed exposition of what the process of “Structural encoding” might entail and how the reported facedetection effect might be involved. The biggest research contribution such a model offers is in its theoretical predictions. One of its predictions is that face-detection is based on exogenous visual attention. In other words, the upright face-image can act as a salient cue that automatically captures visual-attention. This prediction has had minimal support in the visual search literature. While it is true that a face-configuration can facilitate searches for target stimuli within itself {Gorea, 1990 #21}, it has also been repeatedly proven that an upright-face does not pop-out from a cluttered visual array, as one would expect {Nothdurft, 1993 #136;Kuehn, 1994 #135;Brown, 1997 #132}. Still, it must be noted that these visual search tasks for upright faces placed an implicit demand for face-identification on their participants. Hence, their participants were actively searching for upright faces and were likely to have utilised an endogenous strategy in the task. All the experiments presented in this dissertation have been specifically designed to avoid this confound. Chapter 2 will present a series of visual search experiments whereby the presence of an upright face is entirely inconsequential to the task. The premise is that if upright faces do capture visual spatial attention, it will severely impair performance on the primary visual search task. In addition, the elements comprising the visual search array has been chosen such as to remove any possibility that any strategies pertaining to the face-image might be endogenously applied. Chapter 3 describes a dot-probe experiment that assesses how an upright face image directly compares to a non-face image i.e. inverted face, in terms of cue validity. Again, the presence of the face-image is completely irrelevant to successful completion of the task . Nonetheless, the upright face-configuration is expected to be a better spatial cue than an inverted-face if automatic face-detection is capable of priming observers to the spatial location of the upright face. To surmise, this dissertation sets out to investigate the primary claim that facedetection results in the early spatial monitoring of an upright face-configuration. This consequence is supposed to be involuntary and automatic. Furthermore, it is subsumed into the broader processes of face-identification. We propose that this claim has implications on how visual searches might be conducted, in the absence of faceidentification search strategies. Hence, the presence of an upright face can have an impact on visual searches that have to rely entirely on exogenous information.