The psychology of face construction: giving evolution a

The psychology of face construction: giving evolution a helping hand
Charlie D. Frowd (1*)
Melanie Pitchford (2)
Vicki Bruce (3)
Sam Jackson (1)
Gemma Hepton (1)
Maria Greenall (1)
Alex H. McIntyre (4)
Peter J.B. Hancock (4)
(1) School of Psychology
University of Central Lancashire, PR1 2HE
* Corresponding author: Charlie Frowd, Department of Psychology, University of
Central Lancashire, Preston PR1 2HE, UK. Email: Phone:
(01772) 893439.
(2) Department of Psychology
University of Lancaster, LA1 4YF
(3) School of Psychology
Newcastle University, NE1 7RU
(4) Department of Psychology
University of Stirling, FK9 4LA
Running head: Evolving human faces
Applied Cognitive Psychology
Face construction by selecting individual facial features rarely produces recognisable
images. We have been developing a system called EvoFIT that works by the repeated
selection and breeding of complete faces. Here, we explored two techniques. The first
blurred the external parts of the face, to help users focus on the important central
facial region. The second, manipulated an evolved face using psychologically-useful
‘holistic’ scales: age, masculinity, honesty, etc. Using face construction procedures
that mirrored policework, a large benefit emerged for the holistic scales; the benefit of
blurring accumulated over the construction process. Performance was best using both
techniques: EvoFITs were correctly named 24.5% on average compared to 4.2% for
faces constructed using a typical ‘feature’ system. It is now possible, therefore, to
evolve a fairly recognisable composite from a 2 day memory of a face, the norm for
real witnesses. A plausible model to account for the findings is introduced.
The research was gratefully supported by grants from the Engineering and Physical
Sciences Research Council (EP/C522893/1) and Crime Solutions at the University of
Central Lancashire, Preston, UK.
Keywords: facial composite, EvoFIT, PRO-fit, recall, recognition.
Witnesses to and victims of crime are confronted with a series of daunting tasks to
bring a criminal to justice. They normally are required to describe the crime and those
involved. If the police have a suspect, witnesses (and victims) may be asked to try to
identify the offender from an identity parade. If not, and in the absence of other
evidence such as DNA or CCTV footage, they may be asked to build a likeness of the
offender’s face. This is a visual representation known as a facial composite and is
traditionally produced by witnesses describing the offender’s face and then selecting
individual facial features from a kit of parts: hair, face shape, eyes, nose, mouth, etc.
There are several methods that the police use to construct composites in this
way. These include sketch artists, people skilled in portraiture who use pencils and
crayons to draw the face by hand, and computer software systems such as E-FIT and
PRO-fit in the UK, and FACES and Identikit 2000 in the US. In all these methods,
witnesses select from a collection of facial features to build the face.
The techniques have been the subject of considerable research. The general
conclusion is that they are capable of producing good likenesses of a face in certain
circumstances when a person works directly from a photograph (Davies et al., 2000;
Frowd et al., 2007b). When relying on a fairly recent memory of a face, composite
quality is worse and the ability to name such images is about 20% correct (Brace et
al., 2000; Bruce et al., 2002; Davies et al., 2000; Frowd et al., 2004, 2005b, 2007b).
Image quality is considerably worse, however, when the memory is several days old,
which is the normal situation for real witnesses; composite naming levels are typically
only a few percent correct (Frowd et al., 2005a, 2005c, 2007b, 2007c).
The reason for this poor performance has been known for over 30 years
(Davies, Shepherd & Ellis, 1978). Face recognition essentially emerges from the
parallel processing of individual facial features and their spatial relations on the face
(see Bruce & Young, 1986, for a review). In contrast, face production is traditionally
based more on the recall of information: the description and selection of individual
features. While we are excellent at recognising a familiar face, and quite good at
recognising an unfamiliar one, we are generally poor at describing individual features
and selecting facial parts (for a recent review, see Frowd et al., 2008a).
Several research laboratories have been working on alternative software
systems. These include EFIT-V (Gibson et al., 2003) and EvoFIT (Frowd et al., 2004)
in the UK, and ID in South Africa (Tredoux et al., 2006). The basic operation of
these ‘recognition-based’ systems is similar. They present users with a range of
complete faces to select. The selected faces are then ‘bred’ together, to combine
characteristics, and produce more faces for selection. When repeated a few times, the
systems converge on a specific identity and a composite is ‘evolved’ using a
procedure that is fairly easy to do: the selection of complete faces.
One problem with the evolutionary systems is the complexity of the search
space. They contain a set of face models, each capable of generating plausible but
different looking faces. The models, which are described in detail in Frowd et al.
(2004), capture two aspects of human faces: shape information, the outline of features
and head shape, and pixel intensity or texture, the greyscale colouring of the
individual features and overall skin tone. The number of faces that can be generated
from these models is huge, as is the search space. The goal then is to converge on an
appropriate region of space before a user is fatigued by being presented with too many
The current authors have been working on one of these ‘recognition’ based
systems called EvoFIT that presents users with screens of 18 such faces. Users select
from screens of face shape, facial textures, and then combinations thereof before the
selected faces are bred together using a Genetic Algorithm, to produce more faces for
selection. This process is normally repeated twice more to allow a composite to be
‘evolved’. This version of EvoFIT was compared with a standard ‘feature’ system,
PRO-fit, in the laboratory under simulated real life procedures (Frowd et al., 2007b).
People were recruited to act as ‘witnesses’ and shown an unfamiliar face, in this case
a UK footballer. Two days later, they described the face and used either EvoFIT or
PRO-fit. The resulting composites were given to football fans to name. Composites
from EvoFIT were correctly named at 11%, those from PRO-fit at 5%.
Since then, two main improvements to EvoFIT have been proposed and
evaluated in separate studies. The first was based on research that has found
fundamental differences in our perception of familiar and unfamiliar faces (Ellis et al.,
1979; Young et al., 1985). This research compared the internal facial features – the
region including eyes, brows, nose and mouth – with the external facial features –
hair, face shape, ears and neck. For a familiar face, the internal facial features were
more important than the external; when unfamiliar, the external part played a more
equal role. The consequence is less accurate face recognition for unfamiliar faces –
for a review, see Hancock et al. (2000). The implication of this research is that
witnesses will tend to focus too much on the external features during the construction
of an offender, which is a problem since the internal features are more important
when the composite is shown later to other people for recognition (Frowd et al.,
2007a). Therefore, different facial regions have different salience depending on
whether the task is to construct or to recognise a composite.
Our solution was to reduce the perceptual impact of external features, to allow
people building the face to make choices more on the internal region (Frowd et al.,
2008b). This should allow that part of the face to be more accurately constructed, and
produce an overall more recognisable image. A Gaussian or low-pass filter was
applied to the faces’ external features at 8 cycles per face width, a level that renders
recognition difficult when applied to the whole image (Thomas & Jordan, 2002).
While it is possible to mask the external features completely, this would remove the
context for selecting the inner face – the importance of which is established for
unfamiliar face recognition (e.g. Davies & Milne, 1982; Malpass, 1986; Memon &
Bruce, 1983, 1985; Tanaka & Farah, 1993) – and probably hinder recognition. In
practice, the blurring was applied at the start of face construction with EvoFIT,
following the initial selection of hair, and disabled at the end, prior to saving the
composite to disk. We note that distorting selective regions of a face like this has been
carried out previously (Campbell et al., 1999).
Using a design similar to Frowd et al. (2007b), blurring was evaluated:
participants looked at a photograph of a target face and 3 to 4 hours later constructed a
composite with EvoFIT with or without external features blurring. Frowd et al.
(2008b) found that composites evolved using blurred externals were correctly named
significantly more often than those that were not, with a small-to-medium effect size.
Sometimes a user may think their composite could be improved, for example
by changing its apparent age. The second development was to design a set of
‘holistic’ tools to alter a face, for example to make it look older or plumper, without
changing apparent identity. Rather than leaving this entirely to the users, however,
they were shown the effects of each facial transform in turn on their evolved face.
This addresses a theoretical and practical problem: a user might know that something
was inaccurate with the face, but could not verbalise what, making it hard to improve
the likeness (Frowd et al., 2005a). This issue is one of face recall, and particularly
affects ‘feature’ systems. In practice, users adjust each scale in turn searching for a
better likeness, with the best face being carried forward. As with EvoFIT as a whole,
this allows users to engage in face recognition rather than face recall to improve the
The holistic manipulations chosen were those thought to be most useful: age,
face weight, attractiveness, extroversion, health, honesty, masculinity and threatening.
The development of the scales is described fully in Frowd et al. (2006). To evaluate
their effectiveness, that work did not involve participant-witnesses constructing faces
from memory, but it did confirm that the scales appeared to be operating appropriately
and that a more identifiable face could be produced from their use.
The current work tested these two developments under more realistic
construction procedures. The performance of EvoFIT was compared with and without
external features blurring and holistic tools, and against the PRO-fit feature system.
The expectation was that both techniques would be effective on their own, but would
be best when used in conjunction with each other; also, that EvoFIT would be
superior to PRO-fit. If the developments were effective, they could be used with real
witnesses when evolving an EvoFIT and/or as part of other composite systems.
The experiment required two stages. In the first, participants constructed a single
composite of an unfamiliar target face using EvoFIT, with or without external features
blur, or with a standard ‘feature’ system. In the second, different participants
evaluated the quality of the composites by attempting to name them.
Stage 1: Composite Construction
Participants were randomly assigned to one of three conditions to construct a single
composite. In the first two conditions, they used EvoFIT either with or without
external features blurring; all used holistic tools at the end of evolving the face.
Participants in the third group constructed a face using the PRO-fit ‘feature’ system.
While it would have been preferable to implement a balanced experimental design,
with blurring and holistic tools used throughout, this was not possible due the absence
of such functionality in a feature system. For the EvoFITs constructed, the design was
within-subjects for holistic tools and between-subjects for blur. Composite system
was between-subjects (EvoFIT with blur / EvoFIT without blur / PRO-fit).
Frowd et al. (2005a) developed a ‘gold standard’ for evaluating facial
composite systems. This standard involved procedures that mirror construction by
eyewitnesses as far as possible in the laboratory; because of its forensic value, it was
adopted here. One of the fundamental design elements is that the faces used as targets
should be unfamiliar to those who construct the composites, but familiar to other
people who would later evaluate them. Here, photographs of international snooker
players were chosen as targets; this would enable people who did not follow the game
to construct the composites, and snooker fans to evaluate them.
A second important consideration is the time interval between a person seeing
a target face and constructing a composite. As mentioned above, when this delay is
short, up to a few hours in duration, all systems appear to perform fairly well (e.g.
Brace et al., 2000; Bruce et al., 2002; Frowd et al., 2005b). When the delay is much
longer, for example 2 days, performance is much worse, especially for the ‘feature’
systems. In the current design, 2 days was chosen to mirror police work.
Thirdly, participants should be interviewed using a Cognitive Interview (CI) to
elicit the most accurate recall of the target face. The CI is particularly important for
the traditional systems, since the description of the face produced is used to locate a
subset of facial features within the system itself, for presentation to the witness (or the
participant here). The CI is normally administered by the police interviewer, or in this
case the Experimenter, immediately before starting to construct the face. A good
review of the CI may be found in Wells et al. (2007). Here, to mirror police work, we
follow the ‘gold’ standard procedure specified in Frowd et al. (2005a).
Fourthly, composite systems should be used as specified by manufacturers.
For PRO-fit, this includes the opportunity for enhancement using an artwork package,
to add shading, lines, wrinkles, etc., and a software tool called PRO-warp to
manipulate feature shapes. These enhancement tools are used in police work and were
followed here. The same opportunities were made available to participants who used
EvoFIT, achieved by transferring the final EvoFIT image into PRO-fit for rework. To
facilitate this design issue, the software was controlled by a suitably experienced
experimenter (the second author on this paper). This person was trained ‘in house’
and had previously made in excess of 30 composites with participant-witnesses. We
note that her role was to guide participant-witnesses in the face construction process
and to control the software under their direction; she was unaware of the identity of
the composite being constructed.
On a final note, EvoFIT’s holistic tools have been expanded since Frowd et al.
(2006), to allow adjustment of greyscale colourings of individual features and thereby
potentially reducing the amount of artwork required. It is possible now to lighten and
darken the irises, eyebrows, mouth and laughter lines; also, to add a beard, moustache,
overall stubble and deep-set eyes. While the new scales are more ‘featural’ in nature,
for simplicity, we refer to the complete set under the umbrella term ‘holistic tools’.
Twelve front face photographs of international snooker players, ranked in the top 40
players in the world during the 2007-8 season, were located on the Internet. These
individuals were white male and photographed clean shaven (or with minor stubble)
and without jewellery or spectacles. Included were Mark Allen, Ken Doherty, Peter
Ebdon, John Higgins, Stephen Maguire, Alan McManus, Shaun Murphy, Ronnie
O’Sullivan, Joe Perry, Neil Robertson, Mark Selby and Mark Williams. Three sets of
12 photographs were printed at approximately 8 cm (wide) x 10 cm (high) on A4
paper in colour using a good quality printer. Each set was given a different a random
order and placed in separate envelopes for presentation to participant-witnesses.
In a G*Power analysis, this number of targets is sufficient in 2x2 repeatedmeasures by-item analyses (of the type planned to be carried out) to detect practicallyuseful, medium effect sizes, f = 0.28. This was based on the following parameter
settings: power, 1 – β = .8; α = .05; and correlation among measures, r = 0.7.
Verbal description sheets were used to record participants’ recall of targets.
These contained prompts for each feature – overall observations, hair, face shape,
brows, eyes, nose, mouth and ears – with space underneath for making written notes.
Software versions were 1.3 for EvoFIT and 3.4 for PRO-fit.
Participants were 31 female and 5 male staff and students at the University of Central
Lancashire. Their age ranged from 19 to 61 (M = 27.6, SD = 10.4) years. Recruitment
was via a global email requesting non-snooker fans to construct a composite for a £10
reward. It was specified that no one should have constructed a composite previously.
Participant-witnesses made two visits to the laboratory. In the first, the Experimenter
explained that they would be shown a picture of a snooker player to be used to
construct a composite two days later. Also, that she must not see the targets, to mirror
real life. Participants were randomly assigned to construct a composite using EvoFIT
with blur, EvoFIT without blur or PRO-fit, and were given an envelope containing the
relevant set of target photographs. With her back to the participant, each person was
instructed to remove a photograph and say if the face was familiar. All targets were
reported to be unfamiliar (if not, they would have been asked to select another). They
were then given 60 seconds to inspect the face. Participants wrote their initials on the
back of the photograph along with the date, for future identification, and placed it in a
second (‘used targets’) envelope (i.e. non-replacement sampling was used).
Each participant returned to the laboratory after 46 to 50 hours. An initial
briefing was given explaining that two stages would follow. Firstly, they would be
asked to describe the appearance of their target face. To do this, a Cognitive Interview
would be used, a set of techniques to help them recall as much accurate information as
possible about the face. Secondly, a composite would be constructed. For those
assigned to EvoFIT, they would repeatedly select from arrays of alternative faces and
a composite would be ‘evolved’; for PRO-fit, this would involve selecting facial
features to build-up a likeness of the face. Participants were encouraged to ask
questions throughout. When ready, the session moved on to the Cognitive Interview.
Participants were told that, in a few moments time, they would be asked to
think back to when the target had been seen and to visualise the face. They would
then describe the face in as much detail as possible. While doing this, the
Experimenter would not interrupt, but would take notes. This procedure was carried
out to recover a description of the face. Next, the Experimenter repeated the
description given for each facial feature and promoted for more information: e.g.,
“You mentioned that the brows were brown, can you recall anything further?” The
Experimenter explained that a composite would be constructed next.
For those assigned to PRO-fit, it was explained that the facial description
would be entered into PRO-fit, to enable appropriate features to be located. They
would then be able to select the best examples, and resize and reposition each on the
face to create the best likeness. Also, as the features were cut from photographs, an
exact likeness was unlikely, but an artwork paint package was available to add
shading, wrinkles, etc. Such changes were normally carried out towards the end of the
session, along with the PRO-warp tool to manipulate the shape of any feature.
The Experimenter entered the description into PRO-fit, to locate about 20
examples per facial feature, and presented an ‘initial’ composite, a face with features
to match the description. Using this face, the Experimenter demonstrated how features
were selected, resized and positioned on the face, along with the effect of changing
the feature brightness and contrast levels. It was explained that witnesses could work
on any facial feature at a time, though most people elected to start with the hair and
face shape. The Experimenter thus assisted participant-witnesses to construct a
composite using this procedure, including the use of the artwork package and PROwarp if required. When complete, the final image was saved to disk as the composite.
Participants assigned to construct an EvoFIT (in either of the blur conditions)
were told that the correct age database for their target would be selected and then an
appropriate hairstyle. Afterwards, they would be shown four screens of facial shape,
and select two per screen up to a maximum of six, and four screens of facial texture,
or colourings, and similarly select six. They would next select the best combination of
shape and texture, and all selected faces would be bred together to combine facial
characteristics. This process would be repeated once or twice more to allow a
composite to be evolved. Afterwards, the face would be enhanced using ‘holistic
tools’, to change the age, weight, and other aspects of the face to improve the overall
likeness. Finally, an artwork package would be made available to add shading and
wrinkles, and a further software tool called PRO-warp to change the shape of features.
A composite was evolved using this procedure. This was the same for all
participants, except for those assigned to the blurred condition. These participants
were informed that faces would seen with the hair, ears and neck blurred, to allow
them to focus on the central part of the face that is important for later recognition of
the composite; also, after evolving, the blurring would be removed prior to holistic
tool use. At the end of the second and third generation, all witnesses were given the
opportunity to resize and reposition features on their ‘best’ face using EvoFIT’s Shape
Tool. Composites were saved at three stages during construction: at the end of
evolving, after holistic tool use, and after any artwork and PRO-warp changes. For
consistency, artwork and warp changes were carried out in PRO-fit.
Composites took about 1 hour to construct using PRO-fit and 1.5 hours using
EvoFIT. Example constructions are presented in Fig. 1.
Fig. 1 about here
Stage 2: Composite Evaluation
Evaluation of the composites was planned in two parts, to coincide with two major
snooker championships in the UK. Evaluation involved asking attendees of these
tournaments to name a set of composites. A fairly arbitrary decision was made to
evaluate intermediate images first, then final composites. Thus, the first part
considered the effectiveness of blurring and holistic tools, and used intermediate (nonfinal) images created by the blur and non-blur groups before and after holistic tool
use. The second part used finished images, those resulting from artwork and feature
warp. This part is important as it assesses the effectiveness of finished images – facial
composites – as would be produced in police work. It will also consider the
effectiveness of artwork/warping, the effect of which is unknown for EvoFIT.
The first part contained 4 sets of 12 images: with and without blurring, and
before and after holistic tools. They were divided into four booklets of 12 composites,
with each containing one composite of each snooker player, with three examples
taken from each condition and rotated around booklets. As participants inspected
images from each condition, the design was within-subjects for blur and holistic tools.
The design for the second part was also within-subjects and the 36 final composites (3
sets of 12) were similarly rotated around three booklets.
In both tasks, it was important that participants were very familiar with the
snooker targets; if not, then poor composite naming would result. To control for this,
participants were asked to name the target photographs after naming the composites.
An a priori rule was applied such that data were only included from participants who
correctly named 10 or more targets. As a result, twice the number of people were
recruited than recorded in the Participants section below.
EvoFIT and PRO-fit images were printed at approximately 8cm (wide) by 10 cm
(high) on A4 paper in greyscale using a good quality printer. A set of target
photographs was also printed (the same as for the participant-witnesses).
Participants were volunteer snooker fans attending the Snooker World Championship
at the Crucible Theatre in Sheffield and the Snooker Grand Prix at the SECC in
Glasgow. The evaluation using intermediate images involved 4 females and 28 males,
from 17 to 67 (M = 38.8, SD = 14.2) years. For the finished composites, there were 1
female and 23 males, from 25 to 55 (M = 40.1, SD = 9.7) years.
Participants were tested individually and told that facial composites would be seen of
snooker players ranked in the top 40 during 2007-8 season. Their task was to name as
many as possible. For those in the first part of the evaluation, they were randomly
assigned to one of four booklets, with equal sampling, each of which contained 12
images from EvoFIT (see Design for details). Composites were presented sequentially
and participants provided a name where possible (but they were not coerced to do so).
Afterwards, the target photographs were similarly presented and participants
attempted to name those. The task was self paced and the order of presentation of
composites was randomised for each person. Participants in the second part followed
the same procedure, except that only three sets of booklets were used which contained
the final EvoFITs from the blur and non-blur conditions, and the PRO-fits.
Participant responses for composites and target photographs were scored correct or
incorrect with reference to the relevant identity. ‘Conditional’ naming scores were
used to analyse composite quality, as in Frowd et al. (2005a). These by-item scores
were calculated by dividing the number of times a composite was correctly named by
the number of times the relevant target photograph was correctly named.
Three main analyses are presented below. Firstly, the effect of blur and
holistic tools (intermediate EvoFIT images). Secondly, the effect of blurring at the
final construction stage as well as ‘finishing’ techniques, warp and artwork. This used
intermediate images after holistic tools and final EvoFIT composites. Note that while
a combined analysis would be preferable for all intermediate and finished EvoFITs,
the design at face construction was not fully crossed (to do so would have necessitated
artwork/warping applied directly to images after evolving). Thirdly, finished EvoFITs
constructed under the optimal blur condition and finished PRO-fits. Parametric
statistics were used throughout, an established method for analysing composite data
(e.g. Brace et al., 2006; Davies et al., 2000; Frowd et al., 2005b). By-items analyses
are presented as we are more interested in results that generalise across composites
than across observers (by-subjects analyses do show the same pattern of results).
Due to the a priori screening, naming of the target photographs was
appropriately very high, specifically at over 90% in all cells of the design for both
intermediate images and finished composites.
Item means for naming of the intermediate
images are presented to left and centre of Fig. 2. Correct naming was slightly higher
with blurring (M = 9.5%, SD = 14.3%) than without (M = 6.3%, SD = 8.4%); holistic
tools substantially increased naming both without blur (M = 11.9%, SD = 15.5%) and
with blur (M = 20.1%, SD = 21.5%). Naming was thus maximal when both techniques
were used. The conditional naming scores were analysed using a two way repeatedmeasures Analyses of Variance (ANOVA). This was significant for holistic tools,
Blur and holistic tools (intermediate EvoFITs).
F(1, 11) = 9.1, p = .012, ηp2 = 0.45, but not blur, F(1, 11) = 1.6, p = .24, ηp2 = 0.13, or
the interaction, F(1, 11) = 0.9, p = .37, ηp2 = 0.07.
An analysis was conducted on the incorrect naming data, to provide an
indication of guessing (bias). In this case, these data differed little by condition and
were not reliable, Fs < 2.2, p > .17.
Fig. 2 about here
As illustrated in Fig. 2 (right), finishing (artwork and
warp) made little difference overall to correct naming scores (Before: M = 16.0% , SD
=14.7%, After: M = 14.3%, SD = 21.1%). However, naming was much higher with
blurring (M = 22.3%, SD = 20.3%) than without (M = 8.0%, SD = 9.2%). The
ANOVA was significant for blur, F(1, 11) = 5.4, p = .041, ηp2 = 0.33, not significant
for finishing, F(1, 11) = 0.2, p = .68, ηp2 = 0.02, but marginally significant for the
interaction, F(1, 11) = 4.1, p = 0.068, ηp2 = 0.27. Simple-main effects revealed that
blurring was only effective for finished composites, p = .014; other contrasts were ns.
The above analyses suggest that blurring has a benefit on correct naming at the
end of the process. As Fig. 2 illustrates, however, the increase in naming with blur is
small after evolution (M = 3.3%, SD = 11.0%), larger after holistic tools (M = 8.2%,
SD = 23.3%) and largest after warp/artwork (M = 20.3%, SD = 24.1%). Simplecontrasts of an ANOVA of these difference data confirm the reliability of a linear
trend, F(1,11) = 8.6, p = .014, indicating the increasing value of this technique as the
construction process unfolds.
Incorrect naming scores of EvoFITs before finishing (M = 44.8%, SD =
12.7%) were somewhat higher than those after (M = 32.8%, SD = 17.3%); and
similarly, somewhat higher without blur (M = 44.3%, SD = 12.9%) than with (M =
33.3%, SD = 19.5%). The ANOVA was significant for finishing, F(1, 11) = 6.3, p =
.029, ηp2 = 0.26, approached significance for blur, F(1, 11) = 3.3, p = .099, ηp2 = 0.23,
and was not significant for the interaction, F(1, 11) = 0.61, p = .45, ηp2 = 0.05.
Blurring and finished EvoFITs.
This part compares conditional naming scores of PRO-fits (M =
4.2%, SD = 8.1%) with finished EvoFITs in the optimal condition, blur and holistic
tools (M = 24.5%, SD = 22.8%). These data clearly favour the EvoFITs, as confirmed
by a paired-samples t-test, t(11) = 2.6, p = .024, d = 0.75. The incorrect naming data
were not significant by system, t(11) = 0.58, p = .58.
Finished composites.
Table 1 about here
Referring to Table 1, 15 out of the 17 holistic scales were used at some
stage by participants using EvoFIT; only the moustache and beard scales were not,
reflecting the absence of these features in the targets. Overall, each scale changed the
face to some extent on 52.2% of occasions, features scales on 25.2%; there was little
change in scale use for composites constructed with and without blur (M = 40.7% vs.
37.5% respectively). Greatest use was for weight, masculinity and brow-colouring.
Scale use.
Composite construction by selecting individual features results in faces that are rarely
recognised. Recent alternatives are based on the more natural process of selecting and
breeding complete faces. For EvoFIT, this basic method did not reliably converge on
a specific identity (Frowd et al., 2004, 2005a, 2007b), but two recent developments
emerging from the psychology of face perception offer promise (Frowd et al., 2006,
2008b). While one development blurred the external parts of faces presented to users,
the other built a set of image manipulation scales to rework an evolved face.
Here, we explored the effectiveness of these developments using a design that
reflected real-life construction procedures in the laboratory. Participant-witnesses
looked at a photograph of an unknown snooker player and 2 days later constructed a
single composite using either EvoFIT, with or without blurring, and with holistic
tools, or PRO-fit. The resulting composites were named by snooker fans. The correct
naming data revealed a clear benefit for holistic tools and an increasingly important
one for blurring during the construction process; final artwork/warping changes also
reduced the number of incorrect names elicited from the EvoFITs.
In Frowd et al. (2008b), participants inspected a target face and evolved a
composite 3 to 4 hours later; images were correctly named better when a Gaussian
blur filter had been applied to the external features during construction than when it
had not. The current work used the same number of composites per condition, but the
delay was longer. In the evaluation using intermediate images, blurring increased
naming scores during evolution, but not significantly. A failure to replicate this effect
is likely to be an issue of experimental power; for the small effect size found here, the
power was not sufficient to detect a reliable benefit for blurring (1 – β = 0.21). While
not significant, the effect is of somewhat similar magnitude but in the same direction
as reported before. Holistic tools, though, produced a significant increase in correct
naming scores, and with a large effect size. The data thus provide evidence for the
benefit of holistic scales to face production; they lend some support to the previous
studies (Frowd et al., 2006, 2008b).
The second part of the evaluation found that artwork and feature warping
techniques did not reliably increase correct naming scores. They did, though, decrease
the number of incorrect names produced. These additional changes tended to involve
the warp tool (rather than the artwork package), but were quite subtle, including
shortening of the brows, enlarging the nostrils and increasing the down-turn of a nose;
see Fig. 1 for example hairline changes. The consequence is an overall more accurate
image, one that is not more recognisable as the target (due to the non-significant
difference by correct naming) but has an appearance less like other potential targets,
in this case, other snooker players. Incorrect names tend to waste police time –
investigating an innocent suspect – and a reduction thereof is of practical value.
The analysis also revealed that external features blurring had an increasingly
important role in correct identification of composites over the three stages of
construction: evolving, holistic tool use and finishing; in the last stage, this benefit
was statistically reliable, and there was also a trend in the reduction of incorrect
names (from that marginally significant result). This is a curious finding as blurring
was turned off after evolving the face: participant-witnesses assigned to the blur
condition saw intact (non-blur) faces thereafter. Their composites continued to
improve in quality relative to the non-blur group, but why should this be?
There appears to be two non mutually-exclusive explanations. The first is
based on process: having focused on the internal features during evolution, witnesses
continued to do so. This is simply a carry-over effect, of the type seen in repeatedmeasures designs, of which face construction is an example. The other is related to
memory: blurring helped to recover a mental image of the internal features, which
was then of value for the ensuing tasks. The data available to date would support the
former explanation, which is based on research involving participants building pairs
of composites (rather than just one). In the initial work for Frowd et al. (2008b), an
EvoFIT constructed after another EvoFIT was inferior if the initial face was made
without blur than with it. In more recent work, currently in preparation for journal
submission, PRO-fits constructed after EvoFITs were superior, but naming levels
from a person’s composites did not significantly correlate with each other, also
indicating a process effect. Further research is currently exploring this issue.
The next part of the evaluation involved finished composites: PRO-fits and
EvoFITs constructed with blur. Correct naming was found to be fairly low for the
former, at 4.2% correct, but substantially less than the EvoFITs, at 24.5%; the effect
was reliable and large in size. The data clearly indicate that the combination of blur
and holistic tools, with subsequent image enhancement, produces composites with
fairly good naming levels. The result is likely to be of interest to police practitioners
who are involved with composite construction. Here, the two developments together
produce composites with correct naming levels that are useful practically.
Theoretically, it supports the notion that EvoFIT, together with these techniques, form
an appropriate interface to human memory. It also supports the accumulating research
that indicates poor performance for feature systems when the delay to construction is
several days in duration (e.g. Frowd et al., 2005a, 2005c, 2007b, 2007c).
A potentially surprising result was the apparent decrease in correct naming
scores for EvoFITs made without blur after the holistic tools (M = 11.9%) and after
finishing (M = 4.2%). For these composites, artwork and feature warp enhancements
had been carried out on 11 of the 12 composites, but note that this difference was not
significant, and thus consistent with random effects. There was, however, a
marginally significant decrease in incorrect names produced and therefore the overall
representation of the face was more accurate.
A Gaussian blur of 8 cycles/face width was used, as this level, when applied to
the entire face, renders face recognition difficult (Thomas & Jordan, 2002). But, was
this setting optimal? Clearly, using a less intense level of blur is likely to reveal the
external features to a greater extent, and be potentially more distracting for a user; a
more intense blur is likely to lessen the context in which the internal features are
perceived. Past research has shown that removing the external features completely
reduces unfamiliar face recognition relative to a complete face (e.g. Ellis et al., 1979),
so this would argue for preserving some contextual information in the exterior face.
Recent work has found that very accurate hair helps users to produce superior quality
EvoFITs relative to hair that is only slightly inferior (Frowd et al., in press), which
argues for a less intense distortion. On-going research is exploring this issue, along
with its effectiveness for the traditional feature systems.
The holistic tools were designed originally to allow users to describe what
aspects of their evolved face needed changing, and to easily implement those changes
on the face. Since then, all holistic scales are presented in sequence and recall is no
longer required; now, users are asked to recognise when a more identifiable face is
seen. So, how much of the current utility relies on face recall and how much on face
recognition? There is perhaps some anecdotal evidence to start to answer this
question. While the majority of our EvoFIT participants did comment upon the need
to change the age and/or weight of their evolved face, none mentioned changes to
other holistic properties, but then frequently used them (refer to Table 1). It may
simply be that some transforms are fairly easy to verbalise, such as age and weight,
while others are more difficult. Note that the ‘feature’ scales do require recall, and
seven of those were fairly well-used, especially for iris and brow colour, and so recall
is still potentially of value here. Ongoing research is exploring this issue generally,
along with their general applicability for other composite systems.
On a practical note, it is perhaps worth commenting on the degree to which the
size of the potential pool of faces might influence the evaluation of the composites –
i.e. for the participants evaluating the composites. Recall that these participants were
told that the snooker players were in the top 40, but could this knowledge have
inflated naming responses? This is of course possible, but likely to be somewhat
limited, since the composites produced in the PRO-fit condition were only named at
5% correct. Thus, in spite of knowledge of the target set, correct naming was still very
poor; the levels are also very similar to other composite research that has used target
sets with a similar or larger pool of candidates (e.g. Frowd et al., 2005a, 2007b).
The benefit of blurring and holistic tools to the process of face construction
can be explained by an exemplar-based (absolute-coding) model of face recognition
originally proposed by Valentine (1991). In these models, faces are stored and
retrieved within a multi-dimensional space, which is similar to that constructed for an
EvoFIT face model. Somewhat average-looking faces tend to have small values along
each dimension and are thus stored close to the centre of the space (i.e. near the
origin); more distinctive faces have larger values and are located more distally.
There are good candidates for what these dimensions might be. Examples
include hair, face shape and age, all of which are important when processing an
unfamiliar face (e.g. Ellis, 1986). Similarly, there is evidence for the importance of
the eye and brow region (e.g. Shepherd, Davies & Ellis, 1981) and the spatial relation
between features (e.g. Davies & Christie, 1982; Leder & Bruce, 1998; Tanaka &
Farah, 1993). Thus, some dimensions appear relevant to the external features (hair,
face shape) while others to the internal features (age, spatial relationships, eye/brow
region). The current finding, involving blurring of the external features, can be
explained in an exemplar-based model by assuming that the dimensions responsible
for the external features are suppressed in the cognitive system of the composite
constructor. This would allow dimensions for the internal features to exert a greater
weight in the model’s (or user’s) response, thereby improving the effectiveness of
unfamiliar face selection and the ensuing tasks. For the holistic tools, manipulating a
face along multiple (internal and external features) dimensions is likely, some of the
time, to provide a probe that is closer to a target face, thus offering the potential to
improve the accuracy of the representation.
In summary, it is difficult to externalise an unfamiliar face seen several days
previously. The traditional method involves the selection of individual facial features
and does not work well. Our EvoFIT alternative asked users to repeatedly select
complete faces from an array, with breeding. Two recent developments are potentially
valuable to face construction: face selection with external features blurring, and face
manipulation using a set of psychologically-useful scales. In the current paper, both
developments were effective at improving the quality of EvoFIT composites. While
the latter successfully improved the correct naming levels of an evolved composite,
the former had an accumulating benefit over the whole procedure. It was argued that
blurring resulted in users continuing to focus on the internal facial features. When the
new developments were used together, the resulting images were named about 25%
correct, compared to about 5% for those constructed with a feature system. The work
reveals that it is now possible to construct a composite with a fairly good naming
level after a 2 day retention interval. The version of EvoFIT with blur and holistic
tools would appear to be valuable for detecting those who commit crime.
Brace, N., Pike, G.E., Allen, P., & Kemp, R. (2006). Identifying composites of
famous faces: investigating memory, language and system issues. Psychology, Crime
and Law, 12, 351-366.
Brace, N., Pike, G., & Kemp, R. (2000). Investigating E-FIT using famous faces. In
A. Czerederecka, T. Jaskiewicz-Obydzinska & J. Wojcikiewicz (Eds.). Forensic
Psychology and Law (pp. 272-276). Krakow: Institute of Forensic Research
Bruce, V. (1986). Influences of familiarity on the processing of faces. Perception, 15,
Bruce, V., Ness, H., Hancock, P.J.B, Newman, C., & Rarity, J. (2002). Four heads are
better than one. Combining face composites yields improvements in face likeness.
Journal of Applied Psychology, 87, 894-902.
Bruce, V., & Young, A.W. (1986). Understanding face recognition. British Journal of
Psychology, 77, 305-327.
Campbell, R., Coleman, M., Walker, J., Benson, P. J., Wallace, S., Michelotti, J., &
Baron-Cohen, S. (1999). When does the inner-face advantage in familiar face
recognition arise and why? Visual Cognition, 6, 197-216.
Davies, G.M., & Christie, D. (1982). Face recall: an examination of some factors
limiting composite production accuracy. Journal of Applied Psychology, 67, 103-109.
Davies, G.M., & Milne, A. (1982). Recognizing faces in and out of context. Current
Psychological Research, 2, 235-246.
Davies, G.M., Shepherd, J., & Ellis, H. (1978). Remembering faces: acknowledging
our limitations. Journal of Forensic Science, 18, 19-24.
Davies, G.M., van der Willik, P., & Morrison, L.J. (2000). Facial Composite
Production: A Comparison of Mechanical and Computer-Driven Systems. Journal of
Applied Psychology, 85, 119-124.
Ellis, H. D. (1986). Face recall: A psychological perspective. Human Learning, 5, 1-8.
Ellis, H.D., Shepherd, J., & Davies, G.M. (1979). Identification of familiar and
unfamiliar faces from internal and external features: some implications for theories of
face recognition. Perception, 8, 431-439.
Frowd, C.D., Bruce, V., McIntyre, A., & Hancock, P.J.B. (2007a). The relative
importance of external and internal features of facial composites. British Journal of
Psychology, 98, 61-77.
Frowd, C.D., Bruce, V., McIntyre, A., Ross, D., Fields, S., Plenderleith, Y., &
Hancock, P.J.B. (2006). Implementing holistic dimensions for a facial composite
system. Journal of Multimedia, 1, 42-51.
Frowd, C.D., Bruce, V., Ness, H., Bowie, L., Thomson-Bogner, C., Paterson, J.,
McIntyre, A., & Hancock, P.J.B. (2007b). Parallel approaches to composite
production. Ergonomics, 50, 562-585.
Frowd, C.D., Bruce, V., & Hancock, P.J.B. (2008a). Changing the face of criminal
identification. The Psychologist, 21, 670-672.
Frowd, C.D., Carson, D., Ness, H., McQuiston, D., Richardson, J., Baldwin, H., &
Hancock, P.J.B. (2005a). Contemporary Composite Techniques: the impact of a
forensically-relevant target delay. Legal & Criminological Psychology, 10, 63-81.
Frowd, C.D., Carson, D., Ness, H., Richardson, J., Morrison, L., McLanaghan, S., &
Hancock, P.J.B. (2005b). A forensically valid comparison of facial composite
systems. Psychology, Crime & Law, 11, 33-52.
Frowd, C.D., Hancock, P.J.B., & Carson, D. (2004). EvoFIT: A holistic, evolutionary
facial imaging technique for creating composites. ACM Transactions on Applied
Psychology (TAP), 1, 1-21.
Frowd, C.D., & Hepton, G. (in press). The benefit of hair for the construction of facial
composite images. British Journal of Forensic Practice.
Frowd, C.D., McQuiston-Surrett, D., Anandaciva, S., Ireland, C.E., & Hancock,
P.J.B. (2007c). An evaluation of US systems for facial composite production.
Ergonomics, 50, 1987–1998.
Frowd, C.D., McQuiston-Surrett, D., Kirkland, I., & Hancock, P.J.B. (2005c). The
process of facial composite production. In A. Czerederecka, T. JaskiewiczObydzinska, R. Roesch & J. Wojcikiewicz (Eds.). Forensic Psychology and Law (pp.
140-152). Krakow: Institute of Forensic Research Publishers.
Frowd, C.D., Park., J., McIntyre, A., Bruce, V., Pitchford, M., Fields, S., Kenirons,
M. & Hancock, P.J.B. (2008b). Effecting an improvement to the fitness function.
How to evolve a more identifiable face. In A. Stoica, T. Arslan, D. Howard, T.
Higuchi, and A. El-Rayis (Eds.) 2008 ECSIS Symposium on Bio-inspired, Learning,
and Intelligent Systems for Security (pp. 3-10). NJ: CPS. (Edinburgh).
Gibson., S.J., Solomon, C.J., & Pallares-Bejarano, A. (2003). Synthesis of
photographic quality facial composites using evolutionary algorithms. In R. Harvey
and J.A. Bangham (Eds.) Proceedings of the British Machine Vision Conference (pp.
Hancock, P.J.B., Bruce, V., & Burton, A.M. (2000). Recognition of unfamiliar faces.
Trends in Cognitive Sciences, 4-9, 330-337.
Leder, H., & Bruce, V. (1998). Local and relational aspects of face distinctiveness.
Quarterly Journal of Experimental Psychology, 51A, 449-473.
Memon, A., & Bruce, V. (1983). The effects of encoding strategy and context change
on face recognition. Human Learning, 2, 313-326.
Memon, A., & Bruce, V. (1985). Contexts effects in episodic studies of verbal and
facial memory: A review. Current Psychological Research & Reviews, Winter 198586, 349-369.
O'Donnell, C., & Bruce, V. (2001). Familiarisation with faces selectively enhances
sensitivity to changes made to the eyes. Perception, 30, 755-764.
Malpass, R.S. (1996). Enhancing eyewitness memory. In S.L. Sporer, R.S. Malpass &
G. Koehnken (Eds.). Psychological issues in eyewitness identification. (pp. 177-204).
Hillsdale, NJ: Lawrence Erlbaum.
Rhodes, G., Carey, S., Byatt, G., & Proffitt, F. (1998). Coding spatial variations in
faces and simple shapes: a test of two models. Vision Research, 38, 2307–2321.
Shepherd, J.W., Davies, G.M., & Ellis, H.D. (1981). Studies of cue saliency. In G.
Davies, E. Hadyn, J. Shepherd (Eds). Perceiving and Remembering Faces (pp. 105132). London: Academic Press.
Tanaka, J.W., & Farah, M.J. (1993). Parts and wholes in face recognition. Quarterly
Journal of Experimental Psychology: Human Experimental Psychology, 46A, 225245.
Thomas, S.M., & Jordan, T.R. (2002). Determining the influence of Gaussian blurring
on inversion effects with talking faces. Perception & Psychophysics, 64, 932-944.
Tredoux, C.G., Nunez, D.T., Oxtoby, O., & Prag, B. (2006). An evaluation of ID: an
eigenface based construction system. South African Computer Journal, 37, 1-9.
Valentine, T. (1991). A unified account of the effects of distinctiveness, inversion and
race in face recognition. Quarterly Journal of Experimental Psychology, 43A, 161204.
Wells, G., Memon, A., & Penrod, S.D. (2007). Eyewitness evidence: improving its
probative value. Psychological sciences in the public interest, 7, 45-75.
Young, A.W., Hay, D.C., McWeeny, K.H., Flude, B.M., & Ellis, A.W. (1985).
Matching familiar and unfamiliar faces on internal and external features. Perception,
14, 737-746.
Figure captions
Fig. 1. Composites constructed in the current study of the snooker player, Mark
Williams. EvoFITs are presented on the first two rows: the left image is after
evolution; centre is after holistic tool use; right is after final enhancements with the
artwork package and/or feature warp. The images on the top row were constructed
using EvoFIT with blur; those on the second row using EvoFIT without blur; the
PRO-fit is shown on the third row.
Fig. 2. Correct composite naming when evolving using EvoFIT with and without blur,
at each stage in the experiment. Levels are calculated as the number of correct names
given for the composites divided by the relevant correct names for the targets
(expressed in percent). Error bars are SE of the means.
Table 1. Holistic tool use for holistic scales, top row, and for the feature scales,
bottom row. Figures are overall percent scale use for all participant-witnesses who
constructed an EvoFIT.
List of figures and tables
Fig. 1. Composites constructed in the current study of the snooker player, Mark
Williams. EvoFITs are presented on the first two rows: the left image is after
evolution; centre is after holistic tool use; right is after final enhancements with the
artwork package and/or feature warp. The images on the top row were constructed
using EvoFIT with blur; those on the second row using EvoFIT without blur; the
PRO-fit is shown on the third row.
Correct naming (percent)
No blur
After evolving
After holistic tools
After artwork (finished
Construction stage
Fig. 2. Correct composite naming when evolving using EvoFIT with and without blur,
at each stage in the experiment. Levels are calculated as the number of correct names
given for the composites divided by the relevant correct names for the targets
(expressed in percent). Error bars are SE of the means.
Table. 1. Holistic tool use for holistic scales, top row, and for the feature scales,
bottom row. Figures are overall percent scale use for all participant-witnesses who
constructed an EvoFIT.
Deepset eyes
Laughter lines