The evolution of face construction

Evolving facial composite systems
Charlie Frowd (1*)
Vicki Bruce (2)
Peter J.B. Hancock (3)
(1) School of Psychology
University of Central Lancashire, PR1 2HE
* Corresponding author: Charlie Frowd, Department of Psychology, University of
Central Lancashire, Preston PR1 2HE, UK. Email: Phone:
(01772) 893439.
(2) School of Psychology
Newcastle University, NE1 7RU
(3) Department of Psychology
University of Stirling, FK9 4LA
(3,421 words)
Journal: Forensic Update
There are various systems available to construct faces of people who commit crime.
The traditional method is for an eyewitness to select individual facial features from a
kit of parts. There is good evidence, however, that this method does not produce a
recognisable image when used the way that police typically do. While some progress
has been made to improve the effectiveness of these techniques, a new system is
required if composites are to be accessible to all. The new EvoFIT approach is
described here, which is based on the repeated selection and breeding of complete
faces. The system in its basic form did not work well, but a combined set of
developments have enabled good quality composites to be produced.
Eyewitnesses are often asked to describe the events of a crime and those involved. In
the absence of other identifying evidence, these observers may also be asked to
describe the criminal’s appearance and to construct a picture of the face. The picture
is known as a facial composite and is often seen in the newspapers and on TV crime
programmes to allow members of the public to name the face to the police.
There are many systems available for constructing composites. Until recently,
all of these worked in a similar way: by the selection of individual features – hair,
eyes, nose, mouth, etc. The earliest method involved a sketch artist drawing the face
by hand using pencils or crayons, but techniques were later developed to allow use by
those with less artistic skills. The first of these were Photofit and Identikit. More
recent systems are computerised and include E-FIT and PRO-fit in the UK, and
FACES and Identikit 2000 in the US. In essence, each contains a large database of
facial features and computer graphics technology to produce realistic-looking images.
These ‘feature’ systems have been the subject of considerable research. The
procedures required to properly evaluate them are complicated, and have been
developed into a ‘gold’ standard: Frowd et al. (2005b). In brief, people are first
recruited to act as ‘witnesses’ and are shown an unfamiliar target face. After a
specified delay, they work with an experienced interviewer to provide a detailed
description of the face and to construct the best face possible using one of the
composite systems. Later, the composites are shown to other people who are familiar
with the targets and attempt to name them.
Using this standard, when the delay is up to a few hours after seeing a target,
composites from the computerised feature systems – such as E-FIT and PRO-fit – are
correctly named about 20% of the time (Frowd et al., 2004, 2005b, 2007a, 2007b);
other research laboratories have found similar results (Brace et al., 2000; Davies et al.,
2000). However, when the delay is a day or two, the norm in police work, naming
levels fall to just a few percent correct (e.g. Frowd et al., 2005a, 2007b, submitted-b).
This body of research suggests that there is a low chance of detecting criminals from
these modern facial composites.
Part of the problem is that we are ineffective at the tasks required (Ellis,
1986): faces are perceived as whole entities (e.g. Tanaka & Sengco, 1997), and so we
struggle to describe and to select another person’s facial features. There is also a
problem concerning the visual focus of attention. It is known that the internal features
– the region comprising the eyes, brows, nose and mouth – is very important for
recognising a familiar face, but the external features – the hair, face shape and ears –
take on a larger role when the face is unfamiliar (e.g. Ellis et al., 1979). Thus, the
external features of composites tend to be constructed well, but composite recognition
(as carried out by members of the public) is poor due to the inferiority of the inner
face (Frowd et al., 2007a).
Improving the ‘feature’ systems
Good progress has been made to rectify this situation (e.g. Bruce et al., 2002; Frowd
et al., 2004, 2007b, 2007c, 2008a, submitted-b). In one strand of research, we
focussed on the interview. The police use a Cognitive Interview (CI) to obtain an
accurate description of a criminal’s face (see Wells, Memon & Penrod, 2007). This
part is important as it allows a police officer to locate a subset of features within the
composite system; without such, there would be too many examples. However,
recalling a person’s face shifts attention to individual features, at the expense of face
recognition, or processing the face as a whole. During face construction, therefore, a
witness’s recognition ability is reduced, as is the quality of their composite (Frowd et
al., under revision).
This problem can be overcome by asking a witness to make a number of
personality judgements about the face after describing it. This procedure has now
been incorporated into a ‘holistic’ CI (H-CI). The H-CI is very effective at improving
composite quality (Frowd et al., 2008a) and several police forces are using it.
In another strand of research, we explored the use of caricature to improve the
recognition of a finished composite. Facial caricature exaggerates the distinctive
shapes and position of features, making the face as a whole more individuated; they
can easily be produced by computer software. In tests, while a fixed level of distortion
was not effective, presenting a range of caricature states was: naming improved from
a few percent correct to about 25% (Frowd et al., 2007c). An animated GIF is the
most practical format to view the transform for TV crime programmes and wanted
persons’ websites; several police forces are using it. An example image is available at
A different system
The above research has improved the effectiveness of feature-based composites.
Nevertheless, about 70% of witnesses are denied the opportunity of constructing a
composite as they are unable to describe the criminal’s face in detail. The limitation
concerns the method of face construction, and so the system itself must change if
more witnesses are to be helped.
Ten years ago, we began to design a new software system called EvoFIT. It is
based on our fairly good ability to recognise unfamiliar faces. The system presents
screens of complete faces and a witness selects those that resemble the criminal’s. The
selected items are bred together, to combine characteristics, and another set is
produced for selection. Repeated a few times, the faces gravitate towards the
witness’s memory of the face and, ultimately, the face with the best likeness is saved
to disk. Hence, a composite is produced by the selection of complete faces rather than
by facial parts; EvoFIT is an example of ‘evolution by artificial selection’.
The basic EvoFIT system
At the heart of EvoFIT is a model that can generate realistic-looking faces (Frowd et
al., 2004). The focus was initially on adult white males, arguably the most useful in
the UK for detecting serious criminals. It is typically built from 70 complete faces
using a statistical technique called Principal Components Analysis (PCA), and
contains two types of information. The shape part describes the shapes of the features
and their inter-relations; the texture, the colour of the eyes, nose, mouth and overall
skin tone. PCA is often used for image compression applications, but here allows
novel (random) faces to be synthesised. The technique results in 70 coefficients
(numbers) that uniquely describe the shape and texture properties of each face.
PCA does a poor job of generating images of hair. So, along with the ears and
neck, the hair is treated as an independent component and is selected at the start. As
illustrated in Figure 1, a random face is produced by blending a random texture into
the selected external features and then by distorting that face by a random shape.
Figure 1 about here
EvoFIT initially presented screens of such random faces, as illustrated in Figure 2.
Often, however, a face was generated with a good match to a target for shape but not
for texture, and vice versa, this making selection difficult for a user. We now present
this information separately: four screens of facial shape are presented first, followed
by four screens of facial texture; users select the best two examples per screen.
Finally, the face with the closest overall match is selected, referred to as the ‘best’.
Figure 2 about here
The selected items are then bred together: pairs of faces are chosen and their shape
and texture coefficients mixed together randomly to produce an ‘offspring’. Each face
is given same chance of being a ‘parent’, except the ‘best’, which, due to its
preferential likeness, enjoys twice the number of breeding opportunities. In addition,
to ensure the best face is not ‘lost’ through the breeding process, it is carried forward
without change into the next generation. Finally, to help maintain variability within
the population of faces, 5% of the coefficients are ‘mutated’, by changing them to
random values. Thus, a new set of faces is produced that contain a mixture of
characteristics based on witness selections, plus some mutation.
Witnesses normally require three complete breeding cycles to evolve a face.
The system is supplemented by a software utility called the Shape Tool to resize and
reposition facial features on demand. An example evolved from the basic system is
presented in Figure 3.
Figure 3 about here
Early evaluations
This version of EvoFIT was evaluated (Frowd et al., 2005b) using the gold standard
procedure. A person looked at an unfamiliar face and then described and constructed
it after three to four hours. Performance was disappointingly poor, with EvoFIT
composites correctly named 2% of the time, compared to about 20% from the
computerised feature systems. In a follow-up study with a two day delay, Frowd et al.
(2005a), EvoFITs were named only slightly better, at 4%.
Improving convergence
The problem was that EvoFIT only produced an identifiable face occasionally; the
system converged on a specific identity, but this was generally not close enough to
promote recognition, in spite of all our efforts. A breakthrough came when we
improved the selection of the ‘best’ face: users chose the closest match from all
possible combinations of their selected shape and texture. Given the large impact the
best face has on the breeding process, system convergence improved.
In a further evaluation, Frowd et al. (2007b), UK international football players
were used as targets and 48 non-football fans were recruited as witnesses. Using the
‘gold’ standard procedure and a 2 day delay, composites from EvoFIT were correctly
named at 11%, those from PRO-fit, 4% (see Figure 4). In spite of fairly low naming
levels from EvoFIT, the study demonstrated the potential of the technique.
Figure 4 about here
Blurring the external features
EvoFIT faces tend look very similar to each other, leading to difficulty with face
selection. This is illustrated in Figure 2 and is caused by the same external features
appearing throughout. The problem relates to biases in our face perception system:
since the faces are unfamiliar, observers will tend to focus more on the external parts
to the detriment of the important central region.
Our solution was to apply a Gaussian (blur) filter to the external features. The
level of blur chosen was 4 cycles per face width, a setting that renders face
recognition difficult if applied to the entire face. As can be seen in Figure 5, the
selective distortion allows the internal features to appear more salient. In use, blurring
is enabled after the external features have been selected, but disabled at the end of
Figure 5 about here
In a small study, blurring improved composite naming by about 5% (Frowd et al.,
2008b); it also substantially reduced the number of incorrect names that people gave,
by 20%. Thus, while the distortion promoted a composite that looked a little more like
the intended person, it looked considerably less like anyone else.
Holistic tools
A second problem was that faces would sometimes be evolved with a noticeably
incorrect age. This issue was rectified in two ways. Firstly, by building five models
each of a different age range, to allow this aspect to be approximately correct from the
start. This endeavour was generally successful, but some age inaccuracies remained.
Secondly, a software tool was designed to allow manipulation of the perceived age
and other ‘holistic’ properties. These are characteristics of the whole face, rather than
of a particular feature: masculinity, health, threatening, attractiveness, extroversion
and face weight. They were built by asking volunteers to make holistic judgements on
a 200 item face set. The faces with the highest and lowest average ratings were
extracted and an average low and high for each was computed for each scale; these
provided mathematical vectors in which to manipulate a given face. See Figure 6 for
an example. A series of tests indicated that the scales were operating appropriately
(Frowd et al., 2006).
Figure 6 about here
Combining techniques
Both blur and Holistic Tools independently allowed users to evolve a more
identifiable face, but would their effects be additive? This was explored in a recent
study (Frowd et al., submitted-a) using the gold standard, snooker players as targets
and a 2 day delay. Naming was consistently better when the techniques were used on
their own, but much better when used together for the same witness: 25% correct
compared with 5% from a modern ‘feature’ system.
The interview
Only age, gender and race of the criminal are required for EvoFIT, to load the
appropriate face model, but developments to date have involved a Cognitive Interview
(CI) since this is the normal police procedure. As mentioned above, a CI is best when
followed by character attribution for the feature systems, but what about for EvoFIT?
EvoFIT was designed to be based more on recognition rather than recall, and so the
H-CI should be ideal. However, in a formal test, EvoFITs constructed after an H-CI
were named worse than those after a CI. This appeared to be due to the H-CI
encouraging users to focus on the overall (holistic) aspects of the face to the detriment
of individual features. We next compared a CI with one that did not involve face
recall (NI); the latter interview promoted 12% better naming. Frowd et al. (submitteda) argued that the H-CI promoted a strong holistic bias, the CI a strong featural bias,
but a more balanced processing style (NI) was best.
Police use
The focus was to make one face model work effectively, for the white males. More
models have now been added to EvoFIT, including white female and black male. The
system is being used in several forces including Lancashire and Derbyshire. An
EvoFIT constructed in Lancashire is shown in Figure 7. This image led directly to the
arrest of the person shown; with other evidence, he was convicted of indecent assault.
Figure 7 about here
The feedback from the police has been positive, with EvoFITs reported valuable
between 20 and 30% of the time, a level similar to that found in the laboratory when
tested using a CI.
The future
Considerable work has been necessary to produce an effective evolving system. In the
latest evaluation described above, the male composites produced were named at 32%
correct using a 24 hour delay, blur, Holistic Tools and the non-face-recall interview
(Frowd et al., submitted-a). That study involved static composites, but the evidence
suggests that naming should increase by a further 15% using animated caricatures.
Current research is attempting to improve the position of the features on the
face before the facial shapes and textures are selected. This type of holistic
information (positional) is important for face recognition (Tanaka & Sengco, 1997)
and, getting it right at the start should improve the context in which the shape and
texture information is selected. This is being explored using a whole face image blur.
List of figures
Figure 1. Stages in the synthesis of an EvoFIT face: (a) selected external features,
shown with average internal shape and texture (b) random facial texture; (c) blend of
(a) and (b), still with average shape; (d) random facial shape; and (e) application of
shape (d) to (c) to give final face.
Figure 2. Screenshot of EvoFIT faces, which may at first sight seem more similar than
they are, because they all have the same external features (see section on Blurring the
external features).
Figure 3. Sequence of ‘best’ faces produced during the construction of pop singer
Robbie Williams. The composite emerged during the third generation (far right).
Figure 4. Example composites constructed in the study of the footballer, Wayne
Rooney: EvoFIT (left) and PRO-fit (right).
Figure 5. Example of randomly-generated faces (top row), and after external features
blurring (bottom row).
Figure 6. An EvoFIT of footballer, David Beckham, left, and after increasing the
perceived health and attractiveness, right.
Figure 7. An EvoFIT constructed in Lancashire constabulary and a photograph of the
person convicted in the case.