SJTs reduce bias compared to tests of general

advertisement
Title: Situational Judgment Tests and Adverse Impact in the UK: A Large Sample Study
Type: Full Individual Paper
Category: Personnel selection and assessment
Authors: Chris Dewbury, Veronika Solloway, Peter Burnham
Objectives
This paper has two principal objectives. The first is to provide the first large scale (N=
68,821) analysis of the degree to which the use of situational judgements tests (SJTs) in the
UK result in adverse impact. Adverse impact is analysed in relation to three types of social
groupings: ethnicity (White versus Asian and White versus Black), gender, and age, and it is
considered in relation to both statistical (chi square and associated effect sizes) and nonstatistical (four-fifths rule) criteria for adverse impact. The second objective is to explain
how the application of differential item functioning (DIF) analysis can be used to identify SJT
items which make an above average contribution to the adverse impact of SJTs, and how this
process can be used to reduce adverse impact here.
Background
Situational judgement tests involve presenting job applicants with a series of job-related,
“real-life” scenarios and requiring them to choose from a list of alternative actions or
responses. Applicants are either asked to indicate the alternative they would take or the
choice that they believe they should take. SJTs are based on the premise that responses to
situations similar to those encountered on the job will provide good predictions of actual job
behaviour (Lievens & Coetsier, 2002). Some scholars suggest that SJTs should be viewed as a
method rather than as a construct (McDaniel & Nguyen, 2001) as they function much like an
interview, and that like interviews they can be used to measure a variety of constructs
including various types of knowledge, skills, ability (Chan & Schmitt, 1997) and personality
traits (McDaniel et al., 2001).
This paper is a response to the call for more research on the adverse impact of SJTs (Lievens,
Peeters & Schollaert, 2008), and more generally to Hunter and Schmidt’s (1990) call for high
quality estimates of important social phenomenon. Adverse impact to be an important social
phenomenon because it can lead to differential career and educational opportunities for
certain demographic groups. McDaniel and Nguyen (2001) discuss legal issues in North
America surrounding SJTs, and urge organisations to obtain evidence about both the
effectiveness and the fairness of this technique to ensure it is legally defensible. Clearly
these issues are also of considerable importance in the UK also.
The majority of existing research on SJTs has been carried out in North America. A recent
meta-analysis carried out by Whetzel et at. (2008) identified the following effect sizes for the
adverse impact of SJT scores in relation to ethnicity in North America: Black versus white d =
.38, Hispanic versus white d = .24, and Asian versus white d = .29. With respect to gender,
the same authors report the effect size of the adverse impact of SJTs against men to be a d
of .11, and a study in Belgium by Lievens and Coetsier (2002) produced a similar figure in the
same direction (d = .18).
Turning to age, Smith and McDaniel (1998) reported that SJT performance is more strongly
associated with age (and length of job experience) than it is with personality traits. Potential
adverse impact against younger age groups might occur because older applicants have more
experience with solving real world problems and hold greater levels of relevant
tacit/declarative knowledge: advantages which would potentially enable them to perform
better than their younger counterparts on SJTs (Jeff, Weekley & Jones, 1999).
In summary, there is evidence that the use of SJTs in personnel selection can produce
adverse impact against ethnic minority groups and men, and reason to suspect that they
may also produce adverse impact against younger age groups. However, almost all of the
data relevant to these issues has been collected in North America, and we have not been
able to identify any studies of the degree to which the use of SJTs produce adverse impact in
the UK. Therefore a primary objective of this study is to measure the amount of adverse
impact produced by SJTs in relation to ethnicity, gender, and age when they were used in
real personnel selection systems designed to select applicants to for eight roles in the UK.
As a practical step, a second objective is to explain how adverse impact in SJT items can be
identified and eliminated.
Design
SJT data were obtained from Sainsbury’s Supermarkets Limited. Sainsbury’s currently use
SJTs to screen approximately 300,000 job applicants per year (i.e. about 1% of the entire UK
working population). These SJTs are written rather than video recorded and require
applicants to indicate what they should (rather than would) do across 30 job-related
scenarios. Information was also obtained about the ethnicity, age, and gender of the job
applicants, and about the cut-off scores used to discriminate between selected and nonselected applicants. The proportion of selected and non-selected applicants was compared
for each of these three social groupings.
Method
Job applicants complete Sainsbury’s SJTs online, and their scores are automatically recorded
in the organization’s human resources database along with various demographic and
personal information. It was therefore possible to extract the SJT scores obtained by all
online job applications over a given period (June 2008 to April 2009) together with relevant
information about each applicant’s ethnicity (White, Asian, or Black), gender, and age (under
18, 18-24, 25-50, and over 50). This information was available for the eight job roles shown
in Table 1. Table 1 also shows the sample size for each job type. The mean score obtained
on a given SJT for people in each of the social groups was then computed, and the number of
people in each social group selected and rejected as a result of the cut-off score used for
each job was computed also. The selection ratios were then used to examine adverse
impact in relation to statistical criteria (chi-square test and effect size) and a non-statistical
criterion (the four-fifths rule).
Results
Given the space limitation only the results for ethnicity are given here: the presentation will
include results for age and gender also. Table 1 shows the mean score obtained for white,
Asian, and Black applicants for each of the eight job roles together with the effect size (i.e. d)
of adverse impact against ethnic minorities. Planned comparisons were undertaken to
compare (a) the white versus Asian means and (b) the white versus Black means, and the
result of these are also shown in Table 1. Table 2 indicates whether or not the four-fifths
rule was violated for Asians and Blacks for each job type.
Table 1
Mean SJT Scores and Adverse Impact Effect Sizes for Eight Roles:
Whites, Asians, and Blacks
Job type and
SJT
Number
Mean score
d
White
Asian
Black
Asian
Bakery
2,058
14.25
17.00*
16.24*
.70
Clerical
1,723
12.23
15.87*
13.68*
.38
Convenience
14.98
17.25*
16.40*
.67
shop floor
10,115
Department
13.79
15.11*
15.18*
.43
managers
747
Online
14.69
16.08*
15.41*
.49
drivers
3,852
Operations
15.27
17.12*
16.17*
.59
colleague
2756
Shop floor
45,082
14.95
15.83*
15.94*
.29
Warehouse
16.78
19.29*
18.08*
.86
assistant
2,488
p< .05 for planned comparison between white and ethnic minority score
Note: a lower score indicates more correct responses to the SJT
Black
.55
.24
.49
.49
.26
.28
.33
.47
Table 2
SJT Adverse Impact for Eight Roles: Four-Fifths Rule Violated
Job type and SJT
Bakery
Clerical
Convenience shop floor
Department managers
Online drivers
Operations colleague
Shop floor
Warehouse assistant
Percentage passing SJT
%
White
Asian
Black
76
49
57
90
58
77
72
47
54
72
59
53
85
70
79
92
77
83
81
70
70
84
51
70
Chi
square
<.001
<.001
<.001
<.001
<.001
<.001
<.001
<.001
Four-fifths of Four-fifths rule
white selection violated
ratio
Asian
Black
60
Yes
Yes
72
Yes
No
58
Yes
Yes
58
No
Yes
68
No
No
74
No
No
65
No
No
67
Yes
No
Conclusions
The mean effect size for the adverse impact of the SJTs across all eight jobs examined in this
study was Asian d = .55 and Black d = .39. The equivalent figures for North American
reported in the meta-analysis by Whetzel et at. (2008) are Asian d = .29, and Black d = .38.
This suggests that in relation to ethnicity the effect size of the difference between the scores
obtained by White and Black candidates in the UK is almost identical to that found in North
America. In contrast, it would appear that the effect size in the difference of scores between
Whites and people from an Asian background is considerably higher in the UK than in North
America.
However, it should be noted that although the number of job applicants examined in this UK
analysis (i.e. 68,821) is even larger than that examined in Whetzel et al.,’s meta-analysis
(42,178) the data analysed in the Whetzel et al. article are more representative in relation to
the types of SJTs, the job types, the organizations, and the industrial sectors examined. That
is, the Whetzel et al. analysis was based on data from 62 different studies involving different
types of SJT in different organizations and different job domains, whereas this study was
based on just eight job types all involving SJTs designed and implemented by just one
organization in one industrial sector (i.e. food retailing). Despite this limitation, the present
study covered a range of job types from the relatively simple (e.g. warehouse assistant) to
the relatively complex (e.g. department manager) and the very large sample size made it
possible to estimate population effect sizes with a high level of confidence. Furthermore,
despite the differences in the effect size for the adverse impact of SJTs on Asians in the
Whetzel et al., analysis and the analysis presented here, the effect sizes are broadly the
same, being medium in size. This suggests that in comparison to other selection methods
SJTs are around the average, with more unfair discrimination than integrity tests, work
samples, and personality (conscientiousness) (Ones & Viswesvaran, 1998; Schmidt, Ones, &
Hunter, 1992; Schmitt, Clause, & Pulakos, 1996), and less than cognitive ability (Hunter &
Hunter, 1984).
What steps can be taken to reduce the adverse impact of SJTs? The analysis of the data
presented here demonstrated that differential item functioning (DIF) analysis offers a
practical and potentially powerful way of identifying SJT items that are particularly
problematic with respect to adverse impact. An analysis of such items can lead to a better
theoretical understanding of adverse impact in SJTs is possible. For example, practitioners
can conduct a content analysis in SJT items identified by DIF and remove culturally biased
vocabulary. In Addition, cognitive interviews can be conducted with representatives from
adversely effected groups; they can ‘talk through’ how they perceive/understand these SJT
items and these items can be edited accordingly to reduce adverse impact.
References
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job
performance. Psychological Bulletin, 96, 72-98.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of Meta-analysis: Correcting error and bias in
research findings. Newbury Park, CA: Sage
Jeff A. Weekley & C. Jones (1999). Further studies of situational tests. Personnel Psychology,
52,679-700.
Lievens, F., & Coetsier, P. (2002). Situational tests in student selection: An examination of
predictive validity, adverse impact, and construct validity. International Journal Of Selection
And Assessment, 10(4), 245-257.
Lievens, F., Peeters, H., & Schollaert, E. (2008). Situational judgment tests: A review of recent
research. Personnel Review, 37(4), 426-441.
McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion M. A. & Braveman E. P. (2001).
Use of Situational Judgement Tests to Predict Job Performance: A Clarification of the
Literature. Journal of Applied Psychology. 86 (4), 730-740
McDaniel, M. A. & Nguyen, N. T. (2001). Situational judgment tests: A review of practice and
constructs assessed. International Journal Of Selection And Assessment, 9(1-2), 103-113.
Ones, D. S., & Viswesvaran, C. (1998). Gender, age, and race differences on overt integrity
tests: Results across four large-scale job applicant data sets. Journal Of Applied Psychology,
83(1), 35-42.
Schmidt, F. L., Ones, D. S., & Hunter, J. E. (1992). Personnel-selection. Annual Review of
Psychology, 43, 627-670.
Schmitt, N., Clause, C., & Pulakos, E. D. (1996). Subgroup differences associated with
different measures of some common job-relevant constructs. In C. Cooper & I. Roberson
(Eds.), International review of industrial and organizational psychology (pp. 115-140). New
York: Wiley.
Whetzel, D. L., McDaniel, M. A., & Nguyen, N. T. (2008). Subgroup differences in situational
judgment test performance: A meta-analysis. Human Performance, 21(3), 291-309.
Adverse Impact and Situational Judgment Tests in the UK: The Problem and Steps Towards a
Solution
Download