Title: Situational Judgment Tests and Adverse Impact in the UK: A Large Sample Study Type: Full Individual Paper Category: Personnel selection and assessment Authors: Chris Dewbury, Veronika Solloway, Peter Burnham Objectives This paper has two principal objectives. The first is to provide the first large scale (N= 68,821) analysis of the degree to which the use of situational judgements tests (SJTs) in the UK result in adverse impact. Adverse impact is analysed in relation to three types of social groupings: ethnicity (White versus Asian and White versus Black), gender, and age, and it is considered in relation to both statistical (chi square and associated effect sizes) and nonstatistical (four-fifths rule) criteria for adverse impact. The second objective is to explain how the application of differential item functioning (DIF) analysis can be used to identify SJT items which make an above average contribution to the adverse impact of SJTs, and how this process can be used to reduce adverse impact here. Background Situational judgement tests involve presenting job applicants with a series of job-related, “real-life” scenarios and requiring them to choose from a list of alternative actions or responses. Applicants are either asked to indicate the alternative they would take or the choice that they believe they should take. SJTs are based on the premise that responses to situations similar to those encountered on the job will provide good predictions of actual job behaviour (Lievens & Coetsier, 2002). Some scholars suggest that SJTs should be viewed as a method rather than as a construct (McDaniel & Nguyen, 2001) as they function much like an interview, and that like interviews they can be used to measure a variety of constructs including various types of knowledge, skills, ability (Chan & Schmitt, 1997) and personality traits (McDaniel et al., 2001). This paper is a response to the call for more research on the adverse impact of SJTs (Lievens, Peeters & Schollaert, 2008), and more generally to Hunter and Schmidt’s (1990) call for high quality estimates of important social phenomenon. Adverse impact to be an important social phenomenon because it can lead to differential career and educational opportunities for certain demographic groups. McDaniel and Nguyen (2001) discuss legal issues in North America surrounding SJTs, and urge organisations to obtain evidence about both the effectiveness and the fairness of this technique to ensure it is legally defensible. Clearly these issues are also of considerable importance in the UK also. The majority of existing research on SJTs has been carried out in North America. A recent meta-analysis carried out by Whetzel et at. (2008) identified the following effect sizes for the adverse impact of SJT scores in relation to ethnicity in North America: Black versus white d = .38, Hispanic versus white d = .24, and Asian versus white d = .29. With respect to gender, the same authors report the effect size of the adverse impact of SJTs against men to be a d of .11, and a study in Belgium by Lievens and Coetsier (2002) produced a similar figure in the same direction (d = .18). Turning to age, Smith and McDaniel (1998) reported that SJT performance is more strongly associated with age (and length of job experience) than it is with personality traits. Potential adverse impact against younger age groups might occur because older applicants have more experience with solving real world problems and hold greater levels of relevant tacit/declarative knowledge: advantages which would potentially enable them to perform better than their younger counterparts on SJTs (Jeff, Weekley & Jones, 1999). In summary, there is evidence that the use of SJTs in personnel selection can produce adverse impact against ethnic minority groups and men, and reason to suspect that they may also produce adverse impact against younger age groups. However, almost all of the data relevant to these issues has been collected in North America, and we have not been able to identify any studies of the degree to which the use of SJTs produce adverse impact in the UK. Therefore a primary objective of this study is to measure the amount of adverse impact produced by SJTs in relation to ethnicity, gender, and age when they were used in real personnel selection systems designed to select applicants to for eight roles in the UK. As a practical step, a second objective is to explain how adverse impact in SJT items can be identified and eliminated. Design SJT data were obtained from Sainsbury’s Supermarkets Limited. Sainsbury’s currently use SJTs to screen approximately 300,000 job applicants per year (i.e. about 1% of the entire UK working population). These SJTs are written rather than video recorded and require applicants to indicate what they should (rather than would) do across 30 job-related scenarios. Information was also obtained about the ethnicity, age, and gender of the job applicants, and about the cut-off scores used to discriminate between selected and nonselected applicants. The proportion of selected and non-selected applicants was compared for each of these three social groupings. Method Job applicants complete Sainsbury’s SJTs online, and their scores are automatically recorded in the organization’s human resources database along with various demographic and personal information. It was therefore possible to extract the SJT scores obtained by all online job applications over a given period (June 2008 to April 2009) together with relevant information about each applicant’s ethnicity (White, Asian, or Black), gender, and age (under 18, 18-24, 25-50, and over 50). This information was available for the eight job roles shown in Table 1. Table 1 also shows the sample size for each job type. The mean score obtained on a given SJT for people in each of the social groups was then computed, and the number of people in each social group selected and rejected as a result of the cut-off score used for each job was computed also. The selection ratios were then used to examine adverse impact in relation to statistical criteria (chi-square test and effect size) and a non-statistical criterion (the four-fifths rule). Results Given the space limitation only the results for ethnicity are given here: the presentation will include results for age and gender also. Table 1 shows the mean score obtained for white, Asian, and Black applicants for each of the eight job roles together with the effect size (i.e. d) of adverse impact against ethnic minorities. Planned comparisons were undertaken to compare (a) the white versus Asian means and (b) the white versus Black means, and the result of these are also shown in Table 1. Table 2 indicates whether or not the four-fifths rule was violated for Asians and Blacks for each job type. Table 1 Mean SJT Scores and Adverse Impact Effect Sizes for Eight Roles: Whites, Asians, and Blacks Job type and SJT Number Mean score d White Asian Black Asian Bakery 2,058 14.25 17.00* 16.24* .70 Clerical 1,723 12.23 15.87* 13.68* .38 Convenience 14.98 17.25* 16.40* .67 shop floor 10,115 Department 13.79 15.11* 15.18* .43 managers 747 Online 14.69 16.08* 15.41* .49 drivers 3,852 Operations 15.27 17.12* 16.17* .59 colleague 2756 Shop floor 45,082 14.95 15.83* 15.94* .29 Warehouse 16.78 19.29* 18.08* .86 assistant 2,488 p< .05 for planned comparison between white and ethnic minority score Note: a lower score indicates more correct responses to the SJT Black .55 .24 .49 .49 .26 .28 .33 .47 Table 2 SJT Adverse Impact for Eight Roles: Four-Fifths Rule Violated Job type and SJT Bakery Clerical Convenience shop floor Department managers Online drivers Operations colleague Shop floor Warehouse assistant Percentage passing SJT % White Asian Black 76 49 57 90 58 77 72 47 54 72 59 53 85 70 79 92 77 83 81 70 70 84 51 70 Chi square <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 Four-fifths of Four-fifths rule white selection violated ratio Asian Black 60 Yes Yes 72 Yes No 58 Yes Yes 58 No Yes 68 No No 74 No No 65 No No 67 Yes No Conclusions The mean effect size for the adverse impact of the SJTs across all eight jobs examined in this study was Asian d = .55 and Black d = .39. The equivalent figures for North American reported in the meta-analysis by Whetzel et at. (2008) are Asian d = .29, and Black d = .38. This suggests that in relation to ethnicity the effect size of the difference between the scores obtained by White and Black candidates in the UK is almost identical to that found in North America. In contrast, it would appear that the effect size in the difference of scores between Whites and people from an Asian background is considerably higher in the UK than in North America. However, it should be noted that although the number of job applicants examined in this UK analysis (i.e. 68,821) is even larger than that examined in Whetzel et al.,’s meta-analysis (42,178) the data analysed in the Whetzel et al. article are more representative in relation to the types of SJTs, the job types, the organizations, and the industrial sectors examined. That is, the Whetzel et al. analysis was based on data from 62 different studies involving different types of SJT in different organizations and different job domains, whereas this study was based on just eight job types all involving SJTs designed and implemented by just one organization in one industrial sector (i.e. food retailing). Despite this limitation, the present study covered a range of job types from the relatively simple (e.g. warehouse assistant) to the relatively complex (e.g. department manager) and the very large sample size made it possible to estimate population effect sizes with a high level of confidence. Furthermore, despite the differences in the effect size for the adverse impact of SJTs on Asians in the Whetzel et al., analysis and the analysis presented here, the effect sizes are broadly the same, being medium in size. This suggests that in comparison to other selection methods SJTs are around the average, with more unfair discrimination than integrity tests, work samples, and personality (conscientiousness) (Ones & Viswesvaran, 1998; Schmidt, Ones, & Hunter, 1992; Schmitt, Clause, & Pulakos, 1996), and less than cognitive ability (Hunter & Hunter, 1984). What steps can be taken to reduce the adverse impact of SJTs? The analysis of the data presented here demonstrated that differential item functioning (DIF) analysis offers a practical and potentially powerful way of identifying SJT items that are particularly problematic with respect to adverse impact. An analysis of such items can lead to a better theoretical understanding of adverse impact in SJTs is possible. For example, practitioners can conduct a content analysis in SJT items identified by DIF and remove culturally biased vocabulary. In Addition, cognitive interviews can be conducted with representatives from adversely effected groups; they can ‘talk through’ how they perceive/understand these SJT items and these items can be edited accordingly to reduce adverse impact. References Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 72-98. Hunter, J. E., & Schmidt, F. L. (1990). Methods of Meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage Jeff A. Weekley & C. Jones (1999). Further studies of situational tests. Personnel Psychology, 52,679-700. Lievens, F., & Coetsier, P. (2002). Situational tests in student selection: An examination of predictive validity, adverse impact, and construct validity. International Journal Of Selection And Assessment, 10(4), 245-257. Lievens, F., Peeters, H., & Schollaert, E. (2008). Situational judgment tests: A review of recent research. Personnel Review, 37(4), 426-441. McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion M. A. & Braveman E. P. (2001). Use of Situational Judgement Tests to Predict Job Performance: A Clarification of the Literature. Journal of Applied Psychology. 86 (4), 730-740 McDaniel, M. A. & Nguyen, N. T. (2001). Situational judgment tests: A review of practice and constructs assessed. International Journal Of Selection And Assessment, 9(1-2), 103-113. Ones, D. S., & Viswesvaran, C. (1998). Gender, age, and race differences on overt integrity tests: Results across four large-scale job applicant data sets. Journal Of Applied Psychology, 83(1), 35-42. Schmidt, F. L., Ones, D. S., & Hunter, J. E. (1992). Personnel-selection. Annual Review of Psychology, 43, 627-670. Schmitt, N., Clause, C., & Pulakos, E. D. (1996). Subgroup differences associated with different measures of some common job-relevant constructs. In C. Cooper & I. Roberson (Eds.), International review of industrial and organizational psychology (pp. 115-140). New York: Wiley. Whetzel, D. L., McDaniel, M. A., & Nguyen, N. T. (2008). Subgroup differences in situational judgment test performance: A meta-analysis. Human Performance, 21(3), 291-309. Adverse Impact and Situational Judgment Tests in the UK: The Problem and Steps Towards a Solution