Are Employers on Safe Grounds Using Validity Generalization (VG) in Making a Title VII Defense? 2006 SWARM Regional Conference Little Rock, Arkansas Biddle Consulting Group, Inc. 193 Blue Ravine, Ste 270 Folsom, CA 95630 1-800-999-0438 www.biddle.com www.biddle.com Copyright © 2006 Contact Information Dan A. Biddle, Ph.D. CEO, Biddle Consulting Group, Inc. 193 Blue Ravine, Ste 270 Folsom, CA 95630 1-800-999-0438 www.biddle.com Email: Dan@Biddle.com www.biddle.com Copyright © 2006 Overview of Biddle Consulting Group, Inc. (BCG) • Since 1974 • Over 200+ cases in the EEO/AA area (both plaintiff and defense cases) • Pioneers in the EEO/AA field • Administrative Skills Testing (OPAC) • 911 Dispatcher Testing (CritiCall) • AAP Software and Services • EEO Litigation Assistance (expert consulting and witness services) www.biddle.com Copyright © 2006 Agenda • Criterion-Related Validity • Validity Generalization (VG) • Title VII Requirements for Tests that Exhibit Adverse Impact • VG, Title VII, and the Courts • Recommendations • Q&A www.biddle.com Copyright © 2006 The Building Blocks for VG: Criterion-Related Validation Studies www.biddle.com Copyright © 2006 Criterion-Related Validity • Demonstrated by empirical data showing that the selection procedure is predictive of, or significantly correlated with, important elements of work behavior • Relies on “correlations” between tests and job criteria www.biddle.com Copyright © 2006 Criterion Validity Test Job Performanc e The strength of this relationship is reported as a “Validity Coefficient” www.biddle.com Copyright © 2006 Performance Measure Criterion-Related Study 70 60 50 40 30 20 10 0 0 20 40 60 80 100 Test Score Score on some “Criteria” (e.g., job performance, days missed work, etc.) www.biddle.com Score on a “Test” Copyright © 2006 Performance Measure Criterion-Related Study 70 60 50 40 30 20 10 0 0 20 40 60 80 100 Test Score Test Score = 22 Performance = 31 www.biddle.com Correlation Demo Test Score = 85 Performance = 55 Copyright © 2006 Interpreting Correlation Coefficients +1.00 +0.50 0.00 -0.50 The closer to +1.00 or -1.00 the stronger the relationship between the variables The stronger the relationship between two variables, the better the ability to predict one if given the other -1.00 www.biddle.com Copyright © 2006 Guidelines for Interpreting Validity Coefficients Validity Coefficient Interpretation >.35 very beneficial .21 - .35 likely to be useful .11 - .20 depends on circumstances unlikely to be useful < .11 Source: Testing and Assessment: An Employer's Guide to Good Practices (U.S. DOL, 1999). www.biddle.com Copyright © 2006 CRV and Statistical Power • Power = the ability of a statistical study to find “statistical significance” if it exists • Power is determined by: – Sample size (N) – Effect size (r) – “1 tail” or “2 tail tests” and – Statistical significance level (p) www.biddle.com Copyright © 2006 Statistical Power for Criterion-Related Validity Studies 100% Statistical Power 90% 80% 70% r = 0.20 r = .25 r = .30 60% 50% 40% 30% 20% 30 50 70 90 110 130 150 170 190 210 230 250 Sample Size www.biddle.com Copyright © 2006 Validity Generalization (VG): A Brief Overview www.biddle.com Copyright © 2006 VG = Meta Analysis Applied to Test Validation Research • VG applies meta-analysis techniques to combine the results of several validation studies to form general theories about relationships between variables across different situations • Schmidt & Hunter (1977) opened the gate to VG techniques in the personnel testing field www.biddle.com Copyright © 2006 VG Uses and Applications • VG is typically used to answer questions about how: – Specific Tests and/or – Constructs (traits or abilities) • Predict across: – Criteria – Occupations – Settings www.biddle.com Copyright © 2006 • Meta-analysis Example: Results for Cognitive Ability for Police Officer Occupation (Aamodt, 2004) Criterion K N r ρ Academy 61 14,437 0.41 0.62 Supervisor Ratings 61 16,231 0.16 0.27 Commendations 7 2,015 -0.01 -0.02 Activity 6 656 0.19 0.33 Absenteeism 5 1,402 -0.03 -0.05 Injuries 3 1,891 -0.06 -0.08 13 4,850 -0.06 -0.11 7 3,019 -0.12 -0.21 6 1,831 -0.03 0.06 Discipline Problems Discipline Problems: Fired or Suspended Discipline Problems: Complaints/Reprimands K = number of studies, N = sample size, r = mean correlation, ρ = mean correlation corrected for range restriction. www.biddle.com Copyright © 2006 Study Validity Sample Power p# Coefficient Size (1-tail) value Valid? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 www.biddle.com 0.030 0.135 0.180 0.290 0.340 0.180 0.150 0.110 0.090 0.126 0.210 0.390 0.198 0.164 0.109 0.094 0.020 0.114 0.164 0.070 0.010 0.010 120 130 140 150 120 130 140 150 120 130 140 150 120 130 140 150 120 130 140 150 120 130 87% 89% 91% 93% 87% 89% 91% 93% 87% 89% 91% 93% 87% 89% 91% 93% 87% 89% 91% 93% 87% 89% 0.37 0.06 0.02 0.00 0.00 0.02 0.04 0.09 0.16 0.08 0.01 0.00 0.02 0.03 0.10 0.13 0.41 0.10 0.03 0.20 0.46 0.46 No No Yes Yes Yes Yes Yes No No No Yes Yes Yes Yes No No No No Yes No No No • 90% power to detect r=.25 using sample of 134 • 12 studies (over half) showed no validity in local settings • 8 studies had low correlations (< .11) • VG output corrected for unreliability and: – Direct RR: .24 – Indirect RR: .48 Copyright © 2006 Factors That Can Influence Validity From “moving” Between Situations Factors Before/At Testing Situation Factors Occurring After Testing • • • • • • • • • • Sample Size Base Rate (% of applicants who “show up qualified”) Competitive Environment Other Selection Procedures Used Before/After the Test Test Content Test Administration Conditions (proctoring, time limits, etc.) Test Administration Modality (e.g., written vs. online) Test Use (ranked, banded, cutoffs used) Test Reliability (e.g., internal consistency) Test Bias (e.g., culturally-loaded content) • • • • • • • • • • • • • • • • www.biddle.com Job Content Comparability Job Performance Criteria Reliability of Job Performance Criteria Level of Supervision/Autonomy Level/Quality of Training Provided Org./Unit Demands & Constraints Job Satisfaction Management Styles and Role Clarity Reward Structures and Processes Organizational Citizenship, Morale, and Commitment of the Workforce Organizational Culture, Norms, Beliefs, Values, Expectations Surrounding Loyalty and Conformity Organizational Socialization Strategies for New Employees Formal and Information Communication (Style, Levels, and Networks) Centralization and Formalization of Decision-Making Organization Size Physical Environment Copyright © 2006 Title VII Requirements for Tests that Exhibit Adverse Impact www.biddle.com Copyright © 2006 TEST How Can Testing Practices be Challenged? Title VII Disparate Impact Discrimination Flowchart YES Is the PPT Valid? YES Alternative Employment Practice? NO Defendant Prevails www.biddle.com Adverse Impact? NO END NO Plaintiff Prevails YES Plaintiff Prevails Copyright © 2006 Test Validation & Adverse Impact Civil Rights Act of 1991 Amends Section 703 of the 1964 Civil Rights Act (Title VII) (k)(1)(A). An unlawful employment practice based on disparate impact is established under this title only if: • • A(i) a complaining party demonstrates that a respondent uses a particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin, and the respondent fails to demonstrate that the challenged practice is job-related for the position in question and consistent with business necessity; OR, A(ii) the complaining party makes the demonstration described in subparagraph (C) with respect to an alternate employment practice, and the respondent refuses to adopt such alternative employment practice. www.biddle.com Copyright © 2006 Uniform Guidelines Transportability (7B) Job Duties Performed By Incumbents In Original Validation Study Job Duties Performed By Incumbents In New Local Situation Validity Can be “Transported” www.biddle.com Copyright © 2006 EEOC v. Atlas Paper (1989, 6th Circuit) • “. . . the expert failed to visit and inspect the Atlas office and never studied the nature and content of the Atlas clerical and office jobs involved. The VG theory utilized by Atlas with respect to this expert testimony under these circumstances is not appropriate. Linkage or similarity of jobs in dispute in this case must be shown by such on site investigation to justify application of such a theory.” • The premise of the VG theory . . . is that intelligence tests are always valid. The first major problem with a VG approach is that it is radically at odds with Albemarle Paper v. Moody, Griggs v. Duke Power, relevant case law within this circuit, and the EEOC Guidelines, all of which require a showing that a test is actually predictive of performance at a specific job. The VG approach simply dispenses with that similarity or manifest relationship requirement . . . (emphasis added) (EEOC v. Atlas Paper, 868 F.2d. at 1499). www.biddle.com Copyright © 2006 VG, Title VII, and the Courts • When the courts evaluate criterion-related validity evidence, four basic elements are typically inspected: – Statistical significance – Practical significance – Type and relevance of the job criteria – Evidence to support the specific use of the test • VG has a difficult time answering these questions… www.biddle.com Copyright © 2006 Recommendations for Applying VG in Personnel Testing Research • Recommendation #1: Address the evaluation criteria provided by the Uniform Guidelines, Joint Standards, and SIOP Principles regarding the evaluation of the internal quality of the VG study. This will help insure that the VG study itself can be relied upon for drawing inferences. • Key Factors: – Publication Bias – Corrections Made and Underlying Assumptions/Justifications – Similarities of Tests and Criteria www.biddle.com Copyright © 2006 Recommendations for Applying VG in Personnel Testing Research • Recommendation #2. Address the criteria provided by the Uniform Guidelines, Joint Standards, and SIOP Principles regarding the similarity between the VG study and the local situation. – Helps to insure that the VG study can be relied upon and the research is relevant to the local situation (similarities between tests, jobs, job criteria, etc.). – The most critical factor evaluated by courts when considering VG evidence is the similarity between jobs (see also 7B of the Uniform Guidelines). – VG evidence is the strongest where there is clear evidence that the job duties between the target position and those in the positions in the VG study are highly similar as shown by a job analysis in both situations. www.biddle.com Copyright © 2006 Recommendations for Applying VG in Personnel Testing Research • Recommendation #3: Only use VG evidence to supplement other sources of validity evidence (e.g., content validity or local criterion-related validation studies) rather than being the sole source. – Supplementing a local criterion-related validity study with evidence from a VG study may be useful if an employer has evidence that statistical artifacts (not situational moderators) suppressed the actual validity of the test in the local situation (provided that the job comparability criteria of 7B UGESP has been met). www.biddle.com Copyright © 2006 Recommendations for Applying VG in Personnel Testing Research • Recommendation #4: Evaluate the test fairness evidence from the VG study using the methods outlined by the Uniform Guidelines, Joint Standards, and SIOP Principles. • Recommendation #5: Evaluate and consider using “alternate employment practices” that are “substantially equally valid” (as required by the 1991 Civil Rights Act Section 2000e2[k][1][A][ii] and Section 3B of the Uniform Guidelines). www.biddle.com Copyright © 2006 Thank you! www.biddle.com Copyright © 2006