Test Validation 101_7-6-2012

advertisement

Test Validation 101

2012 NILG Conference

August 29: 2:45 p.m. - 3:45 p.m.

Presenters: Dan Biddle, Ph.D., and Heather Patchell, M.A.

Overview of Biddle Consulting Group, Inc.

Affirmative Action Plan

(AAP) Consulting and

Fulfillment

HR Assessments

• Thousands of AAPs developed each year

• Audit and compliance assistance

• AutoAAP™ Enterprise software

• AutoGOJA™ online job analysis system

• TVAP™ test validation & analysis program

• CritiCall™ pre-employment testing for 911 operators

• OPAC™ pre-employment testing for admin professionals

• Video Situational Assessments (General and Nursing)

EEO Litigation Consulting

/Expert Witness Services

• Over 200+ cases in EEO/AA (both plaintiff and defense)

• Focus on disparate impact/validation cases

Compensation Analysis

Publications/Books

BCG Institute for Workforce

Development

Nation-Wide Speaking and

Training

• Proactive and litigation/enforcement pay equity studies

• COMPare™ compensation analysis software

• EEO Insight™: Leading EEO Compliance Journal

• Adverse Impact (3 rd ed.) / Compensation (1 st ed.)

• 4,000+ members

• Free webinars, EEO resources/tools

• Regular speakers on the national speaking circuit

Biddle Consulting Group Institute for

Workforce Development (BCGi)

BCGi Standard Membership (free)

– Online community

– Monthly webinars on EEO compliance topics

EEO Insight Journal (e-copy)

BCGi Platinum Membership

– Fully interactive online community

– Includes validation/compensation analysis books

– EEO Tools including validation surveys and AI calculator

EEO Insight Journal (e-copy and hardcopy)

– Members only webinars, training and much more…

www.BCGinstitute.org

Your Presenters Today…

• Dan Biddle, Ph.D., CEO

– Over 20 years experience in EEO/AA & Testing

– Experience in over 100 cases

– Author of Test Validation & Adverse Impact (3 rd ed.)

– dan@biddle.com

• Heather Patchell, M.A.

– EEO/AA Consultant

– Executive Director of BCGi

– Masters I/O Psychology

– hpatchell@biddle.com

Presentation Overview

• Our goal:

– Review “high level” validation criteria for four common assessment devices

– Provide basic and practical steps for validating each

– Equip you with take-home tools for validation

Provide convincing evidence that validation produces both qualified applicants and defensible PPTs

• The assessment devices we’ll be covering include:

– Basic Qualification (BQ) screens

– Physical Ability Tests

– Interviews

– Written Tests

• Review the “Test Validation Checklist” for validating each type of device

Adverse Impact: The Trigger for the

Validation Requirement

A Brief Review

Before looking at validation…when is validation required?

• Whenever your “PPT” exhibits adverse impact

• Single Event: Adj-FET / Chi-Square p < .05

• Multiple Event: Mantel-Haenszel / MEEP p < .05

• Particular PPT

• Overall Selection Process

Pass Fail Totals

Women 40 60 100

Men 60 40 100

Passing Odds of Women:

Passing Odds of Men

67%

150%

Odds Ratio 2.25

P = .006

SD = 2.747

Adverse Impact in Context

How selection processes are challenged . . .

“or”

Plaintiff

Burden

Practice,

Procedure,

Test (PPT)

Diff. in Rates?

YES

Defense

Burden

Is the PPT

Valid?

NO

END

YES

Plaintiff

Burden

Alternative

Employment

Practice?

NO

Defendant Prevails

YES

Plaintiff Prevails

NO

Plaintiff

Prevails

**OFCCP Insight(s)**

1. The OFCCP (typically) uses overall adverse impact as a

“red-flag” to identify where/when to investigate further.

2. If there is overall adverse impact, the OFCCP will investigate the PPTs in the selection process.

3. It is absolutely imperative that the employer have the data and the ability to analyze the individual steps in the overall process.

4. If the necessary data is not available to perform step analyses, the OFCCP can make an “adverse inference” . . .

(i.e., they can infer impact because the employer did not collect the data they are required to collect).

A Brief Overview of Validation

Before Discussing Particular Type of PPTs,

Let’s Review Validation in General

• What is validity?

– Legally… “job related for the position in question and consistent with business necessity”

– Practically… in jury trials, the test must somehow rationally connect with the job

– With the OFCCP and other FEAs, it must comply with UGESP (see www.uniformguidelines.com

)

– We’ll focus on just two validation methods: o o

Content validation

Criterion-related validation

Guides Related To Validation Techniques

Principles

(SIOP)

Uniform

Guidelines

Joint

Standards

Key!!

Court

Precedence

Validity

Content Validation Process

Other KSAOs

Job

Duties

Operationally defined KSAOs

Selection

Devices

(e.g., application form, tests, interviews)

Content Valid!

Criterion-related Validity

Job

Requirements

Job

Performance

Test Score

Criterion-related Validity

Criterion-Related Study

70

60

50

40

30

20

10

0

0 20 40

Test Score

60

Score on some “Criteria” (e.g., job performance, days missed work, etc.)

80

Score on a

“Test”

100

Basic Qualification Screens &

Validation Requirements

Validating Basic Qualification Screens

• What are BQs? Some examples…

– “Must be able to lift and carry XX pounds for YY feet”

– College degree in XX field

– Certificate in YY field

• Basic qualifications can:

– Save the employer’s money and personnel resources

– Reduce the size of the applicant pool

– Allow qualified applicants to rise to the top

– Reduce the amount of time it takes to fill job openings

– Show applicants that the employer is serious about job standards

Questions to ask about your BQs…

• Is the BQ likely to:

– Save your employer’s money and personnel resources?

– Result in an actual benefit to the target positions?

– Have adverse impact?

– Be perceived as a form of intentional discrimination?

– Survive an OFCCP Review as: o Noncomparative?

o o

Objective?

“Job relevant” and/or “job related and consistent with business necessity”?

Before Launching the BQ, Ask:

• Is the BQ likely to:

– Represent a true “minimum baseline” needed for the first day on the job?

– Be clearly understood by applicants?

– Be uniformly applied to all applicants?

– Discriminate (distinguish between qualified and unqualified applicants)?

– Allow an equal opportunity for all applicants to demonstrate that they possess the required levels?

Two Really Important BQ Concepts!

Important Concept #1:

If BQs have Adverse Impact, they Need to be “Validated”

Important Concept #2:

“Validation” is a DIFFERENT STANDARD than the

“job relevant” BQ requirement in the IA Regulations

Validation sometimes requires a different development process than what might be used to set up “job relevant” BQs under the IA Regulations

Review Standard for BQs Depends on

Whether they have Adverse Impact!

STANDARD 1: Int. App. Regs

Basic

Qualification

Noncomp?

Objective?

“Job Relevant”?

STANDARD 2: Title VII (e.g.,

Guidelines, 14C6)

Adverse

Impact?

YES NO

YES NO

AND

Int. App.

Regulation

Violation

“Job Related

& Consistent with

Bus. Necessity”?

NO=Disp. Imp.

Discrimination

YES=

Defensible

Clarification on the “Two Standards” Offered in the IA Regulations

• “ That standard [the Title VII standard] is applicable as a defense where a disparate impact has already been proven ” (p. 58957).

• By including the “ relevant to performance of the particular position ’’ standard in the final rule as a limitation on qualifications that could qualify as 'basic qualifications,'

OFCCP intends to provide a reasonable limit on the nature of the qualifications used only to define recordkeeping obligations.

OFCCP does not intend to define recordkeeping obligations through a presumption that every putative 'basic qualification' involves a disparate impact.

• Of course, once it is established that a criterion caused a disparate impact, the contractor has the burden of justifying that the criterion is job related and consistent with business necessity (p. 58957).

What Review Standards Apply to BQs?

OFCCP’s Definition of an Internet Applicant

There are no record retention obligations at this stage

Records must be retained for all job seekers during the following steps in the process.

Only job seekers who meet all 4 requirements will be analyzed in your Personnel Transactions and Adverse Impact Analyses

24

BQ Development & Validation Survey

• Use this survey for validating BQs

• Each row should contain incrementally higher levels of the BQ

• See Biddle (2010) Test Validation & Adverse Impact book for details

Weight Handling BQs and Physical

Ability Tests

A Worked Example… Establishing Defensible

BQs for Weight Handling Requirements

• Common Weight Handling BQs:

– Must be able to lift up to 50 pounds daily.

– Must be able to lift/carry 20-30 pounds routinely for a 8 hour shift.

– May be required to carry, push, pull, drag or hold up to 50 pounds.

– Person must be in excellent physical condition; be able to lift and carry 80 pounds; and be able to work under adverse conditions.

• Best Example:

– Must be able to lift and carry 54 pound boxes 100-150 times/8-hour shift for 10-30 feet each carry.

When it Comes to Setting Weight Handling BQs for Your Job Postings . . .

Honest and qualified applicants may self-select out of your hiring process!

One Method for Developing Weight

Handling BQs

• Step 1: Meet with management staff and create a list of the common items that are physically handled by incumbents.

• Step 2: Obtain weights for each item.

• Step 3: Survey job experts regarding:

– the frequency with which they handle (i.e., push/pull, lift/carry, etc.) the items, and

– how they handle the items (e.g., how far, how long, etc.)

One Method for Developing Weight

Handling BQs (cont.)

• Step 4: Analyze the survey Data:

– Remove “ outliers ” (using 1.65 SD rule) and/or raters with low inter-rater reliability

– Establish “ frequent ” and “ occasional ” requirements for various physical activities (push/pull, lift/carry, and other physical requirements)

– Establish weight handling BQs for each position at a level where at least 70% of job experts agree (e.g.,

“ 70% of job experts surveyed agreed that they must be able to lift and carry at least 50 pounds 10 times a day or less ” )

– Final BQ should include weight, how handled (lift, carry, push, pull, drag, rolled), and duration

One Method for Developing Weight

Handling BQs (cont.)

• Questions:

– Why establish the BQ weight using “ at least 70% of job experts agreed on a weight of X ”

– Doesn ’ t that set the weight cutoff too high?

– Why not just use the average of their responses?

• Answers:

– After removing outliers, the dataset should represent opinions from the “ normal range ” of job experts

– Using the 70% rule will help insure that at least the majority of job applicants should be able to handle that weight

– The 70% rule “ trims ” the highest 30% of the ratings, insuring that the benchmark is set at a reasonable level

– Using the average could possibly set the weight requirement at a level that 50% of the job experts thought was too low

What about Jobs that have Rigorous and/or

Regular Weight Handling Requirements?

• Use a physical ability test!

– Key Point: BQ screens are only self-reports!

• Rigorous physical ability tests will typically have adverse impact on women . . . therefore:

– They must be validated!

– Don ’ t rely on “ abstract strength tests ” or “ body measurement methods ” without statistical validity!

– Sometimes it ’ s better to measure physical abilities using “ work sample ” tests o This helps insure that applicants can perform the actual

job, not just the “ inferred ” job requirements o Applicant perception of fairness is the first trigger for lawsuits!

Validating Interviews

Interviews and the Courts

• The question is still sometimes asked…

– “Are Interviews really tests”?

– Yes, they are really tests

• Any Practice, Procedure, or Test (PPT) that separates two groups (e.g., men/women) based on two possible outcomes (e.g., pass/fail) is classified as a “test” under the Uniform

Guidelines.

Interview Defensibility & Validity:

Some General Characteristics…

Least Defensible Most Defensible

Unstructured

Single Rater

Generic “one size fits all”

Open Scoring/No Scoring

Structured

Multiple Raters

Job Specific

BARS

Low Validity High Validity

Unstructured Structured

r= .11 - .18 .24 - .34

Litigation Involving Interviews

• Is there a connection between Interview type and success in court?

• Williamson et al. (1997). Employment interview on trial: Linking interview structure with litigation outcomes. Journal of Applied Psychology,

82 (6), 900-912.

– Study involving 84 disparate treatment and 46 disparate impact cases where interviews were litigated

– 17 interview characteristics were evaluated (e.g., objective, subjective, standardized, etc.).

– Study resulted in clear findings that revealed the three primary ingredients for successful interview validity defense

Key Interview Defensibility Characteristics

• The Three Primary Factors Are…

– Interview objectivity and job relatedness, such as: o o

Objective and specified criteria

Trained interviewers o Validation evidence

Standardized administration, including: o o

Scoring guidelines

Minimal rater discretion o o o

Common questions

Consistency

Multiple Interviewers o Implies a shared decision making process

Rater reliability

Interview Rating Systems

 Rating scale: avoid 3-point; use 7- or 9-point

 Benchmark answers

 Compare responses to benchmarks

 Consider more points for certain questions

38

Rating Errors

 Halo/Horn Effect

 Leniency/Severity/Central Tendency

 Contrast Effect

 Biases and Stereotyping

 Fatigue

39

Using a Panel of Assessors

 Essential investment

 Staff and stakeholder morale

 Diverse perspectives

 Increased defensibility

 Shared responsibility in decision-making

40

Validating Some Common Written

Tests

Types of Written Tests

Skill / Ability Tests

– Can typically be content validated

– Examples include: o o

Math

Reading Comprehension o Language Arts

Job Knowledge Tests

– Almost always content validated

– Examples include: o Promotional movements o Licensure / Certification

Cognitive Ability / Personality

– Typically Require Criterion-Related Validity

Some Factors to Consider for Any Type of

Written Test…

• Are we measuring KSAs that are needed on the first day of the job?

• If the test is based on content validity, are the KSAs

operationally defined?

• Do we have a job analysis that can be linked to the test?

• What is the reliability of the test?

• How will the scores be used?

• Does our use of test scores exhibit adverse impact?

• If so, do we have a validation report that addresses 15B

(criterion) or 15C (content) of the Guidelines?

– For commercially available tests, have we conducted a local validation study, or a 7B transportability study?

“Using” Test Scores in a

Valid/Defensible Manner

How You Plan To Use Your Test Is Critical!

Pass/Fail Cutoffs:

– “Normal Expectations of Acceptable Proficiency in the Workplace”

(Guidelines, 5H)

– Modified Angoff (U.S. v. South Carolina, USSC)

Banding:

– Substantially Equally Qualified Applicants

– Statistically Driven (use Std. Error of Difference)

Ranking. For content validity:

– Is there adequate score dispersion?

– Does the test have high reliability?

– Is the KSA performance differentiating?

Weighted/combined with other tests

– How are the weights related to the job

– Do they come from the job analysis or SME ratings?

How Tests Can Be Used

Applicant Score

Tom

Stacy

Bob

Frank

Julie

Rozanne

Mark

Luke

Henry

Paul

Peter

Rebecca

Alyssa

Matthew

John

Annette

Ray

Thomas

Julissa

100

100

100

100

99

99

98

98

97

97

96

93

93

92

96

95

94

91

90

Ranking assumes one applicant is reliably more qualified than the other

Banding considers the unreliability of the test battery and

“ties” applicants

Pass/fail cutoffs treat all applicants as either “qualified” or “not qualified”

Weighting/combining test scores can be done using

“compensatory” or using cutoff on each test then weighting results

Characteristics of Pass/Fail Cut Scores

NOT TYPICALLY DEFENSIBLE WHEN: o Using an arbitrary cutoff (e.g., 70%) o Using applicant scores to benchmark (e.g., setting cutoff scores at mean-SD of applicant scores)

– TYPICALLY DEFENSIBLE WHEN: o o

Consider “Normal expectations of acceptable proficiency in the workplace” (Guidelines, 5H)

Usually requires SME-level data or ratings o Tied to job performance

– FACTORS TO CONSIDER: o o o o

Is the test supported by content validity information or criterion-related information?

How critical are the KSAs measured?

Does the test measure “baseline” or “differentiating” KSAs?

How would current incumbents perform on this test?

Comparison Score Uses

Factor Ranking Banding

Validation Requirements

Adverse Impact

Defensibility

Litigation "Red Flag"

Utility

Cost

Applicant Flow

Development Time

Reliability Requirements

# Item Requirements

High

High

Low

High

High

Low

Restrictive/

Controllable

Low

High

High

Moderate

Moderate

High

Moderate

Moderate

Moderate

Moderate/

Controllable

Moderate

Moderate

Moderate

Pass/Fail

Cutoffs

Low

High

Low

Low

Low

High

Low

Low

High

High

Setting Validated Cutoff Scores Using the

“Modified Angoff” Method

Rater ID

1 2 3 4

Item Number

5 6 7 8 9 10

Mean

7

8

5

6

9

10

3

4

1

2

100 80

80 90

60 90 100 90

100 70 80 100

60

100

60

80

80

70

100 60

50 70

70

80

50

50

90

90

60 100 90 100 70

50 50 50 80 50

80

70

70

80

50 80 70

60 100 50

60

80

100

60

70 80 90

50 100 90

60

50

50

70

70

90

50

50

60

70

50

80

50 100 50

60 90 100

90

50

70

80

70

50

50

70

50

50

70

50

60

50

50

70

80

80

80

100

80

70

80

60

Mean 79 74 71 77 82 67 64 64 64 78 72

SD 17.29 16.47 15.24 20.58 18.74 16.36 14.30 14.30 18.38 15.49

16.71

68

68

72

63

68

76

80

70

75

80

SD

21.08

20.00

17.16

18.86

13.17

19.89

16.19

12.52

16.87

15.06

17.08

Test Validation Checklists

• Use these review checklists to determine the validity of your PPTs under the requirements of the Guidelines

Questions

Answers

Copyright © 2012 BCG, Inc.

51

Download