ability tests

advertisement
Schedule
Today and Wednesday: Lecture
Monday, 4/08: Exam
1

Mental Measurements Yearbook (1938)
 Now in its 19th edition, updated in 2012
 You can access it online for free through our
library
 I included the Szostek & Hobson article so you can
get an idea of how important this resource is from
a legal perspective – article was published in 2011
and is up-to-date
▪ Courts have acknowledged it as the “bible of testing”
and the “authoritative source on testing”
(Just a brief mention of the MMY – you should always check the reviews for any off-the-shelf test an organization is
planning on using – not going over the other study objectives on the article)
2


For decades, tests have been classified as either
achievement tests or aptitude tests
Definitions of “achievement” and “aptitude”
 Achievement
The act of accomplishing or finishing something successfully,
especially by means of skill, practice, or perseverance
 Aptitude
A natural or acquired talent or ability or inclination; quickness in
learning and understanding - intelligence

The distinction represents the mind-body dualism typical of
traditional testing
(in this material, GFB argue that the terms “achievement test” and “aptitude test”
are inappropriate and should be replaced with the term “ability test” - and I agree with them; excellent material)
3

Achievement tests (supposedly) measure
 What a person learned as a result of a specific structured
educational/training experience/course
 Scores are interpreted to be a measure of how much an individual
knows as a result of the education or training
 English grammar, math, science, social studies, etc.
 These are the types of tests used in grade school and high school to
measure student learning/proficiency
▪ In Michigan, MEAP tests: Michigan Educational Assessment Program
4

Aptitude tests (supposedly) measure
 Accumulation of learning from a number of diverse and usually
informal learning experiences
 Although not emphasized by GFB, there is a genetic implication
▪
▪
▪
▪
You have artistic ability or you don’t
You have mechanical aptitude or you don’t
Women don’t have an aptitude for math
Men don’t have good spatial aptitude
 Said to measure potential to learn, or the potential to develop new
skills and acquire new knowledge
▪ If you don’t have the aptitude you can’t be a good artist, mechanic, mathematician
 Intelligence tests, SATs, GREs, Artistic Aptitude
 These are the tests that you are told you can’t study for (hog wash -
most people don’t say that any more)
(Olympic athletes and musicians have “natural” ability – then we learn the parent was an Olympic athlete or musician –
both parents were musicians…)
5
All tests measure what a person has learned up to the
time he or she takes the test and that is the only thing a
test can measure
 They cannot and do not measure innate or unlearned
potential (even if that existed)
 Thus, the distinction between achievement tests and
aptitude tests is arbitrary and
 We should use the term “ability tests” for both types of
tests

 Ability in the sense of competence or proficiency, regardless of
how you have acquired the ability/skill
6

Tests can and do measure the prerequisites that are
necessary for further learning in an specified area, and
thus can predict future learning/performance
 If students do not do well in PSY 3600, Concepts and Principles of
Behavior Analysis, they cannot do well in PSY 4600, Survey of
Behavior Analysis Research, thus a student’s grade in PSY 3600
can predict his or her performance in PSY 4600
 You can’t balance an equation in chemistry unless you know
algebra, thus a test of algebra can predict performance in a
chemistry class
(not in text, but important to understand)
7






Mental ability tests were at the center of early critical
Supreme Court decisions regarding unfair discrimination
Thus, many companies stopped using them
However, there is a lot of research in selection that
indicates that mental ability tests are related to almost all
jobs
Validity correlations are often quite high, and higher than
correlations of other tests
Many companies are now using them again
Remember, however, if you use one of these, you must
conduct an empirical validity study (or use validity
generalization - risky)
(as a behavior analyst, I still have trouble with the term “mental” ability since it still implies mind-body dualism; I’m
more comfortable, but not completely with “cognitive” ability; but haven’t been able to come up with anything
different that and certainly like those terms better than “intelligence” tests)
8

A rose is not a rose is not a rose
 A mental ability test is not a mental ability test is not a mental ability
test

Mental ability tests measure a collection of abilities - a
learned repertoire that typically includes:
 Verbal, math, memory, and reasoning abilities



14 different abilities are often measured in some
combination by mental ability tests (next slide)
Different mental ability tests often measure a different set
of these abilities
Thus a person may score differently on different tests of
mental ability
(FE: main abilities include some form of verbal, math, memory, and reasoning abilities)
9







Memory span
Numerical fluency
Verbal comprehension
Conceptual classification
Semantic relations
General reasoning
Conceptual foresight







Figure classification
Spatial orientation
Visualization
Intuitive Reasoning
Ordering
Figure identification
Logical evaluation and
deduction
(that is why if you use the PAQ you must take great care in selecting tests that are similar to the GATB tests
that are recommended)
10



The term mental ability makes it explicit that these tests
measure various cognitive abilities of the applicant (and
not some innate, unlearned, hypothetical construct called
“intelligence”)
These cognitive abilities are most directly identified by the
what is measured (some combination of the 14 abilities
listed earlier) and from the content of the items
themselves
They should be thought of the same way the other
abilities discussed in the book are thought of
 e.g., mechanical ability, clerical ability

In other words, the authors are resisting the traditional
view that there is something called “intelligence”
11
I am going to show you some examples of mental
ability tests at the end of class, just to “de-mystify”
them a bit
 The authors describe the Wonderlic Personnel Test
which is probably the most popular

 Given to all players at the NFL Scouting Combine and
scores are reported to NFL teams before the annual draft

For a moment, look at items in the text that are
similar to the ones on the Wonderlic Personnel Test
12
1. Which of the following months has 30 days?
(a) February (b) June (c) August (d) December
2. Alone is the opposite of:
(a) happy (b) together (c) single (d) joyful
3. Which is the next number in this series:
1, 4, 16, 4, 16, 64, 16, 64, 256,
(a) 4 (b) 16 (c) 64 (d) 1024
(Two slides - Note: all six items are different types of items: general knowledge, opposites - verbal
comprehension and vocabulary, numerical reasoning and ordering)
13
4. Twilight is to dawn as autumn is to:
(a) winter (b) spring (c) hot (d) cold
5. If Bob can outrun Junior by 2 feet in every 5 yards of a race, how
much ahead will Bob be at 45 yards?
(a) 5 yards (b) 6 yards (c) 10 feet (d) 90 feet
6. The two words relevant and immaterial mean:
(a) the same (b) the opposite (c) neither same nor opposite
(again, notice the type of questions: semantic or verbal reasoning, numerical fluency/reasoning,
verbal comprehension - opposites)
14

What have the validity studies uniformly
concluded?
Mental ability tests are among the most
valid of all selection instruments
(work samples are the only tests that seem to be as valid, recent data suggest they have just as much adverse impact;
next slide on validity of mental ability tests as well)
15
Differences in the actual tasks that a person
performs as part of a job have very little effect on
the magnitude of the validity coefficients for
mental ability tests
 In other words, mental ability tests are valid
predictors of performance for a wide variety of jobs

16
They have repeatedly been shown to have adverse
impact on protected classes, particularly blacks and
hispanics
 This led to the notion that these types of test might
have differential validity - next

17

14A: What is meant by differential validity?
 Notion/hypothesis that tests are less valid for
minority groups than for non-minorities
▪ That is, a test may be significantly more valid for whites
than for blacks
▪ Term is related to test bias regarding ability tests,
particularly mental ability tests
▪ This claim is made over and over again with respect to
SATs and GREs - that those tests are more predictive of
the performance of white students than they are of the
performance of minority students
(extremely important; and mentioned often in selection as well as admissions to colleges and universities,- and is
still very controversial)
18

The argument is that the content of ability tests is based
on content/items related to the white middle-class (e.g.,
vocabulary and grammar), and thus the scores of the
minorities are lower than what they should be
19

The data are very clear about this issue
Differential validity does not exist
• That is, tests are equally valid for whites and other
ethnic/racial groups
• It makes sense
– Verbal comprehension skills are verbal comprehension skills
– Verbal reasoning skills are verbal reasoning skills
– Math skills are math skills, etc.
• Thus if any of these skills are required by the job, they
should be “equally required” by whites and members of
other ethnic/racial groups
20
Meta-analyses have been consistent – there are
significant differences in mean test scores among
racial/ethnic groups
 Ranking:
Asians
whites
Hispanics
blacks

21
Cognitive ability tests have a high correlation
with job performance and academic performance
 They have a disproportionate impact on
Hispanics and blacks
 Often result in adverse impact as legally defined
when used for selection

(important, difficult issue arises)
22
Remember, adverse impact, however, does not mean that
unfair discrimination has occurred; if the tests are job
related then fair discrimination has occurred
SO16: Three things that make a defense against adverse
impact likely:
 Their overall validity – they are among the most valid
and least expensive tests
 Differential validity does not exist
 Adverse impact cannot be overcome by using any other
measure
23

It is not appropriate to conclude from these studies
that differences are due to
 genetic differences
 educational differences
 cultural differences

Studies do not address the reasons
(the authors want to caution any one making any general conclusions as to why differences exist; particular concern about
race-based genetic arguments as advanced in the Bell Curve, published a number of years ago that re-opened the debate
about race-based genetic intelligence.)
24
Cognitive ability tests are among the most valid
tests for a large number of jobs (and some selection
specialists would say for all jobs)
 Evidence also indicates that adverse impact is highly
likely with these tests

(skipping to SO19; cont. on next slide)
25


Because they are so valid, some selection specialists believe
cognitive ability tests should be used extensively in selection
Some, however, have expressed deep reservation about
using them because of the social implications of the
disqualification of larger proportions of minorities
(very nice discussion of this in text; directly quoting GFB here; cont. on next slide)
26

To some extent, the decision may reflect the
values/goals of the organization
 If goal is to maximize individual performance with minimal cost,
cognitive ability tests will do this
 If the organization has multiple goals of sustaining high performance
while maintaining a broad representation of minorities, then it would
be better to limit the use of cognitive ability tests and use other,
generally more expensive and almost equally valid instruments
▪ biodata inventories (I don’t like these as you will see next unit)
▪ structured interviews
▪ assessment centers
*The authors include work samples in their list but in later in this chapter present recent data that
indicates work samples appear to have as much adverse impact as cognitive ability tests.
(that’s the rub - the expense of those other instruments)
27

If an organization has diversity as a selection goal and
wants to use cognitive ability tests because of their validity
and the fact that other options are much more expensive,
what is the main/best option?
Vigorous recruitment of minority applicants
(now back to SO17: remember race norming is not legal; often a problem because selection specialists are typically not
the ones who are responsible for recruitment –selection specialists really need to work with the HR staff)
28
The authors describe several very popular tests
Refer to this material if you are ever looking for
tests in these categories
 I am not going to have you learn anything specific
about these tests


29

Height and weight requirements have often been
challenged in court
 Adverse impact on females and Asians
The courts have rarely let them stand
The rationale for using these measures is that they
are substitute measures for strength
 But courts have consistently held that if strength is
the job requirement, then it should be measured
directly (physical ability test)


(a lot of organizations in the past; police and fire)
30


The data and information on personality tests is difficult
For many years, companies used personality tests that were
developed by clinical psychologists, and some of those tests
are still popular and being used by organizations




One is the California Personality Inventory
Have not had good validity historically
In prior editions of the book, GFB advised against their use
They remain cautious in this one, but “cautiously optimistic”
31

There is some good work going on right now, however, the
field is in a bit of flux right now
 Intuitively we know that “personality” influences how effective a
person is at work, we just haven’t tapped into what the relevant KSAs
really are, or what the relevant clusters of behaviors are
 Even with the recent work, validity coefficients tend to be low, but
they do appear to add independent predictive power (above and
beyond cognitive ability tests and other types of ability tests)
32

There is some agreement in the field that personality characteristics can
be grouped into five broad dimensions called the Five-Factor Model or
Big Five
 Conscientiousness
▪ Being responsible, organized, dependable, planful, willing to achieve, and
persevering
 Emotional stability (only one described in negative terms)
▪ Being emotional, tense, insecure, nervous, excitable, apprehensive, and easily upset
 Agreeableness (relevant for team work)
▪ Being courteous, flexible, trusting, good natured, cooperative, forgiving,
softhearted, and tolerant
 Extroversion
▪ Being sociable, gregarious, assertive, talkative, and active
 Openness to experience (also called intellect or culture)
▪ Being imaginative, cultured, curious, intelligent, artistically sensitive, original and
broad minded
33

Good news: to date there has been little or no adverse impact
(a) across racial and ethnic groups and (b) between males and
females
34

Two traits have been shown to be universal
predictors, that is, valid across jobs
 Conscientiousness
 Emotional stability

The other three were found to be valid for only a
few jobs or specific criteria
 Extraversion (managers and training criteria)
 Agreeableness (team work)
 Openness to experience (training criteria)
35

If you do use a personality test, you must use a criterionrelated validity study to support it because personality
traits cannot be directly observed
 Concurrent validity
 Predictive validity
 Validity generalization
(in other words you cannot use content validity: also have some legal issues to be aware of)
36

ADA (dealt with this previously in U3)
 If a test can and is used to diagnose mental/psychiatric disorders,
then it will probably be considered a medical examination under
ADA
 If it deals with other personality traits (the Big 5, for example) then
it probably will not be considered a medical examination although I
don’t know how courts would/will handle “emotional stability” as it
relates to ADA
 Nonetheless, my strong advice to you is to treat every personality
test as a medical examination until things are clarified more by the
courts
 Which means you should only administer personality tests postoffer and keep the results in a file that is separate from the
personnel file
37

Clarifying court case, 2005, 7th Circuit Court
 MMPI is a medical examination and thus illegal for pre-
employment use (certainly that was expected)
 Psychological tests that measure personal traits such as
honesty, integrity, preferences and habits do not
constitute medical examinations
38

Right to privacy (be able to explain this as well)
 Although a right to privacy is not explicitly guaranteed under the US
Constitution, individuals are protected from unreasonable
intrusions and surveillance
 Personality tests, by their nature, reveal an individual’s thoughts
and feelings
 Several states have laws that explicitly guarantee a right to privacy
▪ To date, litigation has occurred about questions relating to sexual
inclinations and orientation and religious views
(second thorny issue)
39

Soroka v. Dayton Hudson (1991)
 California Court of Appeals stopped Dayton Hudson’s Target stores
from requiring applicants for store security positions to take a
personality test that contained questions about sexual practices and
religious beliefs
 The court also stated that employers must restrict psychological
testing to job-related questions
 The ruling was later dismissed because the parties reached a courtapproved settlement
▪ Dayton-Hudson agreed to stop using the personality test
▪ Divided $1.3 million dollars among the estimated 2,500 members of the
plaintiff class who had taken the test
40

Performance or work sample tests are excellent and I
highly recommend their use when you can do them
 Typing test
 Having candidates write a computer program to solve a specific
problem
 Role playing a sales situation with an applicant for a sales position
 Having mechanics trouble shoot a problem with an engine

You are getting an actual sample of behavior under
controlled testing conditions (which permits you to easily
compare performance across applicants)
(this slide NFE)
41


From a technical perspective, they have high validity
They reduce two limitations of other selection procedures,
and both are related to verbal behavior
 Most selection procedures rely heavily on verbal behavior
▪ Written answers to questions (ability tests)
▪ Oral descriptions of abilities/skills (interviews, training and
evaluation assessments)
(This slide NFE)
42

Willful distortion and faking (people want to look good)
 This varies dependent upon the selection procedure
▪ Reports about past experiences (interviews, T&Es) where the
information is difficult to confirm - most susceptible
▪ Personality and honesty inventories, next susceptible
▪ Ability tests, least susceptible
43

Relationship between verbal behavior and actual behavior
is not perfect (as we behavior analysts well know)
 Much of our behavior is contingency-shaped, not rule-governed
 This is particularly a problem for exemplar performers who are not
verbally fluent
▪ Automobile mechanic
▪ Plumbers
▪ Machine operator
 It can also be a problem for employees who are exemplar
performers but can’t describe what makes them exemplary
performers – sales representatives
44
Difficulty of accurately simulating job tasks that are representative
of the job
 Applicants must already have the KSAs being tested – they cannot
cover specialized things that must be learned on the job

 General sales skills OK, but questions that deal with specific company-related
products and pricing will not be

Very costly to develop and and often to administer (many must be
done one-on-one)
45


Many consulting firms use stress interviews
Stress interviews
Interviewer creates a stressful situation, often by asking many questions
rapidly, not allowing much time for the applicant to respond, interrupting
the applicant frequently, acting in a semi-hostile manner, or in a cool aloof
manner

Why bad?
 Even if the job is one of high work demands that produce stress, rarely is the
situation staged in the interview representative of the actual work demands that
produce the stress
 In very few jobs, is the stress related to a semi-hostile or cool/aloof stranger
rapidly firing questions
 The behavior of the applicant doesn’t readily generalize to the job and thus
should not be used as a predictor
(maybe OK for a press secretary for a politician)
46

Validity
 They both have high validity: they are two of the most
valid types of selection instruments

Adverse impact
 Equal adverse impact

Cost
 Performance tests cost much more to develop and
administer
47

Assessment centers or even the use of some of the exercises
often included in assessment centers have been highly
successful
 In-basket tests
 Leaderless group interaction tests
 Case analyses


Main problem is their time and expense to both develop and
administer
You are unlikely to become involved in designing an
assessment center, thus I am skipping them for the sake of
time
48

Refer you to the Minnich & Komaki article in U7 in the course
pack from the OBM Network News
The article describes the use of a validated in-basket test to assess the
effectiveness of managers based on Komaki’s Operant Supervisory
Taxonomy and Index
This is one of the best examples I have ever seen of the intersection of
behavior analysis and traditional I/O Psychology

Operant supervisory taxonomy and index
 Assessed the difference between high performing and low performing
managers
 Found that work sampling and type of consequence following
performance distinguished between high and low performing
managers
(Gives a detailed description of the instrument, some of the actual items, and responses, along with analysis of responses
Unfortunately, it is not commercially available – done as Minnich’s dissertation)
49

During the introduction to the course, I provided some information
about graphology
 Used as a selection tool in/by (very popular in Europe):
▪
▪
▪
▪
▪

5,000 US companies
68% of Swiss companies
50% of French companies
80% of French selection consultants
80% of Western European countries
I am appalled, as are the authors, that a section on graphology has to
be included in a legitimate text on personnel selection and placement
but the good news is that its use appears to be declining, at least in
this country
(couldn’t resist including this; this slide NFE)
50

Graphology has no validity whatsoever as a
selection tool or
as GFB state, “it flat out doesn’t work.”
 For validity studies, see 593-594,0
51


(NFE) Just for fun, look at Table 14.4
Gatewood sent a handwriting sample to a graphologist
who graduated from the program conducted by the
International Graphoanalysis Society
 Four times (for each edition of the book), they
calculated reliability (same graphologist) and reported
the results with commentary by GBF
 Read pages 594-596
(love the way the authors handle this - humor and irony; Ok moving on..)
52

For all practical purposes, it is illegal
 Federal law, Employee Polygraph Protection Act of 1988
 It can be used in some specific employment situations for selection (there
are other requirements for use with current employees)
▪ Private employers whose primary business purpose is to provide security services
(e.g., protection of nuclear power facilities, public water supply facilities,
shipments or storage of radioactive or other toxic waste materials, public
transportation)
▪ Employers involved in the manufacture, distribution, or dispensing of controlled
substances
▪ Federal, state and local government employers; also private consultants or
experts under contract to governmental depts. and agencies (e.g., Defense
Dept., Energy Dept., National Security Agency, CIA, FBI)
(spies and spooks)
53

Frequency of false positives; that is, there is a high degree of error
with respect to finding that an individual is lying when in fact, the
individual is not (details below, NFE)
 Assume 90 percent accuracy (high end estimate)
 Assume rate of stealing is 5% of the working population
 If 1,000 polygraphs were given, we would expect 50 individuals would be




lying, and given 90% accuracy, 45 of those would be detected
However, the problem lies with the other 950 individuals
95 (950 X .10) would be identified as lying when they had not
Thus, 140 individuals would be identified as having lied, with 68% of them
being false positives
Not good
(text actually gives 3, I am asking you to learn the major one; (a) other reactions than guilt can trigger an
emotional response; (b) there are countermeasures that can be used to avoid detection - I am sure you can find
them on the web)
54

There are two basic types of paper and pencil
integrity tests
 Overt integrity tests
▪ Self-report inventories that measure a job applicant’s
“attitudes” and “cognitions” toward theft that might predispose
him/her to steal at work
 Personality-based measures
▪ Self-report inventories that measure integrity as part of a larger
syndrome of antisocial behavior or organizational delinquency
and thus not only measure theft but things like drug and alcohol
abuse, vandalism, sabotage, assaultive actions, insubordination,
absenteeism, excessive grievances, bogus worker
compensation claims and violence
(this slide NFE)
55




Pencil and paper integrity tests were developed to replace
polygraph testing after the Employee Polygraph
Protection Act was passed in 1988
A few states have passed laws against the use of these
tests as well, so be careful and check the state laws
Once again the reason for concern is the high number of
false positives that occur
Because of the concern about theft by employers, the use
of integrity tests is on the rise and thus more validity
studies have been conducted recently
(this slide NFE)
56


These tests indicate that these measures do correlate with
measures of theft, general counterproductive behaviors
(grievances filed, absenteeism, disciplinary actions, etc.),
and various types of job performance
They appear to be OK to use in a selection program,
however, at the current time, many still oppose their use
 False positives and the social implications of that – how would you
like to be identified as a liar and cheater when you were not?
 Frequency of false positives is unknown
57


Use of drugs and alcohol have been a major concern
since the 1960s (casting dispersions on my generation,
the hippie generation)
NFE, but paper and pencil drug tests - see items
▪ Do you think that it is OK for workers to use “soft” drugs at work if this does not
cause poor performance?
▪ In the past six months, how often have you used marijuana at work?
▪ In the past six months, have you brought cocaine to work even though you did
not use it at work?
 No public studies that have evaluated either the reliability of
validity of these tests
 In one court case, the court ruled that these were illegal based on
the Fifth Amendment’s prohibition against involuntary selfincrimination
(I find it hard to believe anyone would answer these types of questions honestly!)
58





The legal status of drug testing is unclear
Organizations face less risk using drug testing for preemployment selection (testing individuals who are
applying for a job)
They face considerably more risk if they test existing
workers for promotions (or transfers) or testing workers to
detect drug users for disciplinary or counseling purposes
NFE, but why? Applicants cannot take advantage of
collective bargaining or challenge employment at-will
principles, as can employees who feel they have been
wrongly treated
NFE, Consult a good lawyer in employment law before
implementing drug testing for selection purposes!
(last slide – distribute EAS tests)
59
Questions???
60

Drug testing is not considered a medical test under
ADA
 You can administer a drug test before an offer is made

Why?
Those using illegal drugs are excluded from
coverage under ADA. Thus, while many would
consider drug testing a medical test, it is not
considered a medical test under ADA
61
15 states and the DC have passed medical marijuana
laws
 If a person has a disability and uses medical
marijuana, what about drug testing?
 Many laws protect employers with clauses like
“employers are not required to accommodate the
medical use of marijuana in any workplace.”
 However, laws are varied and there have not yet
been many cases

62

California Supreme Court, 2008
 OK to fire a worker after drug test
 Employers are under no obligation to accommodate
medical marijuana on or off the job
 The law protects the individual from criminal prosecution
but provides no protection on the job
 Why? Marijuana remains classified as an illegal substance
under federal law
(I dealt with this previously, but want you to learn this point now; so drug test away)
63

Agreed to review a case in which a customer
service consultant was “fired” (not hired) for
her legal, at-home use of marijuana
 Applicant disclosed her use during the hiring
process
 Gave the company a copy of her physician’s
authorization
 Was not hired after a pre-employment drug
screen when she tested positive for THC.
64

Don’t think so, but who knows?
(no one knows where this is going)
65
Download