# Unit 5 Lecture

```PSY 6430 UNIT 5
Validity
Determining whether the selection instruments
are job-related
Lecture: Wednesday, 3/04
Lecture: Wednesday, 3/18
Exam: Monday, 3/23
Spring break: 3/09 and 3/11
ME1: Monday, 3/16
Last day to withdraw: Monday, 3/23
1
SO1: NFE, Validity, a little review
2



Predictor = test/selection instrument
Use the score from the test to predict who will
perform well on the job
Possible confusion (again)
 You
need to determine the validity of the test based on
 Then you administer it to applicants and select
employees based on the score
(a few students had a problem distinguishing between validity and reliability on E4, example next)
SO1: NFE, Validity, example
3






Administer a test to current employees
Obtain measures of how well they perform on the job
Correlate the test scores with the performance measures
Assume: The correlation is statistically significant
Assume: Current employees who score 50-75 also are
performing very well on the job
Now you administer the exam to applicants, predicting that
those who score 50-75 will also perform well on the job
(main point next slide)
SO1: NFE, Validity main point
4



You determine the validity of a selection test or
instrument based on your current employees
Then after establishing the validity or job
relatedness of the test
Give the test to applicants and select them on
the basis of their test scores
SO2: Reliability vs. Validity
5


Reliability
Operational Definition: Is the score on the
measure stable, dependable, and/or consistent?
Conceptual Definition: Are you actually
measuring what you want to be measuring?
Validity
Is the measure related to performance on the
job?
SO3: Relationship between reliability and validity
6




A measure can be reliable, but not valid
However, a measure cannot be valid unless it is
reliable
*Reliability is a necessary but not sufficient
condition for validity
Text gives a perfect example
You can reliably measure eye color, however, it
may not be related to job performance at all
*key point
Types of validation procedures
7






Content: expert judgment
Criterion-related: statistical analyses (concurrent &amp; predictive)
Construct (but not practical-not covering this)
Validity generalization (transportable, no local validity study –
jobs are similar)
Job component validity (not covering this in this unit, but will
elements/components based on all possible jobs)
Small businesses: Synthetic validity (not covering it, not very
relevant now –content validity)
(main types are the two kinds of criterion-related and content validity; construct really a hold over from test construction - not very
relevant - I have only seen this used by a few organizations – create their own tests; cover validity generalization, but right now
while validity generalization has excellent professional support, may not be legal - professional guidelines depart from legal; in one
case, 6th Circuit Court ruled it illegal as a matter of law based on Griggs/Duke and Albermarle - 1987)
SO5 NFE but 7B is: Difference between
content and criterion-related validity
8

Criterion-related validity is also called
“empirical” validity
 Concurrent
validity
 Predictive validity

This type of validity relies on statistical analyses
(correlation of test scores with measures of job
performance)
 Measures
(content next slide)
of job performance = criterion scores
SO5 NFE but related to 7B which is: Difference
between content and criterion-related validity
9


Content validity, in contrast, relies on expert
judgment and a match between the “content” of
the job and the “content” of the test
Expert judgment refers to
 the
determination of the tasks and KSAs required
to perform the job via a very detailed type of job
analysis
 linking the KSAs to selection procedures that
measure them
Intro to content validity
10

You do NOT use statistical correlation to validate
is based “only” on your job analysis
procedures and descriptively linking the KSAs to
selection measures
 Validation

It is much more widely used than criterion-related
validity
 Particularly
since Supreme Court ruled it was OK to use
(again, to emphasize)
SO6: Two reasons why content
validity is often used
11

It can be used with small numbers of employees
 Large
sample sizes are required to use criterionrelated validity due to the correlation procedures
 The text later when talking about criterion-related
validity indicates you may need over several
hundred
 Dickinson: usually 50-100 is adequate
 How many companies have that many current
employees in one position?
(small number of incumbents)
SO6: Two reasons why content
validity is often used
12

Many organizations do not have good job
performance measures

You need good performance criterion measures to
do a criterion-related validity study because you
correlate the test scores with job performance
measures
SO7A: Content vs. criterion-related validity and
the type of selection procedure
13


If you use content validity you should write the
test, not select an off-the-shelf test
If you use criterion-related validity, you can do
either
 It
is much easier and less time consuming to use an
off-the-shelf test than to write one!
(VERY IMPORTANT!; book waffles on this a bit, indicating that emphasis should be placed on constructing a test,
But only in rare situations would I recommend selecting off-the-shelf test with content validity - legally too risky; why, next slide)
SO7A: Why should you write the test if you use
content validity? (this slide, NFE)
14


Content validity relies solely on the job analysis
The KSAs must be represented proportionately on the
selection test as indicated in the job analysis in terms of:





Their relative importance to the job
The percentage of time they are used by the employees
It is highly unlikely that an off-the-shelf test will proportionately
represent the KSAs as determined by your job analysis
In some discrimination court cases, the judge has gone
through the test item by item to determine whether the items
were truly proportional to the KSAs as determined by the
job analysis
Both professional measurement reason and legal reason to
write the test rather than using an off-the-shelf test
SO7B: Content vs. criterion-related validity: Differences
in the basic method used to determine validity (review)
15

Content validity
 Relies
solely on expert judgment - no statistical
verification of job-relatedness

Criterion-related validity
 Relies
on statistical verification to determine jobrelatedness
(I am not going to talk about SO8, face validity; very straightforward)
SO9: What is the “heart” of any validation
study and why?
16


Job analysis
The job analysis determines the content domain of
the job – the tasks and KSAs that are required to
perform the job successfully
SO10: Major steps of content validity - very,
very specific requirements for the job analysis
17



*Determine the criticality and/or importance of
Specify the KSAs required for EACH task
 KSAs
*Now because of ADA, is it an essential function?
(cont. next slide)
SO10: Major steps of content validity, cont.
18

Determine the criticality and/or importance of each KSA*


Operationally define each KSA
Describe the relationship between each KSA and each task statement


You can have KSAs that are required for only one or two tasks, or you can have KSAs
that are required to perform several tasks
The more tasks that require the KSAs, the more important/critical they are
Describe the complexity or difficulty of obtaining each KSA (formal
degree, experience)
 Specify whether the employee must possess each KSA upon entry or
whether it can be acquired on the job (cannot test for a KSA if it can
be learned within 6 months)
 Indicate whether each KSA is necessary/essential for successful
performance of the job
*Only the first major point will be required for the exam, but I want to
stress how detailed your job analysis must be for content validity

(cont on next slide)
SO10: Major steps of content validity, cont.
19






Reverse analysis; you have linked the KSAs to the tasks, now you must
KSA # 1 may be relevant to Tasks 1, 6, 7, 10, 12, &amp; 22
KSA # 2 may be relevant to Tasks 2, 4, &amp; 5
Etc.
(NFE) Develop test matrix for the KSAs

If you want see how you go from the task analysis to the actual test,
turn ahead to Figures 7.12, 7.13, 7.14, 7.15, and 7.16 on pages
283-286 and Figure 7.17 on page 290
SO11: When you can’t use content validity
according to the Uniform Guidelines
20

When assessing mental processes, psychological constructs,
or personality traits that cannot be directly observed, but
are only inferred




You cannot use content validity to justify a test for judgment,
integrity, dependability, extroversion, flexibility, motivation,
conscientiousness, adaptability, or any personality characteristic
The reason for that is that you are basing your job analysis on
expert judgment - and judgment is only going to be reliable if you
are dealing with concrete KSAs such as mechanical ability, arithmetic
The more abstract the KSA, the less reliable judgment becomes
If you can’t see it, if you can’t observe it, then the leap from the task
statements to the KSAs can result in a lot of error
(text mentions three; I am having you learn the first one and one I added in the SOs -- these are the two that
are most violated in practice; the second one is relevant to BOTH content and criterion-related so shouldn’t be listed
under when you can’t use content validity: cannot test for KSAs that can be learned on the job)
SO11: When you can’t use content validity
according to the Uniform Guidelines, cont.
21

When selection is done by ranking test scores or banding
them (from U1)



If you rank order candidates based on their test scores and select on
that basis, you cannot use content validity - you must use criterionrelated validity
If you band scores together, so those who get a score in a specified
range of scores are all considered equally qualified, you cannot use
content validity - you must use criterion-related validity
Why? If you use ranking or banding, you must be able to prove that
individuals who score higher on the test will perform better on the job
- the only way to do that is through the use of statistics
The only appropriate (and legally acceptable) cut-off score
procedure to use is a pass/fail system where everyone
above the cut-off score is considered equally qualified
)
Criterion-related validity studies:
Concurrent vs. predictive
22


SO13A: Concurrent validity
Administer the predictor to current employees and correlate
scores with measures of job performance
Concurrent in the sense that you have collected both
measures at the same time for current employees
SO18A: Predictive validity
Administer the predictor to applicants, hire the applicants,
and then correlate scores with measures of job
performance collected 6-12 months later
Predictive in the sense that you do not have measures of job
performance when you administer the test - you collect them
later
(comparison of the two, SO13A, describe concurrent validity; SO18A, describe predictive validity)
Predictive Validity: Three basic ways to do it
23



Pure predictive validity: by far the best
Administer the test to applicants and randomly
hire
Current system: next best, more practical
Administer the test to applicants, use the current
selection system to hire (NOT the test)
professionally and legally
Administer the test, and use the test scores to hire
applicants
(going to come back to these and explain the evaluations; text lists the third as an approach! Click: NO!!)
SO13B: Steps for conducting a
concurrent validity study
24

Job analysis: Absolutely a legal requirement

Discrepancy between law and profession (learn for exam)




Law requires a job analysis (if adverse impact &amp; challenged)
Profession does not as long as the test scores correlate significantly with
measures of job performance
Determine KSAs and other relevant requirements from the job
analysis, including essential functions for purposes of ADA
Select or write test based on KSAs (learn for exam)


May select an off-the-shelf test or
Write/construct one
SO13B: Steps for conducting a
concurrent validity study
25

Select or develop measures for job performance




Sometimes a BIG impediment because organizations often do not
have good measures of performance
Administer test to current employees and collect job
performance measures for them
Correlate the test scores with the job performance
measures
(SO14: add this step) Determine whether the correlation
is statistically significant at the .05 level
You can then use the test to select future job applicants
validity over predictive validity
26


Because you are using the test data and
performance data from current employees, you can
conduct the statistical validation study quickly – in a
relatively short period of time
Remember, that with predictive validity, you must
hire applicants and then wait 6-12 months to obtain
measures of job performance (post-training, after
they have learned the job)
SO15B&amp;C: The basic reason that accounts for all
of the weaknesses with concurrent validity
27




All of the weaknesses have to do with differences between
your current employees and applicants for the job
You are conducting your study with one sample of the
population (your employees) and assuming conceptually
that your applicants are from the same population
However, your applicants may not be from the same
population - they may differ in important ways from your
current employees
Ways that would cause them (as a group) to score
differently on the test or perform differently on the job,
affecting the correlation (job relatedness) of the test
*The first point is related to B, the other points are related to and
essential to C.
(text lists several weaknesses and all of them really relate to one issue; dealing with inferential statistics here)
SO15D: Some specific differences
28




Job Tenure: If your current employees have been on the job
a long time, it is likely to affect both their test scores and
job performance measures
Age &amp; Education: Baby boomers vs. Generation Xers vs.
millennials; high school vs. college vs. graduate degree
Different motivational level: employees already have a job,
thus they may not be as motivated to perform well on the
test; on personality measures, applicants may be more
motivated to alter their responses to make themselves look
good
Exclusiveness of current employees: sample doesn’t include
those who were rejected, those who were fired, those who
left the organization, and employees who were promoted
which can affect both test and performance scores
(SO asks you to learn any three)
SO16: Restriction in range
29


This is the term used for the statistical/mathematical reason
why the differences between your current employees and
applicants affect validity
It also explains from the last unit, why reliability is generally
higher when

Your sample consists of individuals who have greater differences in
the ability for which you are testing


High school students, community college students, vs. engineering majors
in college who take a math test
The questions are moderately difficult – about 50% of test
takers answer the questions correctly – rather then when the
questions are very easy or very difficult
SO16: Restriction in range
30


With criterion-related validity studies the ultimate proof
that your selection test is job related is that the correlation
between the test scores and job performance measures is
statistically significant
A high positive correlation tells you




People who score well on the test also perform well
People who score middling on the test are also middling performers
People who score poorly on the test also perform poorly on the job
In order to obtain a strong correlation you need


People who score high, medium, and low on the test
People who score high, medium, and low on the performance
measure
(before really understanding the weaknesses related to concurrent validity and why pure predictive validity is the most sound
type of validation procedure, you need to understand what “restriction in range” is and how it affects correlation coefficient; related
to some of the material from the last unit on reliability - so if you understood it in that context, this is the same conceptual issue)
SO16: Restriction in range, cont.
31

That is, you need a range of scores on BOTH the test and the
criterion measure in order to get a strong correlation



If you only have individuals who score about the same on the exam,
regardless of whether some perform well, middling, and poorly, you
will get a zero correlation
Similarly if you have individuals who score high, medium, and low on
the test, but they all perform reasonably the same, you will get a zero
correlation
Any procedure/factor that decreases the range of scores on
either the test or the performance measure



Reduces the correlation between the two and, hence,
Underestimates the true relationship between the test and job
performance
That is, you may conclude that your test is NOT valid, when in fact, it
may be
SO16: Restriction in range, cont.
32


Restriction in range is the technical term for the
decrease in the range of scores on either or both
the test and criterion
Concurrent validity tends to restrict the range of
scores on BOTH the test and criterion, hence
underestimating the true validity of a test
(stress the either or both; cont on next slide)
SO16: Restriction in range, cont.
Also related to SO17A&amp;B
33

Why? You are using current employees in your sample




Your current employees have not been fired because of poor
performance
Your current employees have not voluntarily left the company
because of poor performance
Your current employees have been doing the job for a while and
thus are more experienced
All of the above would be expected to



Result in higher test scores than for the population of applicants
Result in higher performance scores than for the population
Thus, restricting the range of scores on both the test and the
performance criterion measure
(diagrams on next slide)
SO16: Restriction in range, cont.
34
Top diagram
 No
restriction in range
 Strong correlation
High
Performance

Low
Low
High
Test Scores
Bottom diagram
 Restriction
in range
 Test
scores and
 Performance scores
 Zero
correlation
High
Performance

Low
Low
High
Test Scores
(extreme example, but demonstrates point - concurrent validity is likely to restrict range on both, underestimating true validity)
SO18: Predictive validity
35

SO18A: Predictive validity (review)
Administer the predictor to applicants, hire the applicants,
and then correlate scores with measures of job
performance collected 6-12 months later
Predictive in the sense that you do not have measures of job
performance when you administer the test - you collect them
later, hence, you can determine how well your test actually
predicts future performance
SO18B: Steps for a predictive validity study
36



Job analysis: Absolutely a legal requirement
Determine KSAs and other relevant requirements
from the job analysis, including the essential
Select or write test based on KSAs*
 You
may select an off-the-shelf test or
 Write/construct one

Select or develop measures for job performance
*Learn this point for the exam
(first four steps are exactly the same as for a concurrent validity study)
SO18B: Steps for a predictive validity study
37

Administer the test to job applicants and select
randomly or using the existing selection system




Do NOT use the test scores to hire applicants (I’ll come
back to this later)
After a suitable time period, 6-12 months, collect
job performance measures
Correlate the test scores with the performance
measures
(SO18B: add this step) Determine whether the
correlation is statistically significant and if it is, your
test is valid
SO19: Two practical (not professional)
weaknesses of predictive validity
38

Time it takes to validate the test
Need appropriate time interval after applicants are hired
before collecting job performance measures
 If the organization only hires a few applicants per month, it
may take months or even a year or more to obtain a large
enough sample to conduct a predictive validity study
(N=50-100)

SO19: Two practical (not professional)
weaknesses of predictive validity
39

Very, very difficult to get managers to ignore the test
data (politically very difficult)
Next to impossible to get an organization to randomly hire some poor employees ARE going to be hired
 Also difficult to convince them to hire using the existing
selection system without using the test score (but much easier
than getting them to randomly hire and doable)

(I don’t blame them; it would be like us randomly accepting students into the graduate program)
SO20A&amp;B: Predictive validity designs
40


Figure 5.5 lists 5 types of predictive validity designs
Follow-up: Random selection (pure predictive validity)
Best design
 No problems whatsoever from a measurement perspective;
completely uncontaminated from a professional perspective


Follow-up: Use present system to select
OK and more practical, but
 It will underestimate validity if your current selection system is
valid; and the more valid it is the more it will underestimate
 And, why will it underestimate the validity?

SO20C: Predictive validity, selection by scores
41

Select by test score: Do NOT do this!!!

Professional reason:
 If your selection procedure is job related, it will greatly
underestimate your validity - and, the more job related
the selection procedure is, the greater it will
underestimate validity.
 In fact, you are likely to conclude that your test is not
valid when in fact it is
 Why? If your test is valid, you are severely restricting the
measures!
(professional and legal reasons not to do this)
SO20C: Predictive validity, selection by scores
42
 Legal
reason:
 If
adverse impact occurs you open yourself up to an
unfair discrimination law suit
 You have adverse impact, but you do not know whether
the test is job related
There is a caveat (nfe): Some courts have ruled that adverse
impact is OK if a validation study is in progress. However, I
see this as being way too risky legally (particularly given the
technical problems with this method).
SO20: NFE, Further explanation of types of
predictive validity studies
43

Hire, then test and later correlate test scores and job
performance measures
If you randomly hire, this is no different than pure predictive
validity: #1 previously, Follow-up: Random selection
 If you hire based on current selection system, this is no
different than #2 previously, Follow-up: Select based on
current system

(one more slide on this)
SO20: NFE, Further explanation of types of
predictive validity studies
44

Personnel file research - applicants are hired and
their personnel records contain test scores or other
information that could be used as a predictor (i.e.,
perhaps from a formal training program). At a later
date, job performance scores are obtained.
For exam: Rank order of criterion-related validity
studies in terms of professional measurement standards
45
1.
2.5
2.5
4.
Predictive validity (pure) - randomly hire
Predictive validity – use current selection system
Concurrent validity
Predictive validity – use test scores to hire
Which is better: Predictive vs. concurrent,
research results (NFE)
46

Data that exist suggest that:
 Concurrent
validity is just as good as predictive validity
for ability tests (most data)
 May not be true for other types of tests such as
personality and integrity tests
 Studies
have shown differences between the two for these
type of tests - so proceed with caution!

Perhaps not too surprising – as discussed earlier, applicants may
falsify their answers more to look better than current employees
(Conceptually, predictive validity is better, it has more fidelity with, is more similar to the actual
selection procedure; test applicants, select, and see how well they do on the job later)
SO21: Sample size needed for a criterionrelated validity study (review)
47

Large samples are necessary
 The
text indicates that frequently over several hundred
employees are often necessary
 Dickinson maintains that a sample of 50-100 is usually

What do companies do if they do not have that
many employees?
 They
use content validity
 They could possibly also use validity generalization, but
even though this would be professionally acceptable, at
the current time it is still legally risky
SO23: NFE, Construct validity
48



Every selection textbook covers construct validity
I am not covering it for reasons indicated in the
SOs, but will talk about it at the end of class if I
have time
Basic reason for not covering it is that while
construct validity is highly relevant for test
construction, very, very few organizations use this
approach - it’s too time consuming and
expensive


First, the organization develops a test and determines whether it is
really measuring what it is supposed to be measuring
Then, they determine whether the test is job related
SO27: Validity generalization, what it is
49


Validity generalization is considered to be a form of
criterion-related validity, but you don’t have to conduct a
“local” validity study, that is, you don’t have to conduct a
Rather you take validity data from other organizations for
the same or very similar positions and use those data to
justify the use of the selection test(s)

Common jobs: computer programmers and systems analysts, set-up
mechanics, clerk typists, sales representative, etc.
(I am skipping to SO27 for the moment, SOs24-26 relate to statistical concepts about correlation; organization of this chapter
Is just awkward. I want to present all of the validity procedures together, and then compare them with respect to when you
should/can use one or the other. Then, I’ll return to SOs 24-26: cont on next slide)
SO27: Validity generalization, what it is
50


Assumption is that those data will generalize to your position
and organization
Thus, you can use this approach if you have a very small
number of employees and/or applicants*
*Note this point well
SO28: Validity generalization, cont.
51

Testing experts completely accept the legitimacy of validity
generalization




Primarily based on the stellar work of Schmidt and Hunter (who was
a professor at MSU until he retired)
Gatewood, Feild, &amp; Barrick believe this has a bright future
Frank Landy (also a legend in traditional I/O) is more pessimistic
Wording of the CRA of 1991 may have made this illegal


There has not been a test case
No one wants to be the test case (you should not be the test case)
(this slide, NFE, cont. on nxt slide)
SO28: Validity generalization, cont.
52

Actually have come full circle with respect to validity
generalization and its acceptance by testing specialists

In the early days of testing, validity generalization was accepted



If a test was valid for a particular job in one organization it would be
valid for the same or a similar position in another organization
It then fell into disfavor, with testing specialists reversing their
position, and adhering to situational specificity
Now, based on Schmidt and Hunter’s work, it is again embraced by
testing specialists
(this slide, also NFE)
SO29 FE: Two reasons why CRA 1991
may make validity generalization illegal
53
Both reasons relate to the wording in the CRA that the only
acceptable criterion measure (job performance measure) is
actual job performance

1.
2.
Criterion-related validity studies have often included the use of
personnel data such as absenteeism, turnover, accident rates, training
data, etc. as the criterion or in multiple regression/correlation studies as
one or more of the criteria – this may not be considered job
performance under CRA 1991
If courts interpret “actual” in actual job performance literally, then the
courts could maintain that only the performance of the particular
organization’s workers would be an acceptable criterion measure. That
is, courts could require local validity studies maintaining that the
performance criteria data from other organizations is not the “actual”
performance of employees in a particular organization.
SO31: Interesting fact (and for the exam)
54

In a 1983 random survey of 1,000 organizations listed in
Dun’s Business Rankings with 200 or more employees, the
percentage of firms indicating that they had conducted
validation studies of their selection measures was:
24%
In today’s legal environment, the other organizations
could find themselves in a whole world of hurt!
(granted, old data and I couldn’t find any new data; click, click!)
Factors that affect the type of validity
study: When to use which validity strategy
55

Four main factors that influence the type of
validity study you can do
 Sample
size
 Cut-off score procedures
 Type of attribute measured: observable or not
 Type of test: write or off-the-shelf
(on the exam, I am likely to give you situations and ask you, given the situation, what type of validity strategy could you use and why:
That is, what options do you have? That’s exactly the type of decision you are going to have to make in organizations. So, to make it
Factors that affect the type of validity
study: When to use which validity strategy
56

Sample size
Large # employees
Concurrent
(all forms, OK)
Predictive
Content
Validity generalization
Small # employees
Content
Validity generalization
(it’s OK to use content and validity gen with large sample sizes; many orgs do use content!)
Factors that affect the type of validity
study: When to use which validity strategy
57

Cut-off score procedures
Minimum (pass/fail)
Concurrent
(all forms, OK)
Predictive
Content
Validity generalization
Ranking or banding
(only criterion-relatedall but content)
Concurrent
Predictive
Validity generalization
(validity generalization is based on correlation, even if you don’t do the study yourself, so remember it is considered a type
Of criterion-related study)
Factors that affect the type of validity
study: When to use which validity strategy
58

Attribute being measured
Observable
Concurrent
(all forms, OK)
Predictive
Content
Validity generalization
Not observable
(only criterion-relatedall but content)
Concurrent
Predictive
Validity generalization
(personality, extraversion, social sensitivity, flexibility, integrity, etc.)
Factors that affect the type of validity
study: When to use which validity strategy
59

Type of test
Write/construct
Concurrent
(all forms, OK)
Predictive
Content
Validity generalization
Off-the-shelf
(only criterion-relatedall but content)
Concurrent
Predictive
Validity generalization
(next slide, back to SO 24; interpretation of validity correlation)
SO24: Statistical interpretation of a validity
coefficient
60



Recall, r = correlation coefficient
r2 = coefficient of determination
Coefficient of determination:
The percentage of variance on the criterion that can be explained by
the variance associated with the test

r = .50, to statistically interpret it:



r2 = .25
25% of the variance on job performance can be explained by the
variance on the test
Less technical, but OK
25% of the differences between individuals on the job performance
measure can be accounted for by differences in their test scores
(back to stats: SO24&amp;25)
SO25: Validity vs. reliability correlations
61

You interpret a validity correlation coefficient
very differently than a reliability correlation
coefficient
 You
square a validity correlation coefficient
 You do NOT square a reliability correlation
coefficient

Why?
With a reliability correlation coefficient you are basically
correlating a measure with itself
 Test-retest
reliability
 Parallel or alternate form reliability
 Internal consistency reliability (split half)
(I am not going to go into the math on that to prove that to you)
SO25B: Validity vs. reliability correlations,
examples for test
62



You correlate the test scores from a mechanical ability test
with a measure of job performance
The resulting correlation coefficient is .40
How would you statistically interpret that?
16% of the differences in the job performance of
individuals can be accounted for by the differences
in their test scores
(note carefully, you do not multiply it by two, you square it!)
SO25B: Validity vs. reliability correlations,
examples for test
63



You administer a computer programming test to a group of
individuals, wait 3 months and administer the same test to
the same group of individuals.
The resulting correlation coefficient is .90
How do you statistically interpret that correlation
coefficient?
90% of the differences in the test scores between
individuals are due to true differences in computer
programming and 10% of the differences are due
to error
Different types of correlation coefficients: or why it is a
good idea to take Huitema’s correlation and regression
64


The most common type of correlation to use is the
Pearson product moment correlation
However, you can only use this type of correlation if
 You
have two continuous variables, e.g., a range of
scores on both x and y
 If the relationship between the two variables is linear
 Some
have shown a curvilinear relationship between
intelligence test scores and performance of sales
representatives
(NFE, I think)
Different types of correlation coefficients: or why it is a
good idea to take Huitema’s correlation and regression
65

Point biserial coefficient is used when one variable is continuous
and the other is dichotomous




High school diploma vs. no high school diploma (X)
Number of minutes it takes a set-up mechanic to set up a manufacturing
line (Y)
x is dichotomous, y is continuous
Phi coefficient is used when both variables are dichotomous



High school diploma or no high school diploma (X)
Pass or fail performance measure (Y)
Both x and y are dichotomous
(NFE, I think, one more slide on this)
Different types of correlation coefficients: or why it is a
good idea to take Huitema’s correlation and regression
66

Rho coefficient - Spearman’s rank order correlation - when
you rank order both x and y, and then correlate the ranks



Rank order in test scores
Rank order number of minutes it takes set-up mechanics to set up a
manufacturing line
Use rank order when either your x or y scores are not normally
distributed - that is, when there are a few outliers - either very high
scores on either or very low scores on either
(NFE, I think,last slide)
END OF UNIT 5
Questions?
67
NFE: Back to construct validity
68




Construct validity:
Does the test actually measure the “construct” you think it is
measuring?
This is a hold-over from the more traditional cognitive
psychology and psychometrics field that philosophically
believes in mind-body dualism (mentalism)
That is, there really is something called “general
intelligence” that is more than just the sum of what you ask
on an exam and it is different than a behavioral repertoire
One of the reasons I like this text so much is that it is clear
that the authors are not from this old school

This will become more obvious when you read the material related
to ability testing
NFE: Back to construct validity
69

But, back to the question you are asking with construct
validity:
Does the test actually measure the “construct” you think it is
measuring?
 Is your measure of extroversion really measuring
extroversion?
 Is your measure of creativity really measuring
creativity?
Is your measure of ability to work with others
(agreeableness) really measuring the ability to work with
others?
NFE: Construct validity, cont.
70




You construct a test
You correlate your test with other tests that supposedly
measure the same thing (or a very similar construct) and
other measures that might get at that construct
Correlations are not going to be perfect because your
measure is not measuring exactly the same thing as those
other measures, but should be reasonably correlated with
those measures
Continue to do that until you have pretty good evidence
that your test is indeed measuring what it is supposed to be
measuring
NFE: Construct validity, cont.
71





But notice, for validation purposes, you are NOT done yet
You have evidence that the test is supposedly measuring
what you say it is, but
You still need to conduct a criterion-related validity study to
determine whether the test is related to the job
Thus, you end up doing a lot of time-consuming work
The ONLY reason you would do this was if you could not
locate a test that measured what you want and had to
create your own (not likely, by the way)
```