Assessing the Assessment

advertisement
Assessing the Assessment
Reliability. Am I measuring something?
Test-retest
Interobserver agreement
Parallel forms
Split-half (internal consistency)
Validity. Am I measuring what I think I am measuring?
Content
Criterion
Construct
Reliability is a necessary prerequisite for validity.
Re liability 
TrueScoreVariability
TrueScore  ErrorVaria bility
Reliability
Reliability refers to the consistency of a measure. Across
Time
Versions
Raters
And so on
A reliable test has little measurement error.
Observed Score = True Score + Error
Reliability
 True
score – true or perfectly
accurate
 E.g.
the time
 Often a fictional mark in psychology
 Based on multiple measurements
 Aggregation = averaging a number of
imprecise measurements to increase
reliability
Re liability 
TrueScoreVariability
TrueScore  ErrorVaria bility
Reliability
Test-retest
Administer same measure at two points in time
Interobserver agreement
Multiple observers/judges/raters/scorers rate same target
Parallel forms
Compare alternate forms of same test
Split-half reliability
Split test into two halves and compare scores across halves
Coefficient alpha: average of all possible split-half
reliabilities
Re liability 
TrueScoreVariability
TrueScore  ErrorVaria bility
Validity
Is the test measuring what I think it is?
This requires empirical demonstration
There are three types of validity
Content Validity
Criterion Validity
Construct Validity
Re liability 
TrueScoreVariability
TrueScore  ErrorVaria bility
Validity
Content Validity
A test has content validity if it adequately covers the area
of content it is supposed to cover.
Difficult to examine statistically
Content validity typically must be built in at beginning
Course exams are the best examples
Re liability 
TrueScoreVariability
TrueScore  ErrorVaria bility
Validity
Criterion Validity
For criterion validity, tests are evaluated against some
criterion
Often called predictive validity
Most at issue for tests employed to make decisions
Selection of students
Parole decisions
Jobs
Criterion Validity - Concurrent



Concurrent validity: does my measure
correlate highly with an established
measure?
Can my measurement instrument predict
a criterion that occurs at the same point in
time?
Can my measure (i.e. my
operationalization) distinguish between
two groups that it should be able to
distinguish between?
Criterion Validity - Predictive
 Can
my measure predict future
behavior?
– If yes, has predictive validity (a type of
criterion validity)
Predictive Validity of the GRE
Kuncel, N.R., Hezlett, S.A., & Ones, D.S. (2001). A comprehensive meta-analysis of
the predictive validity of the Graduate Record Examinations: Implications for
graduate school student selection and performance. Psychological Bulletin, 127, 162181.
Graduate Record Examination
Originally designed to measure “basic developed abilities relevant
to performance in graduate studies”
Verbal measure: analogy, antonym, sentence completion,
reading comprehension
Quantitative measure: quantitative, quantitative comparison,
data interpretation
Analytic measure: analytical and logical reasoning
Subject test: acquired knowledge in particular area
Used often and heavily in decisions about admissions
Predictive validity of GRE



Want to establish predictive validity of GRE
What will my criterion of graduate school
performance be?
Use several indicators of “performance”:
–
–
–
–
–
–
Graduate GPA
1st year graduate GPA
Comprehensive exam scores
Publication citation counts
Faculty ratings
(these are the criteria)
Predictive Validity of the GRE
Predictive Validity of the GRE
Predictive Validity of the GRE
Summary



All areas of GRE were found to be valid
predictors of GGPA, 1st year GGPA, faculty
ratings, and comprehensive exam scores.
GRE subject tests were consistently better
predictors of the criteria than quantitative
or verbal tests;
also better than UGPA
Construct Validity



Most important type of validity
“If this were a measure of …, what would
it look like?”
Depends heavily on theory:
 How
is this construct related to other constructs?
 Requires broad thinking
 In validating my construct, I am validating my theory
Steps to establish construct validity
1.
Need to establish convergent correlations

2.
Need to establish divergent correlations

3.
measures of constructs that theoretically should be
related to each other are, in fact, observed to be related
to each other (that is, you should be able to show a
correspondence or convergence between similar
constructs)
measures of constructs that theoretically should not be
related to each other are, in fact, observed to not be
related to each other (that is, you should be able to
discriminate between dissimilar constructs)
Build nomological net
Convergent validity


Measures that
should be related
are related
These 4 items are
converging on the
same thing (don’t
know for sure that
it is “self-esteem”
yet
Divergent Validity
Self-esteem
measures do not
correlate with locus
of control
measures
 These measure
seem to be tapping
different things

Establishing convergent and
divergent validity
Nomological Network


Must develop a “lawful
network” for your
measure in order to
establish construct
validity.
Includes
– Theoretical framework
– Empirical framework
– Observables
Childhood Psychopathy Scale
Lynam, D.R. (1997). Pursuing the psychopath: Capturing the fledgling psychopath in
a nomological net. Journal of Abnormal Psychology, 106, 425-438.
“The construct of psychopathy and attendant personality information
might profitably be used at the childhood level to identify a more
homogeneous group of antisocial children.”
Psychopathy



The [psychopath] is unfamiliar with the primary facts or data of
what might be called personal values and is altogether incapable
of understanding such matters.
It is impossible for him to take even a slight interest in the
tragedy or joy or the striving of humanity as presented in serious
literature or art. He is also indifferent to all these matters in life
itself. Beauty and ugliness, except in a very superficial sense,
goodness, evil, love, horror, and humour have no actual meaning,
no power to move him.
He is, furthermore, lacking in the ability to see that others are
moved. It is as though he were colour-blind, despite his sharp
intelligence, to this aspect of human existence. It cannot be
explained to him because there is nothing in his orbit of
awareness that can bridge the gap with comparison. He can
repeat the words and say glibly that he understands, and there is
no way for him to realize that he does not understand (Cleckley,
1941, p. 90 quoted in Hare, 1993, pp. 27-28).
• Developed Child Psychopathy Scale
• Principles of rational scale construction
• Working from Psychopathy Checklist (PCL-R),
identified mother-reported items that assessed
PCL-R constructs

Operationalized 13 of the 20 PCL-R constructs at
3- to 4-item scales
– glibness, untruthfulness, manipulation, lack of guilt,
poverty of affect, callousness, parasitic lifestyle,
behavioral dyscontrol, lack of planning, impulsiveness,
unreliability, failure to accept responsibility, criminal
versatility
Items on the CPS
Construct Validity of the CPS
If the CPS is truly assessing psychopathy, scores on the CPS
should be positively related to serious delinquency
Construct Validity of the CPS
If the CPS is truly assessing psychopathy, scores on the CPS
should be positively related to stable delinquency
Construct Validity of the CPS
If the CPS is truly assessing psychopathy, scores on the CPS
should be positively related to impulsivity
Construct Validity of the CPS
If the CPS is assessing psychopathy, scores on the CPS should be positively
related to externalizing problems and negatively related to internalizing
problems
Construct Validity of the CPS
If the CPS is assessing psychopathy, scores on the CPS should predict
delinquency above and beyond other well known predictors
Download