Psychometrics 10..

advertisement
Psychometrics 101: Foundational
Knowledge for Testing Professionals
Steve Saladin, Ph.D.
University of Idaho
Criterion-referenced vs normreferenced
• Is performance rated on some pre-established cut
points or is it based on comparisons with others
 Class room grading is generally criterion based
• 90% right=A, 80%=B, 70%=C, etc.
• Typically reported as a percentage correct or P/F
 Grading on the curve means grade based on
comparison with rest of class (norm-referenced)
• 80% might be a B, an A, a C or something else.
Criterion-referenced vs normreferenced
• Standardized tests are typically norm-referenced
 SAT, ACT, GRE, IQ test
 Typically reported as percentile or standard score
• Certification exams are often criterion-referenced
 Proctor certification, licensing exams
 Typically reported as percentage correct or P/F
• Sometimes you get a mix
 GED uses norms to establish cut-scores
• Important to note difference between percentile
and percentage correct
Damn the Statistics & full speed
ahead!
• Testing is all about quantifying something about
people (skills, knowledge, behavior, etc.)
• Stats are just a way to describe the numbers
 Make it more understandable
 Reveal relationships
• To understand norm-referenced test scores,
you need to know two general things
 What is the typical score?
 To what degree did others score differently?
What’s typical?
10 10 20 20 30 30 40 40 40 40 50
• Mean = arithmetic average = 30
• Median = # in the middle = 30
• Mode = most frequently occurring # = 40
How different are the scores?
• Range = highest – lowest = 40
• Variance = average of squared differences from
mean = 163.6
• Standard Deviation = square root of Variance =
12.8
Standard Normal Distribution
• Normal Curve
• Assumes trait is normally distributed in
population
Mean
Standard deviation
The Normal Curve
%tile
GRE
SAT
IQ
ACT
<1%
200
200
55
1
2.5%
300
300
70
6
16%
400
400
85
12
50%
500
500
100
18
84%
600
600
115
24
97.5%
700
700
130
30
99.5%
800
800
145
36
How are these things related?
 GRE scores and Grad School grades
 CLEP scores and final exam scores
 Compass/Accuplacer scores and success in
entry classes
 Motivation and cheating
• Correlation tells us if things vary or change in a
related way
 Higher GRE scores means higher grades
 Lower motivation suggests higher levels of
cheating
Some Facts About Correlation
• Ranges from +1.0 to -1.0
• Sign tells you direction of correlation
 + as A gets bigger so does B
 - as A gets bigger, B gets smaller
How To Lie With Statistics!
• Test Taking linked
to Longevity! A recent
study found that people who had
taken more tests during early
adulthood tended to live longer.
The number of tests taken between
the ages of 16 and 30 correlated
strongly with the age of death. The
more tests you take, the longer you
will live!
Some Facts About Correlation
• It is not causation, but can be used to predict
• Small samples may miss relationship
• Heterogeneous samples may miss relationship
0.42
0.87
0.78
Error, Error Everywhere
• No test is perfect, no measurement is perfect
________
• Get more precise, but never get exact
• Score = Truth + Error
Error, Error Everywhere
• Error can be lots of things including
 The environment
 The test-taker
 Procedural variations
 The test itself
• Since error makes scores inconsistent or
unreliable, a measure of reliability of scores is
important
Reliability
• Test-Retest
 Test group on two different occasions and
correlate the results
 Are results stable over time
• Internal Consistency
 Correlate score on each item to total
 Are they all measuring the same thing
• Alternate Forms
 Develop two versions of same test and correlate
scores on each
 Are your versions comparable
• All correlations so subject to same problems
So what’s good?
• GRE has reported reliability of 0.89
(Quantitative), 0.92 (Verbal)
 GRE Guide to Use of Scores, 2007-2008
• ACT Technical Manual reports Composite
score reliability of .97
• SAT reports reliabilities of .89-.93
 Test Caharacteristics of the SAT on
http://professionals.collegeboard.com/data-reports-research/sat/data-tables
• COMPASS alternate forms reliability reported to
be .73-.90
 http://www.nationalcommissiononadultliteracy.org/content/assessmentmellar
d.pdf
Reliability & Error
• Can’t totally get rid of Error, but can estimate
how much is there
• Using reliability you can estimate how much a
persons score would vary due to error.
• Standard Error of the Measurement
SEM =SD * 1 − 𝑟
 an index of the extent to which an individual’s
scores vary over multiple administrations
 gives the range within which the true score is
likely to exist
SEM for some tests
• GRE Verbal .34, Quantitative .51, so 68%
confidence interval for score of 500 is 470-530
for Verbal, 450-550 for Quantitative
 Only reported in increments of 10
 GRE Guide to Use of Scores, 2007-2008
• ACT Composite SEM .91, so 68% confidence
interval for score of 20 is 19-21
 ACT Technical Manual
• WAIS-IV FSIQ SEM is 2.16, so 68 %
confidence interval for score of 100 is 98-102
Does Reliability = Validity?
• Getting a consistent result means reliability
NO !
• Having that result be meaningful is validity
• Validity is based on inferences you make from
results
 Test has to be reliable to be valid
 Test does not have to be valid to be reliable
Validity
• Any evidence that a test measures what it says
it is measuring
• Any evidence that inferences made from the
test are useful and meaningful
• 3 types of evidence
 Content
 Criterion-Related
 Construct
Content Validity
• Think of a test as a sample of possible
problems/items
 4th grade spelling test should be a representative
sample of 4th grade spelling words
 GRE Quantitative should be a representative
sample of the math problems a grad school
applicant might be expected to solve
• Should be part of design
 Identifying # of algebra, trig, calculus, etc. should
be on test (table of specifications)
• Frequently evaluated by item analysis or expert
opinions
Criterion-Related Validity
• How does test score correlate with some
external measure (criterion)
 Placement test score and performance in class
 Admission test score and GPA for first semester
• Sometimes called Predictive or Concurrent
Validity
• Correlation that is effected by error in the test
and error in the criterion
 Only top students take GRE
 Graduate School grade restriction
To use or not to use….
• Depends on the question….
 What is impact of decision?
 What is cost of using? Of not using?
• Decision Theory can be a guide to determining
incremental validity
 Net gain in using scores
Decision Theory
Maximize success
False negative
True
positive
C True negative
False
Positive
A
G
PB
A
200
400
600
GRE score
800
Decision Theory
Maximize opportunity
False negative
True positive
True negative
False Positive
A
G
B
P
AC
200
400
GRE score
600
800
Predictive Utility
• Effectiveness =
True Positive + True Negative
True Pos+False Pos+True Neg+False Neg
Have to weigh effectiveness against cost
Construct Validity
• Most important for psychological test where
what you are measuring is abstract or
theoretical
 Intelligence
 Personality characteristics
 Attitudes and beliefs
• Usually involves multiple pieces of evidence
Construct Validity
• Convergent—correlates with measures of same
thing
• Divergent—does not correlate with measures of
something else
• Scores show expected changes after treatment,
education, maturation, etc.
• Factor analysis supports expected factor
structure
Things to remember
• The normal curve
• Correlation
• Reliability
• Standard Error of the Measurement
• Validity
• Decision Theory
Download