What is Test Theory?

advertisement
*What is Test Theory?
 The study of measurement
problems, influence of these
measurement problems on
psychological inventories,
and how to create methods
to minimize these problems
1
UNIT I
INTRODUCTION TO
MEASUREMENT THEORY
CHAP 1: WHAT IS TEST THEORY
CHAP 2: STATISTICAL CONCEPTS FOR TEST
THEORY
CHAP 3: INTRODUCTION TO SCALLING
CHAP 4: PROCESS OF TEST CONSTRUCTION
CHAPTER 5: TEST SCORES AS COMPOSITES
2
UNIT II
RELIABILITY
CHAP 6: RELIABILITY AND THE CLASSICAL
TRUE SCORE MODEL
CHAP 7: PROCEDURES FOR ESTIMATING
RELIABILITY
CHAP 8: INTRODUCTION TO
GENERALIZABILITY THEORY
CHAP 9: RELIABILITY COEFFICIENTS FOR
CRITERION-REFERENCED TESTS
3
UNIT III
VALIDITY
CHAP 10: INTRODUCTION TO
VALIDITY
CHAP 11: STATISTICAL
PROCEDURES FOR PREDICTION
AND CLASSIFICATION
CHAP 12: BIAS IN SELECTION
CHAP 13: FACTOR ANALYSIS
4
UNIT IV
ITEM ANALYSIS IN TEST
DEVELOPMENT
CHAP 14: ITEM ANALYSIS
CHAP 15: INTRODUCTION TO
ITEM RESPONSE THEORY
CHAP 16: DETECTING ITEM
BIAS
5
UNIT V
TEST SCORING AND INTERPRETATION
CHAP 17: CORRECTING FOR
GUESSING AND OTHER SCORING
METHODS
CHAP 18: SETTING STANDARDS
CHAP 19: NORMS AND STANDARD
SCORES
CHAP 20: EQUATINGSCORESFROM
DIFFERENT TESTS
6
Introduction to Classical
and Modern Test Theory
Chapter 1
7
Historic Origins
Pioneer countries in test
theory are:
Germany, England,
France, and the United
States
8
Germany
 Wilhelm Wundt, Ernest Weber, and
Gustavo Fechner used procedures
for collection of observations in a
standard way for all subjects, such
as reading the instructions at the
top of the test page
(see next slide).
9
Germany
Cont..
 Multiple Choice
 Identify the choice that best completes
the statement or answers the question.
1. The type of sensation you experience
depends on which area of the brain is
activated. This is known as
 a. sensory localization. b.transduction.
c.sensory adaptation.d.cerebralization.
2. A hypnic jerk usually occurs during
 a.light sleep.b.deep sleep.c.episodes of
hypersomnia.d.episodes of sleep apnea.
 See p.14 Exercise 4-b
10
Germany
p.14 Exercise 4-b
 4.Consider the following testing
practices and indicate which
nineteenth-century psychological
researcher probably should be credited
with the origin?
b. A teacher about to give a test reads
aloud from the test manual: “Please
read the instructions at the top of the
page silently while I read them
aloud…..” (see previous slide)
England
 Karl Pearson-----Pearson
Correlation
 Charles Spearman----Spearman Correlation.
 Used Factor Analysis in his
“Theory of Intelligence.”
 Galton----Categorizing
half cousin to Darwin
France
Alfred Binet & Theodore
Simon (1905) Developed
the first IQ test.
IQ=MA/CAx100
MA=Mental Age
CA= Chronological Age
 *The Difference between Ratio IQ and
Deviation IQ or Normative IQ
13
United States
 James McKeen Cattell 
“Mental Testing”
 Thorndike -- An Introduction
to the Theory of Mental and
Social Measurement
 Trail and Error  A Theory of Learning
14
 Test
Key Terms
 Optimal
Performance
 Typical
Performance
 Observable
Performance
 Constructs
 Measurement
15
Key Terms
 Test:
Test is a Procedure for obtaining a
sample of an individual’s
performance.
Optimal Performance:
Refers to the performance on
Aptitude Tests (GRE,SAT,ACT), or
Achievement Tests (WRAT, WIAT)
16
Key Terms
 Typical Performance:
Refers to the performance on questioners
and inventories to report one’s feelings,
attitudes, interests, or reactions to a
situation.
 Observable Performance:
Refers to perform in an observable behavior
(watching children interacting with each
others, natural observation).
17
Key Terms

Measurement:
Quantifying an
observable
behavior or
when
quantitative
value is given
to a behavior.
See Exercise 1 & 2 on P.14
18
19
20
Heavy drinkers die at a younger age
21
Confounding Variables
 Confounding variables are variables
that the researcher failed to control, or
eliminate, damaging the internal
validity of an experiment. Also, known
as a third variable or a mediator
variable, can adversely affect the
relation between the independent
variable and dependent variable.
 Ex. Next
22
 Ex. A
research group might
design a study to determine if
heavy drinkers die at a younger
age. Heavy drinkers may be
more likely to smoke, or eat junk
food, all of which could be
factors in reducing longevity. A
third variable may have
adversely influenced the results.
23
Intervening Variables
 A variable that explains a
relation or provides a causal
link between other variables.
 Also called “Mediating
Variable” or “intermediary
variable.”

Ex. Next slide
24
Intervening Variables
 Ex: The statistical association between
income and longevity needs to be
explained because just having money
does not make one live longer. Other
variables intervene between money and
long life. People with high incomes tend
to have better medical care than those
with low incomes. Medical care is an
intervening variable. It mediates the
relation between income and longevity.
25
Key Terms
 Constructs:
Constructs are hypothetical
concepts or psychological
attributes/traits, such as personality,
anxiety, depression etc.
They are difficult to measure.
Constructs are not physical
attributes such as height and weight.
26
*Why do we have Measurement
Problems in Psychology??
 1.There is no single universal way of defining
psychological construct
 2. Psychological measurements are based on
samples of behavior
 3. Sampling of behavior results in errors in
measurement
 4.The units (scales) of measurements are not
well defined.
 5. The measurements must have demonstrated
relationship to other variables to have
meaning.
27
Role of Test Theory in Research &
Evaluation
 Selecting a Problem
 Operational Definitions of
Variables
 Instruments
 Accuracy of the Instruments
 Data Collection
 Use of Statistics
28
Chapter 2
Statistical Concepts
for
Test Theory
29
Population
Sample
30
Population and Sample
 Population:
 Population is the set of all
individuals of interest for a
particular study. Measurements related
to Population are PARAMETERS.
 Sample:
 Sample is a set of individuals
selected from a population.
Measurements related to sample are STATISTICS.
31
Statistics
 The people chosen for a
study are its subjects or
participants, collectively
called a sample
–The sample must be
representative
32
Statistics
Descriptive
Describes the distribution of scores
and values such as mean, median, and
mode
Inferential
Infer or draw a conclusion from a
sample.
33
Key Terms
 Constant I.e. temp in learning and hunger
 Variable
 IV  manipulate
 DV  measure
 Discrete Numbers 1, 2 , 3, 14
 Continues Numbers 1.3, 3.6
34
CONTINUOUS VERSUS
DISCRETE VARIABLES
 Discrete variables (categorical)
– Values are defined by category
boundaries
– E.g., gender
 Continuous variables
– Values can range along a
continuum
– E.g., height
35
Statistics





Scales of Measurement
Frequency Distributions and Graphs
Measures of Central Tendency
Standard Deviations and Variances
Z Score
1- Pearson
 Correlations
2- Spearman
36
Scales of Measurement (NOIR)
Nominal Scale
Qualities
Assignment of
labels
Example
Gender—
(male or
female)
Preference—
(like or dislike)
Voting
record—(for or
against)
What You
Can Say
Each
observation
belongs
in its
own
category
What You Can’t
Say
An
observation
represents
“more” or
“less” than
another
observation
37
ORDINAL SCALE
Qualities
Assignment
of values
along some
underlying
dimension
(order)
Example
Rank in
college
Order of
finishing a
race
What You
Can Say
One
observation
is ranked
above or
below
another.
What You Can’t
Say
The
amount
that one
variable is
more or
less than
another
38
INTERVAL SCALE
Qualities
Equal
distances
between
points
arbitrary
zero
Example
Number of
words spelled
correctly on
Intelligence
test scores
Temperature
What You
Can Say
What You Can’t
Say
One
score
differs
from
another
on some
measure
that has
equally
appearing
intervals
The amount
of
difference is
an exact
representation
of differences
of the
variable being
studied
39
40
RATIO SCALE
Qualities
Meaningful
and nonarbitrary
zero
Absolute
zero
Example
Age
Weight
Time?
What You
Can Say
One
value is
twice as
much as
another or
no
quantity of
that
variable
can exist
What You Can’t
Say
Not much!
41
LEVELS OF MEASUREMENT
Level of
Measurement
For Example
Quality of Level
Ratio
Rachael is 5’ 10” and Gregory is
5’ 5”
Absolute zero
Interval
Rachael is 5” taller than Gregory
An inch is an inch is an
inch
Ordinal
Rachael is taller than Gregory
Greater than
Nominal
Rachael is tall and Gregory is
short
Different from
 Variables are measured at one of these four levels
 Qualities of one level are characteristic of the next level up
 The more precise (higher) the level of measurement, the
more accurate is the measurement process
42
WHAT IS ALL THE FUSS?
 Measurement should be as precise
as possible
 In psychology, most variables are
probably measured at the nominal
or ordinal level
 But—how a variable is measured
can determine the level of
precision
43
Frequency Distributions and Graphs
44
histogram
45
Polygon
46
Frequency Distributions and Graphs
47
48
49
50
51
52
53
Platykurtic Mesokurtic, , Leptokurtic
54
Frequency Distributions
 Frequency Distributions (ƒ)
2, 4, 3, 2, 5, 3, 6, 1, 1, 3, 5, 2,
4, 2
Σƒ=N=14
Ρ=ƒ/N
P=Proportion
%=P x 100
55
Frequency Distributions
 Frequency Distributions (ƒ)
X
f
fX
Ρ=ƒ/N
%=P x 100 Cum%
6
5
4
3
2
1
1
2
2
3
4
2
6
1/14=.07
7%
56
Frequency Distribution Table
Cumulative
%
X
f
fX
P=f/n %=
px100
6
1
6
1/14=.07
7%
7%
5
2
10
2/14=.14
14%
21%
4
2
8
2/14=.14
14%
35%
How do you Calculate Cumulative Percent ?
• Add each new individual percent to the running
tally of the percentages that came before it.
• For example, if your dataset consisted of the four
numbers: 100, 200, 150, 50 then their individual
values, expressed as a percent of the total (in this
case 500), are 20%, 40%, 30% and 10%.
• The cumulative percent would be:1.Proportion 2.percentage
• 100/500=0.2x100: 20%
• 200: (i.e. 20% from the step before + 40%)= 60%
• 150: (i.e. 60% from the step before + 30%)= 90%
• 50: (i.e. 90% from the step before + 10%) = 100% 58
Frequency Distributions
 X=2, f=4, N=14
 Ρ=ƒ/N
P=4/14=.29
 %=P x 100= 29%
 X=3, f=3, N=14
 P=3/14=.21
 %= 21%
 μ=ΣƒX/Σƒ
59
Mean
Measures of Central Tendency
 Mean--------Interval or Ratio scale
– The sum of the values divided by the number of
values--often called the "average." μ=ΣX/N
– Add all of the values together. Divide by the total
number of values to obtain the mean.
– Example:
X
7
12
24
20
19
????
60
Statistics
The Mean is:
μ=ΣX/N= 82/5=16.4
(7 + 12 + 24 + 20 + 19) / 5 =
16.4.
61
Median
 Measures of Central Tendency
 Median or Middle ------Ordinal Scale
– Divides the values into two equal halves, with
half of the values being lower than the median
and half higher than the median.
 Sort the values into ascending order.
 If you have an odd number of values, the
median is the middle value.
 If you have an even number of values, the
median is the arithmetic mean (see above) of
the two middle values.
– Ex: The median of the same five numbers (7, 12,
24, 20, 19) is ???.
62
Mode
 The median is 19.
 Mode ----Nominal Scale
– The most frequently-occurring value (or
values).
 Calculate the frequencies for all of the
values in the data.
 The mode is the value (or values) with
the highest frequency.
– Example: For individuals having the
following ages -- 18, 18, 19, 20, 20, 20, 21,
and 23, the mode is ????
63
CHARACTERISTICS OF MODE
Nominal Scale
Discrete Variable
Describing Shape
64
The Range
 The Mode is 20
 The Range:
The Range is the difference between
the highest number –lowest number +1
2, 4, 7, 8, and 10 -> Discrete Numbers
2, 4.6, 7.3, 8.4, and 10 -> Continues
Numbers
The difference between the upper real
limit of the highest number and the
lower real limit of the lowest number.
Variability
66
Variability
Range, Interquartile Range, Semi-Interquartile
Range, Standard Deviation, and Variance are the
Measures of Variability
 Variability is a measure of
dispersion or spreading of
scores around the mean, and
has 2 purposes:
 1. Describes the distribution
Next slide
67
Variability
 2. How well an individual score (or
group of scores) represents the
entire distribution. i.e. in Z Score
 Ex. In inferential statistics we
collect information from a small
sample then, generalize the results
obtained from the sample to the
entire population.
Next slide
68
Variability
SS, Standard Deviations and Variances
 X
1
2
4
5
σ² = ss/N
σ = √ss/N
Pop
s² = ss/n-1 or ss/df Standard deviation
s = √ss/df
Sample
SS=Σx²-(Σx)²/N
 Computation
SS=Σ( x-μ)²
 Definition
Sum of Squared Deviation from Mean
Variance (σ²) is the Mean of Squared Deviations=MS69
 Suppose you earned a score of
 X = 54 on an exam. Which set of
parameters would give you the
highest grade?
 a. μ= 50 and σ= 2 σ²=4
 b. μ= 50 and σ= 4 σ²=16
 c. μ= 54 and σ= 2 σ²=4
 d. μ= 54 and σ= 4 σ²=16
70
 Suppose you earned a score of
 X = 46 on an exam. Which set of
parameters would give you the
highest grade?
 a. μ= 50 and σ= 2 σ²=4
 b. μ= 50 and σ= 4 σ²=16
 c. μ= 54 and σ= 2 σ²=4
 d. μ= 54 and σ= 4 σ²=16
71
Covariance
 Correlation is based on a statistic called
Covariance (Cov xy or S xy) …..
COVxy=SP/N-1
Correlation-- r=sp/√ssx.ssy
 Covariance is a number that reflects
the degree to which 2 variables vary
together.
 Original Data
X Y
8 1
1 0
3 6
0 1
72
Covariance

73
Spearman Correlation
rank order data then proceed
X
Y
1
1
2
3
3
2
4
4
74
Ranking/Monotonic Transformation








Score Rank position Final Rank
3
1
1.5
3
2
1.5
5
3
3
6
4
5
6
5
5
6
6
5
12
7
7
75
76
Z Scores
Z=x-μ/ σ
Single score
Z=M-μ/ σm  Sample Mean and
research
σ = σ/√n
m
we use Z score when σ is known.
77
Z-Scores
 X= σ(Z)+µ
 µ= X- σZ
 σ= (X-µ)/Z
 If X=60
 µ=50
 σ=5
Z=?
78
Computations/ Calculations or Collect
Data and Compute Sample Statistics
Z Score for Research M=115, n=25

79
Z Score for Research
Standard Error (σm )

80
81
82

Stanines
Stanines are used to compare an individual
student’s achievement with the results
obtained by a national reference sample
chosen to represent a certain year level i.e.
2nd level, 3rd level
 a nine-point scale used for normalized
test scores, with 1-3 below average, 4-6
average, and 7-9 above average. It is a
nine-point scale of standard score with
mean of 5 and SD of 2.
83
The Correlational Method
 Correlational data can be graphed and a
“line of best fit” can be drawn
1- Pearson
 Correlations
2-Spearman
84
The Correlational Method
 Correlation is the degree to which
events or characteristics vary from
each other
–Measures the strength of a
relationship
–Does not imply cause and effect
85
The Correlational Method
Correlational data
can be graphed
and a “line of best
fit” can be drawn
86
Positive Correlation
Positive correlation
= variables change
in the same
direction
87
Positive Correlation
88
Negative Correlation
–Negative correlation =
variables change in
the opposite direction
89
Negative Correlation
90
No Correlation
–Unrelated = No
consistent
relationship
91
No Correlation
92
The Correlational Method
 The magnitude (strength) of a
correlation is also important
–High magnitude = variables which
vary closely together; fall close to
the line of best fit
–Low magnitude = variables which do
not vary as closely together; loosely
scattered around the line of best fit
93
The Correlational Method
 Direction and magnitude of a
correlation are often calculated
statistically
–Called the “Correlation Coefficient,”
symbolized by the letter “r”
 Sign (+ or -) indicates direction
 Number (from 0.00 to 1.00) indicates magnitude
0.00 = no consistent relationship
 +1.00 = perfect positive correlation
 -1.00 = perfect negative correlation
 Most correlations found in
psychological research fall far short of
“perfect”
94
The Correlational Method
 Correlations can be trusted based on
statistical probability
– “Statistical significance” means that the
finding is unlikely to have occurred by
chance
 By convention or agreement, if there is
less than a 5% probability that findings
are due to chance or (p < 0.05), results
are considered “significant,” and
thought to reflect the larger population
–Generally, confidence increases with
the size of the sample (n) and the
magnitude of the correlation (r)
95
The Correlational Method
 Advantages of correlational studies:
– Have high external validity
 Can generalize findings
– Can repeat (replicate) studies on other
samples
 Difficulties with correlational studies:
– Lack internal validity
 Results describe but do not explain a
relationship
96
External & Internal Validity
 *External Validity
External validity addresses the ability to generalize
your study to other people and other situations.
 *Internal Validity
Internal validity addresses the "true" causes of the
outcomes that you observed in your study. Strong
internal validity means that you not only have
reliable measures of your independent and
dependent variables BUT a strong justification that
causally links your independent variables (IV) to
your dependent variables (DV).
97
The Correlational Method
Pearson
 r=sp/√ssx.ssy
 Original Data
 X Y
1 3
2 6
4 4
5 7
SP requires 2 sets of data
SS requires only one set of data
98
The Correlational Method
Spearman
 r=sp/√ssx.ssy
 Original Data  Ranks
 X Y
X
Y
1 3
1
1
2 6
2
3
4 4
3
2
5 7
4
4
SP requires 2 sets of data
SS requires only one set of data
99
Regression and Prediction
Y=bX+a
Regression Line
100
101
Three Levels of Analysis for Prediction
INPUTS
PROCESSES
OUTCOMES
Ex. Stress (INPUT) is an unpleasant psychological
(PROCESS) that occurs in response to
environmental pressures (job) and can lead to
withdrawal (OUTCOME).
1
0
2
prognosis
103
104
Download