CHAPTER 14 ITEM ANALYSIS

advertisement
UNIT IV
ITEM ANALYSIS IN TEST DEVELOPMENT
CHAP 14: ITEM ANALYSIS
CHAP 15: INTRODUCTION
TO ITEM RESPONSE
THEORY
CHAP 16: DETECTING
ITEM BIAS
1
CHAPTER 14 ITEM ANALYSIS
*The goal of test construction is to create a
test with minimum length and good
reliability and validity.
*Item Analysis is the computation and
examination of any statistical property of an
item response distribution.
*Item Analysis is a process that we go
through when constructing a new test or
subtests from a pool of items with good
reliability and validity.
2
CHAPTER 14 ITEM ANALYSIS
*Categories
of Item Parameter
*Item parameters fall into 3 categories or
indices.
1. Indices that describe the distribution of
responses to a single item (e. g. mean and
variance of item responses).
2. Indices that describe the degree of
relationship between the response to the
item and some criterion of interest.
Ex. next
3
CHAPTER 14  ITEM ANALYSIS
Ex.
The relationship between the questions
(items) and the criterion of interest i.e.,
depression in Factor Analysis.
3. Indices that are a function of both, meaning
relationship to item variance/mean and a criterion
of interest.
Ex. First, find the variance/mean for your items
then, calculate the relationship between these
items variance and the criterion of interest (i.e.,
depression) for two groups..
4
ITEM DIFFICULTIES
(P)
It is one of the 7 steps in
Item Analysis.
We use Item difficulties
to select the best items.
5
ITEM DIFFICULTIES (P)
P= f/N or Number of examinees who
answered an item correctly / Total
number of participants
(See your
midterm item analysis and Chap 5).
The higher the P value the easier the
item
6
7
CHAPTER 14 ITEM ANALYSIS
 *Steps
in Item Analysis
In a typical item analysis
the test developer will take
7 steps (they are similar to
the process of test
construction in Chapter 4).
Next Slide
8
FYI PROCESS OF TEST CONSTRUCTION CHAP IV
1-Identifying purposes of test scores
use
2-Identifying behaviors to represent
the construct
3- Preparing test specification i.e.,
Bloom Taxonomy
4- Item construction
5- Item Review
9
PROCESS OF TEST CONSTRUCTION
6-
Preliminary item tryouts
7- Field test
8- Statistical Analysis
9- Reliability and Validity
10- Guidelines
10
7 STEPS IN ITEM ANALYSIS (P)
1. Describe what proportions of the
test score are of greatest important.
Ex. when I select questions for your
midterm/final exam I look for the
similarities of the questions with those
of qualifying/comprehensive or EPPP
exams.
11
7 STEPS IN ITEM ANALYSIS (P)
2. Identify the item parameters (e.g.
mean, variance) most relevant to these
proportions.
3. Administer the items to a sample
of examinees representative of those
for whom the test is intended.
Ex. IQ test for children or
depression test for adults.
12
7 STEPS IN ITEM ANALYSIS (P)
4. Estimate for each item the
parameters identified in step 2 i.e.,
variance).
5. Establish a plan for item
selection.
Ex. Using item difficulties (P) as
in Item Analysis to select the
items.
13
7 STEPS IN ITEM ANALYSIS (P)
6. Select the final subset of items, or use
the data (Items in your Item Analysis) for
test revision.
Ex. Takeout all questions with very
high or very low item difficulties.
7. Conduct a cross validation (validity)
study.
Ex. Use SPSS and compare the results of 2
tests or 2 classes (e. g. this year class and
last year class). i.e., Confirmatory Factor
Analysis.
14
UNIT V
TEST SCORING AND INTERPRETATION
CHAP 17: CORRECTING FOR GUESSING
AND OTHER SCORING METHODS
CHAP 18: SETTING STANDARDS
CHAP 19: NORMS AND STANDARD
SCORES
CHAP 20: EQUATINGSCORESFROM
DIFFERENT
TESTS
15
CHAPT 19
NORMS AND STANDARDS
SCORES
16
NORMS AND STANDARD SCORES
1895
*Alfred Binet (1910)
Ratio IQ = Ratio of MA/CA
1912
In 1912 in Germany Wilhelm Stern
proposed the following
formula: IQ = [Mental age/Chronological age]100
standardized it.
This formula works
fairly well for children
but not for adults. *The abbreviation "IQ" was
coined by the psychologist William Stern for the
German term Intelligenz-quotient
Ratio IQ
NORMS AND STANDARD SCORES
1916
*3. Lewis Terman from
Stanford University, publishes
the Stanford-Binet
Intelligence Test.
He used the standardized version
IQ = [Mental age/Chronological age]100
NORMS AND STANDARD SCORES
*Deviation
IQ = Uses Norms to
estimate the IQ
We use Norms when we want to
compare an examinee’s score (raw
score) or score on a test to the
distribution of scores (scaled or
standard scores) for a sample from
a well-defined population. Ex. next
20
NORMS AND STANDARD SCORES
Ex. When we want to estimate the IQ
of a 20 year-old persons, We compare
their raw score on the subtest of an
IQ test with the people of their age,
which is “their norm” (standard
score). Using this technique tells us
where they stand among the people of
their age.
21
*9 BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432)
1.
Identify the population of interest
Ex. Students, employees of a company,
inmates, patients, etc.
2. Identify the most critical statistics that will
be computed for the sample data.
Ex. Standard deviation σ, σ² , M, SS, p
22
NORMS AND STANDARD SCORES
*9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432)
3.
Decide on the tolerable amount of
sampling error
That is the discrepancy between the
sample statistic (M) and population
parameter, (µ) (Central Tendency M=µ).
The Central Limit Theorem has 3
characteristics;
1. Central Tendency 2.The Shape of the
Distribution (normal) and 3. Variability or
Standard Error of Mean (σm). M-µ
23
9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432)
4. Device a procedure for drawing a sample from the
population of interest.
There are 4 types of probability sampling
I Simple Random Sampling
Give everyone in the population an equal chance to
be selected Ex. Draw names from a hat.
II Systemic Sampling N/n
Select every Kth name on the list. Ex. CAU Pop
N=1500 and your sample size n=150
N/n=1500/150=10 Select every 10th student. 24
9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432)
SAMPLING CONT..
III Stratified Sampling “Strata” means
different layers. We use Stratified
Sampling when we want to compare 2
different groups (e.g. Males and females
CAU Doctoral Students).
First we randomly select males then,
randomly select females.
25
9BASIC STEPS IN CONDUCTING A NORMING STUDY(P.432)
SAMPLING CONT..
IV Cluster Sampling We use Cluster
sampling when the population consists of
units not individuals, such as classes. Ex.
Miami Dade School Districts. If we want
to conduct a research with the Miami
Dade 2nd graders (1000- 2nd grade classes).
We’ll randomly select about 10 of these
1000- 2nd grade classes to be in our
sample, then we conduct research.
26
9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432)
5.Estimate
the minimum sample size (n) required to
hold the sampling error within the specific limits.
There are different statistical procedures to
estimate the (n). (n) should be ≥30. (Law of large
number).
1. n= (σ/d)²
d=effect size d=M-µ/σ
2. n= (σ/σm) ²
σm= σ/√n Standard error of mean for pop Ex.
Z score
27
Sm=S/√n Estimated Standard Error of the Mean
for a sample. Ex. t-distribution
NORMS AND STANDARD SCORES
28
THE EFFECT SIZE
EX. TWO INDEPENDENT T-TEST
29
NORMS AND STANDARD SCORES
30
9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432)
6.
Draw the Sample and collect the Data
7. Compute the Values of the Group
Statistics of interest and their standard
error. Sm=S/√n or σm = σ/√n
Calculate the standard error of
measurement, which is the difference
between M and µ. Also known as
sampling error.
31
9BASIC STEPS IN CONDUCTING A NORMING STUDY (P.432)
8.
Identify the Types of Normative
Scores that will be needed, and
prepare the Normative Score
Conversion table (see next 2 slide).
9. Prepare written documentation of
the Normative Scores.
32
NORMS AND STANDARD SCORES
Types
of Normative Scores
Raw Score Score on a subtest or a
test.
Scaled Score Normative score for
specific age.
33
NORMATIVE SCORES
34
Wex-ler
*NORMATIVE SCORES
35
NORMS AND STANDARD SCORES
*Usefulness of Scaled Scores
Scaled Scores are useful for two purpose:
1. Scaled scores relate the examinee’s
performance to percentile rank scores of the
norm group and their grade level.
2. In evaluation and research the mean scaled
score is a better estimation of average group
performance than the mean raw score.

36
37
43
Download