CEA Assessment Tool

advertisement
Session 2
Classical Test Theory
Martin Walker
Normal distribution
Mean
X
True Score Model
Premise
There is a true score for the knowledge/skills in any given
area
e.g. your own knowledge of French or maths
Domain
Trait
Knowledge of Maths
Algebra
Geometry
Calculus
How many items?
Test
The greater the number of items
the more reliable the test
The greater the number of items the more reliable the test
BUT
Human beings will be taking the test
There is a limit to how many items a test taker can tolerate
This tension is always present when designing a test.
How do you address this tension in your work?
How do you address this tension in your work?
Is there a common approach?
Is there a shared and understood policy?
Do you feel there is a clear rationale behind assessment?
How do you deal with consistency?
Do you have processes to quality assure assessment?
(Do you train, standardise and moderate?)
Testing
The scale
must be
proportionate
Scale
Each item
measures a
specific
element
Test is a sample of
items we need in order
to test the trait
Test
Domain
Trait
Other skills (e.g.
reading skill)
may influence
the test score
Measurement Error
Degrees of Precision
Measurement Error
Measurement Error
Measurement Error
Measurement Error
Measurement Error
Measurement Error
Measurement Error
Types of error
Systematic error; Validity
Measurement error
Random error (inconsistency); Reliability
Sampling error
Random Error
Systematic Error
Validity
Assessing what you set out to assess
The driving test
Maths questions
Reliability
Consistency of a test
The test gives the same result each time it is
used (repeatable)
Reliability measures
• Inter-Rater Reliability
Percent of agreement between raters
• Test-Retest Reliability
Correlation between two scores
• Parallel-Forms Reliability
Correlation between to parallel forms of a test
• Internal Consistency Reliability
Average inter-item correlation
Average Item-total correlation
Cronbach’s Alpha
Measurement Error
• T - True value
• X - Measured value
• E – Error
Repeated
tests
– Systematic
– Random
True score model:
X=T+E
Average of random errors is zero
Systematic and Random Errors
High reliability
Low validity
Low reliability
High validity
Low reliability
Low validity
High reliability
High validity
Task
Describe one assessment that you carry out.
Identify
One element that could have a negative effect on validity
One element that could have a negative effect on reliability
Suggest one measure that could improve validity
Suggest one measure that could improve reliability
Download