Session 2 Classical Test Theory Martin Walker Normal distribution Mean X True Score Model Premise There is a true score for the knowledge/skills in any given area e.g. your own knowledge of French or maths Domain Trait Knowledge of Maths Algebra Geometry Calculus How many items? Test The greater the number of items the more reliable the test The greater the number of items the more reliable the test BUT Human beings will be taking the test There is a limit to how many items a test taker can tolerate This tension is always present when designing a test. How do you address this tension in your work? How do you address this tension in your work? Is there a common approach? Is there a shared and understood policy? Do you feel there is a clear rationale behind assessment? How do you deal with consistency? Do you have processes to quality assure assessment? (Do you train, standardise and moderate?) Testing The scale must be proportionate Scale Each item measures a specific element Test is a sample of items we need in order to test the trait Test Domain Trait Other skills (e.g. reading skill) may influence the test score Measurement Error Degrees of Precision Measurement Error Measurement Error Measurement Error Measurement Error Measurement Error Measurement Error Measurement Error Types of error Systematic error; Validity Measurement error Random error (inconsistency); Reliability Sampling error Random Error Systematic Error Validity Assessing what you set out to assess The driving test Maths questions Reliability Consistency of a test The test gives the same result each time it is used (repeatable) Reliability measures • Inter-Rater Reliability Percent of agreement between raters • Test-Retest Reliability Correlation between two scores • Parallel-Forms Reliability Correlation between to parallel forms of a test • Internal Consistency Reliability Average inter-item correlation Average Item-total correlation Cronbach’s Alpha Measurement Error • T - True value • X - Measured value • E – Error Repeated tests – Systematic – Random True score model: X=T+E Average of random errors is zero Systematic and Random Errors High reliability Low validity Low reliability High validity Low reliability Low validity High reliability High validity Task Describe one assessment that you carry out. Identify One element that could have a negative effect on validity One element that could have a negative effect on reliability Suggest one measure that could improve validity Suggest one measure that could improve reliability