Topics: Quality of Measurements • Reliability • Validity The Quality of Measuring Instruments: Definitions • Reliability: Consistency - the extent to which the data are consistent • Validity: Accuracy- the extent to which the instrument measures what it purports to measure Hitting the Bull’s Eye The Questions of Reliability • To what degree does a subject’s measured performance remain consistent across repeated testings? How consistently will results be reproduced if we measure the same individuals again? • What is the equivalence of results of two measurement occasions using “parallel” tests? • To what extent do the individual items that go together to make up a test or inventory consistently measure the same underlying characteristic? • How much consistency exists among the ratings provided by a group of raters? • When we have obtained a score, how precise is it? True and Error Score Parallel Tests Individuals Test1 Test 2 Test3 … 1 X11 X12 X13 2 X21 X22 3 X31 4 . . N Test N True Score Error … X1n True Score1 = Mean Score of Tests Error1 = Va riance of Test Scores X23 … X2n True Score2 = Mean Error2 = Va riance Scores of Tests of Test Scores X32 X33 … X3n True Score3 = Mean Error3 = Va riance Scores of Tests of Test Scores X41 X42 X43 … X4n True Score4 = Mean Scores of Tests XN1 XN1 XN1 … XN1 True ScoreN = Mean ErrorN = Va riance Scores of Tests of Test Scores Error4 = Va riance of Test Scores Sources of Error: Conditions of Test Administration and Construction • • • • • • • • • • • Changes in time limits Changes in directions Different scoring procedures Interrupted testing session Qualities of test administrator Time test is taken Sampling of items Ambiguity in wording of items/questions Ambiguous directions Climate of test situation (heating, light, ventilation, etc) Differences in observers Sources of Error: Conditions of the Person Taking the Test • • • • • • • • • • • Reaction to specific items Health Motivation Mood Fatigue Luck Memory and/or attention fluctuations Attitudes Test-taking skills (test-wiseness) Ability to understand instructions Anxiety Reliability • Reliability: ratio of true variance to observed variance • Reliability coefficient: a numerical index which assumes a value between 0 and +1.00 Relation between Reliability and Error Error True-Score Variability Reliable Measure (A) True-Score Error Variability Unreliable Measure (B) Methods of Estimating Reliablity • Test-Retest: Repeated measures with the same test (coefficient of stability) • Parallel Forms: Repeated measures with equivalent forms of a test (coefficient of equivalence) • Internal Consistency: Repeated measures using items on a single test • Inter-Rater: Judgments by more than one rater. Reliability Is The Consistency Of A Measurement Repeated Measurements/Observations Person X1 X2 X3 ... Xk-->infinity Charlie 20 19 21 ... 20 Harry 15 17 16 ... 16 Reliable Repeated Measurements/Observations Person X1 X2 X3 ... Xk-->infinity Charlie 20 10 8 ... 23 2 11 4 ... 15 Harry Unreliable Test-Retest Reliability • Situation: Same people taking two administrations of the same test • Procedure: Correlate scores on the two tests which yields the coefficient of stability • Meaning: the extent to which scores on a test can be generalized over different occasions (temporal stability). • Appropriate use: Information about the stability of the trait over time. Parallel (Alternate)Forms Reliability • Situation: Testing of same people on different but comparable forms of the test • Procedure: correlate the scores from the two tests which yields a coefficient of equivalence • Meaning: the consistency of response to different item samples (where testing is immediate) and across occasions (where testing is delayed). • Appropriate use: to provide information about the equivalence of forms Internal Consistency Reliability • Situation: a single administration of one test form • Procedure: Divide test into comparable halves and correlate scores from both halves. – Split Half with Spearman Brown adjustment – Kuder Richardson #20 and #21 – Cronbach’s Alpha • Meaning: consistency across the parts of a measuring instrument (“parts” = individual items or subgroups of items). • Appropriate Use: Where focus is on the degree to which same characteristic is being measured. A measure of test homogeneity. Inter-rater Reliability • Situation: Having a sample of test papers (essays) scored independently by two examiners • Procedure: correlate the two sets of scores – – – – Kendall’s coefficient of concordance Cohen’s kappa Intraclass correlation Pearson product moment • Meaning: measure of scorer (rater) reliability (consistency, agreement) which yields the coefficient of concordance. • Appropriate Use: For ensuring consistency between raters When is a reliability satisfactory? • Depends on the type of instrument • Depends on the purpose of the study • Depends on who is affected by results Factors Affecting Reliability Estimates • Test length • Range of scores • Item similarity Standard Error of Measurement • All tests scores contain some error • For any test, the higher the reliability estimate, the lower the error • The standard error or measurement is the average standard deviation of the error variance over the number of people in the sample • Can be used to estimate a range within which a true score would likely fall Use of Standard Error of Measurement • We never know the true score • By knowing the s.e.m. and by understanding the normal curve, we can assess the likelihood of the true score being within certain limits. • The higher the reliability the lower the standard error of measurement, hence more confidence we can place in the accuracy of a person’s test score. Normal Curve Areas Under the Curve .3413 .3413 .1359 68% .0214 .0214 95% .0013 -3se .1359 .0013 99% -2se -1se X=test score +1se +2se +3se Warnings about Reliability • No such thing as “the” reliability; Different methods are assessing consistency from different perspectives • Reliability coefficients apply to the data, NOT to the instrument • Any reliability is only an estimate of consistency