Reliability Reliability Refers to the degree to which test scores are consistent, dependable, or repeatable. It is a function of the degree to which test scores are free from errors of measurement (Drummon, 2000) Refers to the consistency of test scores obtained by the same persons when they are re-examined with the same test on different occasions, or with different sets of equivalent items, or under other variable examining conditions (Anastasi & Urbina, 1997) Reliability Underlies the error of measurement of a single score, whereby we can predict the range of fluctuation likely to occur in a single individual’s scores as a results of irrelevant chance factors Refers to internal consistency of a test based on the number of items in the test and the average inter-correlations among all items and computing the average of these inter correlation among test items Some Sources of Inconsistency (Error Variance) Characteristics of an individual General skills and techniques of taking tests (test-wiseness or test naivete) Stable response sets (e.g., to mark A option marked more frequently that other options of multiple-choice items, to mark true-false items “true” when undecided) Level of practice on the specific skills involved (especially psychomotor skills) Factors affecting performance on many or all tests at a particular time -Health -Fatigue -Noises and distraction -Motivation -Emotional Strain -Anxiety Administration of the test/ Appraisal of test performance Conditions of testing: adherence to time limits, freedom from distraction, clarity of instructions, etc. Interaction of personality, sex, or race of examiner with that of examinee that facilitates or inhibits performance Unreliability or bias in grading or rating performances Others Luck in selection of answers by sheer guessing Momentary distraction Psychological Assessment for the Guidance Practitioner by Alexa PrielaAbrenica (2009) Key Concepts Sources of Error Time Sampling: Test-Retest method Item Sampling: Parallel Forms Method Internal Consistency ◦ Split-half Method ◦ KR20 Formula ◦ Coefficient Alpha Inter-rater Reliability Measurement error will have an effect on reliability. True scores vs Observed scores Reliability coefficient – ratio of variance of true scores on a test to the variance of observed scores Source of Error: Time Test-retest reliability Same test is given twice Find correlation of scores on both instances Disadvantage: Carryover effect Source of Error: Items Parallel Forms Different items are used to measure the same attribute Pearson Product-Moment Correlation Coefficient Source of Error: Internal Consistency Split-half Tests are given and divided into halves and scored separately Compare scores on first half and scores on the second half Cronbach’s coefficient alpha – lowest estimate of reliability expected Kuder Richardson 20(dichotomous) KR20 (Kuder-Richardson Formula 20) an index of the internal consistency reliability of a measurement instrument, such as a test, questionnaire, or inventory be applied to any test item responses that are dichotomously scored Values of KR-20 generally range from 0.0 to 1.0, with higher values representing a more internally consistent instrument KR20 very rare cases, typically with very small samples, values less than 0.0 can occur, which indicates an extremely unreliable measurement 0.7 is an acceptable value 0.8 for longer tests of 50 items or more http://knowledge.sagepub.com/view/researchdesign/n205.xml Methods of assessing reliability (Drummond, 2006) Test-Retest Procedure: same test given twice with time interval between testings Coefficient: stability Problems: memory effect, practice effect, change over time Alternate Forms-Version 1 Procedure: equivalent test given with one after the other Coefficient: equivalence Problems: hard to develop equivalent tests Alternate Forms-Version 2 Procedure: equivalent test given with time between testings Coefficient: equivalence and stability Problems: hard to develop equivalent tests, may reflect change in behavior over time Internal Consistency Procedure: one test given at one time only (test divided into part in split-half) Coefficient: equivalence and internal consistency Problems: uses shortened forms (splithalf), only good if traits are unitary or homogeneous, gives high estimates on a speeded test, hard to compute by hand Psychological Assessment for the Guidance Practitioner by Alexa PrielaAbrenica (2009) Inter-Rater/Inter-Observer Reliability The degree of agreement among raters Percentage of agreement Correlations Cohen’s Kappa Factors Affecting Reliability Length increases reliability Homogeneity increases reliability Shorter time, higher reliability Types of reliability estimate How reliable is reliable? Usual reliability coefficients of at least 0.70 to 0.80