MGTO 324 Recruitment and Selections Reliability Kin Fai Ellick Wong Ph.D. Department of Management of Organizations Hong Kong University of Science & Technology Prologue • A quick test of your understanding of reliability – Which of the following is the most reliable thermometer? Trial Thermometer A Thermometer B Thermometer C 1 101 103 108 2 99 97 108 3 100 102 107 4 101 103 108 5 99 95 108 Tests that are relatively free of measurement random errors are deemed to be reliable. Reliability refers to consistency of measurement or repeatability. Prologue How reliable your eyes are…. How many digits are there? “456” “8734868412473” “24368414863247821475235874123547532547” Prologue How reliable your eyes are…. How many digits are there? “888” “88888888888888” “88888888888888888888888888888888888” Prologue • Some items are more reliable than others. • Which “item” is more reliable? – “456” vs. “888” – “8734868412473” vs. “88888888888888” “24368414863247821475235874123547532547” vs. “88888888888888888888888888888888888” Outline Reliability Part I: Measurement errors Part II: Estimating reliability Outline Reliability Part I: Measurement errors Part II: Estimating reliability Part I: Measurement errors • Reliability – Consistency of measurement of an attribute. – When we want to assess how well you eyes are… • “8734868412473” vs. “88888888888888” • Measurement errors – Error is a kind of inaccuracy and variability – Error can be reduced if the measuring device is precise, but it cannot be completely eliminated Part I: Measurement errors • True score theory – Observed score = • True score + systematic errors + random errors – Systematic errors • A constant is added to every measure • A constant CANNOT make a test/score “inconsistent” • Having no effects on reliability – Random errors • A random value is added to every measure • The random values are normally distributed, with mean = 0 • Measurement errors, leading to low consistency (i.e., low reliability) Part I: Measurement errors Part I: Measurement errors • Reliability and measurement errors – A test can truly reflect the “true score” is claimed as a reliable test – For a perfectly reliable test • The correlation between the true score and the observed score is 1 • There is no random error; measuring something meaningful. – For a perfectly unreliable test • The correlation between the true score and the observed score is 0 • The test is “full” of random error; measuring something meaningless. Part I: Measurement errors • Interpretation of reliability – Consistency – Theoretical reliability = • The variance of the true score / the variance of the observed score – When the reliability coefficient of an exam = 0.4 • 40% of the obtained scores is due to the true score • 60% of the obtained scores is due to random errors – When reliability coefficient = 1 • All observed-score (X) variance reflects true-score (T) variance = σ2 T ) – When reliability coefficient = 0 • All observed variance reflects error variance (σ2X = σ2E) (σ2X Part I: Measurement errors • Interpretation of reliability – Consistency – Theoretical reliability = • The variance of the true score / the variance of the observed score – When the reliability coefficient of an exam = 0.4 • 40% of the obtained scores is due to the true score • 60% of the obtained scores is due to random errors Outline Reliability Part I: Measurement errors Part II: Estimating reliability Part II: Estimating reliability • Three common methods – Time sampling • Same test given at two points in time – Test-retest reliability – Item sampling • Different items used to assess the same attribute – Parallel/alternate forms reliability – Internal consistency • Consistency of items within the same test – Split half reliability – Coefficient Alpha Part II: Estimating reliability Estimating reliability Test-Retest Parallel/Alternate Split Half Coefficient Alpha Part II: Estimating reliability • Test-retest reliability – Same test given at two points in time – How to assess? • Correlation between scores obtained from two occasions Part II: Estimating reliability S001 S002 S003 S004 S005 S006 S007 S008 S009 S010 Time 1 5 6 5 4 2 1 3 1 7 2 Time 2 4 6 4 4 1 1 2 1 6 1 • Test-retest reliability – Same test given at two points in time – How to assess? • Correlation between scores obtained from two occasions Part II: Estimating reliability S001 S002 S003 S004 S005 S006 S007 S008 S009 S010 Time Time 1 2 5 4 6 6 5 4 4 4 2 1 1 1 3 2 1 1 7 6 2 1 C or re lat io ns Time1 Time1 Pearson Correlation 1 Sig. (2-tailed) . N 10 Time2 Pearson Correlation .970** Sig. (2-tailed) .000 N 10 **. Correlation is significant at the 0.01 level (2-tailed). Time2 .970** .000 10 1 . 10 Part II: Estimating reliability • Consideration before using test-retest method – Carryover effects • The first testing session influences scores from the second session – Test-retest time interval • When the tests are too close, the influences of carryover effects increase • When the tests are too far, the influences of other factors increase Part II: Estimating reliability • When should I use test-retest reliability? – For “Trait” or characteristics that do not change over time • Intelligence, personality • Test-retest is not appropriate for – Changing characteristics • Knowledge, mood, emotion, motivation Part II: Estimating reliability Estimating reliability Test-Retest Parallel/Alternate Split Half Coefficient Alpha Part II: Estimating reliability S001 S002 S003 S004 S005 S006 S007 S008 S009 S010 Form 1 5 6 5 4 2 1 3 1 7 2 Form 2 4 6 4 4 1 1 2 1 6 1 • Parallel/alternate reliability – Different items used to assess the same attribute – Two sets of tests are developed • How to assess? – Correlation between equivalent forms of test that have different items Part II: Estimating reliability • Use of parallel/alternate forms – The two forms have been constructed with an effort to make them parallel • Equal (very similar) observed means, SDs and correlation with other measures – The correlation between the two forms • How reliable they are • How parallel they are – The two forms may be implemented in the same or different periods Part II: Estimating reliability • Practical constraints – Developing two forms are time and resources consuming – It may be difficult to retest the same group of individuals – Test developers usually prefer to base their estimate of reliability on a single form of test Part II: Estimating reliability Estimating reliability Test-Retest Parallel/Alternate Split Half Coefficient Alpha Part II: Estimating reliability • Split-half reliability – Dividing a test into two parts • First and second half – Carryover effects • Odd-even system • Random split • How to assess? – The correlation between the two halves • More items must be more reliable than fewer items • Splitting half lowers/underestimates the reliability of a test – Spearman-Brown method of split-half reliability • Adjusting the problems of underestimation Part II: Estimating reliability S001 Mean score of all odd items (1,3,5,7,9) 5 Mean score of all even items (2,4,6,8,10) 4 S002 S003 S004 S005 6 5 4 2 6 4 4 1 S006 S007 S008 S009 1 3 1 7 1 2 1 6 S010 2 1 Part II: Estimating reliability • Split-half reliability – Spearman-Brown method of split-half reliability • Adjusting the problems of underestimation – Example • The correlation between scores of the two halves is 0.5, what is the split-half reliability? • (2 x 0.5) / (1 + 0.5) • 1 / 1.5 = 0.67 Part II: Estimating reliability • Split-half reliability – Other applications of Spearman-Brown formula – Estimate the reliability of a lengthened test • A test with 20 items has a reliability coefficient of 0.4. If we increase the length of the test to 40 items, what is the reliability of the lengthened test? – Estimate the reliability of a shortened test • A test with 10 items has a reliability coefficient of 0.8. If we decrease the length of the test to 2 items, what is the reliability of the shortened test? – Estimate how many items are needed to increase the reliability to a specific value • A test with 20 items has a reliability coefficient of 0.6. If we want to increase the reliability to 0.9, how many items do we need? Part II: Estimating reliability • Split-half reliability – Advantages • One test is enough • No carryover or practice effect – For random and odd-even splitting methods – Disadvantages • Different splitting methods yield different estimations • Solutions – Coefficient alpha or Cronbach’s Alpha Part II: Estimating reliability Estimating reliability Test-Retest Parallel/Alternate Split Half Coefficient Alpha Part II: Estimating reliability • Cronbach’s alpha – can be viewed as the average of all split-half reliability coefficients resulting different splitting of a test – When test items are continuous scores Part II: Estimating reliability • Cronbach’s alpha – You are not required to memorize the formula – Alpha ranges from 0 to 1 – How reliable is reliable? • For basic research – 0.7 to 0.8 is usually acceptable – Refining tests up to 0.9 or above may waste resources – High reliability might be expected for very focused tests • For clinical settings – High reliability is extremely important – Decision might affect one’s future – 0.9 or even 0.95 • For selection purpose – 0.7 to 0.8 is reasonable – Selection is not just determined by the test score… Part II: Estimating reliability • Cronbach’s alpha – The most popular method of assessing reliability with continuous variable – I will show you how to assess different methods of reliability by using SPSS in the next Workshop