ITEM ANALYSIS Prof.Osama Kamhawy Lecture Outline Where does analyses fit in the education processes • Simple summary statistics • Facility / test difficulty • Discrimination – Point biserial correlation – Discrimination index • Looking at Options Analysis •Tests of internal consistency – Split haves – KR-20 Index – Cronbach’s alpha Item Analysis For any well-written item • a greater portion of students in the upper group should have selected the correct answer. • a greater portion of students in the lower group should have selected each of the distracter (incorrect) answers. Using item analysis - for staff • What have students learned • Gaps in learning – vital gaps • Easy/difficult/confusing questions • How did the new questions ‘do’ • Did all distractors do their workhichar • Did this year’s group learn the same as last year’s confusing or have a disputed answer Using item analysis - for staff • Student responses to individual questions • Quality of items & test as a whole • Improves items for later use • Increases skills in test construction • Identifies areas of course need clarity • Identifies the weak qs to be managed For students Kinds of Item Analysis • Item difficulty or facility • Item discrimination • Internal consistency of the exam Facility / Item Difficulty Facility = % students getting correct answer (If only one correct option) Mean score for the item Facility = ---------------------------------Total score possible for the item Easy &hard exam Number of Students achieving each Score 30 20 10 0 0 10 20 30 Hard Exam 40 50 60 Normal Exam 70 80 90 Easy Exam 100 Item Difficulty (P) The percentage of students who answered the item correctly. High (Difficult) Medium (Moderate) Low (Easy) <= 30% > 30% AND < 80% >=80% Item difficulty Item Difficulty Level: Examples Number of students who answered each item = 50 Item No. No. Correct Answers % Correct Difficulty Level 1 15 30 High 2 25 50 Medium 3 35 70 Medium 4 45 90 Low Examples of Facility Results Examples of Facility Results Facility / Item Difficulty For dichotomous items (e.g. Right/Wrong) ,facility plays a role in item’s ability to discriminate: • if no one or everyone gets it right it provides no information • too easy/difficult = low discrimination • Items with P = 45-55% more discriminating Item Discrimination Tests of Item Discrimination: ❑ Point biserial coefficients ❑ Item discrimination (33%, 27%, quartile/quintiles) ❑ Cronbach’s alphas (total scores with/without item) Item Discrimination - Point biserial coefficient - correct Item incorrect Total exam score Minus the Item score Point biserial coefficient Correlation between item (correct or incorrect) and total exam score minus item score For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer. Item Discrimination - Point biserial coefficient - rpbi = (Mp - Mq/St) √pq rpbi ( point-biserial correlation coefficient) Mp = 85 Mq = 60 St = standard deviation for whole test p = 75 (300 students) q = 25 (100 students) rpbi = (85 - 60/St) √75x25 Ideal Discrimination ➢ As a general rule, PBCC ≥ +0.20 ➢ The higher, the better. Item Discrimination “d” Check the effectiveness of test items: ▪ Score the exam and sort the results by score. ▪ Select an equal number of students from each end, e.g. top 25% (upper 1/4) and bottom 25% (lower 1/4). ▪ Compare the performance of these two groups on each of the test items. Discrimination Index Item Discrimination: Examples Item No. Number of Correct Answers in Group Upper 1/4 Lower 1/4 Item Discrimination Index 1 90 20 0.7 2 80 70 0.1 3 100 0 1 4 100 100 0 5 50 50 0 6 20 60 -0.4 Number of students per group = 100 Discrimination Index For dichotomous items (right/wrong) Discrimination • Those with ability should be more likely to get each item correct than those with little ability. • Compare proportion of upper group for TEST who answer item correctly to proportion of lower group for TEST who answer item correctly thus: Ui – Li di = ni di should not fall below 0.3 Ideal Discrimination – The higher, the better. – As a general rule PBCC ≥ 0.2 di ≥ 0.3 1. very easy or very difficult test items have little discrimination 2. items of moderate difficulty (45% to 55%) generally are more discriminating. PBCCVs DI DI DI - based on fixed upper and lower groups PBCC not based on fixed upper and lower groups - for each item, - compares the mean score of students who choose the correct answer to the mean score of students who chose the wrong answer. - Reasons for Low Item Discrimination • Questions too easy/too difficult • Wrong answer keyed • Ambiguous question • Potentially correct alternatives • Information not learned well e.g. LO not clear or inadequate emphasis Reasons for Low Item Discrimination • Unhelpful choice of question format • Biased questions e.g. UTI or oral contraceptives • Small sample size • Undefined reasons Options Analysis Excercises Options Analysis Excercise Options Analysis Excercises Options Analysis Excercises Distractor Analysis Distractors based on student misconception should attract more students from the lower group Functioning & nonfunctioning distractors (when attracts >5% of students) Functioning distractors are important for reliability of SBA questions Distractor Analysis Compare the performance of the highest- and lowest-scoring 25% of the students on the distracter options Fewer of the top performers should choose each of the distracters compared to the bottom performers. Options Analysis Excercise Q1 A B Hi 44 1 Lo 20 15 Total 32 7 DI 0.3 C 50 21 34 Q2 Facility 0.34 DI 0.3 Q1Facility 0.34 D E F 2 1 2 22 20 2 14 11 2 Q1 A B C D E F Hi 18 10 51 17 2 2 Lo 24 24 21 25 4 2 Total 32 17 34 22 3 2 Distractor Analysis DI Correct Distractor response Reliability in MCQs - Internal Consistency -Internal consistency describes the extent to which all the items in a test measure the same concept or construct and hence it is connected to the inter-relatedness of the items within the test -It assumes that MCQ measures same thing across the items and that therefore students’ performances across whole test will be reflected in each item. Reliability in MCQs Internal Consistency Tests of Internal consistency • Split-halves - correlation between randomly divided halves • Kuder-Richardson 20 - useful if dichotomous ❑ • Cronbach’s a – A more generalized form than KR20 used when not dichotomous scale Cronbach’s a or KR-20 ≥ 0.7 Summary Item analysis useful to staff, students, BoE and the programe as a whole Facility = % students getting correct answer range 0.3-0.8 but 0.45- 0.55 high discrimination Item Discrimination PBCC ≥ +0.20 di ≥ 0.3 Summary (cont.) Tests of Internal consistency • Split-halves - correlation between randomly divided halves • Kuder-Richardson 20 - useful if dichotomous ❑ • Cronbach’s a – A more generalized form than KR20 used when not dichotomous scale Cronbach’s a or KR-20 ≥ 0.7