IN THE NAME OF GOD TEST CONSTRUCTION WORKSHOP J.KOOHPAYEHZADEH M.D , MPH Education development center Iran University of Medical Sciences 4/13/2015 TEST CONSTRUCTION Workshop 1 “Tell me, I forget. Ask me, I remember. Involve me, I understand.” 4/13/2015 TEST CONSTRUCTION Workshop 2 Why Test? Testing is 50% of Teaching 4/13/2015 TEST CONSTRUCTION Workshop 3 Well defined educational objectives prerequsite for assessment Example for this session: At the end of this session participants will be able: To named at list three differences between summative and formative assessment To make a list of at least three written AM To name the most effective AM to assess clinical skills To describe the most effective AM to assess attitudes 4/13/2015 TEST CONSTRUCTION Workshop 4 Evaluating Students: Tests ARE Not the Only Way! Tests Projects Performance Participation 4/13/2015 TEST CONSTRUCTION Workshop 7 How چگونه ؟ Why چرا؟ ارزيابي When 4/13/2015 چه موقع؟ What چه چيزي را؟ TEST CONSTRUCTION Workshop 12 When? Summative Formative Pre-test 4/13/2015 چه موقع ارزيابي كنيم؟ در پايان آموزش در طول آموزش قبل از آموزش TEST CONSTRUCTION Workshop 13 چرا ارزيابي ميكنيم؟ .1 .2 .3 .4 .5 .6 .7 14 ?WHY تشويق به يادگيري آگاه نمودن دانشجو آگاه نمودن مدرس ي اصالح فعاليتهاي يادگير انتخاب دانشجو گواهي دادن كسب آمادگي ارتقاء TEST CONSTRUCTION Workshop 4/13/2015 Why Evaluate Students? To help students improve To assess student learning To determine if the teacher is teaching Motivation tool To communicate with others such as parents 4/13/2015 TEST CONSTRUCTION Workshop 15 What? دانش مهارت 4/13/2015 TEST CONSTRUCTION Workshop نگرش 19 Who Should Assess? Faculty Self Peers Tutors Other team members Standardized patients, patients External and internal examiners Public, society, … 4/13/2015 360 o TEST CONSTRUCTION Workshop 26 Where? Does 4/13/2015 Work Place Assessment Shows how Test Center/Skill Lab Knows how Examination Hall knows Examination Hall TEST CONSTRUCTION Workshop 27 How to use assessment? Summative: usually undertaken at the end of a training programme and determines whether the educational objectives have been successfully achieved. With summative assessment the students usually receives a grade or a mark. Exam Formative: This is testing that is part of developmental or ongoing teaching / learning process. It should include delivery of feedback to the student. 4/13/2015 TEST CONSTRUCTION Workshop 29 Formative assesssment Feedback Feedback Feedback Feedback Feedback 4/13/2015 TEST CONSTRUCTION Workshop 33 THANK YOU ANY QUESTIONS? 4/13/2015 TEST CONSTRUCTION Workshop 41 Stages of test development Conceptualization Construction Tryout Item analysis Revision 4/13/2015 TEST CONSTRUCTION Workshop 42 Conceptualization An idea… 4/13/2015 TEST CONSTRUCTION Workshop 43 Conceptualization What will it measure? What is the objective? Is there a need? Who will use it? Etc… 4/13/2015 TEST CONSTRUCTION Workshop 44 Test Construction Principles Adequate provision should be made for evaluating all the teacher objectives of the instruction. The test should reflect the approximate proportion of emphasis in the course. 4/13/2015 TEST CONSTRUCTION Workshop 45 Preparing the test The preliminary draft of the test should be prepared as early as possible. As a rule the test should include more than one type of item. 4/13/2015 TEST CONSTRUCTION Workshop 46 Preparing the test, continued The content of the test should range from very easy to very difficult for the group being measured. The items in the test should be arranged in order of difficulty. The items should be so phrased that the content rather than the form of the statement will determine the answer. 4/13/2015 TEST CONSTRUCTION Workshop 47 Preparing the test, continued A regular sequence in the pattern of response should be avoided. The directions to the pupils should be as clear, complete and concise as possible. One question should not provide the answer to another question. 4/13/2015 TEST CONSTRUCTION Workshop 48 Item Analysis Process of determining which items are “good” Tools in item analysis Item Item Item Item 4/13/2015 difficulty index reliability index validity index discrimination index TEST CONSTRUCTION Workshop 50 Characteristics of assessment Tools 4/13/2015 TEST CONSTRUCTION Workshop 55 Reliability If an assessment is repeated with the same trainees, they should get the same results 4/13/2015 TEST CONSTRUCTION Workshop 57 Validity What is it? the degree to which a measurement instrument truly measures what it is intended to measure Importance: If the assessment test does not test what it is meant to test so the test is useless Reliability is a pre-req for validity but not sufficient by itself 4/13/2015 TEST CONSTRUCTION Workshop 58 Standardization What is it? All students are tested on the same test items, patients, tasks & according to the same criteria Importance: So that no one gets more easy or difficult questions (Fairness) 4/13/2015 TEST CONSTRUCTION Workshop 60 Feasibility What is it? Importance 4/13/2015 TEST CONSTRUCTION Workshop 61 Objectivity What is it? it is a level of agreement among independent assessors (experts) about the right answer to certain question Importance Decreases intra-rater and inter-rater bias 4/13/2015 TEST CONSTRUCTION Workshop 62 ويژگيهاي يك آزمون اعتبار Validity ميزان دقت يك وسيله اندازهگيري در اندازهگيري موضوع مورد نظر قابليت اطمينان Reliability ميزان ثبات يك وسيله اندازهگيري در اندازهگيري يك متغيير عينيت Objectivity براي هر درجه توافق بين قضاوتهاي مستقل تعدادي ممتحن خبره بر سر پاسخهاي خوب يك از اجزاي وسايل اندازهگيري عملي بودن Practicability سهولت كلي استفاده از يك آزمون هم براي سازنده آزمون و هم براي دانشجويان 63 TEST CONSTRUCTION Workshop 4/13/2015 رابطه ميان روايي و پايايي Validityvalidity+ Reliability+ reliability + • • • • • • • • validityReliability4/13/2015 • • • • TEST CONSTRUCTION Workshop 64 ن جدول مشخصات آزمو )(Table of specifications يك جدول د وبعدي است: -1بعد افقي :محتواي آموزش ي مورد نظر -2بعد عمودي :سطوح حيطه شناختي (دانش ،ادراك ،كاربرد ،تجزيه و تحليل)..، 69 TEST CONSTRUCTION Workshop 4/13/2015 م سطوح دانش درك كاربرد تجزيه و تحليل محتواي آموزش ي نارسايي قلب 2سؤال 1سؤال 0سؤال 0سؤال شوك 2سؤال 1سؤال 1سؤال 1سؤال مسموميت با ديگوكسين 1سؤال 1سؤال 1سؤال 0سؤال 70 TEST CONSTRUCTION Workshop 4/13/2015 جدول مشخصات آزمون .1 .2 .3 بعد محتوا تعداد كل سئوالها بعد هدف دانش .1 .2 .3 فهميدن .1 .2 تحليل تركيب ارزشيابي تعداد كل سئوالها درصد سئوالها 71 TEST CONSTRUCTION Workshop 4/13/2015 تعداد ساعتهائي كه صرف تدريس يك موضوع شده هر موضوع(بخش)= نسبت ساعتهاي تدريس براي تعداد كل ساعتهاي تدريس يك دوره (واحد درس ي) درصد سئواالت هر بخش= *100نسبت ساعتهاي تدريس هر موضوع تعداد سئوالها درصدسئوالهاي ساعتهاي تدريس عناوين يك دوره درس ي يا 2واحد درس ي ()36 6 %11 4 2 8 .1 .2 .3 50 %100 36 جمع در صد سؤاالت بخش يك 4 =0/11 *100=%11 = نسبت ساعتهاي تدريس حال آنچه 36 يك آزمون 50سئوال از اين دوره درس ي بايد تهيه شود تعداد سئواالت مربوط به بخش يك ميشود . 72 TEST CONSTRUCTION Workshop 6 100 11 50 * 4/13/2015 Thank you for your Time Any Questions or Comments? 4/13/2015 TEST CONSTRUCTION Workshop 73 انواع آزمونها (Written) كتبي.1 MCQ :عینی Essay : غير عینی (Oral) شفاهي.2 (Practical) عملي.3 Log Book Portfolio 4/13/2015 MiniCEX MSF TEST CONSTRUCTION Workshop OSCE DOPS 74 ?What are assessment tools باز كتبي تشريحي محدود پاسخ restricted كوتاه پاسخ گسترده پاسخ extended صحيح-غلط بسته جور كردني چندگزينهاي انجام تكاليف Assignments 75 TEST CONSTRUCTION Workshop 4/13/2015 انواع آزمونهاي تشريحي گسترده پاسخ Extended response سطح تركيب و ارزشيابي محدود پاسخ Restricted response سطوح فهميدن ،كاربستن و تحليل 77 TEST CONSTRUCTION Workshop 4/13/2015 انواع آزمونهاي كوتاه پاسخ براي سطوح پايين حيطه شناختي (حداكثر تا مرحله به كار بستن) 78 پرسش ي كامل كردني تشخيص ي (تداعي) TEST CONSTRUCTION Workshop 4/13/2015 (objective) انواع آزمونهاي عيني 4/13/2015 True- False غلط-صحيح matching جور كردني Multiple- choice چند گزينهاي TEST CONSTRUCTION Workshop 79 Action 1. Professionalism Eval Form 2. End-of-Rotation Eval 3. 360° Evals 4. Mini-CEX 5. Critical Incident Reports 6. Record Reviews Decision Making 1. OSCE 2. SP Exam 3. Computer Simulated Patient Reasoning 1. Oral Exam 2. Essay 3. MCQ Awareness 1. Oral Exam 2. Essay 3. MCQ 4/13/2015 ASSESSMENT TOOLS Action DOES Shows How Knows How Knows TEST CONSTRUCTION Workshop Decision Making Reasoning Awareness Miller’s Pyramid 80 Miller 1990 How to assess Knowledge, Skills, Attitudes Written Exams Clinical Exams Viva Knowledge ++++ + ++ Psychomot or skills - ++++ - Attitude - + + 4/13/2015 TEST CONSTRUCTION Workshop 81 نكاتي از تدوين آزمونهاي كتبي سؤاالت را به ترتيب ذيل قرار دهيد: -1صحيح -غلط -2جوركردني -3چندگزينهاي -4كوتاه پاسخ -5تشريحي سؤاالت از ساده به دشوار مرتب شود. سؤاالت را به ترتيب سازمان اصلي مطالب به دنبال هم مرتب كنيد. 90 TEST CONSTRUCTION Workshop 4/13/2015 MCQ گزينه يا پاسخ تنه اصلي پاسخ انحرافي پاسخ درست Destructor Key 4/13/2015 TEST CONSTRUCTION Workshop 91 انواع آزمونهاي چند گزينهاي تنها گزينه درست بهترين گزينه درست منفي 92 TEST CONSTRUCTION Workshop 4/13/2015 Millman قوانين MCQ در خصوص 4/13/2015 TEST CONSTRUCTION Workshop 93 21قانون Millmanدر خصوص MCQ -1پايه بايد مسائل اصلي و كميتها را در برگيرد. -2هر Itemبايد تا حد امكان كوتاه باشد ( ضمن حفظ وضوح جمالت) -3از ذكر سئواالت منفي در پايه حتيالمقدور خودداري شود. در صورت انجام اين امر زير جمله منفي خط كشيده شود يا با حروف درشت نوشته شود. 94 TEST CONSTRUCTION Workshop 4/13/2015 21قانون Millmanدر خصوص MCQ -4پايه سئوال بايد بنحوي تنظيم شود كه بدون كمك گرفتن از ديگر موارد گزينهها بيان كننده مسئله اصلي باشد .گزينه ها نيز بايد حتيالمقدور مستقل از يكدگير باشد. -5بهترين پاسخ بايد خواسته شود يا از عبارت بيشترين و اوليه استفاده شود( .در صورتيكه بيش از يك پاسخ نسبتا ً صحيح داشته باشد) -6در پايه سئواالتي كه جاي خالي گذاشته ميباشد .قسمت حذف شده كه بايد پرشود حتيالمقدور نبايد ابتداي جمله گذاشته شود. 95 TEST CONSTRUCTION Workshop 4/13/2015 21قانون Millmanدر خصوص MCQ -7دشواريهاي زباني گزينهها بايد پايين باشد. -8با هر گزينه يك نقطه نظر را بايد مورد سئوال قرار داد. -9حتيالمقدور از تكرار كلمات در گزينهها خودداري شود مگر توالي منطقي وجود داشته باشد. 96 TEST CONSTRUCTION Workshop 4/13/2015 21قانون Millmanدر خصوص MCQ -10سئواالت انحرافي بايد منطقي و جالب توجه باشد (در صورتي كه پايه سئوال درك و فهم واقعي را اندازهگيري نمايد). -11تمام گزينهها از نظر دستور زبان و اصول گرامر بايد مطابق با پايه سئوال باشد يعني اگر پايه سئوال جمع است گزينهها نيز همه جمع باشند. -12گزينه از نظر طول جمله ،دشواري فني و كاربردي يكسان باشند. 97 TEST CONSTRUCTION Workshop 4/13/2015 21قانون Millmanدر خصوص MCQ -13پايه و گزينهها بايد از نظر قواعد دستوري ،محتوي موضوعي و شكل يكنواخت و همگن باشد. -14از توالي پاسخ صحيح در مجموعه سئواالت امتحاني خودداري شود. (بترتيب :الف ،ب ،ج ،د جواب صحيح نباشد يا اكثريت با جواب ج نباشد) 98 TEST CONSTRUCTION Workshop 4/13/2015 21قانون Millmanدر خصوص MCQ -15بازاي هر موضوع حداقل 4گزينه داشته باشيد. -16از بكاربردن عباراتي كه بنحوي تشابه بين پايه و سئوال باشد ،بايد خودداري كرد. -17از بكاربردن عين عبارت كتاب خودداري شود. -18از بكار بردن پايه سئواالتي كه پاسخ به سئوال بعدي است ،خودداري شود 99 TEST CONSTRUCTION Workshop 4/13/2015 21قانون Millmanدر خصوص MCQ -19گزينهها نبايد شامل يكديگر يا در حقيقت با يك منظور باشند. -20از شاخصهاي معلوم و خاص مثل هميشه ،هرگز خودداري شود. -21در پرسش راجع به فهم و درك يك اصطالح يا مفهوم ،ابتدا اصطالح را ارائه نمود و سپس با يك سري مشخصه و تعاريف گزينه ها را انتخاب نمود. 100 TEST CONSTRUCTION Workshop 4/13/2015 Thank you for your Time Any Questions or Comments? 4/13/2015 TEST CONSTRUCTION Workshop 102 M.P.L. محاسبه حد نصاب قبولي Minimum Pass Level ارزش اختصاص داده شده به گزينه صحيح حدنصاب قبولي براي هر سئوال= حد نصاب قبولي براي امتحان = مجموع امتياز داده شده به كليه گزينهها مجموع حدنصاب قبولي سئواالت امتحان تعداد سئواالت 111 TEST CONSTRUCTION Workshop 4/13/2015 Item Analysis Main purpose of item analysis is to improve the test Analyze items to identify: • • • • Potential mistakes in scoring Ambiguous/tricky items Alternatives that do not work well Problems with time limits 4/13/2015 TEST CONSTRUCTION Workshop 112 انواع آزمونها Criterion- Referenced and Norm- Referenced TESTS )ي(مالكي آزمونهاي معيار )آزمونهاي هنجاري (رقابتي 4/13/2015 TEST CONSTRUCTION Workshop 113 TYPES OF TESTS BY PURPOSE 1. Norm-referenced Tests a. Discrimination most important aspect b. Easy items eliminated 2. Criterion-referenced Tests a. Discrimination not of critical importance. b. Items not altered or eliminated due to difficulty 4/13/2015 TEST CONSTRUCTION Workshop 114 Criterion- Referenced قبل از برگزاري آزمون معيارهاي مشخص جهت اطمينان از كسب حداقل دانش و تواناييهاي در آزمون با مقايسه دانشجو خاص تعيين ميشود و سنجش موفقيت يا عدم موفقيت وضعيت وي با معيارهاي تعيين شده انجام ميگيرد. بيشتر براي امتحانات نهايي و جهت اعطاي گواهينامه كاربرد دارد. اين روش مثال :آزمون ورودي دانشكده خلباني آزمون دانشنامه تخصص ي 115 TEST CONSTRUCTION Workshop 4/13/2015 Norm- Referenced نتايج بدست آمده از كليه دانشجويان با هم مقايسه ميشوند .حدنصاب قبولي بصورت قرادادي و يا با توجه به نمرات اخذ شده توسط دانشجويان تعيين ميشود. بيشتر براي امتحانات ورودي و تشخيص ي كاربرد دارد. اين روش مثال :آزمون ورودي دانشگاهها 116 TEST CONSTRUCTION Workshop 4/13/2015 بررس ي تحليلي سئواالت ي در آزمونهاي هنجار Norm Reference 4/13/2015 TEST CONSTRUCTION Workshop 117 ITEM ANALYSIS an Assessment tool has 3 parts 1. Item Difficulty 2. Item Discrimination 3. Distraction Analysis 4/13/2015 TEST CONSTRUCTION Workshop 118 مراحل تجزيه و تحليل سئواالت .1تعيين نمره هر يك از دانشجويان .2رتبه بندي دانشجويان براساس شايستگي .3تعيين گروههاي باال و پائين .4محاسبه ضريب و شاخص دشواري براي هر سئوال .5محاسبه ضريب و شاخص تشخيص براي هر سئوال .6ارزيابي انتقادي سئواالت 119 TEST CONSTRUCTION Workshop 4/13/2015 كارت تحليل سئوال تاريخ اجراي آزمون 2/11/73 عنوان آزمون :آمار استنباطي موضوع سئوال :ضريب همبستگي كدام يك از ارقام زير معرف ضريب همبستگي بيشتري است؟ الف55/0 - *ب61/0 - ج49/0 - د23/0 - 120 گروهها الف ب ج د بدون پاسخ %25باال %25پايين ضريب دشواري =35 ضريب تميز=3/0 0 5 5 2 3 3 0 0 2 0 TEST CONSTRUCTION Workshop 10 10 4/13/2015 Tests of individual differences Two groups of individuals U – Upper group – 27% of highest scorers L – Lower group – 27% of lowest scorers U=L Upper group individuals who got the item right item difficulty index item discrimination index 4/13/2015 p U p Lp D U L Lower group individuals who got the item right U p Lp U TEST CONSTRUCTION Workshop 121 Example – cont. 60 students who took the test. Item 14: Among 16 upper scorers, 5 have the item right. Among 16 lower scorers, only 1 has the item right. 5 1 p .19 32 4/13/2015 5 1 D .25 16 TEST CONSTRUCTION Workshop 122 ITEM ANALYSIS Difficulty (D): 0 - 1 0______________0.5____________1.0 Hard Moderate Easy 4/13/2015 TEST CONSTRUCTION Workshop 129 ITEM ANALYSIS Example: 30 students in class 5 of Top 10 scorers got ? correct 3 of Bottom 10 scorers got ? correct D = 5 correct + 3 correct = 10 + 10 4/13/2015 8 = .4 (Moderate 20 Difficulty) TEST CONSTRUCTION Workshop 130 ITEM ANALYSIS Discrimination Index 0____________0.5_____________1.0 No Moderate Excellent (-) Something is wrong 4/13/2015 TEST CONSTRUCTION Workshop 135 ITEM ANALYSIS Example: 30 students in class 10 of Top 10 scorers got ? correct 2 of Bottom 10 scorers got ? correct D = 10 correct - 2 correct = 8 = .8 (Good (10 + 10)/2 10 Discrimination) 4/13/2015 TEST CONSTRUCTION Workshop 136 تفسير ضريب تميز سئوال 138 هر قدر ضريب تميز بزرگتر باشد ،قوه تميز آن سئوال بيشتر و هر قدراين ضريب كوچكتر باشد قوه تميز آن كمتر است. در نتيجه سئوااهاي خوب يك آزمون آنهايي هستند كه داراي ضريب دشواري متوسط و ضريب تميز بااليي است. TEST CONSTRUCTION Workshop 4/13/2015 D Index Rule of Thumb for Classroom Tests D Index Interpretation >40% excellent discrimination 25% to 39% acceptable discrimination < 25% poor discrimination 4/13/2015 TEST CONSTRUCTION Workshop 140 Summary of Standards of Acceptance Item Difficulty (P) 30% - 90% Item Discrimination (by D) 25% and above 4/13/2015 TEST CONSTRUCTION Workshop 141 Difficulty Index 0,3 0,5 0,6 0,7 ------/---------------(------------)----------/----------recommended ------------------------------------------acceptable too difficult too easy 4/13/2015 TEST CONSTRUCTION Workshop 142 Format Ideal Difficulty Five-response multiple-choice 70 Four-response multiple-choice 74 Three-response multiple-choice 77 True-false (two-response multiplechoice) 85 4/13/2015 TEST CONSTRUCTION Workshop 143 Discrimination Index 0.15 0.25 0.35 ----------/----------/----------/---------throw off 4/13/2015 to check good TEST CONSTRUCTION Workshop excelent 144 Be aware very easy or very difficult test items have little discrimination items of moderate difficulty (60% to 80% answering correctly) generally are more discriminating. 4/13/2015 TEST CONSTRUCTION Workshop 145 Point-biserial correlation Used to correlate a dichotomous variable with a continuous variable In testing, used to correlate a person’s performance on an item (correct, incorrect) with their total test score Used as an index of item discrimination the point biserial ranges from –1.00 to +1.00 The higher, the better. As a general rule, >+0.20 is desirable 4/13/2015 TEST CONSTRUCTION Workshop 146 Point-biserial formula Mean on the test for people who got item correct 4/13/2015 Mean on the test for people who got item incorrect Standard deviation for test IF for item TEST CONSTRUCTION Workshop 1 – IF for item 147 بررس ي تحليلي سئواالت ي در آزمونهاي معيار Criterion Reference 4/13/2015 TEST CONSTRUCTION Workshop 157 Criterion referenced tests Two groups of individuals U – Upper group (above criterion) Upper group individuals L – Lower group who got the item right item difficulty index item discrimination index 4/13/2015 p D U p Lp U L Up U Lower group individuals who got the item right Lp L TEST CONSTRUCTION Workshop 158 Example A test of mastery of Istanbul geography. Outcome is that 60 individuals are “masters” and 20 failed the test. Item 3: 45 “masters” and 10 who failed got the item right. What are the item difficulty and item discrimination indices? 45 10 p .69 60 20 45 10 D .75 .50 .25 60 20 4/13/2015 TEST CONSTRUCTION Workshop 159 بررس ي تحصيلي سئواالت در آزمونهاي معياري Criterion Reference هدف :ميزان دستيابي افراد به دانش مورد نظر پس از طي دوره بر حسب هدف آموزش ي سئوال ممكن است دشوار يا آسان باشد. شاخص دشواري در اين امتحان ارزش متفاوت داردً سئواالت بسيار آسان و يا بسيار مشكل لزوماُ نياز به تغيير يا حذف شدن ندارد (اگراعتبار كافي داشته باشد) براي بررس ي سئواالت در اين آزمونها از Pretest, Post testو مقايسهنتايج آنها استفاده ميشود. 160 TEST CONSTRUCTION Workshop 4/13/2015 شماره سؤال 1 3 2 5 4 الف Post test : بPre test: نام افراد ب الف ب الف ب الف ب الف ب الف ح .د - + + + - - + - - + س .ن - + + + - - + - + + خ .پ - + + + - - + - - + ش .ف - + + + - - + - - + د .ه - + + + - - - - + + ف .پ - + + + - - + - - - S = Ra - Rb T S=Sensitivity Instructional Effects تعداد كساني كه پس از آموزش به سؤال پاسخ درست دادهاند=Ra تعداد كساني كه پيش از آموزش به سؤال پاسخ درست دادهاند=Rb تعدادكساني كه به سؤال هم پيش و همه پس از آزمون پاسخ دادهاند=T 161 TEST CONSTRUCTION Workshop 4/13/2015 ضريب Sبراي بهترين سئوال و آزمونهاي معياري معادل يك است. سئواالتي كه با ضريب Sصفر و يا كمتر يا منفي باشد قادر به سنجش تأثير آموزش نخواهد بود. 162 TEST CONSTRUCTION Workshop 4/13/2015 تحليل آزمونهای تشريحی و عملکردی نمره ميانگين سوال ضريب دشواری= دامنه ممکن نمرات سوال 2/4 = 6-1 تفاوت بين نمرات ميانگين گروههای باال و پايين برای سوال 3/5 = ضريب تميز = دامنه ممکن نمرات سوال 6 163 TEST CONSTRUCTION Workshop -8/2 4/13/2015 -1 تحليل گزينه هاي انحرافي هر گزينه انحرافي بايد حداقل يك نفر از گروه ضعيف را به خود جلب كند. گزينه انحرافي بايد افراد ضعيف را بيش از افراد قوي به خود جلب كند. 164 TEST CONSTRUCTION Workshop 4/13/2015 Thank you for your Time Any Questions or Comments? 4/13/2015 TEST CONSTRUCTION Workshop 165 Two issues in using instruments... 1. Validity: the degree to which the instrument measures what it purports to measure 2. Reliability: the degree to which the instrument consistently measures what it purports to measure 4/13/2015 TEST CONSTRUCTION Workshop 166 Types of reliability... 1. Stability 2. Equivalence 3. Internal consistency 4/13/2015 TEST CONSTRUCTION Workshop 167 1. Stability )“test-retest”(: the degree to which two scores on the same instrument are consistent over time 4/13/2015 TEST CONSTRUCTION Workshop 168 2. Equivalence )“equivalent forms”(: the degree to which identical instruments (except for the actual items included) yield identical scores 4/13/2015 TEST CONSTRUCTION Workshop 169 3. Internal consistency )“split-half” reliability with Spearman-Brown correction formula , KuderRichardson and Cronback’s Alpha reliabilities, scorer/rater reliability): the degree to which one instrument yields consistent results 4/13/2015 TEST CONSTRUCTION Workshop 170 RELIABILITY TEST-RETEST (COEFFICIENT OF STABILITY) PARALLEL FORM (COEFFICIENT OF EQUIVALLENCE) INTERNAL CONSISTENCY 4/13/2015 TEST CONSTRUCTION Workshop 171 INTERNAL CONSISTENCY SPLITHALF METHOD SPEARMAN BROWN PROPHECY FORMULA KRUDER-RICHARDSON METHOD COEFFICIENT ALPHA 4/13/2015 TEST CONSTRUCTION Workshop 172 KR20 KR20 = [K / (K-1)] x [(S2x - pq) / S2x] K = # of trials or items S2x = variance of scores p = percentage answering item right q = percentage answering item wrong pq = sum of pq products for all k items 4/13/2015 TEST CONSTRUCTION Workshop 173 KR20 Example Item 1 2 3 4 p .50 .25 .80 .90 q .50 .75 .20 .10 If Mean = 2.45 and SD = 1.2, what is KR20? 4/13/2015 pq .25 .1875 .16 .09 pq = 0.6875 KR20 = (4/3) x (1.44 – 0.6875)/1.44 KR20 = .70 TEST CONSTRUCTION Workshop 174 KR21 If assume all test items are equally difficult, KR20 can be simplified to KR21 KR21 =[(K x S2)-(Mean x (K - Mean)] ÷ [(K-1) x S2] K = # of trials or items S2 = variance of test Mean = mean of test 4/13/2015 TEST CONSTRUCTION Workshop 175 RELIABILITY OF CRITERION – REFERENCED LINDMAN AND MERENDA 4/13/2015 TEST CONSTRUCTION Workshop 177 Rule of Thumb for Acceptable Reliability Coefficients for Classroom Tests Reliability Coefficient Interpretation .70 or higher acceptable reliability 4/13/2015 TEST CONSTRUCTION Workshop 178 ویژگیهای روش ارزیابی Types of Validity: Face 1. Item validity Content 2. Sampling validity Predictive Concurrent Construct 4/13/2015 Determined by expert judgment Blueprinting TEST CONSTRUCTION Workshop 179 Types of validity... 1. Content validity 2. Criterion-related validity 3. Construct validity 4/13/2015 TEST CONSTRUCTION Workshop 180 1. Content validity: the degree to which an instrument measures an intended content area 4/13/2015 TEST CONSTRUCTION Workshop 181 3. Construct validity: a series of studies validate that the instrument really measures what it purports to measure 4/13/2015 TEST CONSTRUCTION Workshop 182 forms of content validity… …sampling validity: does the instrument reflect the total content area? …item validity: are the items included on the instrument relevant to the measurement of the intended content area? 4/13/2015 TEST CONSTRUCTION Workshop 183 2. Criterion-related validity: an individual takes two forms of an instrument which are then correlated to discriminate between those individuals who possess a certain characteristic from those who do not 4/13/2015 TEST CONSTRUCTION Workshop 184 forms of criterion-related validity… …concurrent validity: the degree to which scores on one test correlate to scores on another test when both tests are administered in the same time frame …predictive validity: the degree to which a test can predict how well individual will do in a future situation 4/13/2015 TEST CONSTRUCTION Workshop 185 Types of Validity 1. Content Validity 2. Empirical Validity Face Validity Sampling Validity (content validity) Concurrent Validity Predictive Validity 3. Construct Validity 4/13/2015 TEST CONSTRUCTION Workshop 186 4/13/2015 TEST CONSTRUCTION Workshop 187 Item discrimination How well does the item separate those that know the material from those that do not. In LXR, measured by the Point-Biserial (rpb) correlation (ranges from -1 to 1). rbp is the correlation between item and exam performance 4/13/2015 TEST CONSTRUCTION Workshop 188 Item discrimination + rpb means that those scoring higher on the exam were more likely to answer the item correctly. (better discrimination) - rpb means that high scorers on the exam answered the item wrong more frequently than low scorers. (poor discrimination) A desirable rpb correlation is +0.20 or higher. 4/13/2015 TEST CONSTRUCTION Workshop 189 Evaluation of Distractors Distractors are designed to fool those that do not know the material. Those that do not know the answer, guess among the choices. Distractors should be equally popular. (# expected = # answered item wrong / # of distractors) Distractors ideally have a low or -rpb 4/13/2015 TEST CONSTRUCTION Workshop 190 LXR Example 1 (* correct answer) N % Avg % Correct on Exam rpb A* 86 99% B 0 0% C 0 0% D 1 1% E 0 0% 85.3% 0% 0% 82.0% 0% +.06 ---- --- -.06 --- Very easy item, would probably review the alternates to make sure they are not ambiguous and/or provide clues that they are wrong. 4/13/2015 TEST CONSTRUCTION Workshop 191 LXR Example 2 (* correct answer) A B C* D E N 0 21 65 2 0 % 0% 24% 74% 2% 0% 0% 80.7% 87.2% --- -.33 +.36 Avg % Correct on Exam rpb 78.7% 0% -.13 --- Three of the alternatives are not functioning well, would review them. 4/13/2015 TEST CONSTRUCTION Workshop 192 LXR Example 3 (* correct answer) A B C* D E N 3 1 15 5 66 % 3% 1% 17% 6% 76% Avg % Correct on Exam rpb 83.0% 80.0% 83.4% 82.2% -.07 -.09 -.15 -.12 86.8 % +.23 Probably a miskeyed item. The correct answer is likely option E. 4/13/2015 TEST CONSTRUCTION Workshop 193 LXR Example 4 (* correct answer) A B* C D E N 11 43 3 22 8 % 13% 49% 3% 25% 9% Avg % Correct on Exam 81.5% 87.4% rpb -.24 +.35 82.3% 84.5% 82.4% -.09 -.08 -.15 Relatively hard item with good discrimination. Would review alternatives C & D to see why they attract a relatively low & high number of students. 4/13/2015 TEST CONSTRUCTION Workshop 194 LXR Example 5 (* correct answer) N % Avg % Correct on Exam rpb A B* C D E 3 3% 60 69% 1 1% 5 6% 18 21% 83.0% -.07 85.3% 80.0% 82.2% 86.8% +.002 -.09 -.12 +.13 Poor discrimination for correct choice “B”. Choice “E” actually does a better job discriminating. Would review item for proper keying, ambiguous wording, proper wording of alternatives, etc. This item needs revision. 4/13/2015 TEST CONSTRUCTION Workshop 195 4/13/2015 TEST CONSTRUCTION Workshop 196