Instructor: Faculty of Educational and Behavioral Sciences, BDU Course Objectives At the conclusion of this course you are expected to: • Understand concepts related to student learning assessment • Develop techniques for assessing the performance of students based on sound principles and educational objectives • Analyze items to increase the fit for purpose of classroom assessment tools. • Interpret assessment results to understand the implications and thereby make appropriate decisions. 2 Course Objectives contd... • Conduct self-assessment of their teaching in classrooms in view of student learning and standards of teacher professionalism. • Adhere to professional assessment ethical standards in assessing student learning, handling records, using or communicating assessment results and making decisions. 3 Chapter 1 Assessment: Concept, Purpose, and Principles Definitions Test A procedure in which a sample of an individual’s behavior is obtained, evaluated and scored (AERA et al., 1999) A process of presenting series of questions that student must answer (Nitko, 1996) – achievement test Measurement A set of rules for assigning numbers to represent objects, traits, attributes, or behaviors (Reynolds, 2006) A process of quantifying or assigning a number to performance. (Nitko, 1996). 5 Definitions Assessment Any systematic procedure for collecting information that can be used to make inferences about the characteristics of people or objects (AERA et al., 1999) It is a general term that includes all the different ways teachers gather information in their classroom(Nitko 1996; Airasian, 1996). Evaluation The process of making judgment about pupil’s performance, instruction, or classroom climate (Airasian, 1996) 6 Frames of Reference for Evaluating Assessment Results 1.Criterion-referenced-Criteria Vs Performance In Criterion-referenced interpretation is provided by describing what the student can and can not do. 2. Norm referenced-Ones Performance Vs others In Norm-referenced, interpretation is provided by comparing the student’s performance to the performance of others or to the typical performance for that student 3. Growth referenced-Present Vs Past Performance in Growth-referenced, the present performance of a student compared to his /her prior performance 4. Ability-referenced- Performance Vs Potential In ability-referenced a student's performance is interpreted in light of that student's maximum performance(potential ). Purposes/functions of educational assessment What do you think are the purposes of assessment? Purposes/functions ... Instructional Entering Behaviors Objectives Instructional Procedures Feedback loop Figure 1. Basic Teaching Model Performance Assessment Purposes/functions ... Instructional Functions Helpful to determine what to teach, how to teach it, and how effective instruction had been Tests encourage clarification of meaningful objectives Tests provide feedback to the teacher and to the learner Properly constructed tests can motivate learning Tests can facilitate learning Tests are useful meanses of overlearning Purposes/functions ... Administrative Functions Tests provide a mechanism of ‘quality’ control – policy decisions on curriculum and instructional practices Tests are useful for placement decisions – assigning individuals to various categories that represent different educational tracks or levels ordered in some way (remedial, regular, honors) Tests are useful for classification decisions – assigning individuals to different categories that are not ordered in any way (learning disabled, emotionally disturbed etc Purposes/functions ... Administrative Functions Tests are useful for selection decisions - to determine those who are/are not likely to succeed in subsequent learning tasks Tests are useful for accreditation/certification. They are useful to provide formal credit for demonstrated knowledge /proficiency/. Purposes/functions ... Counseling and guidance functions To provide information that promotes self understanding and help students plan for the future – to select careers that best match to a student’s abilities and interests Types of Assessment on the Bases of Purpose Preliminary/Prognosis/-before instruction/during the first days of school and provide a base for expectation thought of the school year concerned with student’s skills, attitudes, and physical characteristics and are essential to guide our interactions with others and with student. Formative -During Instruction Based primarily on continuous informal assessments such as Oral questions , observation. It also based on formally developed assessment such as quizzes, seatwork and homework. The purpose is to know whether or not students have achieved sufficient mastery of skills and whether further instruction over these skills is appropriate. to determine what adjustments to instruction should be made. Cont… Types of Formative Assessment Observations during in-class activities of students, non verbal feedback during lecture Homework exercises as review for exams and class discussions Question and answer sessions, both formal and informal Conferences between the instructor and student at various points in the semester In-class activities where students informally present their results Student feedback collected by periodically answering specific question about the instruction and their selfevaluation of performance and progress Summative-After instruction Based primarily on formally developed assessments like quizzes, tests, project works, term papers, lab works Purpose is to To certify student achievement and assign end-of-term grades For promoting and sometimes grouping students To determine whether teaching procedures should be changed before the next school year. Types of Summative Assessment techniques Examinations (major, high-stakes exams) Final examination (a truly summative assessment) Term papers (drafts submitted throughout the semester would be a formative assessment) Projects (project phases submitted at various completion points could be formatively assessed) Portfolios (could also be assessed during its development as a formative assessment) Diagnostic-before or, more typically, during instruction. When it is implemented before instruction it is used to anticipate conditions that will negatively affect learning. during instruction, it is used to establish underlying causes for a student failing to learn a skill/ recurrent learning difficulties. Assumptions and principles of educational assessment Psychological and educational constructs exist Construct is a trait or characteristic that a test is designed to measure (e.g., achievement) Psychological and educational constructs can be measured According to Cronbach (1990) “if a thing exists, it exists in some amount. If it exists in some amount, it can be measured.” assessment experts believe that educational and psychological constructs can be measured Although we can measure constructs, our measurement is not perfect Some degree of measurement is inherent in all measurement. Cont… The assessment of student learning begins with educational values Educational values should drive not only what we choose to assess but also how we do so. Assessment is most effective when it reflects an understanding of learning as multidimensional, integrated, and revealed in performance over time. it involves not only knowledge and abilities but values, attitudes, and habits of mind that affect both academic success and performance beyond the classroom. Assessment works best when the programs it seeks to improve have clear, explicitly stated purposes. Assessment is a goal-oriented process. It entails comparing educational performance with educational purposes and expectations Cont… Assessment works best when it is ongoing not episodic. Assessment is a process whose power is cumulative. Though isolated, "one-shot" assessment can be better than none, Assessment serves as a means to gather information to make decisions – not an end in itself. Through assessment, teachers meet responsibilities to students and to the public Cont… Basic Assumptions of Assessment The quality of students’ learning is directly, although not exclusively, related to the quality of teaching. To improve their effectiveness, teachers need first to make their goals and objectives explicit and then to get specific To make decision about students’ learning achievement, use of various classroom assessment is vital Teachers should understand that almost all assessment techniques have their own weaknesses and strengths. The use of a combination of assessment techniques increases the validity and reliability of the data obtained. To improve their learning, students need to receive appropriate and focused feedback early and often; they also need to learn how to assess their own learning. Cont… Systematic inquiry and intellectual challenge are powerful sources of motivation, growth, and renewal for teachers, and classroom assessment can provide such challenge. Classroom assessment does not require specialized training; it can be carried out by dedicated teachers from all disciplines By collaborating with colleagues and actively involving students in classroom assessment efforts, teachers (and students) enhance learning and personal satisfaction Cont… Clarification of what to assess/evaluate must be given priority in the evaluation process An assessment/evaluation procedures must be selected because of its relevance to the identified characteristic or behavior There are different ways to measure any given construct. Comprehensive assessment/evaluation requires a variety of techniques of evaluation. No single mechanism is adequate to appraise the learners’ progress toward all of the important learning outcomes Assumptions and principles of educational assessment Proper utilization of assessment/evaluation mechanisms requires an awareness of their limitations Assessment/evaluation is a means to an end and not an end in itself. The results obtained from an evaluation procedure should lead to various sorts of educational decisions. Continuous assessment It is the daily process by which teachers gather information about learners’ progress in achieving the learning targets It makes use of formal/structured (test, exam, assignment etc.) and informal/less structured (observation, oral questioning etc.) mechanisms of assessment It is meant to be integrated with teaching in order to improve learning and to help shape and direct the teaching-learning process. Continuous assessment The assessment is continuous because: • it occurs at various times as a part of instruction, • may occur following a lesson, • usually occurs following a topic and • frequently occurs following a theme. Why continuous assessment? Benefits It provides regular information about teaching, learning and the achievement of learning objectives and competencies. It also allows teachers to assess, in a classroom environment, performance-based activities that cannot or are difficult to assess in an examination (project works, model development etc. ). It is also a powerful diagnostic tool that enables pupils to understand the areas in which they are having difficulty and to concentrate their efforts in those areas. It also allows teachers to evaluate the effectiveness of their teaching strategies. The role of educational objectives in assessment Objectives /instructional objectives / educational objectives Relevance refers to whether the objective is based on the need of the society and the learner feasibility (realism of objectives) refers to whether the objectives are achievable or not In terms of content 1. Objectives should be appropriate for level of difficulty and prior learning experiences. 2. Objectives should be “real” in a sense that they describe behaviors that the teacher actually intends to act on in the classroom situation. Cont… represent what we hope students will learn or accomplish Objectives are stated desirable outcomes of education. They give direction for education They help teachers to plan instruction, guide students learning and provide criteria for evaluating learning outcomes Cont… Making specific and measurable in terms of attitude and apperception Example Poor: Students will learn to love science. Better: After a visit to a local hospital, pharmacy students will better appreciate the importance of scientific experimentation. objective should describe the overt behavior expected and the content Example Poor: Students will know the world capitals Better: Students will be able to recall the capitals of the countries in eastern Africa. Cont… In terms of form objectives should be stated in the form of expected student behavior not in terms if the teacher’s activities. Example Poor: The teacher will describe the major events in the Ethio– Italian war. Better: The student will recall the military event that directly leads to the outbreak of war between Ethiopia and Italy Poor “the teacher will show the students how to solve quadratic equations" Better "the student will be able to solve quadratic Cont… Objectives should be stated in behavioral or performance terms Example Poor: The student will see the importance of education (implicit). Better: The student will be able to identify three major importance of education (instructional objectives Objectives can be general or specific General objectives are broader in scope. They do not explicitly indicate what a student will be able to do. Begin each general objective with a verb (e.g., knows, applies, interprets); Example : The student will be able to understand Newton’s second law Cont… Specific objectives explicitly indicate what a student will be able to do clearly express our instructional intent; precisely specify the students’ performance we are willing to accept as evidence that general objectives has been attained; select appropriate assessment techniques. The student will be able to state Newton’s second law 2. Don't state instructional objectives in terms of the learning process Example Poor "the student will study a diagram showing human circulatory system" Better "the student will identify the parts of human circulatory system" 3. Don't include two objectives in one statement Example Poor "the student will be able to list and describe the fundamental causes of World War II" Better "the student will be able to describe the fundamental causes of World War II" 4. Specific objectives should be directly relevant to the general objective from which they are derived. For example consider the following general objectives i. Students will know basic terms…. “ the students will write the textbook definition of each term" ii. Students will understand basic terms “the students will paraphrase the definition of basic terms in their own words " A = The audience to whom the objective is written. It should be referred as the learner or the student not as the learners or the students. B = The Behavior or the type of change the learner is expected to acquire. This should be an overt, observable behavior. C = The condition under which the behavior will be demonstrated. D = The degree of proficiency or the amount of learning behavior the learner should display Bloom’s Taxonomy of objectives Three domains The cognitive domain – emphasis on understandings, awareness, insights. The affective domain – emphasis on attitudes, appreciations, etc The psychomotor domain – emphasis on practical skills Each of these have Knowledge Objectives at the knowledge level require the students to remember or recall information such as facts terminology, problem-solving strategies, and rules. Example The student will be able to name each state capital Define Select Identify State Outline Recite Recall Match List Name Comprehension Objectives at this level require some level of understanding. Students are expected to be able to translate, restate what has been read, see connections or relationships among parts of a communication interpretation, or draw conclusions or consequences from information (inference). defend summarize predict estimate convert distinguish discriminate explain Infer extend paraphrase Example the student will be able to explain how interest rates affect unemployment Application Objectives written at this level require the student to use previously acquired information in a setting other than the one in which it was learned. change employ organize transfer modify compute prepare use produce solve demonstrate develop relate operate Example the student will be able to apply multiplication of double digits in applied math problems Analysis Objectives written at the analysis level require the student to identify logical errors (e.g. point out a contradiction or an erroneous inference) or to differentiate among facts, opinions, assumptions, hypothesis, and conclusions break down differentiate illustrate distinguish subdivide relate point out diagram deduce outline separate out Example The student will distinguish the different approaches to establishing validity and illustrate their relationship to each other Synthesis Objectives written at the synthesis level require the student to produce something unique or original. compile create categorize compose rewrite design summarize devise formulate Example Given a short story, the student will write a different but plausible ending Evaluation Objectives written at this level require the student to form judgments and make decisions about the value or worth of methods, ideas, people, or products that have a specific purpose. appraise contrast interpret criticize compare justify support defend conclude validate Example The student will judge the quality of validity evidence for a specified assessment instrument Affective Levels of affective domain from simple to complex 1. Receiving- one is expected to be aware of or to passively attend to certain stimuli or phenomena. listen attend share look notice be aware control hear etc… Example: The student will listen actively when the teacher explains the difference between formative and summative evaluation. 2. Responding- One is required to comply with given expectations by attending or reacting to certain stimuli. follow play Practice participate discuss applaud comply obey Example: The student will submit the assignments on the deadline 3. Valuing- Display behaviour consistent with a single belief or attitude in situations where one is neither forced nor asked to comply. help express act argue display debate organize convince prefer Example: The student will express his support or opposition on the nation’s stand against religious fundamentalism. 4. Organisation- Commitment to a set of values. This level involves 1) forming a reason why one values certain things and not others, and 2) making appropriate choices between things that are and are not valued. select formulate balance abstract decide systematize compare define Example: By the end of the class, the student will reconcile his spiritual and economic views on helping beggars 5. Characterization- All behaviour displayed is consistent with one’s value system. one has developed a consistent philosophy of life display manage exhibit require internalize avoid resolve revise resist Example: During peer evaluation time, the student will evaluate his team mates objectively. Psychomotor Domain Levels of psychomotor domain from simple to complex 1. Imitation-The learner observes and then imitates an action. These behaviours may be crude and imperfect. repeat hold follow place grasp balance Example: The student will assemble the mobile phone after observing the technician’s demonstration. 2. Manipulation-Performance of an action with written or verbal directions but without a visual model or direct observation. Example: The student will assemble the mobile phone listening to the instruction given by the technician 3. Precision-Requires performance of some action independent of either written instructions or a visual model. Accurately with control proficiently Independently without error with balance Example: The student will be able to assemble the mobile phone at least 3 times appropriately given 5 chances. 4. Articulation-Requires the display of coordination of a series of related acts by establishing the appropriate sequence and performing the acts accurately, with control as well as with speed and timing. harmony speed confidence coordination proportion integration timing stability smooth mass Example: The student will be able to assemble the phone perfectly in less than 5 minutes. 5. Naturalization-High level of proficiency is necessary. The behaviour is performed with the least expenditure of energy, becomes routine, automatic, and spontaneous. naturally effortlessly professionally routinely with ease automatically with perfection spontaneously UNIT TWO ASSESSMENT STRATEGIES, METHODS AND TOOLS 2.1 Assessment strategies include: Quizzes, Tests, Examinations Anecdotal Records Interview Teacher observation Performance Task Exhibitions/Demonstrations Checklists, Scales Or Charts Classroom Presentations Diagnostic Inventories Peer Evaluation Self-Evaluation Portfolios Rubrics Simulation Students Journal Student-led Conferences Quizzes, tests, examinations A quiz, test, or examination requires students to respond to prompts in order to demonstrate their knowledge (orally or in writing) or their skills (e.g., through performance). Quizzes are usually short; examinations are usually longer. anecdotal records: objective narrative records of student performances, strengths, needs, progress and negative/positive behavior Interviews • An interview is a face-to-face conversation in which teacher and student use inquiry to share their knowledge and understanding of a topic or problem, and can be used by the teacher to explore the student’s thinking; assess the student’s level of understanding of a concept or procedure; and gather information, obtain clarification, determine positions, and probe for motivations. • Teacher observations: regular, first-hand observations of students, documented by the teacher Performance tasks During a performance task, students create, produce, perform, or present works on "real world" issues. The performance task may be used to assess a skill or proficiency, and provides useful information on the process as well as the product. Exhibitions/Demonstrations • An exhibition/demonstration is a performance in a public setting, during which a student explains and applies a process, procedure, etc., in concrete ways to show individual achievement of specific skills and knowledge • checklists, scales or charts: identification and recording of students' achievement can be through rubric levels, letter grade or numerical value, or simply by acceptable/unacceptable Classroom presentations A classroom presentation is an assessment strategy that requires students to verbalize their knowledge, select and present samples of finished work, and organize their thoughts about a topic in order to present a summary of their learning. It may provide the basis for assessment upon completion of a student’s project or essay. Diagnostic inventories: student responses to a series of questions or statements in any field, either verbally or in writing. These responses may indicate an ability or interest in a particular field. Peer evaluation: assessment by students about one another's performance relative to stated criteria and program outcomes self-evaluations: student reflections about her/his own achievements and needs relative to program goals portfolios: collections of student work that exhibit the students' efforts, progress and achievements in one or more areas rubrics: a set of guidelines for measuring achievement. Rubrics should state the learning outcome(s) with clear performance criteria and a rating scale or checklist. Using one assessment for a multitude of purposes is like using a hammer for everything from brain surgery to pile driving. Simulations: the use of problem-solving, decision-making and role-playing tasks. Diary/ student journals: personal records of, and responses to activities, experiences, strengths, interests and needs Student-led conferences: where the student plans, implements, conducts and evaluates a conference regarding their learning achievements. The purpose of the conference is to provide a forum in which students can talk about their school work with parents/carers and demonstrate their growth towards being self-directed lifelong learners. 2.2. Planning for Assessment tools (planning preparing an achievement test ) Steps involved 1. Defining the purpose of the test The purpose of the test will determine the kind of test to be used. This in return will determine the score reporting and interpretation, breadth and depth of the test coverage, item difficulty, item size etc. 2. Preparing table of specification (test blue print ) At this stage the number and types of items to be constructed are decided based on the the stated instructional objectives and the content delivered. 3. Selecting appropriate item format item format is selected considering the following preconditions a) The purpose of the test b) The time available to prepare and score the test c) The number of pupils to be tested d) The physical facilities available for reproducing the test e) Age and other characteristic of students f) Your skill in constructing the different types of items. 4. Writing and piloting the initial draft of the test At this stage, the teacher writes the test item, improve them using comments from self and colleagues and try out it on sample of students 2.3. Preparing Table of specification Preparing table of specification involves: 1. Listing down all specific instructional objectives treated in the class 2. Listing down all content areas to be covered in the class 3. Preparing a two-way grid/chart that depicts how many questions are to be tapped from each content or objective is listed down. When one to decide on the relative distribution of question for each content and objective area, he or she must consider: 1. amount of content contained 2. amount of instructional time devoted 3. roles as a future prerequisite 4. other opportunities to evaluate Table of specification developed for General Psychology Final Exam taken out of 60% Contents Perception Memory Learning Emotion Motivation Personality Total Objectives Know. Comp. 2 1 2 2 5 1 4 1 3 8 13 App. 2 3 10 Anal. 7 6 28 2 2 8 Synth. 1 4 2 3 Eval. Total 5 6 21 1 15 12 60 The use of Table of specification Generally, the use of table of specification or test blue print in test development will help ensure that only these objectives actually pursued in instruction will be measured that each objective will receive the appropriate relative emphasis in the test. that by using subdivisions based on content and behaviours, no important objectives will be overlooked or misrepresented. 2.4 Selecting and developing assessment methods and tools 2.4.1. Assessment made in the course of teaching 1. Class Work And Homework Class works: are tasks that are given during learning teaching process Homeworks are tasks assigned to students by their teachers to be completed outside of class 2. Observation Refers to watching the learner while performing the necessary skills(performance tasks ) Observational techniques / tools include: Anecdotal records , Checklist Rating scale , Socio-metric techniques Anecdotal recodes: Anecdotal recodes provide the least structured method of recording behavioral observation. It is merely a brief description of some observed behavior which appeared significant for evaluation purpose. Checklist Checklist is a prepared list of statements relating to behavior, trait and performance in some area or a product of some performance. Each statement in the list is checked in some way to indicate presence or absence of a particular quality. It is frequently used to evaluate aspects of pupil’s interests, attitudes, activities, skills, and personal characteristics. A student’s class participation cheecklist Class participation Listening the lecture Answering easy questions Answering all questions Answering difficult questions Taking lecture notes Leading group discussion…….. Yes No Rating Scale It is a device for systematically recording observers judgment concerning the degree to which a quality or trait is presented. Rating scale for a student’s class participation Class participation Listening the lecture Answering easy questions Answering all questions Answering difficult questions Taking lecture notes Leading group discussions ……. Never seen 1 Some times 2 Usually 3 always 4 Socio-Metric Technique Socio-Metric Technique is a method for evaluating the social relationships existing in a group. Each group member is asked to indicate those individuals they would prefer as associate for some group situation or activities. The number of choices each pupil receives serves as an indicator of his/her social acceptance. Strengths and weaknesses of observation Strengths It enables skills to be seen live It enables mistakes to be easily identified so learners can learn more It is reliable since evidence has been seen/first hand information is best gathered through observation Weaknesses Timing has to be arranged to suit all learners/ it is time taking . Assessor might not be objectives with decisions 2.4.2. Periodic Assessment method Test /examination Types of tests Tests can be categorized in different ways based on different criteria 1 : the Kind of answer required On the basis of this criterion tests are classified as Selection test item and Supply test item/ constructed response tests Selection type item /Selected-response Item Format/ requires student to select the correct answer from the given alternatives e.g Multiple Choice ,T/F, Matching Supply type item/ Constructed Response Item Format/ requires student to write the answer for the item. e.g. short answer, completion , essay 1. 2: The nature of item scoring Objective and subjective Objective test items have clear and unambiguous criteria to score items. because of this, different scorers can give the same result to students’ answer Subjective test items are scored differently by different scorers because of unclear and ambiguous criteria to score items 3. Degree of standardization Standardized and non standardized Standardized tests Are constructed by test specialists with curriculum experts and teachers Can be used to compare the performance of one school students with other schools It is administered at the same time in different place e.g. General school leaving Examination Non standardized tests / teacher made tests Is constructed by classroom teachers Used to gather information about the progress of students in the classroom It is administered at any time Can not be used to compare students of different schools 4: Number of testees Individual and group test Individual test Is designed to be administered to one person at a time. It requires much training ,time ,money, and experience to administer than group test Group test Is administered to a group of persons at the same time and place 5: Nature of responses required by the item/ language emphasis of the response / verbal , non verbal and performance tasks Verbal : mostly use written language to respond items Non verbal : De-emphasis the role of reading and writing language The responses of the items presented in the form of pictures, figures ,musical records etc Performance tasks Requires students to perform a task rather than to answer questions. 6. The speed of students to complete the test items Speed and power Speed : Requires student to complete the items as fast as possible within short period of time. The items are easy Here only the most exceptional students win the competition Power test Has enough time to complete the exam It focuses on the amount of knowledge , and comprehension students possess rather than speed In comparison to speed tests power tests are difficult. 7. Scheme of interpretation of results Norm referenced test and Criterion referenced test 8. Attributes / behavior being measured Cognitive test and Non cognitive tests Cognitive tests used to measure intelligence , reasoning ability , and academic achievement. it has correct or best answer(Achievement test and aptitude test are included under cognitive tests Non cognitive tests Used to Measure non academic or affective behaviors eg. personality test like emotional adjustment tests ,tests used to measure interpersonal relationship , motivation, interest attitude etc 9. In terms of content Survey and Mastery Survey test seeks a broad approximation of students achievement by measuring attainment of a sample of the objectives in one or more levels of a curriculum. Mastery test is usually employed to get more detailed information about student’s achievement over a short range of objectives Selected Response Item Format True/False , matching, multiple choice & interpretive exercise Advantage reduces marking time of students response. Speedy assessments Wider sampling of content areas Provision of automatic feedback to students particularly when used in computer-based assessment. Questions can be pre-tested in order to evaluate their effectiveness. Limitation Significant amount of time is required to construct good questions Writing questions that test higher order skills requires much effort as compared to constructed ones. Cannot easily and directly assess written expression, creativity and performance. Problem of guessing True /False/ Alternative Response Item Format A statement will be given & students express their agreement or disagreement to the truthfulness/correctness/ of the statement by choosing either of the two mutually exclusive options. The mutually exclusive options can be given as: True or False, Correct or Incorrect, Yes or No, Right or Wrong, Valid or Invalid etc. May also be required to judge whether the converse statement is also correct or not. Hence, the additional options become Converse True or Converse False. Eg . T F CT C F. All Ethiopians are Africans. First the student has to judge whether the direct statement ‘’All Ethiopians are Africans ‘’ is True/ False. Second, he/she is also required to judge whether the converse statement is True i.e. CT or False i.e. CF by mentally reversing the direct statement from ‘’All Ethiopians Ethiopians’’. are Africans’’ to ‘’All Africans are Advantages 1. Are comfortable for young children or pupils. 2. It can cover a larger amount of subject matter in a given testing period than can any other selected response item format. 3. The problem of misspellings and lack of legibility or neatness of students hand writings is not the issue of this item format . 4. Are appropriate when there is lack of 3 or 4 plausible (equally attractive) distracters in multiple choice. 5. If carefully constructed, it measures the higher mental processes of understanding, application, and interpretation. For instance, this is an example developed from application level in the cognitive domain. T F 1. If X+3X=9, then the value of X is 2. To answer the above item the student must regress through the appropriate mathematical algorithm. Limitations 1. Highly susceptible to guessing effect. ( 50% guessing probability is provided in this item format) 2. highly susceptible to cheating 3. The problem of getting statements which are unequivocally true or false. 4. True-False items tend to rely heavily upon rote memorization of isolated facts, thereby trivializing the importance of understanding those facts. Guidelines for constructing True/False items 1. Test significant contents of a course and avoid trivial statements. T F Benjamin Bloom had Jewish blood origin. What is the significance of this item for educational assessment and evaluation course? 2. Write items that can be classified unequivocally as either true or false and if it is an opinion or arguable theory recognise the source of theory or opinion. Look at the following example: Poor : T F. Females feeling of inferiority results from their lack of penis. Better : T F. According to Freud Females feeling of inferiority results from their lack of penis. 3. Avoid taking statements or verbatim directly from textbook. Because It pushes students to engage on rote learning rather than getting its gist. The context under which the verbatim or statement used on the text (exercise) book may not be available on the exam. So the exam becomes ambiguous. For example: T F Poor: The Square of the hypotenuse of a right triangle equals the sum of the squares of the other two sides. T F Better: If the hypotenuse of an isosceles right triangle is 7 inches, each of the two equal sides must be more than 5 inches. 4. Ask only a single major idea in each item. And avoid compound statements unless the item measures cause-effect relationship. Poor T F: Ethiopia is the oldest origin of human civilization and currently it is among middle income countries of the world. Better T F: Ethiopia is the origin of human civilization Better T F: Currently Ethiopia is grouped among middle income countries of the world. 5. Avoid tricky questions. Tricky questions cheat students Eg . Writing an item in misspelled manner. Poor T F: The largest sports kit producer is Addidos. Better T F: The largest sports kit producer is Addidas. 6. Avoid using absolute degree indicator terms like “always,” “all”, or “never’’ in False statement and relative degree indicators terms like “usually,” “often,” “many” in True statement. T F. All Americans are educated. T F. Most Americans are educated. 7. Avoid using negatively worded statements and if it is obligatory write it in italics, bold or underline it. T F Poor: Animals do not produce their own food. T F Better: Animals produce their own food. 8. Put the items in a random and make the number of true and false options approximately equal. This minimizes the problem of response set. 9. Try to avoid long drawn-out statements or complex sentences with many qualifiers. This will help students to understand what is asked and reduce language barrier. 10 . Make the length of True and False statements approximately equal. Avoid making items that are True consistently longer than those that are False. Matching exercise A matching exercise typically consists of a list of questions or premises to be answered along with a list of responses. The examinee is required to make an association between each question column (premise) and the response column (alternatives). 5.2.3 Advantages It is compact It measures large number of objectives stated at knowledge level in a minimum amount time./ item sampling is higher/ 5.2.4 Limitations It is generally restricted to the measurement of factual information based on rote learning. Difficult to find homogenous material. Guidelines for Constructing Matching Exercise 1. Use homogeneous material in each list of a matching exercise. Column A Column B 1. Abebe Bikila A. Atlanta 2. Fatuma Roba B. Barcelona 3. Mirutse Yefitir C. Gura 4. Mamo Wolde D. Mexico 5. Derartu Tulu E. Moscow 6. Ras Alula Abba Negga F. Munich 2. . Include directions that clearly state the basis for the matching. 3. Keep the lists of items to be matched unequal. 4. Put the questions or the premises (typically longer than the responses) in a numbered column at the left, and the response choices in a lettered column at the right. 4. Arrange the list of responses in alphabetical or numerical order if possible in order to save reading time. Look column B under table 4. The names of the Olympic cities are arranged in their respective alphabetical order of their initial letters 5. Limit the least of premises between 5 and 10. Furthermore, put all the responses to be matched on the same page as it helps to prevent the production of noise when students flip back and forth the test booklet. Multiple Choice The multiple choice item (MCQ) consists of two distinct parts: 1. The first part that contains the task or problem is called stem of the item. The stem of the item may be presented either as a question or as an incomplete statement. The form makes no difference as long as it presents a clear and a specific problem to the examinee. 2. The Second part presents a series of options or alternatives. Each option represents possible answer to the question. In a standard form one option is the correct or the best answer called the keyed response and the others are misleads or foils called distracters. The number of options used differs from one test to the other. An item must have at least three answer choices to be classified as a multiple choice item. The typical pattern is to have four or five choices to reduce the probability of guessing the answer. A good item should have all the presented options look like probable/plausible/ answers at least to those examinees who do not know the answer. Multiple... Knowledge of terminology Knowledge of specific facts Knowledge of principles Knowledge of methods and procedures Advantages of Multiple Choice Items It is flexible It avoids the problem of spelling error by students Multiple choice items have greater reliability per item The need for homogenous material is minimized or avoided It is relatively free response set Using a number of plausible alternatives makes the results useful in diagnosing students’ learning errors. Cont… It is free from the common weakness of other type items Students should know the answer Example T F Africa union was established in Algeria. Africa union was founded in _______. A. South Africa C) Ethiopia B. Kenya D) Algeria Limitations of Multiple Choice Items 1. It is limited to the measurement of verbal material 1. 2. It is unsuitable to measure synthesis and evaluation levels of the cognitive domain Difficulty of getting plausible distracters Suggestion for Constructing Multiple Choice Items a clearly stated problem the identification of plausible alternatives, and removing irrelevant clues to the answer The stem of the item should be meaningful by itself and should present a definite problem. State the stem of the item in positive form wherever possible Cont… Put as much of the wording as possible in the stem of the item All the alternatives should be grammatically consistent with the stem of the item.] All items should contain only one correct or clearly best answer. Make the distracters plausible and attractive to the uninformed The correct answer should appear in each of the alternative positions an approximately equal number of times but in random order. Cont… Use carefully “none of the above” and” all of the above” as alternatives rather better to use A & B or B & C Make certain each item is independent of the other items in the test Avoid an intentional clue to the correct answer irrelevant clues specific determiners correct answers that are consistently longer Grammatical inconstancies between the stem and the wrong alternatives tend to be easier than items without these faults Subjective Items Format Essay items enable students to select, organize, interpret, and present ideas in their own ways. it provides more freedom to respond for the student than objective test formats A. Restricted response questions It limit both the content and the forms of pupils response\ Example Write in a short paragraph the reasons multiple choice items are widely used? Cont… Extended Response Essay Questions It allows the student to determine the length and complexity of response Most useful at the synthesis or evaluation levels of cognitive taxonomy. To determining whether students can organize, integrate, express, and evaluate information, ideas, or knowledge Example Evaluate the effects of globalization. Identify as many different ways to generate electricity as you can. Give the advantages and disadvantages of each and how each might be used to meet the electrical requirements of a medium – sized city. Cont… Advantages of Essay Items Most effective in assessing complex learning outcome Relatively easy to construct Emphasize essential communication skills in complex academic disciplines Guessing is eliminated Cont… Limitations of Essay tests Difficult to score Scores are unreliable The limited sampling they provide Bluffing: Suggestions for Constructing Essay Questions Restrict the use of essay questions to these learning outcomes which cannot be satisfactorily measured by objective items Formulate questions that will call forth the behavior specified in the learning outcomes Cont.. Phrase each question so that the pupils’ task is clearly indicated Indicate an approximate time limit for each question Avoid the use of optional questions Suggestions for Scoring Essay Items Prepare an outline of the expected answer in advance Use the scoring methods which is most appropriate Two formats I. The point method and II. The rating method. Cont… A. The point method The criteria may include, content, organization, word selection, accuracy/reasonableness, completeness, originality and so on B. The Rating method The teacher generally is more interested in the overall quality of the answer than in specific points. General recommendations for scoring essay item Decide on provisions for handling factors that are irrelevant to the learning outcomes being measured i.e hand writing, language …….. Evaluate all answers to one question before going on to the next question Evaluate the answers without looking at the pupil’s name. If important decisions are to be based on the results, obtain two or more independent ratings Assembling the Classroom Test Early preparation of exams Extra items make it easy to eliminate those items found to be defective i. Recording of Test Items it is desirable to write each item on a separate index The card should contain information concerning the instructional objectives, the specific learning outcome, space should also be reserved on the card for item analysis information Cont… ii. Review of Test Items a. Reviewing the items after they have been set aside for a few days, and b. Asking a fellow teacher to review and criticize the items iii. Arranging of Items in the Test the types of items used the learning outcomes measured the difficulty of the items and the subject matter measured Cont… Based on Test items True-false Matching items Supply type ( short answer and completion ) Multiple-choice Interpretive exercises Essay questions Learning outcomes For example, the items in the multiple-choice section might be arranged in the following order: knowledge of terms knowledge of specific facts knowledge of principles application of principles Cont… iv. Preparation of Directions for the Test purpose of the test time allowed for completing the test basis for answering procedure for recording the answers v. Reproducing the Test The test items should be spaced and arranged it is desirable to proofread the entire test before it is administered. Charts, graphs and other pictorial must be checked Cont… Administering of the Tests Adequate working space Quiet room Appropriate light Ventilated room Comfortable seat and so on. Test anxiety Some of the excessive test anxieties caused by: threatening pupils with tests, warning pupils to do their best “ because this test is important” telling pupils they must work fast to complete the test on time threatening terrible consequences if they fail the test. Cont… Other psychological factors to be considered by the teacher are: Time of testing, if tests are administered just before “The big game” or “the big holiday”, the results may not be representative. Individual pupil fatigue, the onset of illness, or worry about a particular problem may prevent maximum performance. Things we need to avoid during test administration Do not talk unnecessarily before letting students start working Keep interruptions during the test to minimum Avoid giving hints to pupils who ask about individual items Discourage cheating Activities that do not match with test administrations Item Analysis Procedures Steps of Item analysis 1. Arrange the scored test paper in order from the highest to the lowest or in reverse order 2. From the arranged test papers, form two groups. That is upper and lower groups If students are 40 and below all are included in the analysis if they are more than 40 Nx 27/100 upper group and Nx27/100 lower group Difficulty level/p/ = Ru+RL T Cont... P levels are less than about 25%, the item is considered relatively difficult When P levels are above 70%, the item is considered relatively easy Test construction experts try to build test items that have most items between P levels of 20% to 80% with an average P level of 50%. Discrimination power of the item D= Ru-RL/1/2 T If the value of D is ≥ 0.40, the item is very good 0.30 – 0.39, the item is reasonably good but subjected to improvement 0.20 – 0.29, the item is moderately good but needs revision is < 0.20, the item is poor, needs serious revision or rejected Cont…. Evaluating the effectiveness of distracters Distracters effectiveness is determined by inspection or observation A good or effective distracter is the one that attracts more students from the lower group than the upper group Group Alternatives A* B C D Upper (10) 5 4 0 1 Lower (10) 3 2 0 5 Cont… Option “B” is a poor distracter because it attracts more pupils from the upper group Option “C” is completely ineffective as a distracter because it attracted no one Alternative “D” is functioning as intended INTERPRETATION OF SCORES Once test items are scored, the teacher should organize their results and give meaning to the scores. A student’s score has no meaning by itself. It has meaning when compared with other students’ score or compared with a certain criterion In this part we will address some of the concepts of basic descriptive statistics, such as measure of central tendency, variability, and relationship. Interpretation Table 1: Scores of 25 grade nine students on a mid exam of mathematics out of 40 % 25 24 26 28 28 28 26 27 20 27 26 25 31 26 14 26 32 28 36 27 27 29 29 26 27 Cont… The following scores are geography final examination results out of 60 for 40 students 30 17 38 46 29 20 16 39 26 29 50 27 36 29 30 51 27 36 22 31 16 28 28 18 32 39 44 49 56 13 22 21 24 27 39 12 10 12 10 20 The Use of Summary Statistics The statistics used to report the typical score is called measure of central tendency.It includes mean, median, and mode, To describe the amount of score differences among students use measure of variability. It includes range, standard deviation, and variance Use measures of relationships Measures of Central Tendency Describe points on a distribution that represents the average or typical values. Cont… The Mode Mode is defined as the score value which is obtained most often. It is the most frequently occurring value. Example The following are the scores of 10 students on a 25 item spelling test. 12, 18, 16, 20, 13, 19, 19, 19, 20, 17 The Median (Mdn) The median is the point that divides the number of ordered or ranked scores in a distribution into equal parts. It is determined by arranging the scores in order of magnitude and selecting the value that separates the score in to equal parts. Cont… Example To calculate the median score of the above score you should arrange the scores like this: 12, 13, 16, 17, 18, 19, 19, 19, 20, 20 The median of the above distribution, therefore, (since it is even) is 18+19/2 Mdn = = 18.5 The Mean The mean is the average score, obtained by adding all the scores and dividing the sum by the total number of scores. Measures of Variability The Range The range is the difference between the highest score and the lowest score in the distribution Standard Deviation The standard deviation indicates the average of the distances of all the scores around the mean. It is the most common and useful measure of variability. More formally, the standard deviation is the square root of variance (S2).