Test Writing Basics Molly Baker, Ph.D. Planning Good Tests Determine the amount of time available, the location of the test, the level of difficulty desired, and the percentage of weight the test will hold compared to the total number of assessments. Create test specification charts to help plan the test. See examples below, which assume that the test creator has determined that an objective test with some constructed items is best. Write sample items and ask colleague to critique them. Revise. Administer test. Evaluate test by conducting item analysis and collecting feedback from the students on items that were frequently missed. Was the question confusing or is the content it was attempting to measure confusing? Example 1: Bloom's Knowledge Comprehension Application Analysis Synthesis Evaluation Total Content Area 1 15%: 9 items Content Area 2 5%: 3 items 5%: 3 items 15%: 9 items 5%:3 items Content Area 3 Content Area 4 5%: 3 items 5%: 3 items 5%: 3 items 5%: 3 items 35% 15% 5%: 3items 20% Total 20%:12 items 5%: 3 items 5%:3 items 30% 20% 30% 20% 10% 10% 10% 100% Example 2: Objectives MC TF Obj 1 (know) Obj 2(apply) Obj 3(comp) Obj 4(synth) Obj 5(apply) Total 5 pts 5 pts 10 5 5 Matching Short answer 10 pts (5 items) Essay Total items 5 pts (1 item) 25%:25 items 15%:15 items 10 5 pts (1 item) 15 15%:15 items 5 40 20%:20 items 10 10 pts (5 items) 20 10 1 10 pts (1 item) 20 25%:25 items 100 Objective Test Items Objective items assess content based on objectives. Plan on about 1 minute per MC question, 2 minutes per TF, 3-5 minutes per short answer. Common types: Multiple Choice True False Fill-in and short answer Matching It's more challenging to write good items, but easier to grade and easier to select a more representative sample of questions from all subject areas to be tested. It is easier for the test taker to guess. Multiple choice questions tend to be less ambiguous or subject to misinterpretation than do T-F ones. Multiple Choice Items: How to develop them: 1. Write propositions (facts, concepts (abstract or concrete), principles (cause and effect, relationship between two concepts, laws of probability, axiom), or procedures (sequence of mental or physical acts leading to a result)) that represent important content the students have been learning. These are often based on the instructional objectives you have already written. Students should feel they are being tested on material that is important to know, not trivial content. 2. Convert these propositions into questions (called the "stem"). a. Each question should address one idea at an appropriate reading level. b. The central idea should be in this stem rather than in the choices. Sample: What best defines photosynthesis? NOT=Photosynthesis is: c. Each question should be a complete sentence. Do not use an incomplete statement. Sample: Which tool is recommended for taking blood pressure readings? NOT=Taking blood pressure readings uses _______ tool: d. Try not to use negatives (not, except). If unavoidable, put the negative word in CAPS. If it is important to measure a student's knowledge of all of the possibilities, convert the question into multiple questions rather than one with a negative term. Never use double negatives. e. Put the important part of the question near the beginning. 3. Develop a correct answer (vary the position of the right answer in the items) 4. Develop plausible distractors: a. Select distractors from common misconceptions and misunderstandings about the central idea of the question, perhaps selected from errors students have made. A higher-level thinking list of distractors includes other correct answers too. The test taker is then asked to select the "best answer," requiring him/her to evaluate all of the correct answers. Sample: 2 Which approach is most effective for reducing fevers in young children? b. Try to make answer choices equal in length (preferably short), similar in complexity, and equal grammatically to the correct answer. c. If the answer choices have a natural order, arrange them in that fashion (dates, ages). Otherwise, arrange them randomly. d. Use "all of the above" or "none of the above" infrequently and not as a filler. It is OK to have 3-6 distracters in various items throughout an exam. An alternative is to use the "best answer" approach described above, or construct multiple truefalse items (see True-False section). e. Move any repetitive words or phrases that appear in all distractors to the stem. 5. Check for correct grammar, punctuation, spelling, and capitalization. 6. Format the item vertically instead of horizontally. HOTS recommendations: 1. To test "understanding" (ability to explain a term, concept or principle beyond rote memory) the question can ask students to identify a correct nonverbatim definition, identify characteristics or noncharacteristics, or identify examples or nonexamples. Samples: Which best defines ___________? Which is (un)characteristic of ___________? Which of the following is an example of ____________? 2. To test critical thinking "prediction," the question requires the student to predict what will happen given information OR what caused something to happen (usually based on understanding of particular concepts and principles). Samples: What would happen if _______? If this happens, what should you do? On the basis of _______, what would you do? Given __________, what is the primary cause of __________? 3. To test critical thinking "evaluation," the question asks the student to select a criterion or criteria, use a criterion or criteria, or both (usually based on understanding of particular principles and the application of a procedure). Samples: What is most effective (appropriate) for ________________? Which is better (worse) ______________? What is the most effective method for _______________? What is the most critical step in this procedure? What is (un)necessary in a procedure? 4. To test critical thinking "problem solving," multiple steps must be required of the student. Therefore, objective questions of this sort are usually presented in sets (usually requiring understanding of concepts, principles or procedures and mental skills to select which ones; most difficult to teach and test). Sample set: 3 What is the nature of the problem? What do you need to solve this problem? What is a possible solution? Which is a solution? Which is the most effective (efficient)solution? Why is _______the most effective (efficient) solution? True-False Questions: Less reliable than multiple choice. Best used when a question has only two plausible answers. Sample: Increasing parental involvement with math homework reduces student performance on math tests. Also good to use when testing the ability to apply a principle (HOTS). Sample: It is easier for a poor student to get a good score (80 percent correct) on a true-false test if the test includes only 50 items than if it includes 100 items. How to Develop Them 1. Select a single idea from the proposition, such as a concept or principle, that is important and worth saying. 2. Write a true statement, based on the idea, that would be easy to defend to an expert but not be obvious to anyone. 3. Write a false statement based on the same idea, usually a parallel but opposite statement. Sample: More salt can be dissolved in a pint of warm water than in a pint of cold water. More salt can be dissolved in a pint of cold water than in a pint of warm water. 4. Use determiners in a variety of ways throughout the exam (all, never, some, few….avoid a pattern such as statements with "never" always being false) 5. Choose more false statements than true. 6. To reduce ambiguity, use internal comparisons. Sample: Open-book tests tend to me less inefficient than closed-book tests. NOT=Open book tests tend to be inefficient. 7. Avoid exact wording of the text book. 8. Avoid tricky items; make it clearly true or false so it can be defended. 9. If multiple answers are correct, a multiple tru-false question may be a better format to use. Sample: An ecologist losing weight by jogging and exercising is 1. increasing maintenance metabolism (T) 2. decreasing net productivity (T) 3. increasing biomass (F) 4. decreasing energy lost to decomposition (F) 5. increasing gross productivity (F) 4 Short Answer Questions: Often written to focus the question toward recalling facts or applying principles. Easier to write than a multiple-choice or true-false question, but potentially more difficult or more time-consuming to grade. How to Develop Them 1. Based on a proposition, write a question rather than an incomplete statement. 2. For fill-in-the-blank statements, avoid more than one blank; put the one blank at the end of the sentence. Matching: Best used for lower-order objectives. How to Develop Them 1. Identify a category of items. 2. Arrange the premise and response options in 2 columns, numbered items on the left and lettered items on the right. 3. Clearly specify in the directions whether the matching is one-to-one or one-to-many and whether response items can be used more than once. 4. Use between 6 and 15 question stems and 2-3 more response options than question stems. 5. Put the entire matching set on one test paper. Essay Tests Essay tests assess content, process and/or writing skills depending on the objectives being measured. Plan on about 5 pages of writing per hour, depending on amount of thinking required to address the complexity of the problem. Essay tests are easier to write, but grading takes more time. It's important to have criteria (checklist?) identified for correct answers to reduce subjectivity, increase consistency of scoring across student papers, and minimize the potential impact of "bluffing." It is recommended that scoring of one item be done for all students before moving on to the next item. Provide feedback to students on their answers and share your key and model answers with the students; these practices will increase performance on future exams. Do not let writing skills affect your evaluation of the answer's content unless the students know they are being evaluated for such. Essay tests can draw on fewer representative subject areas than objective tests. A good essay question requires an original thoughtful response composed by the examinee in the form of several sentences, not the recall of memorized items or an undefended opinion. Essay tests do not do a better job of determining how well students can analyze, organize, synthesize, or develop original ideas if the efforts of instruction were not directed toward such goals. The tendency is to focus instruction on establishing a knowledge base and focus the evaluation on the application of that knowledge. Planning can prevent these inconsistencies. Essay Questions: How to Develop Them 1. Select an objective(s) that the essay will measure. 5 2. Delimit the scope of the content to be covered. One way to do this is the develop the criteria for evaluation and then write a question aimed at those criteria. Avoid making the question so general that any number of right answers are possible; it makes it very difficult to evaluate the answers then. It is tempting to compare the thoroughness of the answers between students instead of the correctness of an answer as prescribed by the checklist. 3. Define the student's task as clearly and specifically as possible. The verb in the objective may suggest the task (analyze, interpret, explain, predict). Avoid using verbs such as discuss, comment on, elaborate on unless you make it clear what you expect. 4. Avoid using essays to measure objectives that can be better measured by objectively scored items. Use essays to measure ability to synthesize, integrate, speculate and perform other higher-level tasks. 5. Use several short essay questions rather than one long one. It makes preparing the scoring guide much easier and your grading more reliable from one test to another. 6. Develop a model answer or scoring guide. Show the objective, the question, and your scoring guide to a colleague for feedback. Revise as needed. 7. After the test, look at the range of answers received to determine if the question was sufficiently delimited and made clear. 8. Share sample questions and exemplary answers with the students before you give the exam. Practice essay quizzes are a good idea too. Performance Tests: These tests are often created to assess complex thinking skills that can easily be observed (e.g., fix a cellphone), complex mental or physical behavior that is not easy to observe without the help of a checklist or rating scale (e.g., efficient planning of a lesson OR correct loading of a dishwasher), or complex physical behavior that can only be evaluated through observation (e.g., correct serving of a tennis ball). Examples: 1. Actual Performance (assess content AND process; observe, rating quality or describe, often with a rubric) Interpretive reading aloud Teach a child a concept Do CPR Demonstrate how to serve a tennis ball Conduct an interview for human relations office Perform an original dance Play the clarinet Conduct an ITV lesson on the system Do quality of water analysis of nearby stream Write an essay (when assessing writing skills) 2. Simulation of performance (assess content AND process used or proposed) Flight simulation Design a stock/bond transaction plan based on data provided Diagnose an engine problem based on data provided Propose clinical decisions to address presented case 6 Propose management decision to respond to presented scenario Outline strategy for taking "sample" objective psychology test Conduct a mock interview 3. Product development (assess the product, the outcome of performance; appropriate process (actual performance) is inferred) Compose a nursing care plan based on data Make a woodworking project Develop a weekly menu Write a book review or speech or lab report Outline a chapter in biology text book Design a lesson plan for teaching ______ to ________ Write an annual report based on data Portfolio (organized collection of student work often used to demonstrate progress) Oral report Research paper 4. Identification (of real objects; "go get a ___" or "go get me a _____ that does _______" Carpenter's tools Rocks Muscles, bones, nerves Works of art by particular artists or in a particular style Musical instruments Leaves Lab equipment Map locations or routes Birds 5. Performance task with prescribed components and criteria for evaluation (assess each component using performance criteria shared with learners ahead of time) Online course design Research article critique Journal of internship experience Cost-benefit analysis of corporate initiative Experiment (set up and complete experiment with recorded observations and conclusion) How to Develop Them 1. Select an objective or objectives to which a performance task is appropriate. 2. Determine the length of time the learners will have to complete the test/task (usually a full period to several days or weeks). 3. Develop one or more potential tests/tasks tied to the objectives. a. How structured is the definition of the problem? In other words, will you tell them what the problem is (structured) or are they to figure out what the problem is and go from there (unstructured)? Is the type of problem meaningful and realistic within the subject taught, i.e., is it contextualized (authentic)? 7 b. How much will you tell the students about what materials to use, what proportions to use, where to get the information they need, etc.? (scaffolding) Whatever is decided, are the directions clear? c. Are the students free to choose what strategies they will use to solve the problem, or are they to choose between ones you describe, or do you expect them to invent one that is consistent with limits you set? (alternate strategies) Do they encourage the learners to draw on a variety of skills, knowledge, and processes? Is your description of the scope of the project clear? d. Are alternate solutions possible or is only one acceptable? (alternate solutions) 4. Decide if the students are to work alone or in groups and if the latter, whether their ability to work in the group will be part of the evaluation. If so, develop guidelines and conditions for collaborative work. 5. Decide if the students can solicit feedback or assistance form others while preparing their "project." How much and when? 6. Develop the scoring criteria. Will they be based on the product produced, the process used to create the product, or both? Are they specific enough to give guidance to the students without being so rigid that a multitude of product modes are ruled out? Avoid unclear language in the criteria. If the scoring criteria are too complex for the students to understand, translate them into a checklist so that they know what is expected. Sample Parallel park without hitting the curb. NOT=Parallel park precisely. 7. Assemble necessary materials or equipment for the learners to use. 8. Evaluate what types of assistance will be needed during the project and make appropriate arrangements. Plan to attend the ICC workshop on designing rubrics, checklists, etc. for performance testing. References for Item Writing Baird, H. (1997) "Evaluating higher-order thinking" Performance assessment for science teachers. Accessed: http://www.usoe.k12.ut.us/curr/science/Perform/Past4.htm Baird, H. (1997) "Performance Assessment for science teachers" Performance assessment for science teachers. Accessed: http://www.usoe.k12.ut.us/curr/science/Perform/Past5.htm#Performance Ebel, R.L. & Frisbie, D.A. (1991) Essentials of educational measurement, 5th ed. Englewood Cliffs, NJ: Prentice-Hall Inc. (Lots of examples) Gronlund, N.E. (1991) How to write and use instructional objectives. New York: Macmillan Publishing. (Lots of verbs tied to types of learning in 3 domains, including Bloom's taxonomy in the cognitive domain.) Haladyna, T.M. (1997) Writing test items to evaluate higher order thinking. Boston: Allyn and Bacon. (Lots of examples) 8