Lecture 8:Subjective Items (Constructed Response) and Essays Subjective Items PROS: Require RECALL of info (less guessing) Helps identify unanticipated misconceptions and problems May be more convenient for you to write Able to measure more complex outcomes than most selected response items Able to measure outcomes not measured with paper and pencil Gives the student the freedom to express self, be creative CONS: Items must be CLEARLY written and unambiguous, your writing skills are crucial Subjectively scored so less reliable and therefore, less valid, May need to develop a rubric of some kind to score so more time consuming Cover less material because they are more time consuming to take Difficult to use with large groups Some students try to bluff on written items Student writing ability may improperly influence your content-based grades While the constructed response category contains Performance Assessments, Portfolio Assessments, and Product Assessments, tonight’s discussion focuses on short-answer items (including fill-in-the-blank and keyed response) and essay items. **As always, the behavior required by the item or assessment must match the behavior described in the instructional objective being measured. **You also must always remember to consider the previous experience and developmental status of your students. I. Short-Answer Items These items require the student to provide a written response. The response required of them may be a word or two – as in fill-in-the-blank, a phrase, a sentence, or several sentences. A. There are Two Types of Short-Answer Items 1. Direct Questions Example: What is the technical name for shortsightedness?________________ or 1 In the space below, briefly define myopia. 2. Incomplete Statements Example The technical name for shortsightedness is __________________. VIP things to remember about short answer items: 1. A major strength of short answer items is that they provide a vehicle for the development of written expression that will eventually lead to better essay writing. 2. However, remember that these are not suitable for measuring complex learning outcomes. Other types of constructed response (subjective) items are more appropriate for that purpose. Short-Answer items are suitable for assessing relatively lower level learning outcomes, such as those that focus on the acquisition of knowledge, basic comprehension, and simple applications of information. B. Short-Answer Item Writing Guidelines 1. Remove only one or two key words from incomplete statements. **Avoid “Swiss Cheese” items. Poor Example: __________ is __________ but not sufficient for __________. Better Examples Reliability is necessary but not sufficient for __________. In the space below, describe the relationship between reliability and validity. 2. Place blanks near the end of the statement for incomplete statements and at the right margin for direct questions. Putting the blank at the end makes the item clearer (less ambiguous). It also saves your time (and vision) during grading. 2 POOR Example: Warm-blooded animals, called _______________, are born alive and suckle their young. Better Examples: Warm-blooded animals that are born alive and suckle their young are called ___________. What are warm-blooded animals that are born alive and suckle their young called? _______________________ or, use a more advanced version of this item: In the space below, name three characteristics of mammals. 1. 2. 3. 3. Write items that direct the student to the one correct and concise answer. If your item is poorly worded, vague or ambiguous, students will project meaning into the question and possibly misinterpret it. POOR Examples: An animal that eats the flesh of other animals is ______________. (hungry?) John Glen first orbited the Earth in _______________. (a spaceship?) Better Examples: Name the term from the text that describes animals that eat the flesh of other animals. _______________ In your own words, in the space below, define the term ‘carnivore’. In what year did John Glen first orbit the Earth? _______________ 3 4. Blanks should be kept at a uniform length so that the student is not given unintentional clues. Avoid using shorter blanks for short answers and longer blanks for longer answers. 5. Avoid giving grammatical clues to the correct answer. Watch your articles a(an) so you don’t give away whether the response begins with a consonant or a vowel. 6. If the answer is numerical, indicate the type of measurement units and accuracy desired. Examples: Jill ate one-half of the apple pie. Jack ate one-quarter of the peach pie. How much pie did they eat all together? Give your answer in decimal notation. or Calculate the mean of the following scores. Round your final answer to tenths. 6 7 4 8 2 7. Employ direct questions rather than incomplete statements for younger students and students with learning disabilities. 8. Never take statements directly from the textbook to use for fill-in-the-blank items. Statements taken out of context are more likely to be ambiguous. Caveat for using any constructed response item: The more complex the nature of the student constructed response, the more attention you will have to give to scoring the responses. Otherwise the reliability of scores, and hence, the validity of your decisions, will be seriously compromised. C. Short-Answer Item Scoring Guidelines 1. The score value for each response in a fill-in-the-blank item should be a single point or two, and awarded consistently across all items. 2. If you accept synonyms for a correct answer, be sure you are consistent across the class. 4 Examples: Name the type of item that is scored objectively. ____________________ (I wanted “selected response”, but also accepted particular types of selected response items that students named, since they were scored objectively.) 3. For keyed response items, you may want to award credit for student-invented responses (not taken from the word bank) that are also correct answers. 4. The score value for more extensive short-answer items should be based on how well the student’s response meets the criteria specified in the item. a. For keyed response items, award credit for the correct answer and credit for correct spelling. (Be sure that students know you score this way by putting that information in your directions!) b. For lists, give credit for each component of the list. If it is an ordered list, you must take order into account when scoring. Specify the criteria to do this. For example: Name the first three presidents of the United States, in correct order. Be sure to spell their names correctly! Are the correct presidents named? 1 point for each correctly named Are they in correct order? Yes = 2, No = 0 Are they spelled correctly? 1 point for each correctly spelled (3 max) (2 max) (3 max) c. Short answer items may be scored analytically, as described above, or holistically (all or nothing, categorically), but holistically scored items should never be worth more than 5 points. Do not use this method for scoring complex behaviors. Also, remember that holistic scoring is NOT THE SAME as objective scoring!! d. Longer written responses (such as essay items) should always be scored analytically, according to content only, written form only, or both. (Remember our discussion about this when we learned about reliability?) You will need to create rubrics that specify what you are scoring (Content? Form? Both?) and 5 describe how points will be awarded. (We will focus on rubrics when we get to Lecture 9.) 6 Essay Items A limitation of objective items and Short-Answer items is they do not test a student’s ability to construct responses that reflect complex cognitive processes or behaviors. We need to know that student’s know how to approach a given problem, can plan & organize their ideas, and present their responses. Often these complex goals can be measured with essay questions. They may be used to call on the student to “Discuss, analyze, compare & contrast, synthesize, evaluate, etc.” They measure the students’ ability to write, synthesize, and create. I. Writing Questions and Instructions A. Examples of complex cognitive behaviors BEHAVIOR TERMS THAT CALL OUT THE BEHAVIOR Analyzing Break down, diagram, differentiate, explain Comparing Compare, contrast, classify, distinguish between Creating Compose, devise, propose, design Evaluating Critique, choose and defend, evaluate, judge Inferring Extend, extrapolate, predict, conclude, project Interpreting Illustrate, translate, interpret, convert Synthesizing Combine, rearrange, infer, deduce B. Two Types of Essay Questions Restricted-Response Essay Questions: limits both the content and the form of the response (usually 1 paragraph). Extended-Response Essay Questions: Provides the examinee with more latitude to produce a longer response and to vary the context in which content is presented. Typically, the form of the written response is a component of the scoring criteria. C. Essay Item Writing Guidelines 1. Restrict the use of essay items to those learning outcomes that cannot be satisfactorily measured by other (objective or short answer) items. 7 2. Phrase each question (or set of directions) so that the pupil’s task is clearly indicated. The more complex the task, the more guidance/direction is required. Poor Example: What does Newton’s third law have to say about the bounce of a rubber ball? Better Example: Using Newton’s third law, explain why a ball bounces higher when dropped from 10 feet, than when dropped from 5 feet. Progressively Improved Examples: (1) Compare objective and essay tests. (2) a. b. c. d. Compare and contrast objective and essay tests citing the respective strengths and weakness of each. (3) Compare and contrast objective and essay tests citing the respective strengths and weakness of each. Make sure to include the following: Ease of item construction e. Nature of student responses Sampling of subject matter f. Guessing Type of objectives measured g. Time needed for testing Preparation by student Another Improved Example: Poor: What were the causes of the Civil War? This could be a dissertation topic and does not even specify the country Better: Discuss the role of agriculture in the North and South as a factor in the outbreak of the United States’ Civil War. More Poor Essay Items: *Actual questions from unidentified campuses across the country. *HISTORY: Describe the history of the papacy from its origins to the present day, concentrating especially, but not exclusively, on its social, political, economic, religious, and philosophical aspects and impact on Europe, Asia, America, and Africa. Be brief, concise, and specific. 8 *EDUCATION: Develop a foolproof and inexpensive system of education that will meet the needs of all segments of society. Convince both the faculty and the rioting students outside to accept it. *EPISTEMOLOGY: Take a position for or against truth. Prove the validity of your position. *GENERAL KNOWLEDGE: Describe in detail your general knowledge. Be objective and specific. 4. Indicate an appropriate time limit for each question. a. Specify a length for the desired response b. Tell the point value of the item c. Suggest how long it should take. 5. Don’t give optional questions Students answering different questions are taking different tests (validity) 6. Judge an item’s quality by composing a model response. II. Developing Scoring Procedures Writing the essay item is relatively simple compared to scoring the item. A. Two methods for scoring constructed responses 1. Holistic Scoring: sort students’ responses into categories by quality. The categories may be point based or simply pass/fail (all or no points) The best use of this method is pass/fail and for no more than 5 points. You only want to know, in general, has the student achieved this objective. Not focused on detailed or complex information. Appropriate for short-answer items and as a component of more complex rubrics. a. Establish the scoring categories you will use Pass/Fail, Good/Average/Poor, all points or no points b. Characterize a response that fits each category What characteristics should a response in each category have? c. Read each response and form an overall general impression Don’t belabor the issue; look for the overall gist of the response. d. Sort the responses into the designated categories e. Reread the papers and re-categorize as needed f. Assign the same score to all papers within a category 9 2. Analytical Scoring: systematic scoring using specific procedures, such as checklists or rating scales (or both), to more accurately assign partial credit and indicate where students lost points. The scoring plan or procedure is called a rubric. Advantages: More specific feedback to students More reliable scores Disadvantages: Time consuming to construct scoring instrument A. Checklists – a.k.a. Item Based Rubrics - provides two categories for evaluation (present/absent, acceptable/unacceptable) Like the one used for scoring your 10 objectives. Example: Compare and contrast maple and pine trees. Describe the maple and the pine, and tell me what kinds of tree they are. Then tell me how they are alike and how they are different in terms of the shape of the leaves, when they have leaves, and what kinds of products we get from them. Essay Checklist I. Content (2 pts each, 16 pts possible) _____ Pine _____ Maple Kind of Tree _____ Pine _____ Maple Shape of Leaves _____ Pine _____ Maple Time for Leaves _____ Pine _____ Maple Products Comments______________________________________________ II. Structure (1 pt each, 2 pts total) _____ Topic Sentence (present or absent) _____ Conclusion (present or absent) Comments______________________________________________ III. Mechanics (1 pt each, 2 pts total) _____ Grammar (acceptable/unacceptable) _____ Spelling (minimal or no errors/many errors) Comments______________________________________________ _____/16 Total Points 10 B. Rating Scales – a.k.a. Descriptive Rubrics – an extension of the checklist that also allows for a judgment of quality, not simply whether the criterion is present or absent. Include only as many categories as you can consistently distinguish between Essay Rating Scale Absent Poor I. Content Average Excellent Maple: (4 pts possible) Description 0 1 2 3 Kind of tree 0 (absent) 1 (present) Pine: (4 pts possible) Description 0 1 2 3 Kind of tree 0 (absent) 1 (present) Similarities and Differences: (9 pts possible) Discuss shape 0 1 2 3 Discuss timing 0 1 2 3 Discuss products 0 1 2 3 Comments______________________________________________________________ II. Structure (4 pts possible) Absent Poor Good Topic Sentence 0 1 2 Conclusion 0 1 2 Comments______________________________________________________________ III. Mechanics (2 pts possible) Poor Good Grammar 0 1 Spelling 0 1 Comments_______________________________________________________________ _____/23 Total Points *Be sure to evaluate and pilot test your Checklists and Rating Scales before using them to grade all the papers. B. General Suggestions for Writing Rubrics and Scoring Essays Protect the reliability of your scores! 1. Prepare an outline or an example of the expected answer in advance. You must have clearly defined scoring criteria. 2. Choose the scoring method that is most appropriate. 3. Decide how to handle factors that are irrelevant to the learning outcomes measured (spelling, grammar, etc.) Aspects of the performance that do not apply to your scoring plan. 11 4. Evaluate all of your students’ responses to one question before going on to the next item. It helps you to be more consistent, intra-rater reliability 5. When possible, score student responses anonymously. 6. Do not look at the student’s scores on previous items Avoid bias (positive and negative) 7. If big decisions rest on the results, have 2+ independent ratings. The raters must use the scoring plan in the same way, inter-rater reliability. 8. Give serious consideration to your point breakdown; is the focus on writing mechanics or knowledge of content? Stay on target with validity!! What do you intend to measure? D. E. Holistic or Analytical Avoid the horns and halo effects All too often teachers are encouraging students to write journals, letters, poems, stories and give almost exclusively positive feedback to encourage students in their writing. However, many of the same teachers rake the students over the coals when grading written work on exams. You might consider offering both positive and negative feedback on written assignments that students have had time to develop and, while good test writing skills are important, one might place the emphasis on content for exams. III. Additional Information on Scoring Subjective Items A. Definition Items Example: What is a norm-referenced test? Sample Responses: Jasmine: “A standardized test” (1) Hyde: “A test where the scores are reported in standard scores such as percentiles, not percent of information learned” (2) Homer: “A test that is designed to rank-order students” (2) Fred: “A test administered under standard conditions” (0) Rating Scale: 2 pts - indicates the idea of comparison or rank ordering 1 pt - student gives and example 0 pts - wrong answer or missing information Remember grading all responses to this item before moving on will help! B. Lists 12 Preliminary decisions: (1) (2) Do we want them to know the entire list? Does the order matter? Example: List, in order, the categories in Bloom’s taxonomy? (8 pts) Knowledge, Comprehension, Application, Analysis, Synthesis, Evaluation, Create Simple Rubric: _____1 pt. for every category correctly listed (6 points possible) plus _____+2 pts if all in right order (0 or 2 points possible) or _____+1 pt if two are out of order (0 or 2 points possible) or _____0 pts if 3 or more are out of order _____/8 Total points C. Single-Sentence Responses Example: Why did Columbus sail west and not east? Sample Responses Jasmine: “Columbus sailed west and not east because he knew the world was round.” Hyde: “Trying to get to China by sailing around Africa was expensive and dangerous; knowing the world was round, Columbus sailed west.” Homer: “west - world round east - too hard” Jasmine only 1 piece so 1 point, Hyde both pieces 2 pts, Homer both pieces but not a complete sentence (use best judgment and consider the age and developmental level of the student) IV. Avoiding Common Errors in Test Development, Scoring, and Grading A. Development Errors 1. Inappropriate difficulty level Inadequate directions B. Scoring Errors 1. Inconsistency when scoring a. Inter-rater Reliability b. Intra-rater Reliability 13 2. Bias a. Generosity error: This is described as being an “easy grader”. This type of bias is applied to the whole class. You give an overly favorable evaluation of student responses. b. Severity error: The opposite of generosity error. Also applies to the whole class. You give an overly critical evaluation of student responses. c. Central-tendency error: Also applies to the entire class. Possible due to fear of being too easy or too hard, you score everyone as average. d. Halo effect: This applies to specific individuals. You like the student and let that influence your evaluation of his or her work. e. Horns effect: This also applies to specific individuals. You don’t like this student and let that influence your evaluation of his or her work V. Helping Students Write Better Essay Tests 1. 2. 3. 4. 5. 6. VI. Emphasize vocabulary and logic unique to the discipline Tell them to read all questions before responding to any of them Require and Reinforce Legible penmanship Communicate the relevance of grammar, punctuation, and spelling Provide practice in essay writing before the test Promote study habits appropriate for essay testing Comparison of Subjective and Objective Items CHARACTERISTIC SUBJECTIVE ITEMS OBJECTIVE ITEMS Writing test items Relatively easy to construct Relatively difficult to construct Sampling of subject Limited Extensive Measurement of Knowledge & Complex Achievement Can measure both; but complex reserved for essay, product, and performance Can measure either; depending on item used Preparation by student Emphasis on larger units of material Emphasis is often on details 14 Nature of student response Organizes original responses Student selects response Guessing Answers Very difficult to guess Possible to guess Grading Difficult, time-consuming, and somewhat unreliable Simple, rapid, and highly reliable Time needed for testing Very time consuming Very quick 15