Conceptual Issues and the Development and Use of Selected Response Assessments The following is an excerpt from a paper delivered at the Annual Meeting of the National Council of Measurement in Education (1998) which was authored by Jeffrey Oescher, Juanita Haydel, and Rick Stiggins. This section describes the development of objective tests in the context of the three conceptual issues described earlier in the paper. Some of this section deals with relatively new ideas like changes in terminology, the match between the achievement target and assessment strategy, and student-centered classroom assessment. Most of the information, however, is a reorganization of the basic ideas surrounding objective test development traditionally discussed in measurement and assessment texts. Selected Response Assessments A number of years ago I wrote an item-writing guide that was intended to help teachers write objective test items. Stiggins (1997) has drawn issue with the use of this term and offers an alternative: selected response assessments. The first part of the term is self-explanatory. Selected response relates to the type of item format that asks students to choose a single best answer or a very limited set of acceptable answers. The evaluation of the response is right or wrong, correct or incorrect, acceptable or unacceptable, etc. Four formats fall into this category: multiple choice, true/false, matching, and fill-in-the-blank. The second part of the term, assessment, relates to the use of these item formats across a multitude of situations other than the traditional test. Classroom teachers use selected response items for homework assignments, classroom exercises, quizzes, and other formative and summative assessment related activities. In addition teachers are commonly combining selected response formats with essay and extended response formats. Matching Selected Response Assessments to Achievement Targets The idea of matching the type of assessment and desired achievement target is an explicit example of the conceptual development in the field of classroom assessment. Few of us at this time remember the significant step toward ensuring quality classroom assessments represented by this idea. There are two purposes for which selected response items are appropriate. The first, as you might assume, is for assessing knowledge or mastery of specific content. This encompasses Bloom’s taxonomic .levels of recall and comprehension as well as Stiggins’ level of knowing. While it is possible to address higher level cognitive targets, it is difficult. Often other assessment methods such as essays and performance assessments provide a better fit for such targets. The second purpose for which this format is most appropriate is to assess affective characteristics such as attitudes, self-efficacy, self-esteem, preferences, values, etc. The typical response formats like Likert, Thurstone, or semantic differential scales are non-dichotomous in nature. That is, the respondent chooses their response from among several options such as strongly disagree, disagree, neutral, agree, or strongly agree. There is no correct or incorrect answer, only that which best describes the respondent’s perceptions or feelings. Developing Selected Response Assessments Stiggins (1997) suggests several contextual factors that should be considered once a match between the achievement target and the assessment strategy has been established. They are listed below. Document1 Students have an appropriate level of reading to understand the exercise. Students have time to respond to all items. High-quality assessments have been developed and are available from the text, teacher's manual, other teachers, etc. Computer support is available to assist with item development, storage and retrieval, printing, scoring, and analysis. The efficiency of the selected response format is important. 1 If after considering such factors, a teacher decides to develop or use a selected response assessment, the stages in its development or selection are straightforward. They have been discussed extensively in the classroom measurement and assessment literature, so I will mention them only briefly. Developing a Test Plan. A test plan systematically identifies an appropriate sample of achievement. This is accomplished thorough either the use of a table of specifications or creating instructional objectives. The important issue to emphasize is not the format of the plan, but its existence. Planning is critically important to the development of a test. Effective planning saves time and energy; it is very effective; and it contributes to a clear understanding of the intended achievement targets. A table of specifications graphically relates the content of the test with the level of knowledge at which that content is to be tested. Table 1 is an example of such a table from an introductory graduate research course. Table 1 Table of Specifications for Unit 1 Content Sources of knowledge Scientific evidence-based knowledge Research designs Types of research Research reports Total Level of Knowledge Apply Analyze Synthesize 1 Comp 1 3 1 2 6 1 3 10 1 2 17 20 2 4 37 10 1 1 17 0 0 Eval 1 Total Recall 2 1 5 The first column represents the content material covered prior to the first examination; in this case it is somewhat consistent with the concepts covered over a four-week period of time. The first row represents the levels of knowledge at which the content is to be mastered. While the use of any taxonomy of knowledge will facilitate test development, that found in Stiggins (2008) to be quite helpful in terms of differentiating items at various levels of the taxonomy. The numbers within the cells of the matrix represent the number of items written to a specific content and level of knowledge. The differences in the number of items usually reflects differences in the amount of instructional time spent on each topic, the amount of material in each category, or the relative importance of material in each category. In Table 2, five (5) items address the Sources of Knowledge. Two (2) of these are written at the recall level. One (1) item reflects the comprehension level, one t(1) he analysis level, and one (1) the evaluation level. Overall, thirty-seven (37) items will be on the test. Obviously there is greater emphasis on the Research Designs (20 items) than Sources of Knowledge (5), Scientific Evidence-Based Inquiry (6), or Types of Research or Research Reports (4 items). On the other hand, more emphasis is placed on recall and comprehension level items (17 and 12) than those at the other levels. An alternative to the table of specifications is the development of a list of instructional objectives that act very much like the cells of a table of specifications. Each objective specifies the knowledge to be brought bear and the action to be taken. An example of an objective taken from same introductory research test is as follows. Table 2 provides additional examples of the objectives for the same unit in the introductory research class discussed above. (The links within each objective refer to an outline discussing the content of that objective.) Assess the appropriateness of relying on each source of knowledge for specific decisions. Document1 2 In this objective, the student is being asked to evaluate the appropriateness on relying on specific sources in specific situations. Table 2 Learning Objectives for Unit 1 Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Objective Identify four (4) sources used to make decisions in education1.1 and the limitations associated with each source1.2. Assess the appropriateness of relying on each source for specific decisions1.3. Define the term research2.1. Defend research as a valuable source of information and knowledge in education2.2. Discuss the need for scientific evidence-based inquiry3.1 in education. Identify six characteristics associated with scientific evidence-based inquiry3.2. Define theory4.1 and explain its importance in scientific evidence-based inquiry4.2 Identify the six (6) steps typically used to conduct research5.1. Discuss the way in which each step contributes to the credibility of the results5.2. Explain how research results can become scientific evidence-based information5.3. Compare quantitative and qualitative approaches to research in terms of their goals 6.1; the research designs used6.2; the samples from which information is collected 6.3; the actual data, data collection techniques, and data analyses6.4; the researcher’s role6.5; context6.6; and common terminology6.7. Distinguish between non-experimental and experimental quantitative research designs7. Describe the characteristics of the following quantitative research designs: descriptive 8.1, comparative8.2, correlational8.3, causal comparative8.4, true-experimental8.5, quasiexperimental8.6, and single subject8.7. Identify the characteristics of case studies9. Describe the goals of the following qualitative research designs: phenomenology 10.1, ethnography10.2, and grounded theory10.3. Identify the characteristics of analytical research11.1. Identify two types of analytical designs11.2. Identify the characteristics of mixed-methods design12. Understand the differences between basic, applied, action, and evaluation research 13. Identify the components of an educational research report14.1. Provide a brief description of the function of each component14.2. Identify these components in a research report14.3. Selecting Material to Test. The second task in the development of a test is to select the material to be tested. This entails sampling specific items to address a particular cell of the table of specifications from the host of items that might measure that cell. For example, we might study five ways of -knowing- in an introductory educational research course (i.e., experience; custom, tradition, authority; inductive reasoning; deductive reasoning; and scientific inquiry). The cell from the table of specifications in Table 2 requires the construction of three (3) items. The question becomes, “Which of these knowledge sources are to be selected?” We would like to offer two factors in considering the answer. The first represents the recognition of the subjective nature of selecting a sample of items from a larger body of content and thinking called a domain. While most teachers do not have sophisticated test banks from which they can systematically sample items, they do have a very real sense of what is important in terms of content and cognition. We would like to recognize that expertise in the context of a clearly defined and understood achievement target. For example, in the table of specifications depicted in Table 2, the Sources of Knowledge questions at the analysis (1) and evaluation (1) level are likely to involve scientific inquiry, as that is the basic source of knowledge being studied in the semester. The discussion of all other sources is relevant only to their limitations in comparison to those of scientific inquiry. While a decision to focus these questions on this specific knowledge is ultimately subjective in nature, such subjectivity is not problematic given the knowledge of the context within this content is being studied. Document1 3 The second method for sampling items involves identifying important learning objectives related to the content. In practice, we identify most, if not all, of the important learning objectives prior to teaching or assessing. Carefully thinking about the emphasis of each objective will determine the specific objectives for which we can develop items. Writing Items. There is a wealth of information available on writing selected response items (Stiggins, 2008; Nitko and Brookhart, 2007; Nitko, 2001; Gronlund, 1998; Gronlund and Brookhart, 2009; Stiggins, Arter, Chappuis, and Chappius, 2007). Most of this is in the form of suggestions for each specific item format. I can offer the following quality checklist that summarizes many of these suggestions. General Guidelines for All Formats Items are clearly written and focused. A question is posed. The lowest possible reading level is used. Irrelevant clues are eliminated. Items have been reviewed by a colleague. The scoring key has been checked. Guidelines for Multiple Choice Items The item stem poses a direct question. Any repetition is eliminated from the response. Only one best or correct answer is provided. The response options are brief and parallel. The number of response options offered fits the item context. Guidelines for True/False Items The statement is entirely true or false. Guidelines for Matching Exercises Clear directions on how to match are discussed. The list of items to be matched is brief. Any lists consist of homogeneous entries. The response options are brief and parallel .Extra response options are offered. Guidelines for Fill-In Items A direct question is posed. One blank is needed to respond. The length of the blank is not a clue to the answer. Selecting from Among the Four Selected Response Formats. The last aspect of test development we would like to address is the strengths of each of the four formats. This knowledge can be extremely helpful when trying to decide which format to use. The multiple-choice format has four distinct advantages. First, most multiple-choice assessments can be scored electronically using optical scanning and test scoring technologies. This in and of itself is an efficient and effective way to score tests. It has an added advantage in that most scoring programs provide item analysis data, which can provide insights into common misconceptions among students. A third advantage relates to situations where a teacher can identify the single correct or best answer and identify a number of viable incorrect responses. An example of a simple computational statistics problem like Document1 4 computing the standard deviation exemplifies this advantage. We know the skills of the student that computes the answer correctly. We might also gain some insight into the skills of students choosing other responses if those responses systematically reflect aspects of the computation. For example, two alternatives might reflect the differences between answers incorrectly using (x²) and (x)². By examining a student's choice of one of these alternatives we can understand what mistakes were made. Finally, multiple-choice tests have the advantage of being able to test a broad range of targets with a minimal amount of testing time. Matching and true/false items have distinct advantages also. In the case of the matching item, the major advantage is its efficiency. One way to think of them is as a series of multiple choice items presented at once. The true/false item has the advantage of requiring very short response times. This is extremely valuable if there is a great deal of information to cover and the teacher needs to ask many items. In addition, both formats share two of the advantages of multiple-choice items: they can be machine scored and analyzed and they can provide insight into common misconceptions. While each of these three formats has unique advantages, they share a common fault -all are susceptible to guessing. Obviously the true/false item is most problematic in this regard. However, when guessing is a problem, fill-in-the-blank items have a distinct advantage over the other formats. Student Involvement in the Assessment Process Earlier in this paper I described Airasian's assessment model as one in which planning, instruction, and assessment were related in a multi-directional, interactive manner (Airasian, 1994). That is, assessment feeds planning and instruction as well as visa-versa. I suggested this represented the nature of assessment in the context of current cognitive and instructional theory. Of particular interest in this regard is the belief that assessments can be woven in the learning environment to such an extent that they involve teachers as well as students. This involvement has been termed “student-centered assessment.” Some examples of studentcentered classroom assessment follow. Develop a set of objectives for a test before ever teaching the unit, being sure to identify not only to content but the cognitive process required to deal with that content (e.g., a table of test specifications for a final unit test before the unit is ever taught). Share the table with students. Explain your expectations. A clear vision of the valued outcomes will result, and instruction can be tailored to promote student success. Involve students in the process of devising a test plan, or involve them from time to time in checking back to the blueprint to see together, as partners, whether you need to make adjustments in the test plan and/or chart your progress. Once you have the test plan completed, write a few items each day as the unit unfolds. Such items will necessarily reflect instructional priorities. Involve students in writing practice test items. Students will have to evaluate the importance of the various elements of content and become proficient in using the kinds of thinking and problem solving valued in the classroom. As a variation on that idea, provide unlabeled exercises and have students map them into the cells of the table of specifications. Have students use the test blueprint to predict how they are going to do on each part of the test before they take it. Then have them analyze how they did, part-by-part, after taking it. If the first test is for practice, such an analysis will provide valuable information to help them plan their preparation for the real thing. Have students work in teams where each team has responsibility for finding ways to help everyone in class score high in one cell, row, or column of the table of specifications. Use lists of unit objectives and tables of test specifications to communicate with other teachers about instructional priorities, so as to arrive at a clearer understanding of how those priorities fit together across instructional levels or school buildings. Document1 5 Store test items by content and reasoning category for reuse. If the item record also included information on how students did on each item, you could revise the item or instruction when these analyses indicate trouble. Document1 6