Chapter 13 Issues in Testing Comprehension and in Evaluating Writing In this chapter we explore: Issues related to testing listening, namely, tasks and language of assessment Issues related to testing reading Issues in evaluating written work One issue for all tests: purpose Bachman (1990) reminds us that not all tests are created for the same purpose. Within an educational setting, tests serve a variety of purposes. A classroom test can indicate progress and achievement. Tests can also be diagnostic, indicating strengths and weaknesses. Entrance tests discriminate among applicants. Placement tests direct learners to particular courses. Context dependent The shape of a test is always contextdependent, and that purpose is is one of the major determinants of the context. What a test looks like, then, is a direct function of what that test is supposed to do and for whom it is supposed to do it. Testing listening comprehension In some formal testing situations, aural testing of vocabulary or grammar has been equated with listening comprehension. Just a few years ago it was not uncommon to find the following as a standard listening section on a foreign language exam in the United States. Section 1. Oral questions Listen to each question carefully and then answer in a complete sentence. Did you call your mother last night? Did you eat eggs for breakfast this morning? What time did you get up today? Where did you go last night? Did you arrive at class on time today? Analysis of section 1 Exam sections such as this one (used to test past tense) cannot be classified as “listening comprehension” as that concept was developed in Chapter 10. The section itself bears little resemblance to the kind of listening that happens in real life. And because the section asks for written responses in complete sentences, the nature of performance in listening is severely compromised. Three factors A good listening test considers at least the following three factors: Content (topic domain) Task (how the learner is asked to demonstrate comprehension) Language of assessment Content For content, we can develop a listening test that is topic specific (listening to a description of someone’s family) or not (listening to a news report). In the typical second language classroom, it is clear that professional concerns rarely can be considered for the purposes of testing. An instructor may have a wide array of future professionals engaged in language learning, and testing cannot be tailored to each individual in any practical manner. Section 2. Families and relationships You will hear two people talk about their families. Select one of the two speakers, and after listening, draw his or her family tree using all the information you can. The connection of lines should demonstrate family relationships, and each face should have a name under it. Tasks The task that a learner is asked to perform to demonstrate comprehension can fall into one of two categories: tasks that require a linguistic response and those that require a nonlinguistic response. Samples are listed on the next slide. Assessing listening comprehension Linguistic Nonlinguistic Creating an Making a graph outline Filling in a chart Creating a drawing Labeling things in a visual display Making a table Selecting a visual Indicating something on a visual Linguistic vs. nonlinguistic A linguistic response is any kind of response that requires the use of language on the part of the learner to demonstrate comprehension. Comprehension can only be assessed based on the language that the learner produces. Nonlinguistic responses are those that do not require the production of language for comprehension to be assessed: the learner indicated comprehension visually, not verbally. Sample linguistic tasks 1. The learner answers a number of questions about the various family members. 2. The learner is asked to write a brief paragraph in which he describes the family. 3. The learner is given a list of names and is asked to write next to each the relationship of that person to the speaker. Sample nonlinguistic tasks 1. The learner receives names and faces as cut-outs and must place them into the family tree; the lines that represent the tree are already drawn in. 2. He learner receives the family tree with missing members and must add faces for those who are missing. 3. The learner receives four different family trees with only slight variations among them and must select the family tree that best represents the description he heard. Selection of task Selection of task depends mostly on the level of the learners and the point at which the quiz or test is administered. Can the learners handle a summary? Would a word-level linguistic task be the best demonstration of their comprehension? Those are the types of questions that an instructor must ask in developing developing listening practices and quizzes. Language of assessment Isn’t it obvious that learners are listening to something in the second language? The issue of language of assessment is not about the stimulus. It refers to the language used by learners when the task requires a linguistic response. Research Research in second language reading (Lee, 1986,A; Wolf, 1993b) demonstrates that language of assessment is a significant variable when testing reading comprehension. In Lee’s and Wolf’s studies, comprehension score were significantly higher for those subjects who were allow to respond in English (their first language) compared with those who took the test in Spanish (their language of study). What does this mean? If the actual test (including instructions and test items) is presented in the second language and if learners also have to perform in the second language (write answers, summarize, and so forth), then the test results are confounded by performance variables. An instructor needs to be aware of this problem and make judicious decisions based on the purpose of the test and comparison to real-life listening situations. Testing reading comprehension As with the testing of listening comprehension, several factors should be taken into consideration when reading comprehension is being tested. Type of task The language of assessment Construction of individual test items Task type and language of assessment Wolf (1993a) reviews and interprets selected literature on testing second language reading comprehension. Wolf’s discussion focuses on the effects on learners’ responses of task type and the language of assessment. Research directly comparing task types clearly demonstrates that the task influences the outcome (Shohamy, 1984; Lee, 1987b; Wolf, 1993b): some tasks allow learner to demonstrate their comprehension better than other tasks do. More research Research on the language of assessment examines the use of reading test items written in the test takers’ native or target language. The results consistently show that language learners perform better on items and tasks written in their native language (Hock & Poh, 1979; Shohamy, 1984; Lee, 1986a, 1987b; Wolf, 1993b). Item construction If a test item can be answered correctly without the test taker reading the passage, then the item is not passage-dependent and, thus, not a good test item (Johns, 1978; Perkings & Jones, 1985). If test items encourage test takers to read only sections of a passage or to do only a surface reading of the passage, then the items are not good ones (Cohen, 1984). Wolf (1993a) recommends… (1) That all items be passage dependent; (2) that items test information from different levels of the passage, that is, main ideas as well as details; (3) that all distractors be plausible; (4) that items paraphrase information in the passage so that learners cannot match words and phrases from the item to the passage; and (5) that test takers not be allowed to refer to the passage while performing the comprehension tasks, thereby discouraging surface reading of the passage (p. 327). From classroom activities to reading tests Processes and Products Comprehension can be defined as the process of relating new or incoming information to information already stored in memory. All attempts to test and evaluate comprehension are problematic because the process is internal to the reader (it happens in the mind), but tests require external manifestation of mental processes. Not the how but the what! Testing assesses the accuracy of the result of relating incoming information to information already stored in memory. Although comprehension is a process, the process yields a product. This view holds that what is important in testing is not how a reader comprehends but what is comprehended. Two perspectives Lee and VanPatten have advocated that tests reflect classroom activities. The now examine testing reading comprehension from two perspectives, both consistent with this position. The first focuses on content- a productoriented approach. The second focuses on applying skills learner to a new reading situation- a process-oriented approach. Focus on content Reading tests should be constructed to encourage learners to read more. The more language learners read, the better readers they become, and the more language they acquire. When writing a test that focuses on content, you will want to focus on the guided interaction phase, the assimilation phase, and the communicative functions of texts. Activities illustrated these three aspects of reading in chapter 11. The slides are reproduced here. Activity: Guided interaction Step 1: Since this is a relatively long reading, it would be best to read it section by section. After reading each section fairly quickly, pause to collect your thought be writing a sentence that captures the main idea of the section. Step 2: Go back and reread each section, paying more attention to the details. Using a highlighter, identify key words or phrases that will help you remember what you have read. At the end of each section, look at what you have highlighted. Does it spark your memory? Activity: Guided interaction continued… Step 3: Based on what you have read, check off the statements that are true. ɠ From the tone of the article, it is evident that the author is pro-elephant. ɠ Even though elephants are normally quite peaceful, they are capable of tremendous violence. Step 4: Complete the following statement. A herd of elephants is composed of: Males and females in more or less equal proportions More males than females One male and various females, like a harem Activity: Guided interaction continued… Step 5: Working with two classmates, make a list of all the behaviors described in the article. Then share your list with the rest of the class, adding to your list whatever behaviors you might have missed. Step 6: According to the introductory paragraphs, elephants are intelligent, difficult, active, powerful, and fun-loving animals. As a class, identify the information in the article that supports the idea that elephants really are as they are described. Section A (Based on activity, steps 3 & 4) Based on your reading of “The Secret Code of Elephants,” comment on three of the following ideas. Be sure to cite specific information from the passage that supports your statements. Tone of the article Organization of the herd (leadership) Care of the young Violence among elephants Allegiance to the herd as the young grow older Section B (Based on activity, steps 5 & 6) 1. We often hear that animal behavior is instinctive, that animals survive in the wile because they have the instincts to survive. How true is this statement for elephants? Refer to specific information from the article when answering. 2. According to the authors, elephants are intelligent, difficult, active, powerful, and fun-loving animals. Do you agree or disagree with the authors? Be sure to cite specific information from the article to support your opinion. Analysis of A and B Test sections A and B parallel the inclass guided interaction activities. The test requires learners to produce evidence of their comprehension of the passage. In each case, learners must cite specifics from the passage to support their views. The next part Recall that in an activity in Chapter 11, learners are asked to personalize the content of the reading about the secret code of elephants, relating it to the world as they know it. Test Section E builds from this activity and demonstrates that both comprehension of the passage and class participation are important. Activity: Communicative function of a test Step 1: Working with two classmates, put the number that corresponds to your own opinions next to each of the following sentences. We believe for the majority of people our age, 1=it is important… 2=It will be important some day… 3=It is not very important… Activity: Communicative function of a test continued… ---To have a leadership role in whatever group one is associated with. ---To live in a safe and protected area. ---To lead an active social life. ---To count on child care while at work. ---To have various opportunities to find companionship. ---To make friends. ---To advance professionally. ---To have economic security in old age. Activity: Communicative function of a test continued… Step 2: Compare your answers with those of the rest of the class by indicating how many people responded to each item with a 1,2, or 3. Step 3: Which items were most important to the majority of the class? Which were not important? Does the class agree on what to look for in life? Step 4: Go back over the sentences, but this time indicate with the letter E those statements that can apply to elephants. Then explain what information from the article supports your choices. In what ways are humans and elephants similar? Section E (Based on previous activity) 1. Indicate which of the following items were important to the class. 1. 2. 3. To have a leadership role in whatever group one is associated with. …. To have economic security in old age. 2. The class discussed ways in which elephant and human behaviors are similar. Fist, summarize both sides of the discussion. Then, state which side you agree with, using specific passage information to support your point of view. Focus on skills application The alternative to testing content is to test the application of reading skills to a new reading. The teaching-testing philosophy behind this practice is that the assigned class readings are themselves not important. The act of reading and the accumulation of reading skills should instead be the focus. Instructions To focus on the application of reading skills, construct a series of test sections whose structure mirrors that of class activities: preparation, guided interaction, assimilation, and the communicative functions of texts. Section F is an example of how to adapt the preparation-oriented in-class formats from Chapter 11 activities for a test. Activity: Brainstorming with the whole class Step 1: As a class, generate a list of all the ideas you associate with weddings. Come up with as many different ideas as possible in five minutes. Step 2: As rapidly as possible, skim the text to determine whether or not the ideas on the board are actually in the reading. All you have to do is say whether or not the information is there; you do not have to know what the author says about that information. You have five minutes. Step 3: Share what you found with the rest of the class. As you do, erase from the board all those ideas that are not in the text. Activity: Scanning Step 1: Find the following three words in the text and underline the sentences in which you find them Feudalism Stewardship Tithes Step 2: Working with two or three classmates, either write a definition of the word or list as many things as your can think of that you associate with each. Section F (based on previous activities) 1. Find the following three words in the text and underline the sentences in which you find them. feudalism stewardship tithes 2. Now skim the passage to determine whether or not the following topics are covered in the reading. 1. 2. 3. Inheritance laws for titles and property Women’s rights The effects of war on the economy Some issues in evaluating writing In Chapter 12, Lee and VanPatten distinguished between transcriptionoriented practices and composition activities in teaching writing in a second language. The evaluation of transcriptionoriented practices is a fairly simply, straightforward issue: You grade according to the intent of the practice. Composition activities Composition activities, however, engage qualitatively different thinking processes and yield a qualitatively different product than do the transcription-oriented activities. Lee and VanPatten focus their discussion here on issues concerning the evaluation of compositions. Responding to form Responding to form, otherwise known as “error correction” is perhaps the most debated issue in language instruction. The underlying questions is whether corrective feedback is effective: In the case of composition, does corrective feedback improve learners’ writing? The answer is Yes and No. Some research supports the idea that responding to form brings about changes in learners’ writing (Lalande, 1982). Supporting the idea Lalande (1982) compared two methods of treating errors in the writing of second-year university learners of German. In the first method, instructors corrected errors and learners rewrote their compositions incorporating the corrections. In the second method, instructors coded the errors. Learners then had to rewrite their compositions addressing these errors. Lalande found that learners in the second method improved their linguistic accuracy in writing more, although only to a small extent. Negating the idea Semke (1984) compared several methods of providing feedback to first-year university learners of German. Commenting on the content Correcting errors Commenting on the content and correcting errors Coding errors for learners to then selfcorrect Negating the idea continued… At the end of the quarter, learners who received comments on content only were superior to all other groups. Not only did they write more (produced longer works), they also wrote more accurately (with fewer grammatical errors) than did the other groups. A middle ground Robb, Ross, and Shortreed (1986) tracked learners over a year-long period and used multiple methods of feedback. Correcting errors Coding errors Highlighting errors but not correcting or coding them Indicating in the margin the number of errors made. A middle ground continued… They found that writing improved less as a result of feedback on errors than as a result of having additional opportunities to write. Labor-intensive methods of providing feedback, such as correcting and coding errors, did not produce results commensurate with the instructor’s investment of time. A middle ground continued… Moreover, when instructors respond to form, so do learners. That is, since instructors were indicating surface errors, rather than errors in meaning, learners responded by focusing their attention on changing the surface features, not their meanings. Responding to content Feedback on compositions should include responding to the type of content (the intended meanings), whether or not one responds to form. The type of instructor response should encourage writers to express themselves better. The instructor, acting on behalf of the intended audience, will in effect negotiate written meaning with the writer. Zamel’s research Zamel (1985) examined the comments, reactions, and markings that appeared on compositions assigned and evaluated by fifteen instructors teaching their own university-level ESL classes. She found that, by and large, instructors: Make vague comments about abstract rules and principles that learners are unable to interpret. Correct on a clause-by-clause basis without considering the text as a whole. Zamel’s research continued… Respond to some problems but not others so that their reactions appear arbitrary and idiosyncratic. Tend to give conflicting signals about what to improve when providing overall comments and suggestions. Tend not to review their feedback when reviewing a revised composition and so accept revisions that address surface-level language errors. Overall, Zamel found that the instructors were poor communicators who faulted their students for being imprecise and vague but were themselves no better at communicating their responses. Responding to drafts As Zamel found, even instructors who responded to content accepted revisions of the work with only changes in surface errors. This practice is questionable on two levels. If the instructor accepts rewrites that only address grammatical errors, then learners will most likely interpret the intent of the writing to be correct form production. Responding to drafts continued… On the other hand, learners may not know how to address content-related issues in their rewrites. Their practice of correcting only the grammatical errors is a call to the instructor to teach them how to address other issues. Holistic versus analytical scoring Whether you use holistic or analytical scoring procedures, you are applying criteria in order to evaluate a compositions. Holistic scoring results in an overall assessment of the work, reflected in a single score, rating, or grade based or descriptions of performance. Analytical scoring is analogous to componential scoring discussed in Chapter 5. Each component of the composition is evaluated; the component scores are typically added together to yield a final evaluation. Lee and Paulson’s criteria Lee and Paulson (1992) developed the analytical scoring criteria listed on the following slides. As you read them, note that the categories are not weighted equally. The weightings should reflect the importance of the category. One way to determine importance is to consider how it was treated during instruction. Evaluation criteria for compositions-Content Minimal information; information lacks substance; inappropriate or irrelevant information; or not enough information to evaluate 19 Limited information; ideas present but not 22 developed; lack of supporting detail Adequate information; some development 25 of ideas; some ideas lack supporting detail Very complete information; no more can be 30 said; thorough; relevant; on target Evaluation criteria for compositions-Organization Series of separate sentences with no transitions; no connected discourse; no apparent order to the content Limited order to the content; lacks logical sequencing of ideas; ineffective ordering An apparent order to the content is intended; somewhat choppy; loosely organized but main points do stand out Logically and effectively ordered; main points and details are connected; fluent 16 18 22 25 Evaluation criteria for compositions-Vocabulary Inadequate; repetitive; incorrect use or nonuse of words studied; literal translations; abundance of invented words Erroneous word use or choice leads to confused meaning; some literal translations and invented words Adequate but not impressive; some erroneous word usage; some use of words studied Broad; impressive; precise and effective 16 18 22 25 Evaluation criteria for compositions-language One or more errors in use and form of the grammar presented in lesson; frequent errors in subject/verb agreement; nonSpanish sentence structure; erroneous use of language makes the work mostly incomprehensible No errors in the grammar presented in the lesson; some errors in subject/verb agreement; some errors in adjective/noun agreement; work was poorly edited for language 13 15 Evaluation criteria for compositions-language No errors in the grammar present in lesson; occasional errors in verb/subject agreement; some editing evident for language but not complete No errors in the grammar present in lesson; very few errors in subject/verb or adjective/noun agreement; work was well edited for language 17 20 Total points ___/100 Source: Lee and Paulson (1992) p. 33 Criteria Whether you select holistic or analytical scoring criteria, you must ensure that: Writers are both aware and knowledgeable of the criteria. The criteria are applied consistently to all writers. When learners know how they will be evaluated, they can write with the criteria in mind. Intra-rater reliability Consistent application of criteria is a fundamental consideration in all testing situations. An issue that arises in composition grading is that of intra-rater reliability, in which the same rater applies the criteria consistently across all the compositions he or she evaluates. Summary of chapter 13 Discussed a number of issues in the testing of listening and reading and the evaluation of writing Adapted classroom activities as test sections, underscoring the position that instructors should test what and how they teach Summary of chapter 13 continued… Presented two approaches to testing reading: One that focused on content Another that focused on the application of skills Presented several issues in evaluating writing