MCLTH Chapter 13

advertisement
Chapter 13
Issues in Testing
Comprehension and
in Evaluating Writing
In this chapter
we explore:
 Issues related to testing listening,
namely, tasks and language of
assessment
 Issues related to testing reading
 Issues in evaluating written work
One issue for all
tests: purpose
 Bachman (1990) reminds us that not all
tests are created for the same purpose.
 Within an educational setting, tests serve a
variety of purposes.




A classroom test can indicate progress and
achievement.
Tests can also be diagnostic, indicating
strengths and weaknesses.
Entrance tests discriminate among applicants.
Placement tests direct learners to particular
courses.
Context dependent
 The shape of a test is always contextdependent, and that purpose is is one
of the major determinants of the
context.
 What a test looks like, then, is a direct
function of what that test is supposed
to do and for whom it is supposed to
do it.
Testing listening
comprehension
 In some formal testing situations,
aural testing of vocabulary or
grammar has been equated with
listening comprehension.
 Just a few years ago it was not
uncommon to find the following as a
standard listening section on a
foreign language exam in the United
States.
Section 1. Oral
questions
 Listen to each question carefully and
then answer in a complete sentence.





Did you call your mother last night?
Did you eat eggs for breakfast this
morning?
What time did you get up today?
Where did you go last night?
Did you arrive at class on time today?
Analysis of section 1
 Exam sections such as this one (used to test
past tense) cannot be classified as “listening
comprehension” as that concept was
developed in Chapter 10.
 The section itself bears little resemblance to
the kind of listening that happens in real life.
 And because the section asks for written
responses in complete sentences, the nature
of performance in listening is severely
compromised.
Three factors
 A good listening test considers at
least the following three factors:



Content (topic domain)
Task (how the learner is asked to
demonstrate comprehension)
Language of assessment
Content
 For content, we can develop a listening test
that is topic specific (listening to a
description of someone’s family) or not
(listening to a news report).
 In the typical second language classroom, it
is clear that professional concerns rarely can
be considered for the purposes of testing.
 An instructor may have a wide array of future
professionals engaged in language learning,
and testing cannot be tailored to each
individual in any practical manner.
Section 2. Families
and relationships
 You will hear two people talk about
their families. Select one of the two
speakers, and after listening, draw
his or her family tree using all the
information you can. The connection
of lines should demonstrate family
relationships, and each face should
have a name under it.
Tasks
 The task that a learner is asked to
perform to demonstrate
comprehension can fall into one of
two categories: tasks that require a
linguistic response and those that
require a nonlinguistic response.
 Samples are listed on the next slide.
Assessing listening
comprehension
Linguistic
Nonlinguistic
Creating an
Making a graph
outline
Filling in a chart Creating a drawing
Labeling things
in a visual
display
Making a table
Selecting a visual
Indicating something on
a visual
Linguistic vs.
nonlinguistic
 A linguistic response is any kind of
response that requires the use of language
on the part of the learner to demonstrate
comprehension.

Comprehension can only be assessed based on
the language that the learner produces.
 Nonlinguistic responses are those that do
not require the production of language for
comprehension to be assessed: the
learner indicated comprehension visually,
not verbally.
Sample linguistic
tasks
1. The learner answers a number of
questions about the various family
members.
2. The learner is asked to write a brief
paragraph in which he describes the
family.
3. The learner is given a list of names and is
asked to write next to each the
relationship of that person to the speaker.
Sample nonlinguistic
tasks
1. The learner receives names and faces as
cut-outs and must place them into the
family tree; the lines that represent the
tree are already drawn in.
2. He learner receives the family tree with
missing members and must add faces for
those who are missing.
3. The learner receives four different family
trees with only slight variations among
them and must select the family tree that
best represents the description he heard.
Selection of task
 Selection of task depends mostly on the
level of the learners and the point at which
the quiz or test is administered.
 Can the learners handle a summary?
 Would a word-level linguistic task be the
best demonstration of their
comprehension?
 Those are the types of questions that an
instructor must ask in developing
developing listening practices and quizzes.
Language of
assessment
 Isn’t it obvious that learners are
listening to something in the second
language?
 The issue of language of assessment
is not about the stimulus.
 It refers to the language used by
learners when the task requires a
linguistic response.
Research
 Research in second language reading (Lee,
1986,A; Wolf, 1993b) demonstrates that
language of assessment is a significant
variable when testing reading
comprehension.
 In Lee’s and Wolf’s studies, comprehension
score were significantly higher for those
subjects who were allow to respond in
English (their first language) compared with
those who took the test in Spanish (their
language of study).
What does this
mean?
 If the actual test (including instructions
and test items) is presented in the second
language and if learners also have to
perform in the second language (write
answers, summarize, and so forth), then
the test results are confounded by
performance variables.
 An instructor needs to be aware of this
problem and make judicious decisions
based on the purpose of the test and
comparison to real-life listening situations.
Testing reading
comprehension
 As with the testing of listening
comprehension, several factors
should be taken into consideration
when reading comprehension is
being tested.



Type of task
The language of assessment
Construction of individual test items
Task type and language of
assessment
 Wolf (1993a) reviews and interprets selected
literature on testing second language
reading comprehension.
 Wolf’s discussion focuses on the effects on
learners’ responses of task type and the
language of assessment.
 Research directly comparing task types
clearly demonstrates that the task
influences the outcome (Shohamy, 1984;
Lee, 1987b; Wolf, 1993b): some tasks allow
learner to demonstrate their comprehension
better than other tasks do.
More research
 Research on the language of assessment
examines the use of reading test items
written in the test takers’ native or target
language.
 The results consistently show that
language learners perform better on items
and tasks written in their native language
(Hock & Poh, 1979; Shohamy, 1984; Lee,
1986a, 1987b; Wolf, 1993b).
Item construction
 If a test item can be answered correctly
without the test taker reading the passage,
then the item is not passage-dependent
and, thus, not a good test item (Johns,
1978; Perkings & Jones, 1985).
 If test items encourage test takers to read
only sections of a passage or to do only a
surface reading of the passage, then the
items are not good ones (Cohen, 1984).
Wolf (1993a)
recommends…
(1) That all items be passage dependent; (2)
that items test information from different
levels of the passage, that is, main ideas as
well as details; (3) that all distractors be
plausible; (4) that items paraphrase
information in the passage so that learners
cannot match words and phrases from the
item to the passage; and (5) that test takers
not be allowed to refer to the passage while
performing the comprehension tasks,
thereby discouraging surface reading of the
passage (p. 327).
From classroom activities
to reading tests
Processes and Products
 Comprehension can be defined as the
process of relating new or incoming
information to information already stored in
memory.
 All attempts to test and evaluate
comprehension are problematic because
the process is internal to the reader (it
happens in the mind), but tests require
external manifestation of mental
processes.
Not the how but the
what!
 Testing assesses the accuracy of the
result of relating incoming information to
information already stored in memory.
 Although comprehension is a process, the
process yields a product.
 This view holds that what is important in
testing is not how a reader comprehends
but what is comprehended.
Two perspectives
 Lee and VanPatten have advocated that
tests reflect classroom activities.
 The now examine testing reading
comprehension from two perspectives,
both consistent with this position.
 The first focuses on content- a productoriented approach.
 The second focuses on applying skills
learner to a new reading situation- a
process-oriented approach.
Focus on content
 Reading tests should be constructed to
encourage learners to read more.
 The more language learners read, the better
readers they become, and the more language
they acquire.
 When writing a test that focuses on content, you
will want to focus on the guided interaction
phase, the assimilation phase, and the
communicative functions of texts.
 Activities illustrated these three aspects of
reading in chapter 11. The slides are
reproduced here.
Activity: Guided
interaction
 Step 1: Since this is a relatively long reading, it
would be best to read it section by section.
After reading each section fairly quickly,
pause to collect your thought be writing a
sentence that captures the main idea of the
section.
 Step 2: Go back and reread each section,
paying more attention to the details. Using a
highlighter, identify key words or phrases that
will help you remember what you have read.
At the end of each section, look at what you
have highlighted. Does it spark your memory?
Activity: Guided
interaction continued…
 Step 3: Based on what you have read, check
off the statements that are true.


ɠ From the tone of the article, it is evident that the
author is pro-elephant.
ɠ Even though elephants are normally quite
peaceful, they are capable of tremendous
violence.
 Step 4: Complete the following statement.

A herd of elephants is composed of:



Males and females in more or less equal proportions
More males than females
One male and various females, like a harem
Activity: Guided
interaction continued…
 Step 5: Working with two classmates, make a
list of all the behaviors described in the article.
Then share your list with the rest of the class,
adding to your list whatever behaviors you
might have missed.
 Step 6: According to the introductory
paragraphs, elephants are intelligent, difficult,
active, powerful, and fun-loving animals. As a
class, identify the information in the article that
supports the idea that elephants really are as
they are described.
Section A (Based on
activity, steps 3 & 4)
 Based on your reading of “The Secret
Code of Elephants,” comment on three of
the following ideas. Be sure to cite specific
information from the passage that supports
your statements.





Tone of the article
Organization of the herd (leadership)
Care of the young
Violence among elephants
Allegiance to the herd as the young grow older
Section B (Based on
activity, steps 5 & 6)
1. We often hear that animal behavior is
instinctive, that animals survive in the
wile because they have the instincts to
survive. How true is this statement for
elephants? Refer to specific information
from the article when answering.
2. According to the authors, elephants are
intelligent, difficult, active, powerful, and
fun-loving animals. Do you agree or
disagree with the authors? Be sure to
cite specific information from the article
to support your opinion.
Analysis of A and B
 Test sections A and B parallel the inclass guided interaction activities.
 The test requires learners to produce
evidence of their comprehension of
the passage.
 In each case, learners must cite
specifics from the passage to support
their views.
The next part
 Recall that in an activity in Chapter
11, learners are asked to personalize
the content of the reading about the
secret code of elephants, relating it
to the world as they know it.
 Test Section E builds from this
activity and demonstrates that both
comprehension of the passage and
class participation are important.
Activity: Communicative
function of a test
 Step 1: Working with two classmates,
put the number that corresponds to
your own opinions next to each of the
following sentences.
 We believe for the majority of people
our age,



1=it is important…
2=It will be important some day…
3=It is not very important…
Activity: Communicative
function of a test continued…
 ---To have a leadership role in whatever
group one is associated with.
 ---To live in a safe and protected area.
 ---To lead an active social life.
 ---To count on child care while at work.
 ---To have various opportunities to find
companionship.
 ---To make friends.
 ---To advance professionally.
 ---To have economic security in old age.
Activity: Communicative
function of a test continued…
 Step 2: Compare your answers with those of the
rest of the class by indicating how many people
responded to each item with a 1,2, or 3.
 Step 3: Which items were most important to the
majority of the class? Which were not
important? Does the class agree on what to look
for in life?
 Step 4: Go back over the sentences, but this time
indicate with the letter E those statements that
can apply to elephants. Then explain what
information from the article supports your
choices. In what ways are humans and
elephants similar?
Section E (Based on
previous activity)
1. Indicate which of the following items
were important to the class.
1.
2.
3.
To have a leadership role in whatever group
one is associated with.
….
To have economic security in old age.
2. The class discussed ways in which
elephant and human behaviors are
similar. Fist, summarize both sides of the
discussion. Then, state which side you
agree with, using specific passage
information to support your point of view.
Focus on skills
application
 The alternative to testing content is to test
the application of reading skills to a new
reading.
 The teaching-testing philosophy behind
this practice is that the assigned class
readings are themselves not important.
 The act of reading and the accumulation of
reading skills should instead be the focus.
Instructions
 To focus on the application of reading
skills, construct a series of test sections
whose structure mirrors that of class
activities: preparation, guided interaction,
assimilation, and the communicative
functions of texts.
 Section F is an example of how to adapt
the preparation-oriented in-class formats
from Chapter 11 activities for a test.
Activity: Brainstorming
with the whole class
 Step 1: As a class, generate a list of all the ideas
you associate with weddings. Come up with as
many different ideas as possible in five minutes.
 Step 2: As rapidly as possible, skim the text to
determine whether or not the ideas on the board
are actually in the reading. All you have to do is
say whether or not the information is there; you
do not have to know what the author says about
that information. You have five minutes.
 Step 3: Share what you found with the rest of the
class. As you do, erase from the board all those
ideas that are not in the text.
Activity: Scanning
 Step 1: Find the following three words in
the text and underline the sentences in
which you find them



Feudalism
Stewardship
Tithes
 Step 2: Working with two or three
classmates, either write a definition of the
word or list as many things as your can
think of that you associate with each.
Section F (based on
previous activities)
1. Find the following three words in the text
and underline the sentences in which you
find them.
feudalism stewardship
tithes
2. Now skim the passage to determine
whether or not the following topics are
covered in the reading.
1.
2.
3.
Inheritance laws for titles and property
Women’s rights
The effects of war on the economy
Some issues in
evaluating writing
 In Chapter 12, Lee and VanPatten
distinguished between transcriptionoriented practices and composition
activities in teaching writing in a
second language.
 The evaluation of transcriptionoriented practices is a fairly simply,
straightforward issue: You grade
according to the intent of the
practice.
Composition
activities
 Composition activities, however,
engage qualitatively different thinking
processes and yield a qualitatively
different product than do the
transcription-oriented activities.
 Lee and VanPatten focus their
discussion here on issues concerning
the evaluation of compositions.
Responding to form
 Responding to form, otherwise known as
“error correction” is perhaps the most
debated issue in language instruction.
 The underlying questions is whether
corrective feedback is effective: In the
case of composition, does corrective
feedback improve learners’ writing?
 The answer is Yes and No.
 Some research supports the idea that
responding to form brings about changes
in learners’ writing (Lalande, 1982).
Supporting the idea
 Lalande (1982) compared two methods of
treating errors in the writing of second-year
university learners of German.
 In the first method, instructors corrected
errors and learners rewrote their
compositions incorporating the corrections.
 In the second method, instructors coded the
errors. Learners then had to rewrite their
compositions addressing these errors.
 Lalande found that learners in the second
method improved their linguistic accuracy in
writing more, although only to a small extent.
Negating the idea
 Semke (1984) compared several
methods of providing feedback to
first-year university learners of
German.




Commenting on the content
Correcting errors
Commenting on the content and
correcting errors
Coding errors for learners to then selfcorrect
Negating the idea
continued…
 At the end of the quarter, learners
who received comments on content
only were superior to all other
groups.
 Not only did they write more
(produced longer works), they also
wrote more accurately (with fewer
grammatical errors) than did the
other groups.
A middle ground
 Robb, Ross, and Shortreed (1986) tracked
learners over a year-long period and used
multiple methods of feedback.




Correcting errors
Coding errors
Highlighting errors but not correcting or coding
them
Indicating in the margin the number of errors
made.
A middle ground
continued…
 They found that writing improved less
as a result of feedback on errors than
as a result of having additional
opportunities to write.
 Labor-intensive methods of providing
feedback, such as correcting and
coding errors, did not produce results
commensurate with the instructor’s
investment of time.
A middle ground
continued…
 Moreover, when instructors respond
to form, so do learners.
 That is, since instructors were
indicating surface errors, rather than
errors in meaning, learners
responded by focusing their attention
on changing the surface features, not
their meanings.
Responding to
content
 Feedback on compositions should include
responding to the type of content (the
intended meanings), whether or not one
responds to form.
 The type of instructor response should
encourage writers to express themselves
better.
 The instructor, acting on behalf of the
intended audience, will in effect negotiate
written meaning with the writer.
Zamel’s research
 Zamel (1985) examined the comments,
reactions, and markings that appeared on
compositions assigned and evaluated by
fifteen instructors teaching their own
university-level ESL classes.
 She found that, by and large, instructors:


Make vague comments about abstract rules and
principles that learners are unable to interpret.
Correct on a clause-by-clause basis without
considering the text as a whole.
Zamel’s research
continued…



Respond to some problems but not others so that
their reactions appear arbitrary and idiosyncratic.
Tend to give conflicting signals about what to
improve when providing overall comments and
suggestions.
Tend not to review their feedback when reviewing
a revised composition and so accept revisions
that address surface-level language errors.
 Overall, Zamel found that the instructors
were poor communicators who faulted their
students for being imprecise and vague but
were themselves no better at communicating
their responses.
Responding to drafts
 As Zamel found, even instructors who
responded to content accepted
revisions of the work with only changes
in surface errors.
 This practice is questionable on two
levels.

If the instructor accepts rewrites that only
address grammatical errors, then learners
will most likely interpret the intent of the
writing to be correct form production.
Responding to drafts
continued…

On the other hand, learners may not
know how to address content-related
issues in their rewrites. Their practice
of correcting only the grammatical
errors is a call to the instructor to teach
them how to address other issues.
Holistic versus
analytical scoring
 Whether you use holistic or analytical
scoring procedures, you are applying
criteria in order to evaluate a
compositions.


Holistic scoring results in an overall
assessment of the work, reflected in a single
score, rating, or grade based or descriptions of
performance.
Analytical scoring is analogous to
componential scoring discussed in Chapter 5.
Each component of the composition is
evaluated; the component scores are typically
added together to yield a final evaluation.
Lee and Paulson’s
criteria
 Lee and Paulson (1992) developed the
analytical scoring criteria listed on the
following slides.
 As you read them, note that the categories
are not weighted equally.
 The weightings should reflect the
importance of the category.
 One way to determine importance is to
consider how it was treated during
instruction.
Evaluation criteria for
compositions-Content
Minimal information; information lacks
substance; inappropriate or irrelevant
information; or not enough information to
evaluate
19
Limited information; ideas present but not 22
developed; lack of supporting detail
Adequate information; some development 25
of ideas; some ideas lack supporting detail
Very complete information; no more can be 30
said; thorough; relevant; on target
Evaluation criteria for
compositions-Organization
Series of separate sentences with no
transitions; no connected discourse; no
apparent order to the content
Limited order to the content; lacks logical
sequencing of ideas; ineffective ordering
An apparent order to the content is
intended; somewhat choppy; loosely
organized but main points do stand out
Logically and effectively ordered; main
points and details are connected; fluent
16
18
22
25
Evaluation criteria for
compositions-Vocabulary
Inadequate; repetitive; incorrect use or
nonuse of words studied; literal
translations; abundance of invented words
Erroneous word use or choice leads to
confused meaning; some literal
translations and invented words
Adequate but not impressive; some
erroneous word usage; some use of words
studied
Broad; impressive; precise and effective
16
18
22
25
Evaluation criteria for
compositions-language
One or more errors in use and form of the
grammar presented in lesson; frequent
errors in subject/verb agreement; nonSpanish sentence structure; erroneous
use of language makes the work mostly
incomprehensible
No errors in the grammar presented in the
lesson; some errors in subject/verb
agreement; some errors in adjective/noun
agreement; work was poorly edited for
language
13
15
Evaluation criteria for
compositions-language
No errors in the grammar present in
lesson; occasional errors in verb/subject
agreement; some editing evident for
language but not complete
No errors in the grammar present in
lesson; very few errors in subject/verb or
adjective/noun agreement; work was well
edited for language
17
20
Total points ___/100
Source: Lee and Paulson (1992) p. 33
Criteria
 Whether you select holistic or
analytical scoring criteria, you must
ensure that:


Writers are both aware and
knowledgeable of the criteria.
The criteria are applied consistently to
all writers.
 When learners know how they will be
evaluated, they can write with the
criteria in mind.
Intra-rater reliability
 Consistent application of criteria is a
fundamental consideration in all
testing situations.
 An issue that arises in composition
grading is that of intra-rater
reliability, in which the same rater
applies the criteria consistently
across all the compositions he or she
evaluates.
Summary of
chapter 13
 Discussed a number of issues in the
testing of listening and reading and
the evaluation of writing
 Adapted classroom activities as test
sections, underscoring the position
that instructors should test what and
how they teach
Summary of chapter
13 continued…
 Presented two approaches to testing
reading:


One that focused on content
Another that focused on the application
of skills
 Presented several issues in
evaluating writing
Download