validity

advertisement
Testing What You Teach: Eliminating
the “Will this be on the final?”
Ideology
Dr. Barry Lee Reynolds
National Yang-Ming University
Education Center for Humanities and Social
Sciences
Outline
•
•
•
•
Introduction
Backwash
Reliability
Validity
Introduction
Why students ask: “Will this
be on the final exam?”
The distrust of tests
 Who distrusts tests?
 Language Teachers
 Language Students
 Why?
 Due to their negative effects on learning, tests are
often considered as more harmful than helpful.
 Sometimes teaching is good, but the test does not
reflect the teaching.
 The effect of testing on teaching is known as
backwash, and can be harmful or beneficial
(Hughes, 2003).
Tests are often inaccurate
measurements
 Testing technique
 e.g., If you want to know how well someone writes,
you must ask them to write. (referred to as validity)
 e.g., The test must consistently measure the
‘construct’ (e.g., the past tense, vocabulary, writing)
(referred to as reliability)
Backwash
How can a teacher achieve beneficial
backwash?
Backwash
 Harmful Backwash
 Ex. multiple choice items to test writing
 Beneficial Backwash
 Ex. writing to test writing
 More contextualized (low-stakes exam)
 Final exam for a course
 More global (high-stakes exam)
 University entrance exam (e.g., TOEFL)
How can a teacher achieve
beneficial backwash? (1/2)
 Test the abilities whose development you want to
encourage
 If you want to encourage oral ability, then test oral
ability.
 Sample widely and unpredictably
 It is important that the sample taken should
represent as far as possible the full scope of what is
specified.
 Use direct testing
 If we test directly the skills that we are interested in
fostering, then practice for the test represents
practice in those skills.
How can a teacher achieve
beneficial backwash? (2/2)
 Make testing criterion-referenced
 If the test specifications make clear just what students have to
be able to do (and with what degree of success), then
students will have a clear picture of what they have to
achieve.
 Base tests on objectives
 If tests are based on objectives, rather than on detailed
teaching and textbook content, they will provide a truer
picture of what has actually been achieved.
 Ensure the test is known and understood by students and
teachers
 Students need to understand what the test demands of them.
 Explain the rationale for the test, its specifications, and provide
sample items.
Validity
How can teachers ensure the validity of an
assessment?
Construct validity
 An assessment is said to be valid and have
construct validity if it measures accurately what it
is intended to measure.
 e.g., “reading ability”; “speaking fluency”;
“grammar”
 Does the assessment really test the “construct” it
has set out to test?
 Construct validity used in reference to an
overarching notion of validity.
 Teachers must ensure that their tests truly assess
the skills they have taught in their classrooms.
Content validity
 Content Validity
 If you wish to test “reading ability” the assessment must be
made up of items that test for language skills that are
associated with “reading ability.”
 To ensure content validity, it is not enough just to have
students “read” and require them to answer questions;
the questions must constitute a proper sample of all the
language skills that have been taught in the course.
 Areas that are not tested, tend to be ignored by teachers in
their teaching and students in their learning.
 Unfortunately, the content of tests are usually made up of
what is easiest to test.
 Match assessment content to specifications written for the
course (i.e., class goals & objectives).
Criterion-related validity
 Criterion-related validity refers to the degree to which one
assessment correlates with another assessment.
 Criterion-related validity includes concurrent validity and
predictive validity.
 Concurrent validity is established when the test and the
criterion are administered at about the same time.
 Example – testing of oral and written language abilities
 Predictive validity concerns the degree to which a test can
predict students’ future performance.
 Example – prerequisite course; internship opportunities
 Criterion-related validity is usually investigated through the
use of correlation coefficients.
Validity in scoring
 An assessment should not test more than one
ability (unless it was designed with the intention
to do so!).
 Example – Reading test that also assesses spelling
and grammar; writing test that emphasizes
punctuation
Face validity
 A test is said to have face validity if it looks as if it
measures what it is supposed to measure.
Reliability
How can teachers ensure the reliability of an
assessment?
Reliability
 Reliability refers to the degree to which an
assessment produces stable and consistent results.
 In other words, giving the assessment on X day will result
with pretty much the same results if it had been given
on Y day.
 This is determined through the use of “the reliability
coefficient.”
 test-retest method
 split-half method
 Lado (1961) provides benchmarks to follow:
 vocabulary, grammar, and reading assessments .90-.99
 listening .80-.89
 speaking .70-.79
Scorer reliability
 Quantifying the level of agreement given by the
same or different scorers on different occasions
by means of a coefficient can help ensure scorer
reliability.
 Ex. grading essays
How to make tests more
reliable? (1/3)
 Take enough samples of behavior
 It is not enough to just include enough items, but to
ensure each item is a “fresh start” for the students.
 Exclude items which do not discriminate well
between weaker and stronger students.
 Do not allow candidates too much freedom.
 Write unambiguous items.
 Provide clear and explicit instructions.
How to make tests more
reliable? (2/3)
 Ensure that tests are well laid out and perfectly
legible.
 Make students familiar with format and testing
techniques.
 Provide uniform and non-distracting conditions of
administration.
 Use items that permit scoring which is as objective
as possible.
 Make comparisons between students as direct as
possible (similar to not allowing students too much
freedom).
How to make tests more
reliable? (3/3)
 Create a detailed scoring key.
 Train scorers (if not scoring sheets yourself).
 Agree acceptable responses and appropriate
scores at outset of scoring.
 Identify candidates by number, not name.
 Employ multiple, independent scoring (if
possible).
Relationship between
reliability and validity
 To be valid an assessment must be reliable;
however, it may be possible for an assessment to
be reliable but not valid.
 Ex. writing test that actually assesses translation
 Be careful not to sacrifice validity while ensuring
reliability.
Thank You For Your Attention
References
 Hughes, A. (2003). Testing for language teachers.
Cambridge University Press.
 Lado, R. (1961). Language Testing: The
Construction and Use of Foreign Language Tests.
A Teacher's Book.
Download