Ch 4. Validity: What the Test Measures

advertisement
Chapter 4. Validity:
Does the test cover what we are told (or believe)
it covers?
To what extent?
Is the assessment
being used for an
appropriate purpose?
1
Validity Topics:
 Definition (usual and refined)
 Categories of validity evidence





A. face validity
B. content validity : table of specifications, alignment
analysis, opportunity to learn
C. criterion-related validity
D. construct validity
E. consequential validity
 Test fairness
2
Introduction
Without good validity, all else is lost. Validity is the most important
characteristic of a test or assessment technique.
 Usual Definition:
 It measures what it purports to measure.
 Refined Definition:
 It involves the interpretation of a score for a particular
purpose or use (because, a score may be valid for one
use but not another)
 It is a matter of degree, not all-or-none. As a practical
matter, our concern is to determine the extent (for
example in non-mathematical terms we might say: slight,
moderate, considerable)
3
Some Helpful Terms
 Construct:
The trait or characteristic that interests us. We might call it a
“target” or “what we want to get at”. We create a test to
“cover” this attribute.
 Validity addresses how well an assessment technique
provides useful information about the construct / target.
 Construct underrepresentation:
 The test we made is not assessing all of the construct; our
test misses things we should be assessing.
 Construct irrelevant variance:
 The test we made is assessing things that are not really part
of our construct; we are assessing irrelevant stuff that we
don’t want.
[see next two slides for illustrations]

4
The Construct and Valid Measurement
5
Varying Degrees of
Construct Underrepresentation and
Construct Irrelevant Variance
6
A. Face Validity
Think of the idiom “on the face of it . . .”
 A test is said to have face validity if it "looks like" it is
going to measure what it is supposed to measure
 Face validity is not empirical; one is saying that the
test “appears it will work,” as opposed to saying “it
has been shown to work.”
 Face validity is often “created” to influence the
opinions of participants who are not expert in testing
methodologies, e.g. test takers, parents, politicians.
7
B. Content Validity
Most used in achievement tests and employment exams
 Meaning of this type of validity
 there is a good match between the content of the test and some
well-defined domain of knowledge or behavior. Reference to
content defines the orientation of the test.
 For teachers, considered most important type of validity for


your own classroom tests
achievement tests
 Where do we find the “well-defined domain”



Examination of textbooks in the field with special attention to the learning
objectives at beginning of chapter and terms at the end.
Curriculum guides of school districts
Ohio’s Academic Content Standards
So, we now we have the content topics identified, but what should we actually
expect “students to know and be able to do” in relation to these topics? This
question deals with “process” or “depth” indicators. How should we make sure we
include both the content and the depth expected in our tests?
8
The Table of Specifications
Building content validity into my own classroom tests
 Table of Specifications – this connects the content determined
earlier to the mental processes students are expected to employ
regarding this content
 Two way table


Content
Bloom’s taxonomy (simplest mental operation to the most complex)
Each test item I create then falls into one cell
 By creating the table, I can see the relative weight assign to
each cell. Is this what I want?

9
Alignment Analysis
Checking content validity in existing tests
 These steps are parallel to building your own good test and the
table of specifications construction. There are some things to
watch for and consider as you do this:
 Be wary of using the summary outline provided by the test
maker; examine the actual test items
 Match items on test with content you are teaching; watch for
mismatches



Items on the test you are not teaching
Content you are teaching that is not tested
This matching requires considerable judgment


The test does not have to cover every detail; it could be a
representative sample
If stakes are high, use a panel of individuals
10
Opportunity to Learn
But was it taught . . .
An emerging idea related to content validity is a concern
called instructional validity. This relates to your
behavior as teacher. The content may be in the book;
the content may be in the state standards . . . BUT . . .
did you actually teach it? Some teachers skip items of
instruction they don’t like, don’t understand or don’t
have time for.
If related items appear on a test, this would reduce the
validity of the test since the students had no opportunity
to learn the knowledge or skill being assessed.
11
C. Criterion-Related Validity
While the term “test” is used, also think “measure” or “procedure”


The basic idea – to demonstrate the degree of accuracy of a test by comparing
it with another “test, measure or procedure which has been demonstrated to be
valid” (i.e. a valued criterion).
Two general contexts
 predictive validity - one measure is now one is later. The later test is known
to be valid. This approach allows me to show my current test is valid by
comparing it to a future valid test.

For example, a behind-the-wheel driving test has been shown to be an accurate
test of driving skills. By comparing the scores on a written rules-of-the-road test
with the scores from the driving test, the written test can be validated by using a
criterion related strategy.
concurrent validity – both measures are current. This approach allows me
to show my test is valid by comparing it with an already valid test. I can do
this if I can show my test varies directly with a measure of the same
construct or indirectly with a measure of an opposite construct.
The computed statistic in both cases is “r” (which we now call a validity
coefficient) and it has all the characteristics we have already discussed about
correlations coefficients in general.


12
Special Considerations for Interpreting
Criterion-Related Validity
 Group Variability

Greater the variability, the greater the “r”.
 Reliability-Validity Relationship

Reliability limits validity; reliability is a prerequisite
to validity
 Validity of the Criterion

How good is the criterion? Do you agree with the
operational definition of the critierion?
13
D. Construct Validity
 When we ask about a test’s construct validity, we are taking a
broad view of the test. Does the test adequately measure the
underlying, unobserved construct? The question is asked both
in terms of


convergent validity, are test scores related to behaviors and tests
that it should be related to and
divergent validity, are test scores unrelated to behaviors and tests
that it should be unrelated to?
 There is no single measure of construct validity. Construct
validity is based on the accumulation of knowledge about the
test and its relationship to other tests and behaviors.
 To establish construct validity, we demonstrate that the measure
changes in a logical way when other conditions change.
14
E. Consequential Validity
Recent controversial entry into assessment lexicon . . .
 Some professionals feel that, in the real world, the consequences that
follow from the use of assessments are important indications of validity.
 Some professionals feel that these consequences are matters of
politics and policymaking; important considerations, yes, but not
matters of validity.
 On which side are we? As educators, we sometimes see the
consequences as more important than the technical validity of the test.
Judgments based on assessments we give and use have value
implications and social consequences.
 What is the intended use of these test scores?
 How are the scores really being used?
 Does this testing lead to educational benefits?
 Are there negative spin-offs?
15
Test Fairness, Test Bias
 Test fairness / test bias have the same
meaning with opposite connotations
 Fairness – an assessment or test measures a
trait, construct, or target with equal validity for
different groups.
 Bias – the groups do not differ in terms of real
status on the trait, construct, or target being
assessed; yet, the test suggests they do.
16
Methods of Reviewing Fairness
 Test Companies : (look in test manual to see what a particular
company did about test fairness issues on this test)
 Panel review - most “popular” but is this just face validity?
 Differential item functioning (DIF) - subsets
 Criterion-related validity – whole test
 Teacher –Created Assessments : (teachers need to be
knowledgeable about, and sensitive to, issues of test fairness)
 Is there anything about my test that will unfairly advantage or
disadvantage a student or group of students?
 Is there anything about the mechanics of the test that calls
for skills other than those I intend to measure?
17
Practical Advice
1.
For building your own tests, think content validity.
2.
For judging externally prepared achievement test, start with a
clear definition of what’s to be covered.
3.
For criterion-related validity, take into account group variability;
and think about validity of the criterion.
4.
For test fairness (bias), distinguish between differences in
groups’ average scores and group status on the trait.
5.
For your own assessments, try to eliminate the influence of
any factors not related to what you want to measure.
18
Terms Concepts to Review and
Study on Your Own (1)
 alignment analysis
 Bloom’s taxonomy
 concurrent validity
 consequential validity
 construct
 construct irrelevant variance
 construct underrepresentation
 construct validity
 content validity
 criterion-related validity
19
Terms Concepts to Review and
Study on Your Own (2)
 differential item functioning (DIF)
 external criterion
 face validity
 Fairness (or its opposite, bias)
 instructional validity
 opportunity to learn
 predictive validity
 table of specifications (two-way table)
 validity
 validity coefficient
20
Download