Validity - People Server at UNCW

advertisement
Validity of Assessments
Exercise: An Officer’s Question
You say that your MCCSSS course teaches students to (reason well) in (a specific
domain). What evidence can you provide that it actually does so?
Three Key Concepts in Judging the Quality of an Assessment



Validity
Reliability
Usability
Why should you be bothered with these concepts, anyway?







Appreciate why all assessments contain error
Know the various sources of error
Understand that different kinds of assessments are prone to different kinds of
error
Build assessments with less error
Know how to measure error, if need be
Know what is safe—and not safe—to conclude from assessment results
Decide when certain assessments should not be used
Validity
Definition: Appropriateness of how scores are interpreted [and used]*


That is, to what extent does your assessment measure what you say it does [and is
as useful as you claim]?
Stated another way: To what extent are the interpretations and uses of a test
justified by evidence about its meaning and consequences.
*Appropriate "use" of tests is a controversial recent addition to the definition of
"validity." That is probably why your textbook is inconsistent in how it defines it.
Validity
Very important points. Validity interpretations are:
1.
2.
3.
4.
a matter of degree ("how valid")
always specific to a particular purpose ("validity for…")
a unitary concept (four kinds of evidence to make one judgment—"how valid?")
must be inferred from evidence; cannot be directly measured
Validity of Assessments 1
Validity Evidence
Four interrelated kinds of evidence:
1.
2.
3.
4.
content
construct
criterion
consequences
Questions Guiding Validation
1. What are my learning objectives?
o Did my test really address those particular objectives?
2. Do the students' test scores really mean what I intended?
o What may have influenced their scores?
 growth
 instruction
 intelligence
 cheating
 etc.
3. Did testing have the intended effects?
o What were the consequences of the testing process and scores obtained?
What is an achievement domain?
A carefully specified set or range of learning outcomes; in short, your set of
instructional objectives.
Content-Related Evidence
Definition: The extent to which an assessment’s tasks provide a relevant and
representative sample of the domain of outcomes you are intending to measure.
The evidence:
1.
2.
3.
most useful type of validity evidence for classroom tests
domain is defined by learning objectives
items chosen with table of specifications
Content-Related Evidence
Important points:


is an attempt to build validity into the test rather than assess it after the fact
sample can be faulty in many ways
a. inappropriate vocabulary
Validity of Assessments 2


b. unclear directions
c. omits higher order skills
d. fails to reflect content or weight of what actually taught
"face validity" (superficial appearance) or label does not provide evidence of
validity
assumes that test administration and scoring were proper
What is a construct?
A hypothetical quality or construct (e.g., extraversion, intelligence, mathematical
reasoning ability) that we use to explain some pattern of behavior (e.g., good at
making new friends, learns quickly, good in all math courses).
Construct-Related Evidence
Definition: The extent to which an assessment measures the construct (e.g., reading
ability, intelligence, anxiety) the test purports to measure
Construct-Related Evidence
Some kinds of evidence:





see if items behave the same (if test meant to measure a single construct)
analyze mental processes required
compare scores of known groups
compare scores before and after treatment (do they change in the way your theory
says they will and will not?)
correlate scores with other constructs (do they correlate well—and poorly—in the
pattern expected?)
Construct-Related Evidence
Important points:



usually assessed after the fact
usually requires test scores
is a complex, extended logical process; cannot be quantified
What is a criterion?
A valued performance or outcome (e.g., scores high on a standardized achievement
test in math, later does well in an algebra class) that we believe might—or should—
be related to what we are measuring (e.g., knowledge of basic mathematical
concepts).
Validity of Assessments 3
Criterion-Related Evidence
Definition: The extent to which a test’s scores correlate with some valued performance
outside the test (the criterion)
The evidence:


concurrent correlations (relate to a different current performance)
predictive correlations (predict a future performance)
Clarification: The word "criterion" is used in a second sense in testing, so don't get them
confused. In this context it means some outcome that we want to predict. In the other
sense, it is a performance standard against which we are comparing students' scores. In
the latter sense, it is used to distinguish "criterion-referenced" interpretations of test
scores from "norm-referenced" test scores. Susan reads at the "proficient" level would be
a criterion-referenced interpretation. (She reads better than 65% of other students would
be a norm-referenced interpretation.)
What is a correlation?
A statistic that indicates the degree of relationship between any two sets of scores
obtained from the same group of individuals (e.g., correlation between height and
weight).
Called:


validity coefficient when used in calculating criterion-related evidence of validity
reliability coefficient when used in calculating reliability of test scores
Criterion-Related Evidence
Important points:






always requires test scores
is quantified (i.e., a number)
must be interpreted cautiously because
irrelevant factors can raise or lower validity coefficients (unreliability, spread of
scores, etc.)
often hard to find a good "criterion"
can be used to create "expectancy tables"
What is a consequence?
Any effect that your assessment has—or fails to have—that is important to you or the
other people involved.
Validity of Assessments 4
Consequences-Related Evidence
Definition: The extent to which the assessment serves its intended purpose (e.g.,
improves performance) and avoids negative side-effects (e.g., distorts the curriculum)
Possible types of evidence:




did it improve performance? motivation? independent learning?
did it distort the focus of instruction?
did it encourage or discourage creativity? exploration? higher level thinking?
etc.
Consequences-Related Evidence
Important points:

usually gathered after assessment is given


scores may be interpreted correctly but the test still have negative side-effects
have to weigh the consequences of not using the assessment (even if it has
negative side-effects). Is the alternative any better—or maybe worse?
judging consequences is a matter of values, not psychometrics

Sources of Threats to Validity: Can you give examples of each?
1.
2.
3.
4.
5.
tests themselves
teaching
administration and scoring
students
nature of group or criterion
Validity of Assessments 5
Download