Consequantialism

advertisement
VALIDITY - CONSEQUANTIALISM
Assoc. Prof. Dr. Sehnaz Sahinkarakas

“Effect-driven testing” (Fulcher & Davidson,
2007)
“the effect that the test is intended to have and to
structure the test development to achieve that
effect” (p.144)
 What does this mean?

DEFINITION OF VALIDITY



“Overall judgment of the degree to which empirical
evidence and theoretical rationales support the
adequacy and appropriateness of interpretations and
actions on the basis of test scores or other modes of
assessment” (Messick, 1995, p. 741).
What is score?
In general it is “any coding or summarization of
observed consistencies or performance regularities on
a test, questionnaire, observation procedure, or other
assessment devices such as work samples, portfolios,
and realistic problem simulations” (p. 741).



Then validity is making inferences about scores;
scores are the reflections of a test taker’s knowledge
and/or skills based on test tasks.
Different from early definitions of validity: the degree
of correlation between the test and the criterion
(validity coefficient)
In early definition:



there is an upper limit for the possible correlation
it is directly related to the reliability of the test (without
high reliability a test cannot be valid)
New definition (especially after Messicks), validity
changed as the meaning of the test scores, not a
property

Final remarks for validity (and reliability,
fairness…):
not based on just measurement principles;
 they are social values
 correlation coefficients and/or content validity
analysis are not enough to assume validity (Messick).


So, “score validation is an empirical evaluation of
the meaning and consequences of measurement”
(Messick)
CONSTRUCT VALIDITY

What is construct?

To define a concept in such a way that
it becomes measureable (operational definition)
 it can have relationship with other different constructs
(e.g. the more anxious, the less self-confidence)


Construct validity


is the degree to which inferences can be made from the
operational definitions to theoretical constructs those
definitions are based
What does this mean?

Two things to consider in construct validation:
Theory (what goes on in our mind: ideas, theories,
beliefs…)
 Observation (what we see happening around us; our
actual program/treatment)




i.e., we develop something (observation) to reflect
what is in our mind (theory)
Construct validity is assessing how well we have
transformed our ideas/theories to our actual
programs/measures
What does this mean in testing? How do we do it in
testing?
SOURCES OF INVALIDITY



Two major threats:
Construct underrepresentation: assessment
is too narrow: does not include important
dimensions of the construct
Construct-irrelevant variance: assessment is
too broad: contains variance associated with
other distinct constructs
CONSTRUCT-IRRELEVANT VARIABLE




Two kinds
Construct-irrelevant difficulty (e.g., undue
reading text based on subject-matter knowledge):
leads to invalid low scores
Construct-irrelevant easiness (e.g., highly
familiar texts to some): leads to invalid high
scores
What do you think about KPDS/YDS in terms of
threats to validity
SOURCES OF EVIDENCE IN CONSTRUCT
VALIDITY (MESSICK, 1995)
Construct Validity= the evidential basis for score
interpretation
 How do we interpret scores?

Any score interpretation is needed, not just
‘theoretical constructs’
 How do we do this?

EVIDENCE-RELATED VALIDITY



Two types:
Convergent validity consists of providing
evidence that two tests that are believed to
measure closely related skills or types of
knowledge correlate strongly. (i.e. The test
MEASURES what it clasims to measure)
Discriminant validity consists of providing
evidence that two tests that do not measure
closely related skills or types of knowledge do not
correlate strongly. (i.e. The test does NOT
MEASURE irrelevant attributes)
ASPECTS OF CONSTRUCT VALIDITY

Validity is a unified concept but it can be
differentiated into distinct aspects:
Content
 Substantive
 Structural
 Generalizability
 External
 Consequential

CONTENT ASPECT

Content relevance; Representativeness; Technical
quality (to what extent does it represent the
domain?)
It requires identifying the construct DOMAIN to
be assessed
 To what extent does the domain/task cover the
construct
 All important parts of the construct domain
should be covered

SUBSTANTIVE ASPECT



The process of the construct and the degree these
processes are reflected
It includes content aspect in it but empirical
evidence is also needed.
This can be done using a variety of sources; e.g.
think-aloud protocols


The concept bridging content and substantive is
representativeness.
Representativeness has two distinct meanings:
Mental representation (cognitive psyhchology)
 Brunswinkian sense of ecological sampling:
correlation between a cue and a property. (e.g. Color
of banana is a cue and it indicates the ripeness of the
fruit)

STRUCTURAL ASPECT


Related to scoring
The scoring criteria and rubrics should be
rationally developed (based on the constructs)
GENERALIZABILITY



Interpretations should not be limited to the task
assessed
Should be generalizable to the construct domain
(degree of correlation between the task and the
others)
EXTERNAL VARIABLES


Scores’ relationship with other measures and
nonassessment behaviours
Convergent (correspondence between measures of
the same construct) and Discriminant evidence
(distinctness from measures of other constructs)
are important
CONSEQUENCES



Evaluating intended and unintended
consequences of score interpretation both positive
and negative impact
But, negative impact should NOT be because of
the construct underrepresentation or construct
irrelevant variance.
Two facets: (a) justification of the testing based
on score meaning or consequences contributing to
score valuation; (b) function or outcome of the
testing—as interpretaion or applied use
FACETS OF VALIDITY AS A PROGRESSIVE
MATRIX (MESSICKS, 1995, P. 748)
Two facets: (a) justification of the testing based on score meaning or
consequences contributing to score valuation; (b) function or outcome of the
testing—as interpretaion or applied use.
When they are crossed with each other a four-fold classification is obtained
Test Interpretation
Test Use
Evidential Basis
Construct Validity (CV)
CV +
Relevance/Utility(R/U)
Consequential
Basis
CV +
Value Implication (VI)
CV + R/U +
VI + Social Consequences


Construct validity appears in every cell in the figure.
This means:
Validity issues are unified into a unitary concept
 But also distinct features of construct validity should be
emphasized


What is the implication here?


Both meaning and values are interwined in the validation
process.
Thus,

‘Validity and values are one imperative, not two, and test
validation implicates both the science and the ethics of
assessment, which is why validity has force as a social
value’ (Messick, 1995, p. 749).
CONSEQUENTIAL VALIDITY & WASHBACK


Messician view (Unified version) of Construct
Validity = Considering the consequences of test
use (i.e., washback)
What does this mean in validity studies?
Washback is a particular instance of
consequential aspect of construct validity
 Investigating washback and other consequences
is a crucial step in the process of test validation

i.e., Washback is one (not the only) indicator of
consequential aspect of validity
 It is important to investigate washback to
establish the validity of a test





Put it differently:
Modern paradigm of validity comes with its
consequential nature
Test impact is part of a validation argument
Thus, effect-driven testing should be considered:
testers should build tests with the intended
effects in mind

To put it all together
Value implication + Social consequences
= CONSEQUENTIAL VALIDITY
(two fairness-related elements of Messick’s
consequential validity)
IMPLICATION
Positive
washback
Consequential
validity
Promoting
learning
Negative
washback
Lack of
validity
Unfairness
But who brings about washback (positive or
negative)?
 People in classrooms (T / Ss)?
 Test Developers?

For Fulcher and Davidson, it is the people in
classrooms
 Thus more attention should be given to teachers’
beliefs about teaching and learning and the
degree of their PROFESSIONALISM

TASK A9.2

Course book (p.143)

Select one large-scale test you are familiar with.
What is its influence upon whom?
 Does it seem reasonable to define these tests as
their influence as well?

Download