Validity

advertisement
Validity
EDUC 307
Chapter 3
What does this test tell me?
 Validity as defined by Chase: "A test is valid to the
extent that it helps educators make an appropriate
inference about a specified quality of a student."
 Validity as defined by Mertler: "... the degree to
which evidence and theory support the
interpretations of test scores entailed by proposed
uses of tests."
 Validity as defined by Linn and Gronlund: "Validity is
an evaluation of the adequacy and appropriateness
of the interpretations and uses of assessment results."
What is Validity?
 Validity is the soundness of your interpretations and
uses of the students’ assessment results.
A. Validity of a test must be stated in specific terms of
a specific trait, assessments don't have generic
validity.
B. Validity is not an either/or situation. Tests have
varying degrees of validity.
C. To demonstrate validity, test makers, publishers
and teachers, should develop assessment procedures
that ask students to perform acts that are similar as
possible to the skill about which educators are making
a decision.
Nature of Validity
When using the term validity in relation to
testing and assessment, keep these cautions
in mind:

A. Validity applies to the ways we interpret
and use the assessment results, not to the
assessment procedure itself. We sometimes
say, "validity of a test", when it is more correct
to say the validity of the interpretation and
use to be made of the results.
Nature of Validity
 B. Validity is a matter of degree; it
does not exist on an all-or-one
basis. Consequently, we should
avoid thinking assessment results as
valid or invalid. Validity is best
considered in terms of categories
that specify degree, such as high,
moderate or low validity.
Nature of Validity
Validity is always specific to some
particular use or interpretation.
 No assessment is valid for all purposes.
 For example, the results of an arithmetic test
may have a high degree of validity for
indicating computational skill, a low degree
of validity for indicting arithmetical
reasoning, a moderate degree of validity for
predicting success in future mathematics
courses and no validity for predicting
success in art or music.

C.
Evidence of Validity
 There are standard procedures for determining the
level of validity of an assessment.
A. 3 types of validity:
 1. Content Validity - how precisely the test samples a
specified body of knowledge
 2. Criterion Related Validity - how closely the test's
results correspond with a defined criterion of
performance.
 3. Construct Validity - how closely the test relates to
behavioral characteristics described by a
psychological theory.
Content Validity
 Content Validity- planned sampling of the content of a
specified instructional program.
 1. To be valid, a classroom test should reflect the
instructional objectives in the unit relative to the
emphasis in teaching it.
 2. Validity is determined by the extent to which the
test content samples the content of instruction.
 3. Content validity must be considered for every
assessment device.
Criterion Related Validity
Criterion Related Validity - is shown in
the relationship of outcomes of an
assessment device to a performance
criterion.

This relationship is typically shown
by correlation procedures.
Criterion Related Validity
 A. In assessment, correlation is used to show how closely test
scores correspond to the ranking of the skill the test is designed
to predict.
 1. Correlation is a statistical procedure that shows how closely students'
scores on one measure correspond in rank with scores on another.
 a. Correlation is represented by a correlation coefficient, a number that
shows how closely the two sets of data correspond. (.00 to 1.0 positive
correlation and .00 to -1.0 negative correlation)
 b. 1.0 or -1.0 almost nonexistent. Most run between .25 and .80
 c. .85 to 1.0 is high correspondence (between test and performance)
 d. .00 to .29 is low correspondence (between the test and the
performance.)
 e. with negative correlations, as test scores get higher, performance scores
get lower.
 f. errors in prediction - predicted scores typically do not hit the actual ability
of a student but are likely to be a little above or below it.
 g. Tests with correlation coefficients above .50 are better predictors.
Construct Validation
Construct Validation evaluating the
correspondence between
how a test assesses a trait and
how a psychological theory or
construct says it should assess
it.
Finding the Validity of
Published Tests
 Finding the Validity of Published Tests
- publisher's test manuals and the
Mental Measurements Yearbook
 A. Test manuals contain reports on
how tests were developed and results
when tried out.
 B. Mental Measurements Yearbook
is a multi-volume set of reports on
published tests of all types.

Applying Validity
Information
Applying Validity Information - teachers must make practical use of
information about validity to become more intelligent
assessment designers and users.
A. Guidelines for evaluating test validity as reported in
test manuals and journals
 1. Does the validation procedure fit the use to which the
test will be put?
 2. How well was the validation carried out?
 3. Does the evidence reported for the validation support
the use of the test?
 4. Without a reasonable amount of validity, the
assessment procedure is of very limited use to educators.


Consideration of
Consequences
Assessments are intended to contribute to improved student learning.
The question is, do they? And, if so, to what extent? What impact do
assessments have on teaching? What are the possibly negative,
unintended consequences of a particular use of assessment results?
Teachers have an excellent vantage point for considering the likely
effects of assessments. First, they know the learning objectives that
they are trying to help their students achieve. Second, they are quite
familiar with instructional experiences that the students have had.
Third, they have an opportunity to observe students while they are
working on an assessment task and to talk to students about their
performances. This first-hand awareness of learning objectives,
instructional experiences, and students can be brought to bear on an
analysis of the likely effects of assessments by systematically
considering questions such as the following:
Consideration of
Consequences
 1. Do the tasks match important learning objectives?
WYTIWYG - What you test is what you get.- has
become a popular slogan. Despite the fact that it is
an oversimplification, it is a good reminder that
assessments need to reflect major learning
outcomes. Problem-solving skills and complex
thinking skills requiring integration, evaluation, and
synthesis of information are more likely to be fostered
by assessments that require the application of such
skills than by assessments that require students merely
to repeat what the teacher has said or what is stated
in the textbook.
Consideration of
Consequences

2. Is there reason to believe that students
study harder in preparation for the
assessment? Motivating student effort is a
potentially important consequence of tests
and assessments. The chances of achieving
this goal are improved if students have a
clear understanding of what to expect on the
assessment, know how the results will be
used, and believe that the assessment will be
fair.
Consideration of
Consequences
 3. Does the assessment artificially constrain the
focus of students' study? If it is judged important to
be sure that students can solve a particular type of
mathematics problem, for example, then it is
reasonable to focus an assessment on that type of
problem. However, much is missed if such an
approach is the only mode of assessment. In many
cases the identification of the nature of the problem
may be at least as important as facility with
application of a particular formula or algorithm.
Assessments that focus only on the latter skills are not
likely to facilitate development of problem
identification skills.
Consideration of
Consequences
4. Does the assessment encourage or
discourage exploration and creative modes
of expression? Although it is important for
students to know what to expect on an
assessment and have a sense of what to do
to prepare for it, care should be taken to
avoid overly narrow and artificial constraints
that will discourage students from exploring
new ideas and concepts.
Factors Influencing Validity
 A careful examination of test items and
assessment tasks will indicate whether
the test or assessment appears to
measure the subject-matter content and
the mental functions that the teacher is
interested in assessing. The following
factors can prevent the test items or
assessment tasks from functioning as
intended and thereby lower the validity
of the interpretations from the
assessment results.
Factors Influencing Validity
 1. Unclear directions: Directions that do not clearly
indicate to the student how to respond to the tasks
and how to record the responses tend to reduce
validity.
 2. Reading vocabulary and sentence structure too
difficult (construct-irrelevant variance.) Vocabulary
and sentence structure that are too complicated for
the students taking the assessment result in the
assessment's measuring reading comprehension and
aspect of intelligence, which will distort the meaning
of the assessment results.
Factors Influencing Validity
 3. Ambiguity. Ambiguous statements in assessment
tasks contribute to misinterpretations and confusion.
Ambiguity sometimes confuses the better students
more than it does the poor students.
 4. Inadequate time limits(construct-irrelevant
variance.) Time limits that do not provide students with
enough time to consider the tasks and provide
thoughtful responses can reduce the validity of
interpretation of results. Rather than measuring what a
student knows about a topic or is able to do given
adequate time, the assessment may become a
measure of speed with which the student can respond.
For some content (typing) speed may be important.
However, most assessments of achievement should
minimize the effects of speed on student performance.
Factors Influencing Validity
 5. Overemphasis on easy to assess aspects of domain at
the expense of important but hard to assess aspects. It is
easy to develop test questions that assess factual recall
and generally harder to develop ones that tap conceptual
understanding or higher-order thinking processes such as
the evaluation of competing positions or arguments.
Hence, it is important to guard against under
representation of tasks getting at the important, but more
difficult to assess aspects of achievement.
 6. Test items inappropriate for the outcomes being
measured. Attempting to measure understanding, thinking
skills, and other complex types of achievement with test
forms that are appropriate only for measuring factual
knowledge will invalidate the results.
Factors Influencing Validity
 7. Poorly constructed test items. Test items
that unintentionally provide clues to the
answer tend to measure the students'
alertness in detecting clues as well as
mastery of skills or knowledge the test is
intended to measure.
 8. Test too short. a test is only a sample of
the many questions that might be asked. If
a test is too short to provide a representative
sample of the performance we are
interested in, its validity will suffer
accordingly.
Factors Influencing Validity
 9. Improper arrangement of items. Test items are
typically arranged in order of difficulty, with the easiest
items first. Placing difficult items early in the test may
cause students to spend too much time on these and
prevent them from reaching items they could easily
answer. Improper arrangement may also influence
validity by having a detrimental effect on student
motivation. This influence is likely to be strongest with
young students.
 10. Identifiable pattern of answers. Placing correct
answers in some systematic pattern (TT FF TT or ABCD
ABCD) enables students to guess the answers to some
items more easily, and this lowers validity.
Factors Influencing Validity
In short, any defect in the
construction of the test or
assessment that prevents it from
functioning as intended will
invalidate the interpretations to be
drawn from the results.
Evaluating Your Classroom
Assessment Methods
Content Representativeness
 1. Does my assessment procedure emphasize what I
taught?
 2. Do my assessment tasks accurately represent the
outcomes specified in my school’s and state’s
curriculum framework?
 3. Are my assessment tasks in line with the current
thinking about what should be taught and how it
should be assessed?
 4. Is the content in my assessment important and
worth learning?
Evaluating Your Classroom
Assessment Methods
Thinking Processes:
 5. Do the tasks on my assessment instrument require
students to use important skills and processes?
 6. Does my assessment instrument represent the kinds
of thinking skills that my school’s curriculum
framework and state’s standards view as important?
 7. During the assessment, do students actually use
the types of thinking I expect them to use?
 8. Do I allow enough time for students to
demonstrate the type of thinking I am trying to
assess?
Evaluating Your Classroom
Assessment Methods ds
Consistency:
9. Is the pattern of results in the
class consistent with what I
expect based on my other
assessments of them?
10. Do I make the assessment
tasks too difficult or too easy for
my students?
Evaluating Your Classroom
Assessment Methods
Reliability and Objectivity:
 Reliability: refers to the consistency of assessment
results.
 Objectivity: is the degree to which two or more
qualified evaluators will agree on what quality rating
or score to assign a student’s performance.
 11. Do I use a scoring guide for obtaining quality
ratings or scores from students’ performance on the
assessment?
 12. Is my assessment instrument long enough to be
representative sample of the types of learning
outcomes I am assessing?
Evaluating Your Classroom
Assessment Methods
Fairness to Different Types of Students:
 13.Do I word the problems or tasks on my assessment
so students with different ethnic and socioeconomic
backgrounds will interpret them in appropriate ways?
 14. Di modify the wording or the administrative
conditions of the assessment tasks to accommodate
students with disabilities or special learning
problems?
 15. Do the pictures, stories, verbal statements, or
other aspects of my assessment procedure
perpetuate racial, ethnic, or gender stereotypes?
Evaluating Your Classroom
Assessment Methods
Economy, Efficiency, Practically, &
Instructional Procedures
16. Is the assessment relatively easy for me to
construct and not too cumbersome to use to
evaluate students?
17. Would the time needed to use this
assessment be better spent directly teaching
my students instead?
18. Does my assessment represent the best
use of my time?
Evaluating Your Classroom
Assessment Methods
Multiple Assessment Usage:
19. Do I use one assessment result
in conjunction with other
assessment results?
Positive Consequences for
Learning:
20. Do my assessments result in
both the students and myself
Download