Validity and reliability

advertisement
Validity and reliability
In Research
•Agenda
AT the end of this lesson, you should be able to:
1
Discuss validity
2
Discuss reliability
3
Discuss how to achieve validity and reliability
4
Discuss validity in qualitative research
5
Discuss validity in experimental design
Reliability
The consistency of scores or answers from one administration of
an instrument to another, or from one set of items to another.
 A reliable instrument yields similar results if given to a similar
population at different times.

Validity


Appropriateness, meaningfulness, correctness, and
usefulness of inferences a researcher makes.
Validity of ??

Instrument?

Data?
Validity
•
•
Internal validity is the extent to which research findings are free
from bias and effects
External validity is the extent to which the findings can be
generalised
Validity - Content-related evidence
•
•
•
•
•
Content-related evidence of validity focuses on the content and
format of an instrument.
Is it appropriate?
Comprehensive?
Is it logical?
How do the items or questions represent the content? Is the
format appropriate?
Validity - Criterion-related evidence
This refers to the relationship between the scores obtained using
the instrument and the scores obtained using one or more
other instruments or measures. For example, are students’
scores on teacher made tests consistent with their scores on
standardized tests in the same subject areas?
Validity - Construct-related evidence
Construct validity is defined as “establishing correct operational
measures for the concepts being studied” (Yin, 1984).
For example, if one is looking at problem solving in leaders, how
well does a particular instrument explain the relationship
between being able to problem solve and effectiveness as a
leader.
ATTAINING VALIDITY AND
RELIABILITY
Elements of content-related evidence
Adequacy : the size and scope of the questions must be large
enough to cover the topic.
 Format of the instrument: Clarity of printing, type size,
adequacy of work area, appropriateness of language, clarity of
directions, etc.

How to achieve content validity



Consult other experts who rate the items.
Rate items, eliminating or changing those that do not meet the
specified content.
Repeat until all raters agree on the questions and answers.
Criterion-related validity
To obtain criterion-related validity, researchers identify a
characteristic, assess it using one instrument (e.g., IQ test) and
compare the score with performance on an external measure,
such as GPA or an achievement test.
Validity coefficient
A validity coefficient is obtained by correlating a set of scores on
one test (a predictor) with a set of scores on another (the
criterion).
 The degree to which the predictor and the criterion relate is the
validity coefficient. A predictor that has a strong relationship to
a criterion test would have a high coefficient.

Construct-related validity
This type of validity is more typically associated with research
studies than testing.
 It relates to psychological traits, so multiple sources are used to
collect evidence. Often times a combination of observation,
surveys, focus groups, and other measures are used to identify
how much of the trait being measured is possessed by the
observee.

Proactive
Coping Skills
Reliability
The consistency of scores obtained from one
instrument to another, or from the same
instrument over different groups.
Errors of measurement
Every test or instrument has associated with its errors of
measurement.
 These can be due to a number of things: testing conditions,
student health or motivation, test anxiety, etc.
 Test developers work hard to try to ensure that their errors are
not grounded in flaws with the test itself.

Reliability Methods
Test-retest: Same test to same group
 Equivalent-forms: A different form of the same instrument is
given to the same group of individuals
 Internal consistency: Split-half procedure
 Kuder-Richardson: Mathematically computes reliability from
the # of items, the mean, and the standard deviation of the test.

Reliability coefficient
•
•
Reliability coefficient - a number that tells us how likely one
instrument is to be consistent over repeated administrations
Alpha or Cronbach’s alpha
•
used on instruments where answers aren’t scored “right” and “wrong”.
It is often used to test the reliability of survey instruments.
Standard error of the measurement
This is a calculation that shows the extent to which a
measurement would vary under changed circumstances. In
other words, it tells you how much of the error is due to issues
related to measuring.
INTERNAL VALIDITY
•Validity
• Validity
can be used in three ways.
instrument or measurement validity
• external or generalization validity
• Internal validity, which means that what a
researcher observes between two variables should
be clear in its meaning rather than due to
something that is unclear (“something else”)
•
What is “something else”?
•
Any one (or more) of these conditions:
•
Age or ability of subjects
•
Conditions under which the study was conducted
•
Type of materials used in the study
•
Technically, the “something else” is called a threat to internal validity.
Threats to internal validity
Subject characteristics
• Loss of subjects
• Location
• Instrumentation
• Testing
• History
• Maturation
• Attitude of subjects
• Implementation
•
•Subject characteristics
• Subject
characteristics can pose a threat if
there is selection bias, or if there are
unintended factors present within or among
groups selected for a study. For example, in
group studies, members may differ on the basis
of age, gender, ability, socioeconomic
background, etc. They must be controlled for
in order to ensure that the key variables in the
study, not these, explain differences.
•Subject characteristics
• Age
• Strength
• Maturity
• Gender
• Ethnicity
• Coordination
• Speed
Intelligence
Vocabulary
Reading ability
Fluency
Manual dexterity
Socioeconomic status
Religious/political belief
•Loss of subjects (mortality)
•
Loss of subjects limits generalizability, but it can also
affect internal validity if the subjects who don’t
respond or participate are over represented in a
group.
•Location
• The
place where data collection occurs, aka
“location” might pose a threat. For example,
hot, noisy, unpleasant conditions might affect
test scores; situations where privacy is
important for the results, but where people are
streaming in and out of the room, might pose a
threat.
Instrumentation
•
•
•
Decay: If the nature of the instrument or the scoring procedure
is changed in some way, instrument decay occurs.
Data Collector Characteristics: The person collecting data can
affect the outcome.
Data Collector Bias: The data collector might hold an opinion
that is at odds with respondents and it affects the
administration.
Testing
•
•
In longitudinal studies, data are often collected through more
than one administration of a test.
If the previous test influences subsequent ones by getting the
subject to engage in learning or some other behavior that he or
she might not otherwise have done, there is a testing threat.
History
•
If an unanticipated or unplanned event occurs prior to a study
or intervention, there might be a history threat.
Attitude of subjects
•
Sometimes the very fact of being studied influences subjects.
The best known example of this is the Hawthorne Effect.
Implementation
•
This threat can be caused by various things; different data
collectors, teachers, conditions in treatment, method bias, etc.
Minimizing Threats
•
•
•
•
•
Standardize conditions of study
Obtain more information on subjects
Obtain as much information on details of the study: location,
history, instrumentation, subject attitude, implementation
Choose an appropriate design
Train data collectors
Qualitative Research
Validity and reliability??
•Qualitative research
.
•
•
•
Many qualitative researchers
contend that validity and
reliability are irrelevant to their
work because they study one
phenomenon and don’t seek to
generalize
Fraenkel and Wallen - any
instrument or design used to
collect data should be credible
and backed by evidence
consistent with quantitative
studies.
Trustworthiness
•Quantitative vs. Qualitative
Traditional Criteria for Alternative Criteria for
Judging Quantitative Judging Qualitative
Research
Research
Internal validity
Credibility
External validity
Transferability
Reliability
Dependability
Objectivity
Confirmability
In qualitative research
•
•
•
Reliability pertained to the extent to which the study is
replicable and how accurate the research methods and the
techniques used to produce data
Objectivity of the researcher - researcher must look at her bias
and preconceived notions of what she will find before she
begins her research.
Objectivity of the interviewee
In qualitative research
•
•
•
Triangulation
Member check
Audit trail
Let’s look at one particular design
Validity in experimental research
Experimental
Designs Should
be Developed to
Ensure Internal
and External
Validity of the
Study
Internal Validity:
•
Are the results of the study
(DV) caused by the factors
included in the study (IV) or
are they caused by other
factors (EV) which were not
part of the study?
Threats to Internal Validity
Subject
Characteristics
(Selection Bias/Differential Selection) -- The groups may have been
different from the start. If you were testing instructional strategies to
improve reading and one group enjoyed reading more than the
other group, they may improve more in their reading because they
enjoy it, rather than the instructional strategy you used.
Threats to Internal Validity
Loss of Subjects
(Mortality) -- All of the high or low scoring subject may
have dropped out or were missing from one of the
groups. If we collected posttest data on a day when the
debate society was on field trip , the mean for the
treatment group would probably be much lower than it
really should have been.
Threats to Internal Validity
Location
Perhaps one group was at a
disadvantage because of their
location. The city may have been
demolishing a building next to one of
the schools in our study and there are
constant distractions which interfere
with our treatment.
Threats to Internal Validity
The testing instruments may not be scores similarly.
Perhaps the person grading the posttest is fatigued
and pays less attention to the last set of papers
reviewed. It may be that those papers are from one
of our groups and will received different scores than
the earlier group's papers
Instrumentation
Instrument Decay
Threats to Internal Validity
The subjects of one group may react differently to the data collector
than the other group. A male interviewing males and females about
their attitudes toward a type of math instruction may not receive the
same responses from females as a female interviewing females would.
Data Collector
Characteristics
Threats to Internal Validity
The person collecting data my favors one group, or some
characteristic some subject possess, over another. A principal
who favors strict classroom management may rate students'
attention under different teaching conditions with a bias toward
one of the teaching conditions.
Data Collector Bias
Threats to Internal Validity
Testing
The act of taking a pretest or posttest may influence the results of the
experiment. Suppose we were conducting a unit to increase student
sensitivity to racial prejudice. As a pretest we have the control and
treatment groups watch a movie on racism and write a reaction essay.
The pretest may have actually increased both groups' sensitivity and we
find that our treatment groups didn't score any higher on a posttest given
later than the control group did. If we hadn't given the pretest, we might
have seen differences in the groups at the end of the study.
Threats to Internal Validity
History
Something may happen at one site during our study that influences the results.
Perhaps a classmate was injured in a car accident at the control site for a study
teaching children bike safety. The control group may actually demonstrate more
concern about bike safety than the treatment group.
Threats to Internal Validity
There may be natural changes in
the subjects that can account for
the changes found in a study. A
critical thinking unit may appear
more effective if it taught during a
time when children are developing
abstract reasoning.
Maturation
Threats to Internal Validity
Hawthorne Effect
The subjects may respond differently just because they are being studied. The
name comes from a classic study in which researchers were studying the effect
of lighting on worker productivity. As the intensity of the factory lights increased,
so did the worker productivity. One researcher suggested that they reverse the
treatment and lower the lights. The productivity of the workers continued to
increase. It appears that being observed by the researchers was increasing
productivity, not the intensity of the lights.
Threats to Internal Validity
One group may view that it is in competition with the other group and may work
harder than they would under normal circumstances. This generally is applied to
the control group "taking on" the treatment group.
John
Henry
Effect
Threats to Internal Validity
The control group may become discouraged because it is not
receiving the special attention that is given to the treatment
group. They may perform lower than usual because of this.
Resentful
Demoralization of
the Control Group
Threats to Internal Validity
Regression
(Statistical Regression) -- A class that scores particularly low can
be expected to score slightly higher just by chance. Likewise, a
class that scores particularly high, will have a tendency to score
slightly lower by chance. The change in these scores may have
nothing to do with the treatment.
Threats to Internal Validity
The treatment may not be implemented as intended. A
study where teachers are asked to use student modeling
techniques may not show positive results, not because
modeling techniques don't work, but because the teacher
didn't implement them or didn't implement them as they
were designed.
Implementation
Threats to Internal Validity
Someone may feel sorry for the control group because they
are not receiving much attention and give them special
treatment. For example, a researcher could be studying the
effect of laptop computers on students' attitudes toward
math. The teacher feels sorry for the class that doesn't have
computers and sponsors a popcorn party during math
class. The control group begins to develop a more positive
attitude about mathematics.
Compensatory
Equalization of
Treatment
Threats to Internal Validity
Experimental Treatment
Diffusion
Sometimes the control group actually
implements the treatment. If two different
techniques are being tested in two
different third grades in the same
building, the teachers may share what
they are doing. Unconsciously, the control
may use of the techniques she or he
learned from the treatment teacher.
Once the researchers are confident that
the outcome (dependent variable) of the
experiment they are designing is the
result of their treatment
(independent variable)
[internal validity],
they determine for which
people or situations
the results of
their study apply
[external validity].
External Validity:
•
Are the results of the study generalizable to other
populations and settings?
•
Population
•
Ecological
Threats to External Validity (Population)
Population Validity is the extent to which the results of a study can be generalized
from the specific sample that was studied to a larger group of subjects. It involves...
...the extent to which one can generalize from the study sample to a defined
population-If the sample is drawn from an accessible population, rather than the target
population, generalizing the research results from the accessible population to the
target population is risky.
Threats to External Validity (Ecological)
Ecological Validity
is the extent
to which the results of an experiment can be generalized from the set
of environmental conditions created by the researcher to other
environmental conditions (settings and conditions).
There are 10 common
threats to external
validity.
Threats to External Validity (Ecological)
Explicit description of
the experimental
treatment
(not sufficiently described for others to replicate) If the
researcher fails to adequately describe how he or
she conducted a study, it is difficult to determine
whether the results are applicable to other
settings.
Threats to External Validity (Ecological)
Multiple-treatment
interference
(catalyst effect)
If a researcher were to apply several treatments,
it is difficult to determine how well each of the
treatments would work individually. It might be
that only the combination of the treatments is
effective.
Threats to External Validity (Ecological)
Hawthorne effect
(attention causes differences)
Subjects perform differently because they know they
are being studied. "...External validity of the experiment
is jeopardized because the findings might not
generalize to a situation in which researchers or others
who were involved in the research are not present"
(Gall, Borg, & Gall, 1996, p. 475)
Threats to External Validity (Ecological)
Novelty and
disruption effect
(anything different makes a difference)
A treatment may work because it is novel and the subjects respond to the
uniqueness, rather than the actual treatment. The opposite may also occur,
the treatment may not work because it is unique, but given time for the
subjects to adjust to it, it might have worked.
Threats to External Validity (Ecological)
Experimenter effect
(it only works with this experimenter)
The treatment might have worked because of the
person implementing it. Given a different person, the
treatment might not work at all.
Threats to External Validity (Ecological)
Pretest sensitization
(pretest sets the stage)
A treatment might only work if a pretest is
given. Because they have taken a pretest, the
subjects may be more sensitive to the
treatment. Had they not taken a pretest, the
treatment would not have worked.
Threats to External Validity (Ecological)
Posttest sensitization
(posttest helps treatment "fall into place")
The posttest can become a learning experience. "For
example, the posttest might cause certain ideas presented
during the treatment to 'fall into place' “ . If the subjects had
not taken a posttest, the treatment would not have worked.
Threats to External Validity (Ecological)
Interaction of
history and
treatment effect
(...to everything there is a time...)
Not only should researchers be cautious about generalizing to other
population, caution should be taken to generalize to a different time
period. As time passes, the conditions under which treatments work
change.
Threats to External Validity (Ecological)
Measurement of
the dependent
variable
(maybe only works with M/C tests)
A treatment may only be evident with certain types of
measurements. A teaching method may produce
superior results when its effectiveness is tested with an
essay test, but show no differences when the
effectiveness is measured with a multiple choice test.
Threats to External Validity (Ecological)
Interaction of time
of measurement
and treatment
effect
(it takes a while for the treatment to kick in)
It may be that the treatment effect does not occur until several weeks after the end
of the treatment. In this situation, a posttest at the end of the treatment would
show no impact, but a posttest a month later might show an impact.
NEXT WEEK
Consultation
Download