Validity of Experimental Designs

advertisement
Reliability and Validity of Dependent
Measures
Validity of Dependent Variables



Does it measure the concept?
Construct Validity: Does DV really capture
what you want to measure (good operational
definition?)
Or does it include mood, culture or gender
bias, confusing wording, observational bias,
etc.
Indicators of Construct Validity



Face Validity: Does it appear to be a good
measure (do experts think so?)
Predictive Validity: Predict later behaviorGRE=grad school success?
Concurrent Validity: Are those known to
diverge different in scores (Self Monitoring)
Indicators of Construct Validity



Convergent Validity: do other kinds of
ratings agree? Similar responses to similar
scales
Divergent validity: is it different from other
constructs? (measures intell, not SES or
gender bias) shy isn’t lonliness
Reactivity- knowing you are being studied
changes behavior
Reliability of DV



Are results repeatable?
All measurement contains true score plus
error of measurement
Not an issue of replication- same
subjects=same scores
Types of Reliability




Inter-rater reliability- calculate r for
observers or Cohen’s Kappa
Internal consistency- split half reliability
Cronbach’s Alpha calculates ave of all
possible corr.
Temporal consistency- test-retest reliability
with SAME people
Restaurant example

Can a variable be reliable and not valid?
Valid and not reliable?

How do you know you have a good DV?

–
Mental Measurements Yearbook
Validity of Experimental Designs
Survey Design
Internal validity

Does the design test the hypothesis we want
it to test? Did IV manipulation cause change
in DV? Can we infer causality?

What if internal validity is low?
External validity

Does your study represent a broad
population?

Caution with Discussion Section if weak
Random Sampling

–
–
Stratified Sampling
Block Randomization
Ecological validity
Does study reflect the real worlddo people really behave this way?
Can you study anything without changing it?
Threats to Internal Validity:

In pre-post design:
–
–
–
–
Test participants
Administer IV
Post test for effect of IV
Compare pre vs. post results to look for effect of
IV
History

World events may cause change in attitudes
or behavior over time.

Tests of patriotism pre/post 9/11
Views of President pre/post Katrina
Attitudes of adolescents pre/post Cobain
suicide


Maturation

Individuals change over time as they mature.

Issue for studies of children, but also huge
growth in freshman year- change of attidues
and behavior.
Testing

The study you use may cause differences in
behavior.

Similar to REACTIVITY, but for entire study
not just DV. Parenting study for example
Instrumentation

Use of instrument may get better or worse
with time

Observation studies
Testing skill/ interviewing

Regression toward the mean

Extreme scores do not tend to be repeatablethose who score very high or very low on a
test will be closer to the average if tested
again.

A big issue for any study where pretest is
used to select subjects for post test.
Mortality

Those who drop out of your study may differ
from those who choose to continue.
Placebo effect

If given any treatment, behavior will change,
even if treatment was not meaningful. (fake
drugs get some results)
How can we improve internal validity?







History
Maturation
Testing
Instrumentation
Regression toward the mean
Mortality
Placebo effect
Improved Design

In pre-post design:
–
–
–
–
Test participants
Administer IV
Post test for effect of IV
Compare pre vs. post
results to look for effect
of IV




Two Group design
Pretest (do you need to
do this?)
RANDOMIZED
assignment to levels of
IV
Compare post test
results of IV and
Control groups
Extraneous Variables

Any variable that you have not measured or
controlled (RA) that may impact the results of
your study
Demand Characteristics

Participants behave in ways demanded by
the situation or experimental set-up.
Behavior does not reflect actual beliefs or
attitudes.

Issue of Ecological Validity
Subject Bias

Bias brought on by subjects beliefs
(Overhead of mood and menstrual cycle)
Social desirability

Subjects want to do the “right thing” and try
to guess what the experimenter wants, and
do not behave naturally.

How to reduce Subject biases?
Experimenter Bias

Experimenters’ behavior and expectations
can sway results of test.

How to reduce these biases?
Floor & Ceiling Effects

If measures are too easy or too difficult you
will not see differences between groups.

Pilot test with similar subjects!
Order effects

When using within subjects designs, order of
presentation can affect results in several ways.
Practice effects: Subjects get better at task with
successive trials
Fatigue effects: Subjects get tired and do worse
or lose interest
Carryover effects: subjects experience in one
condition impacts results of another
condition- subject bias or anchoring and
adjustment issues.
How to reduce order effects

Counterbalancing
–
–
–
–
Does not get rid of effects, it just makes them
equal for all groups. Can do complete
counterbalancing if small number of conditions.
Latin Square counterbalancing
A, B, skip, C, skip, D, etc. then fill back
A, B, N, C, N-1, D, N-2, E etc.
A Latin Square for 6 conditions
Order
1
2
A
B
B
C
F
A
C
D
E
F
D
E
3
C
D
B
E
A
F
4
D
E
C
F
B
A
5
E
F
D
A
C
B
6
F
A
E
B
D
C
Pretest Vs. Pilot test

When do you use a pilot test?

When do you use a pre test?
Can a DV be reliable but not valid?
Experimental Validity

What to do if low Internal Validity?

What are impacts of low External Validity?

What if Ecological Validity is low?
Download