The Basics of Experimentation - Where can my students do

advertisement
The Basics of Experimentation
Lecture Outline
I. Independent variables
A. Definition
B. Manipulated vs. Subject
II. Dependent variables
III. Operational definition
A. Experimental operational definitions
B. Measured operational definitions
C. Scales of Measurement
D. Evaluation of operational definitions
1. Reliability
2. Validity
i. face
ii. content
iii. predictive
iv. construct
IV. Evaluation of the experiment
Internal Validity
Extraneous variables
Confounding variables
Classic threats to internal validity
History
Maturation
Testing
Instrumentation
Statistical regression
Selection
Subject mortality
Selection interactions
External Validity
• ----------------------IN CLASS CONFOUNDS PROJECT -------------
• What is an experiment?
• Smart-Pill Study
– Independent Variable?
– Dependent Variable?
• Operational definition
Independent variable vs. Subject variabls
• Subject Variables
– characteristics about our subjects that we
might be interested in but we cannot randomly
assign
•
•
•
•
Age
Height
Weight
Gender
– not a true independent variable
• not manipulated
• AKA predictor variables
• Both independent variables and
dependent variables must be carefully
defined
– Operational Definitions
• Experimental Operational Definitions
– Operational Definition of IV
• Measured Operational Definitions
– Operational Definition of DV
• How we operationally define our constructs can
affect the outcome of a study.
– Schacter (1959)
– The effect of anxiety on the need to affiliate
• Manipulated anxiety by telling subjects different stories about
how much shock will hurt
– Gp1
» Dr Gregor Zilstein
» It is necessary for the shocks to be intense”
– Gp 2
» The shock will be more like a tickle
• Measured whether the subjects waited with others or alone
• Sarnoff, and Zimbardo (1961)
– Does anxiety cause a need to affiliate
– Male subjects
• we are studying the physiology of the mouth
– High anxiety
• Suck on Pacifier
• Baby bottle
• Breast shield
– Low anxiety
• Whistles
• Kazoos
• We also must operationally define
nonconstruct variables
– Drug exposure – not an abstract concept like
anxiety
• Amnt.
• Route of administration
– IV, IP, SC, oral
• Injection volume
• When we operationally define our
measurement (DV) we also consider what
scale of measurement we are using
• Scales of measurement
– Nominal
– Ordinal
– Interval
– ratio
• Nominal
– This is the simplest level of measurement
– Classification by name only
– Classifies items into two or more distinct categories
• You are either one or the other
– You can be Democratic or Republican,
• we don’t know how democratic or republican you are.
– Gender
– Ethnicity
– Marital status
– Jenni (1976)
• Book carrying techniques
– type1 - wrap one or both arms around the books with the short
edges of the books resting on the hip or in front of the body
– type 2 - support the books by one arm and hand at the side of
the body, with the long edges of the books approximately
horizontal to the ground.
– Observed 2,626 individuals in six colleges in the US,
one in Canada, and one in Central America.
• 82% of females used type 1
• 96% of males used type 2
• ordinal
• A higher scale value has more of that attribute.
– The intervals between adjacent scale values are
indeterminate.
– Scale assignment is by the property of "greater than,"
"equal to," or "less than."
•
•
•
•
Movie ratings (0, 1 or 2 thumbs up).
U.S.D.A. quality of beef ratings (good, choice, prime).
Gold, Silver, Bronze - olympics
The rank order of anything.
• Adds the arithmetic operation of greater than
and less than
• Interval
– Intervals between adjacent scale values are
equal
• the difference between 8 and 9 is the same as the
difference between 76 and 77.
– Standard operations of addition,
multiplication, subtraction, and division can be
performed
• Fahrenheit and Centigrade
• Ratio
• There is a rational zero point for the scale.
• Ratios are equivalent
– e.g., the ratio of 2 to same as the ratio of 8 to 4.
• Examples
– Degrees Kelvin.
– Annual income in dollars.
– Length or distance in centimeters, inches, miles, etc.
• Allows Multiplication and division of scale
values.
• Ratio and interval scales tend to be
preferred because the statistics that can
be used on them tend to be more powerful
– t-tests and Anova
– vs. Chi Square
• Evaluating Operational Definitions
– Reliability vs. Validity
• Precision vs. Accuracy
• Throwing Darts - or Sighting a gun
• Forms of Reliability
– Reliable definition of your I.V.
• let’s say you hypothesize that hungry rats will learn
a maze more quickly (when food is the reward)
than rats that aren’t hungry.
• What is the IV?
• What is the DV?
• Hunger is a hypothetical construct
– how can we make a reliable operational definition of this
construct?
• Reliable operational definitions of your DV
– How can we be certain that our measurements
always come out the same?
• This can be examined in multiple ways
– 1) Inter-rater reliability
• Do different observers agree with the measurement
• have 2 people make the observations and see if their
observations are correlated.
– should be high positive correlation
• 2) Test-Retest reliability
– If the same individual is measured twice with the
same instrument – you should get similar results
• GRE
• IQ
• 3) Inter-item reliability
– individual items on a questionnaire designed to
measure the same thing should be highly correlated
• Cronbach’s Alpha you will calculate for the survey project
• Validity
– Are we manipulating the variable we intend
manipulate?
• Anxiety example Schacter vs. Zimbardo
– Are we measuring the variable that we intend
to measure?
• IQ test – does it really measure intelligence?
Forms of validity
• Some Types of Validity
• Face validity
– The degree to which a manipulation or
measurement is self-evident
– Does it look like it should measure what it intends
to? Does it just make sense?
• Heart rate and anxiety?
– This is a weak form of validity.
• May be a problem if face validity is low though
– subjects might not take it seriously
• Content validity
– The degree to which the content of a measure
reflects the content of what is measured.
• Students often question the content
validity of exams
– “You only asked me about the things I didn’t
know.”
– “You didn’t ask me about the stuff I did know.”
• Predictive validity
– The degree to which a measure, definition, or
experiment yields information allowing prediction
of actual behavior or performance
• If people do well on the IQ test and it is really
getting at intelligence, we would expect them
to do well at other tasks that require
intelligence.
– School
– Other tests
– Difficult jobs
• Construct validity
– Is the entire measure valid?
– The degree to which an operational definition
accurately represents the construct it is
intended to manipulate or measure AND
ONLY that construct
Construct validity continued
• Does the test discriminate between constructs
– Lets say I designed a new test to predict how well
adjusted people are
• The Kaiser attitude adjustment survey
– but it also measures verbal aptitude?
• In other words people’s verbal aptitude score is highly
correlated with their Kaiser attitude adjustment survey score
– Problem?
• Evaluating the validity of the experiment itself
– The forms of validity that we have discussed so far
have been in terms of defining our variables
• operational definitions
• We also need to determine whether the
experiment itself is valid
– Internal validity
• The degree to which a researcher is able to show a causal
relationship between antecedent conditions and subsequent
behavior
– External Validity
• How well the Findings of the Experiment generalize or apply
to situations in real life
• Experiments tend to be high internal validity and
low in external validity
– contrived (controlled) situations
• rats in cages exposed to cocaine
– far from real life human cocaine use.
• Keep in mind though that an experiment that is
low internal validity (confounded) also has low
external validity.
– The experiment can tell us nothing because it is
confounded.
Confounding
• Experiments are most concerned with achieving
high internal validity.
– External validity is less of a concern.
• Internal validity
– Occurs when we are relatively certain that the
changes in the DV were do to the levels of the IV
– if there could be other reasons for the changes, then
our experiment is not internally valid.
• These other reasons (if they exist) are known as confounds
Extraneous Variables versus Confounds
• The better control we have over an experiment
the more likely our internal validity will be good.
– But – there are often other things that are changing
during an experiment.
• If these other things (variables) are outside the
experimenters control we call them extraneous
variables
– Could be anything.
•
•
•
•
changes in noise
changes in temperature
equipment failure
differences in subject preparation (some more alert than
others)
• Extraneous variables should always be
avoided if possible
• but they are not a threat to the internal
validity of the experiment unless they vary
systematically with the independent
variable.
– When this occurs it is known as a confound
• Thus, a confound is a type of extraneous variable
• Extraneous variables are not necessarily
confounds
• Confounds are the worst form of
extraneous variable
– If an extraneous variable varies systematically
with the independent variable we cannot know
if the changes we see in the dependent
variable are due to the IV or the confounding
variable.
• we cannot infer causality
• thus, low internal validity
• Extraneous variables alone are not necessarily
confounds and won’t necessarily be a threat to
the internal validity of an experiment
– Alzheimer’s rat experiment
• Morris water maze
–
–
–
–
old rats
young rats
Drug
no drug
– What is the IV, DV, Subject Variable?
• Jack hammering going on in adjacent building
– All rats experienced the jack hammering
• Extraneous or confound?
• Keep in mind extraneous variables are not good even if they are not
confounding.
– Makes it more likely you will make a Type II error
• Type II error
– Not finding an effect that is actually there
– failing to reject the Null hypothesis when it is actually false
– Because extraneous variables increase the variability of responding
(behavior) it makes it more difficult to find significant effects
• thus, it lowers our power.
• Extraneous variables alone, however, do not cause Type 1 errors
– Finding an effect that is not really there
– Rejecting the Null hypothesis when it is true
• Confounds can lead to Type 1 errors
• It would have been a confound if I had only ran
the control rats during the jack hammering, and
the drug exposed rats when it was quiet.
• I didn’t do that, but the jack hammering could
have increased the variability in responding to
make it more difficult to find a difference
– If I find a difference despite this extraneous variable I
would not be concerned about the internal validity of
the study.
– I found an effect under poor conditions.
8 classic threats to Internal Validity
• 8 classic threats to Internal Validity
– Donald Campbell identified 8 kinds of
extraneous variables that can threaten the
internal validity of experiments
• Each of these is a potential confound
• 1) History (External Events)
– Refers to the history of the experiment
• Changes in the subjects environment that might affect the
results
– What if during testing of a new anxiety treatment
• The 9/11 attacks take place
– What if while testing treatments for asthma
• summer smog occurs in the city
– 1970s when head start began
• Sesame Street also began
– proper controls can eliminate this issue
• 2) Maturation
– refers to any internal (physical or psychological) changes in
subjects that could affect the dependent measure
• Performance could get worse
– boredom
– fatigue
• Performance could get better
– Within subject designs always have to worry about this
• Performance at the end of a session could be better or worse
because of the above reasons
• Make sure your IV conditions are evenly distributed across time
– A classic issue in medical research is spontaneous recovery
• people sometimes just get better over time.
• 3) Testing
– This refers to effects on the DV produced
because of previous administration of the
same test.
• single group pretest/posttest design
– Might think about initial test, figure things out they wish
they had done differently.
• GRE/Kaplan/GRE
– increase in performance could be the result of taking the
test previously
– Practice effect
• 4) Instrumentation
– When some feature of the measuring instrument itself
changes over the course of an experiment
– Obviously if your equipment breaks that would be a
problem
– What if using human assessment?
• train someone to observe behavior
– aggression
• They could get better at observing aggression through
practice
– If one group is assessed early in the semester and one late in
the semester could be a problem (instrumentation confound)
• 5) statistical regression
– This can occur whenever people are assigned to conditions
based on extreme scores on a test
– Regression toward the mean
• Draw the normal curve
–
–
–
–
–
IQ – look at the smartest people
make some manipulation
Test them again
likely to score more toward average
The scores will tend to decrease or regress to the mean regardless of
manipulation
– Tony Delk nine 3-pointers
– An early study of Sesame Street indicated that it was especially
helpful to the most disadvantaged kids
• Kids with the lowest scores improved the most
• 6) Subject Mortality
– If more subjects drop out of one condition compared to another
– Can be a clear problem in animal studies
• high dose of cocaine – 50% mortality
– dealing with a “super” subgroup of rats
– in human studies
• dropout of experimental condition because aversive or boring
– those that stay are a special subgroup of people
– not the same as the control group
– Children in head start that are having trouble might drop out
• leaving only the better students
• thus, higher scores
• 7) selection
– Whenever the researcher does not randomly assign
to the different conditions of the experiment
• When we randomly assign to groups we spread individual
differences among our subjects across the conditions
• without random assignment individual differences could
affect the results of our study.
– Self selection bias
• when participants choose which group they will be in
– perhaps parents choose to place their kids in head start
because they are more motivated for their kids to learn
•
8) selection interactions
–
•
Selection & History
–
•
Perhaps observers in control group get bored and less likely to pick up on kids improvement
Selection & Mortality
–
•
Head start kids are more motivated, so they figure out trick questions better leading to
improved performance on the second test
Selection & Instrumentation
–
•
Head start kids are more motivated and thus more likely to develop skills faster than the
control condition
Selection & Testing
–
•
Head start kids are more motivated (parents chose program) and more likely to watch
Sesame street when it came on
Selection & Maturation
–
•
A selection threat can combine with another threat to form a selection interaction
Perhaps you lose more motivated (good) students from the control group to other preschool
programs
Selection & Regression
–
If kids that are the furthest behind are assigned more often to the treatment group – they
would show greater improvement
• Work on CONFOUND WORKSHEETS
Download