For Dr

advertisement
Research Methods II Notes
-
-
-
-
1
Scientific Thinking &
Psychological Science
Scientific Approach is preferred if interested in causality
many ways of fixing beliefs- faith, scientific approach
scientific approach best if looking at whether something causes something else
Historical Examples where people have failed to use the scientific method:
a) Benjamin Rush (1793)- yellow fever research and bloodletting
thought it was caused by too much blood- he saw many people die
he didn’t look for contrary evidence
if he drew blood and it didn’t work- he didn’t draw enough blood or people didn’t come to
him soon enough- he couldn’t lose
b) Joseph Goldberg - Pellagra
looks at the difference between correlations and true causation
pellagra is a disease that is characterised by open sores, vomiting, runny nose- it was
causing 100,000 deaths in the us per year
most physicians said that the people who got pellagra lived in poor areas- open sewers,
etc.
Goldberg didn’t believe it- he thought it was due to the poor diet that the same poor
people were living on- couldn’t afford to eat healthy
he did an experiment: none of them got pellagra when they ingested liquid feces and
blood, urine, etc.- it isn’t a bacteria
so he thought he was right- the protein thing, so he took volunteers from a prison to go
on low protein diets- they developed pellagra- it showed that it actually was low protein
Characteristics of a
Scientific Experiment
 Manipulation (of the independent variable)
 Control (of extraneous variables - potential confounds)
 Measurement [of the dependent variable(s)]
 Comparison (of the measurements with appropriate statistics)
there will be appropriate and inappropriate statistics in every study
Scientific Observations
Two rules:
1. Operational Definitions: things you are measuring must be operationally defined
2. No Distortion: no distortions introduced by you when you are doing your measurements

-
-
Why operational definitions?
avoids confusion - do we all mean the same thing when we talk about personality
or intelligence?
makes study reproducible it’s ok to have mistakes; the fact that we use statistics
ensures we might make a mistake (alpha levels)
makes measurements reliable(the same results every time you do the study)
But
trade off between precision of an operational definition and it’s construct validity
talking to someone in the street the topics would be more broad
if you said “does pornography influence negative views of men towards women?”- you
would have to operationalize what you are talking about, but it’s interesting that on the tv
or in discussions people will have long discussions without operationalization
Research Methods II Notes
2
Sources of Distortion?
a) Instruments: when they were first looking at planets the cameras showed that the
planets had these coloured rings around them but it was just a problem with the camera
b) sampling: it is impossible to get a random sample of people, you will always have some
sort of biases
c) observer bias: big problem; you tend to see what you expect to see
-
-
Observer Bias
Two historical examples
 Rene Blondlot’s N’Rays
he was one of worlds predominant physicists
french
u of nancy
shortly after x-rays discovered
he wanted to know whether x-rays could be polarized- he thought he discovered a
new ray called N’rays
he created special materials and everyone else could see the N’rays by certain methods
one guy wasn’t able to replicate Blondlot’s findings and also the review board of Nature
didn’t believe it
Wood took out the aluminum prism (the critical piece) and it still worked, and put it in
when they thought it wasn’t,
 Clever Hans
horse that could answer mathematical problems
but what he really knew was how to read people- they would look a different way when
he was close to answer
Current examples
- Female mate choice in wolves they didn’t believe it before…men doing research,
biases
 Alex - talking parrot
 Koko - the talking chimp: uses American sign language; iq between 70-95; 2000
words
"you can only see what you believe" anonymous
Flotsam and Jetsam Assignment
(stuff to watch for when surfing)
 challenge the Scientific basis for claims, not the truthfulness of the claim
proper attribution of causal relations (control for placebo effects?)
Flotsam and Jetsam assignment (stuff to watch for when surfing)
- challenge the scientific basis for claims, not the truthfulness of the claim
- proper attribution of causal relations (control for placebo effects?)
- look where people got phd’s from
- data is not always information which is not always knowledge
Research Methods II Notes
3
Characteristics of a
Scientific Experiment
•
Manipulation (of the independent variable)
- is the manipulation done within subjects or between subjects
• Control (of extraneous variables - potential confounds)
• Measurement [of the dependent variable(s)]
• Comparison (of the measurements with appropriate statistics)
-
Hypothesis generation
the hardest part can be coming up with the right question
• interested in amnesia (e.g., read about “lost Mariner” - Oliver Sacks)- interviewed
someone with no memory- guy in the prairies with a memory of about 1 second
• read “the man who mistook his wife for a hat”
• effects of “shock” on memory – evidence of shock on memory
(electro-convulsive therapy - ECT)
• Wonder if an emotional shock will cause retrograde amnesia (forgetting things from
before the incident)?
•
•
•
•
Converting question to an analytical experiment
operational definitions
subject selection
subject assignment
variety of “mechanical details”
Experimental Hypothesis
• An emotional shock will disrupt memory for events that occur immediately prior to
occurrence- will influence information you received a few seconds before the emotional
shock
• Which words require operational definitions?
• An emotional shock will disrupt memory for events that occur immediately prior to
occurrence.
• Final, operationally valid version:
An unexpected 15-second scene that portrays a mutilated body at the end of a 10 minute
travel film will disrupt a person's memory of the price of 10 items listed in the film during
the one-minute period immediately preceding the final scene
- you could come up with other valid operationally defined versions of the same question
Why operational definitions?
•
•
•
avoid confusion
makes study reproducible
makes measurements reliable- redo and get similar results
But
trade off between precision of an operational definition and it’s construct validity
Research Methods II Notes
•
•
•
•
4
Converting question to an analytical experiment
operational definitions
subject selection
population
sampling
subject assignment
variety of “mechanical details”
Using Introductory Psychology Students as subjects (participants)
1. Conclusions based on undergraduate subjects are not wrong, just require further tests to
ensure generalizability. If the results don't replicate, still not wrong, just incomplete (once
you have the results from the undergraduate student data you can compare that data to
other groups like “retired military people”)
2. Much of psychological research is so basic subject sampling is irrelevant.
3. There is considerable replication among different undergraduate populations. This gives
geographical and SES variability that helps support generalizability. (you can generalize
to other university student populations at other universities that might differ in such ways
as whether or not they have to pay for university, etc.)
•
This is a legitimate problem, but it is often overstated.
•
•
Converting question to an analytical experiment
operational definitions
subject selection- undergraduates, volunteers (can also have problems)
population
sampling
subject assignment
variety of “mechanical details”
•
•
•
•
Converting question to an analytical experiment
operational definitions
subject selection
subject assignment
variety of “mechanical details”
•
•
Final Experiment
20 intro psychology subjects, 10 randomly assigned to each group- bivariate experiment
(one group will see the shocking material, one won’t)
- with random assignment, people have equal chances of being in each group
- random assignment might be a confound because you will most likely get about 15
females or more and then males
- you could end up that you get all males in one group which could be a confoundproblem with small groups, but we will do random assignment anyway
Experimental group
•
•
Male experimenter
written instructions
Research Methods II Notes
•
•
•
•
•
-
•
•
•
•
•
•
•
5
10 min travel film
1 o’clock testing
group testing
15 s final scene version ‘a’ - mutilated body seated in chair (operational defn’ of
emotional shock) - I.V.
Measure memory - D.V.
you should make these things consistent- fast and easy is always good also to keep in
mind
Control group
Male experimenter
written instructions
10 min travel film
1 o’clock testing
group testing
15 s final scene version ‘b’ - craftsperson weaving a basket - I.V.
Measure memory - D.V.
YOU WANT TO STATISTICALLY COMPARE DIFFERENCES IN MEASURED MEMORY
Loftus and Burns 1982
Loftus, E., & Burns, T. Mental shock can produce retrograde amnesia. Memory and
Cognition, 10, 318-323.
- she used a video that told bank tellers what to do in case of a robbery
- the violent version- boy being shot in face
- nonviolent version- bank teller telling people to remain calm
- she also took confidence ratings- most of data shows that confidence isn’t really related
to accuracy
- she found that most of the things in the video were remembered less well than in the
nonviolent
- what was the number on the jersey?
Remember model of human memory
-
-
•
•
does the info not get into long-term memory?
Is the information there but we can’t bring it back into working memory?
they might not know the number on the jersey because it was never coded or that it was
but you can’t retrieve it- hard to know
she redid the experiment using a multiple choice question- to see where people are
being affected- she used another nonviolent condition of staying in the alley (got same
results as going back into the bank)
had another control of couple walking on the beach at the end to see if it was surprise
that actually had memory loss
Difficulties with Probabilistic Reasoning
(from Stanovich: How to think straight about psychology)
1. Salience of atypical cases (“man-who statistics”)
e.g., Hamil, Wilson & Nesbett (1980) prison guard exp.
2. Insufficient use of probabilistic information
e.g., Bayes theorem
Research Methods II Notes
•
•
•
•
•
3.
4.
5.
6.
7.
6
Cognitive illusions
Failure to use sample size information
Tendency to explain chance events
Gambler’s Fallacy
Conjunction Fallacy
January 14, 1999
Characteristics of a scientific experiment
Manipulation (of the independent variable)
Control (of extraneous variables- potential confounds)
Measurement (of the dependent variable(s))
Comparison of the measurements with appropriate statistics
Difficulties with probabilisitic reasoning
(from stanovich: how to think straight about psychology)
1. Salience of atypical cases (“man-who statistics”)
i.e. Hamil, Wilson, & Nesbett (1980) prison guard exp. Media
- you may know people that violate the rules
- depending on how that information is presented, that the salience of the weird cases can
override the main way of the world
- statistical summaries of relevant information is less effective in changing opinion
than one face-to-face opinion
- prison guard experiment: either had an abusive guard (talking about how prisoners
should be punished) and then a nice guard
- they were told that whatever the video they looked at was the typical guard or they
were told that the guy they saw was very unusual in his attitudes- you have a 2 X 2
factorial design (four different groups)
- the main finding was that when asked what guards were typically like they all talked
about the video we saw (even when they were told that what they saw wasn’t even
typical)
- students choosing courses?
- If you have a friend that hated the course but everyone else liked the course, most of
us will be influenced by the friend
- Vietnam- weekly fatality statistics (200-300); Life magazine
- every day they would report the death rate like the stock market, people passed off
the daily summary of stocks being up and down
- life magazine ran a photo spread of as many of the two hundred soldiers that had
died the week before- it looked like a yearbook
- some people have argued that this was the turning point in the anti-war movement
people became outraged that 250 of their American men were dying over in
Vietnam- they had already heard the numbers for years, but they hadn’t seen it
(people instead of numbers)
- the anecdotes used in media (politicians, etc.)
- politicians will tell little anecdotes like the welfare mom going to the spa with the
coloured television, etc.- everyone will think “cut welfare!”
- no images of the gulf war- no pictures of Americans dying; weeks before the
americans this woman talked about how she was an eye-witness to Iraqis going in
Research Methods II Notes
7
and killing children in the hospitals- this was a major catalyst for us going to war (it
ended up that she was lying)- came out after us already in the war
2. Insufficient use of probabilistic information (e.g. Bays theorem)
In a city where 15% of the taxi cabs are blue and remaining 85% are green, the only eye
witness to a hit and run accident involving a taxi says the taxi involved was blue. She has
been shown to be accurate 80% of the time in making such identification.
What is the probability that the taxi was in fact blue, given that she said it was?
Bayes Theorem
Assuming there is no reason for one taxi colour to have been in an accident than the other:
base rate probability of the taxi being blue =.15
she would identify it as such 80% of the time: .8*.15=.12
base rate probability of the taxi being green =.85
she would erroneously identify it as blue 20% of the time: .2*.85=.17
proportion of taxies identified as blue that actually were = .12/(.12+.17)=.41
Bayes Theorem (another example)
- Casscelss et al. (1978): Harvard Medical School
- suppose there is a disease that occurs at a rate of 1/1000
- a cheep safe test is developed that has a false positive rate of 5% (falsely identifying
someone as having the disease when they do not)
- this test becomes routinely done
- you, with no symptoms of the disease test positive during a routine screening
- what is the probability that you actually have the disease?
- turns out to be 1/51 or 2% chance you have the disease (p=0.0196)
3. Cognitive illusions (the line test)
- see handouts
Group A:
Question 1: how many chose option a (route 1) option b (route 2)
Question 2: how many choose option a (sure gain) option b (80% chance to gain)?
Group B:
Question 1: how many choose option a (route 1) option b(route 2)
Question 2: how many choose options a (sure loss) or option b (80% chance to lose)?
-
the questions are the same but the wording is different- people can be manipulated
depending on how information is worded
one of the characteristics of humans is that we tend to be loss-adverse; we don’t like to
loose something that we already own
4. Failure to use sample size information
- people do not consider sampling information that much
- sample size example: in which would you expect to have more days during which over
60% of the babies born are female? (large or small hospital)
- it is the small hospital: you are more likely to get extreme results in smaller samplesmore likely to be biased
Research Methods II Notes
-
8
in which cases is it more likely that more coin tosses will be heads- two or 10 times?
(two times)
5. Tendency to explain chance occurrences
- stock market analysis
- illusion of control (lottery tickets- picking numbers, books, etc.)
- people spend forever trying to pick numbers
- the chance of someone winning twice was 1/billions
- personal coincidences: we like to explain these, we tend to ignore all the noncoincidences when they happen
- everyone is connected- seven degrees of separation
- wouldn’t is be, like, really weird karma if, like, two people in this class had the same
birthday?
- bet that two people in this class were born on the same day of the year (not necessarily
the same year)?
- In a group of 23, chances are 50/50
- coincidences do not need explanation (you forget that everyone else you met didn’t have
your same birthday when you meet someone who does)
6.
-
-
Gambler’s fallacy: tendency for people to see unrelated things as being related
if you have T, H, H, H, H, H, ____? (coin toss)
people will say more likely to be tails
the probability is always equal, however
we like to explain chance events
one of the places this happens is that in sports: if they are making shots in a row, the
commentators say they are hot, etc.- but in reality it is the same chance that they will
either get it in or not
Stanovich’s point is that we are not statistical thinkers, but we are pattern thinkers
Linda
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy and
was an activist in anti-war movement.
Which of the following statements is most likely true about linda?
- examples like “Linda works in a bookstore and takes yoga” and things like “she is an
insurance salesperson” and “she is a feminist”, “she is a bank teller” and “she is a bank
teller and active in the feminist movement”
- when you look at the conjunction fallacy (Linda- representative heuristic)- the idea that is
less likely to be the last one- the overlap of two things can’t possible be more likely than
either one by themselves
- we are wired to pick things like that
Stanovich: while many scientists sincerely wish to make scientific knowledge accessible to
the general public, it is intellectually irresponsible to suggest that a deep understanding of
the particular subject can be obtained by the layperson when that understanding is crucially
dependent on certain technical information that is only available through formal study
-
such is the case with statistics and psychology
no one will be able to teach me reasonably why the earth is so old…half lives, etc.
Research Methods II Notes
-
9
he says that the study of behaviour and thinking is the same- statistical knowledge is
taught by formal knowledge and psychology is understood by statistics
Innumeracy (John Allen Paulos)
talking about numbers
one of the simplest parts of innumeracy is the problem people have with large numbers
“thousands and thousands of them…almost 300”
we are faced with numbers all the time
we hear numbers and use them but not too many of us have a good understanding of 5,
10, 100- 5 billion?
- 1 million seconds…..11.5 days
- 1 billion seconds…..32 years
- modern homosapiens- 10 trillion seconds old (10, 000, 000, 000, 000 seconds)
Stock market scam:
- I want to invest stamps 32, 000 letters (predicting stock changes- 50:50)
- I would have been right half of the time- 16,000 letters and keep going until you have
500 people who are playing the stock market who have gotten 6 correct
predictions…next thing you send out is – for 500 dollars I will send you my next
prediction….you would end up with 250, 000!
-
-
it is not easy to understand probabilistic reasoning
Inferential Statistics (Ch. 12)
Used to infer the characteristics of the population
Some terminology:
Parameters - a characteristic of the population
Statistic - a characteristic of your sample
Parametric statistics estimate values of the population from characteristics of a sample
These estimates are based on the following assumptions:
1. the population values are normally distributed
2. interval or ratio measurements are made
Nonparametric statistics make no assumptions about the distribution of scores in your
sample and can be used with ordinal or nominal data
Inferential Statistics are used to estimate the probability that observed samples come from
the same population. You calculate the observed value of the statistic (based on sample
data) and compare this to critical values (or with computer programs, you are simply given
the probability that the calculated statistic (or an even unlikely one) occurred by chance.
Parametric Inferential statistics make assumptions that:
1. scores have been sampled randomly
2. the sampling distributions are normal
3. the within-groups variances are homogeneous
violations of these assumptions will bias the test
Research Methods II Notes
-
10
Inferential Statistics
this is some of the information from chapter 12
used to “infer” the characteristics of the “population”
some terminology:
- parameters: a characteristic of the population
- statistic: a characteristic of your sample
Parametric statistics
- estimating values of a population from characteristics of a sample
- these estimates are based on several assumptions
1. the population values are normally distributed
2. interval or ratio measurements are made
3. samples were drawn randomly from the population(s)
- this is a nice theory
- we often violate this
- sometimes questionnaires that are treated as interval are really just ordinal
- Beck depression inventory- the difference between a score of 5 and 8 is not really what
you think
4. the samples have equal variances
Nonparametric statistics
- these statistics don’t make assumptions about the distribution of scores in your sample
and can be used with ordinal or nominal data
- we have another set of tools for when we don’t have interval or ratio data
Characteristics of Psychological Science
(Summary of Stanovitch)
1. Psychology progresses by investigating empirical problems.
2. Psychologists propose testable hypothesis. (Freud wouldn’t be a psychologist or a
scientist- he was a medical doctor (a neurologist)
-
-
-
Falsifiability and Folk Wisdom
people don’t have a lot of ideas about human behaviour but many people use folk
wisdom- proverbs come up
“decisions should be made only after consideration of all the alternatives”
folk wisdom has “look before you leap” and also “he who hesitates is lost”- they cover all
the basis
“out of sight, out of mind” but also “absence makes the heart grow fonder”
- these proverbs will explain human behaviour
there is a lot of talk in education about co-learning and working together (two heads are
better than one)
- money and policies directed toward this- i.e. in research you might not get money
unless you have a partner for collaborative research
- but you also have the proverb- “too many cooks spoil the broth”
“opposites attract” and “birds of a feather flock together”
- do people seek out people who are similar to them- this seems to be more researchbased
the point of these examples is that folk wisdom tends to be contradictory- you can have
little proverbs to explain almost anything
Research Methods II Notes
-
11
it would be like the thinking of Benjamin rush with the bleeding example- it is not
falsifiable- the theories can’t be proven wrong (if they don’t get better then you didn’t
bleed enough…)
Folk Wisdom & the Benefits of Work Experience for Youth
- many folk beliefs of the benefits of youth getting work experience
a) the money can be used to pay for future education (this theory has a lot of face validityyou will be earning money instead of goofing off)
b) the experience will help develop a work ethic
c) working will instill respect for the economy (learn about business, appreciate the value of
a dollar)
d) having been in the “real world” working youth will become more motivated students so
they don’t have to flip burgers for the rest of their life
a)
b)
c)
d)
-
Psychological Studies on the Effects of Work Experience for Youth
earnings are spent on luxury items
workers become more cynical and less respectful of work
employment has harmful effects on education
it turns out that the workers do less well
working also seems to promote some delinquent behaviour
the folk arguments make sense to other people- good face validity, but they really don’t
pan out
Characteristics of Psychological Science
(Summary of Stanovitch)
3. The concepts in these hypotheses are operationally defined.
- i.e. it should not be illegal to be in possession of child pornography because it violates
the charter of human rights and freedoms
- the more severe forms of pornography have been associated with bad human
behaviour, but there must be clear operational definitions of child pornography
- child pornography has to be operationally defined
4. Psychologists use many different empirical methods.
5. Most conclusions are arrived at only after slowly accumulating data from many
experiments. (people want the “ureka!” people who’ve “found the answer” but this does not
happen)
6. The behavioral principals eventually uncovered at are almost always probabilistic.
7. Psychological data and theories are only acceptable after publication in peer
reviewed scientific journals. you can publish any nonsense you want- as long as you pay
them to publish whatever you want)
- we have to pay to get the journals of our own data…?
Stanovich
While many scientists sincerely wish to make scientific knowledge accessible to the general
public, it is intellectually irresponsible to suggest that a deep understanding of a particular
subject can be obtained by the layperson when that understanding is crucially dependent
on certain technical information that is only available through formal study.Such is the
case with statistics and psychology.
Characteristics of a Scientific Experiment
•
Manipulation (of the independent variable)
Research Methods II Notes
12
•
•
Control (of extraneous variables - potential confounds)
Measurement [of the dependent variable(s)]
• Comparison (of the measurements with appropriate statistics)
- we are finally on “comparison” (of the measurements with appropriate statistics)
Hypothesis Testing
- in truth, there could be:
1) a true difference between populations sampled (for instance, the video of memorythere is a true difference in memory recall when you are emotionally shocked)
- but all we have are samples of the populations
- we have a set of data from experimental and control groups and that data is differencethey are differences in samples because there are differences in populations
2) no difference (samples came from a single population)
- maybe the two results would be the same for both groups in theory- from the population
- but just say you have a sample where you end up with a difference when everyone was
taken from the same population
- you have to decide whether the samples came from a single population or two separate
populations
- based on your samples, you have to decide which of these to believe, make your best
guess
- four possible outcomes:
a.) you guess (based on your samples) that the observed difference is real, and be correct:
Correct rejection of the null hypothesis
b.) guess (based on your samples) that the observed difference is real, and be wrong:
Type I (alpha) error
c.) guess (based on your samples) that the observed difference is not real, and be correct:
Correct failure to reject the null hypothesis
d.) guess (based on samples) that there is no difference between the groups and you are
wrong: Type II (beta) error
- know table with four things of reject or do not reject Ho with Ho is true or Ho is false
- we can put values to how often we will be wrong or right
- the key to all inferential
statistics is the ratio between
the between-group (between
sample) variance (difference in
group means) and the withingroup variance
- imagine there is no overlap
from the two populations- if
you get a biased sample
that is high and another that
is unusually low- you could
say that this confirmed
- we want to look at the
variance between groups
and the variance within
groups
Research Methods II Notes
-
Inferential Statistics
A key aspect of science is that it deals with testable hypotheses
The key tool for testing these hypotheses are inferential statistics
Inferential Statistics can be either parametric or non-parametric
-
statistics that assess the reliability of your findings are called inferential statistics
-
-
-
-
-
13
Inferential Statistics
the key to all inferential statistics between the between-group (between sample)
variance (difference in group means) and the within-group variance
or, the extent to which the two sample distributions overlap
different from descriptive statistics which were calculated on you sample distribution
(such as the mean and standard deviation)
descriptive statistics may or may not agree with the population values- i.e. the mean of
the sample may or may not agree with the mean of the population (thus inferential
statistics are used – t-test, etc. which looks at the sample as coming from the population
and the chance that your result happened by chance, etc.)
“sampling distribution of the mean”: variance in the means of all possible samples of
a given size from a population
- this distribution has well-defined characteristics- normally distributed and the mean of
the sample distribution is the same as the mean of the population from which the
samples drawn from
- central limit theorum says that even if the population is not normally distributed, the
distribution of sample means will tend to be normal- you can make statements that
depend on normality using the samples even if the population is not normally
distributed
standard error of the mean
(or “standard error”):
estimate of the amount of
variability in the expected
samples means across
series of samples (standard
deviation of sample divided
by the square root of sample
n)
degrees of freedom: for a
single sample will always be
n –1 (because if you have
say 10 scores and a known
mean, once you have
selected 9 scores the value
of the 10th can not vary)
in analysis of experiment it will be A –1 (where A is the # of levels of the independent
variable)
inferential statistics are either “parametric” or “nonparametric”
- parameter: characteristic of a population
- statistic: characteristic of sample
- parametric statistic: estimates the value of a population parameter from the
characteristics of a sample (when you use a parametric statistic, you are making
certain assumptions about the population from which the sample was drawn- i.e. key
Research Methods II Notes
-
-
-
-
14
assumption of parametric test is that sample was taken from a normally distributed
population)
- nonparametric statistics used if data does not meet the assumptions of a
parametric test
each sample statistic provides an independent estimate of the population parameter (i.e.
Xbar is an estimate of u)
- if two means for example were drawn from the same population you would expect
them to differ only because of sampling error- you can calculate the probability that
the two sample means would differ as much as or more than they do simply because
of chance factors (this probability is the obtained “p”)
- i.e. in an experiment where
a drug would actually have
no affect on subjects, you
can work out what the
chance would be of getting
the two means you did by
chance
- as a researcher, you don’t
know if a drug will have an
effect or not- just say you
get the following results:
you must decide whether the two
sample means were drawn from
the same population (the
treatment had no effect on the
sample scores) or from two different populations (the treatment shifted the scores
relative to scores from the control group)
there are many possibilities- see hypothesis testing table
hypothesis that means drawn from same population: null hypothesis
hypothesis that means were drawn from different populations: alternative hypothesis
inferential statistics measures how likely it would be to get the different sample means by
chance if they really did come from one population- if the probability is small then the
difference between the sample means is said to be “statistically significant” and null is
rejected
know hypothesis testing table: four options
you calculate an observed value of a statistic and compare it to a critical value on a table
(make decision based on whether or not observed value meets or exceeds critical
value)- want to reduce possibility of committing a Type I error
- probability of committing type I error dependent on the alpha level you chose
- alpha level: represents the probability that a difference at least as large as the
observed difference between your sample means could have occurred purely
through sampling error (smaller you make alpha, less likely you are to commit type I
error)
- significance level: particular level of alpha you chose
- if the obtained p is less than or equal to alpha, your comparison is statistically
significant- REJECT NULL
One tailed vs. two tailed
- one tailed test is when you have a directional hypothesis
- two tailed test is when you do not have a directional hypothesis (just that something will
produce a change in either direction of something else)
Research Methods II Notes
-
-
15
in this test if you had .05 then each critical region (portion of sampling distribution
within which observed values will be statistically significant) must contain 2.5% of the
cases
implication: you must obtain a greater difference between means of groups in either
direction to reach statistical significance if use a two-tailed test than a one-tailed test
Parametric Statistics
- three assumptions for parametric inferential tests:
a) scores have been sampled randomly from population
b) sampling distributions of the mean is normal
c) variances between groups are highly similar
- the t-test
- used when experiment includes only two levels of independent variable
- “t-test of independent samples”: used when you have data from two groups of
participants and those participants were assigned at random to the two groups
- has two versions- pooled (computes error term based on the two samples
combined under the assumptions that both samples come from populations
having the same variance) and unpooled (computes an error term based on the
standard error of the mean provided separately by each sample)
- “t-test for correlated samples”: used when two means being compared come from
samples that are not independent of one another- produces a larger t value than the t
test for independent samples when applied to same data if scores from two samples
are at least moderately correlated
- matched-groups and within-subject designs use this
-
-
-
-
Factors Contributing to Group Mean Differences
strength of the I.V.’s influence on the D.V.
- you are hoping that you don’t make type II error
level of the treatment of the I.V.
- just say you have a conclusion but because the level of treatment was wrong your
conclusion would actually be wrong
sensitivity of operational definition of the D.V.
- you must have a sensitive measure or you might not be able to detect the influence
- just say you were testing a drug: there could be altered mean differences because of
the strength of the iv or because of the level of treatment you give them (maybe you
didn’t give them enough)
subject differences: if you were doing a memory test and you had good memory
people in emotional shock group- you might not find anything even though your
hypothesis was right
Factors Contributing to Within-Group Differences
degree of control of extraneous variables
subject differences- if everyone has the same tolerance for emotional shock for
instance, you will probably get pretty consistent results
sample size
Ways of Increasing Power
What can you do?
Research Methods II Notes
-
16
note that it is easier to control the
factors that influence between
groups variance than the factors
that determine within group
variance but some factors that
reduce within group variance
also tend to decrease external
validity
Power
- power is the probability of
rejecting the null hypothesis
when it is false
- more precisely: power is the
probability of rejecting the null
hypothesis AND correctly
identifying a TRUE alternative
hypothesis (avoiding type III
error)
- inferential statistics designed to
help you determine the validity
of the null hypothesis- you want
statistics to detect differences in
your data that are inconsistent
with the null (the “power” of a
statistical test is its ability to
detect these differences)
- power is statistic’s ability to correctly reject the null hypothesis
- issue of power is important: rejection of null implies that your independent variable
affected your dependent variable- want to make sure that you aren’t incorrectly rejecting
or correctly rejecting null due to power of test and not the real reasons
- power is affected by…
chosen alpha level: as you
reduce alpha level you reduce
probability of making a type I
error- also reduce power
unfortunately as it is more
difficult to reject the null
hypothesis (you need a larger
difference between means to
reject it)
size of sample: power
increases as size increases
because larger sample
provides more stable
estimates of population
parameters
whether you use one-tailed or two-tailed: two-tailed test is less powerful than onetailed test
- size effect produced by your independent variable:
Research Methods II Notes
17
“effect size”: degree to which manipulation of independent variable changes the
value of the dependent variable
- effect size estimates the amount of overlap between the two population
distributions from which samples were drawn- large effect sizes indicate little
overlap (greater power)
too much power can be as bad as too little power
-
-
-
Types of Inferential Statistics
there are many…very many!
Each is intended, and appropriate, for use in certain specific conditions that depend
upon:
- Scale of measurement: nominal, ordinal, interval, ratio
- Shape of the sample distributions
- Experimental design: # of I.V.’s, # of levels of each I.V., how manipulated (within,
between, mixed), # D.V.’s
•
Nonparametric Statistics
these make no assumptions about the distribution of scores in your sample and can be
used with nominal or ordinal data
an example…
Bivalent Between Subjects Memory Experiment
20 intro psychology subjects, 10 randomly assigned to each group
Experimental group
Male experimenter
written instructions
10 min travel film
1 o’clock testing
group testing
15 s final scene version ‘a’ - mutilated body seated in chair (operational def. of emotional
shock) - I.V.
Measure # items (of 10) correctly remembered - D.V.
•
•
•
•
•
•
•
Control group
Male experimenter
written instructions
10 min travel film
1 o’clock testing
group testing
15 s final scene version ‘b’ - craftsperson weaving a basket - I.V.
Measure # items (of 10) correctly remembered - D.V.
-
•
•
•
•
•
•
STATISTICALLY COMPARE DIFFERENCES IN MEASURED MEMORY
Hypothetical Results of Bivalent
Between Subjects Memory
Experiment #1
-
Mann-Whitney U-Test
Rank order all memory scores
from both groups- in case of ties
assign average rank
Research Methods II Notes
18
-
-
-
-
-
-
simple, you take scores and you rank order them- put best score on the top and you
go down, when you have ties you average the rank- both get 7.5 if rank 7 and 8 had
the same score)
indicate from which group scores came (circle)
Count the # of scores from the group hypothesized to have lower scores that are higher
than scores from the other group (ours is a directional hypothesis)
- the hypothesis is that emotional shock will disrupt memory…we have a directional
hypothesis, looking at the extent to which the data fit the hypothesis- if there is a
difference between your groups, then the ranks for the scores in one group should
be consistently above the
ranks from the other group
rather than being randomly
distributed
add up these counts
basic logic: you get a quantitative
measure of the mixing of the
scores
table of Mann-Whitney U-test: all
the scores for the control were
above all the scores for the
experimental- that’s why you
have the frequency of 0- then you
add up the 5 0’s
from text:
- Mann-Whitney U test used when dependent variable is scaled on at least an ordinal
scale
- good alternative to t test when data does not meet assumptions
analogous to dealing 5 black cards in a row from a randomly shuffled deck of 10 cards, 5
of which are black & 5 red
it would be 0.0036
Mann-Whitney U Test
-
this is analogous to dealing 5 black cards in a row from a randomly shuffled deck of 10
cards, 5 of which are black & 5 red
this gives us an extreme value of the Mann-Whitney of 0
- p5/10)*(4/9)*(3/8)*(2/7)*(1/6) = 0.00396
Hypothetical Results of Bivalent Between Subjects Memory Experiment #2
Mann-Whitney U-test
Mann-Whitney U-test = 8
Research Methods II Notes
-
-
19
what you want to find out is whether or not the results you got from an experiment could
have resulted from chance
the old way of doing things was to give statistical tables
on the tables: we had two groups of 10 so our critical value was 24, in a test just say we
got a value of 33 when doing the test on data about memory- we added up the ranks
and got 33
the most unlikely score of Mann-Whitney is 0- no mixing of groups (one is all high, one is
all low)
we want to find out how unlikely a value of 33 is
thus we would reject the null
Review of when to accept and reject the null:
Printouts- they give you the p value- telling you the exact probability of getting your results
by chance – probability that the null is true (if you had .34 for example, you would accept the
null if your alpha was 0.05)- you are not willing to be wrong 34 times out of a hundredsupport for experimental hypothesis would be if the value is less than alpha level
-
one tailed probability will always be half of the two tailed probability
in Critical value tables
- tells you the maximum value you can have to reject the null or the minimum value you
can have to reject the null- that number or less you would reject the null (like what we did
with the 33 way)
Between-subjects t-test
- when is it appropriate?
- Comparing means from two independent groups
- “means” mean that you have interval level data
- only appropriate for two group comparison
- the groups are independent
- assumes subjects were randomly sampled
- population values are normally distributed- if your data is not normally distributedyou would run Mann-Whitney- the between-subjects t-test is a parametric version of
the Mann-Whitney
Research Methods II Notes
why wouldn’t you do that
anyway?- parametric (in
general) are more
powerful- will give you
smaller values- less likely
to result in type II error
- parametric tests are your
choice
what does it take into account?
- difference between the
groups-how far apart the
group means are and how
variable they are within their
own group
- computation:
- you must be able to understand this because the computer might prompt you or
encourage you to do things wrong
- it is really the “sample mean difference” divided by the “variability (standard error)”
bigger number indicates larger difference between groups (taking into account variability
and sample size)
- as you increase difference in means- t will go up (the numerator getting larger)
- as denominator goes down (less variance): t will go up
- very distinct set of scores- have combination of big differences between groups, low
variance within groups, high t- high values not typically occuring by chance (the t)this is unlike the Mann-Whitney because the Mann-Whitney has low numbers as less
likely
-
-
-
20
Between- subjects t-test:
Calculating standard error
- we are using sample to infer
population values
- the bigger the sample the better
1) pooled- assumes both sample
are form populations with equal
variance. This is a weighted
average of the two sample’s
variances- if you’re two samples
have the same variance it would
be ok to assume that the two
populations they come from have
the same variance- this is more
powerful
- use definitions from other sheets/
power points
- pooled variance is really the sum
of the standard deviations over
the sum of the degrees of
freedoms
- note that it is a weighted average
because of sample size (we will
put more emphasis on bigger
Research Methods II Notes
21
samples- you do this by multiplying them by the degrees of freedoms)
if you have equal sample sizes it will just be the average of the two- but you must
remember that bigger sample sizes will have bigger reliability
2) Unpooled: adds the error terms from the two samples. Use this formula when the
homogeneity of the variance assumption is violated (when the samples have significantly
different variances)
- the real world variances are probably different (bigger probably)
- we will add the error terms together from the two samples but we will only do this when
the samples have significantly different variances
- you are adding the sample variances rather than the pooled variances
-
Evaluating the significance of t
- how big is big enough?
- We know what t-distribution looks like
- people have calculated the possible t values- you get a normal distribution
- because we know what this looks like: we know for a given df we know how likely a t
value is to have occurred by chance
- big t’s are more unlikely
- generate a large # of samples; see shape almost normal; can know how likely a specific
t-value (or higher) would occur by chance
Evaluating the significance of t
What counts as chance?
- probability of making Type I error
- you determine it (tradition is .05)
- it is really just tradition- it is a reasonable balance between type I and type II error
- in stats class- you use tables (look for critical values)
- print outs- give probabilities of getting a t-value as big, or bigger, than observed by
chance (i.e. if the null hypothesis is true)
Between- subjects t-test exercise
Independent samples means betweengroups- independent people
- the test variable in spss is the
dependent variable
- the levine test says whether or not
the variances are equal enoughthe number must be bigger than
alpha- you don’t’ want this to be
significant, if it is higher than .05
you can accept the null that the two
variances were equal- then look at
appropriate line
Group Statistics
GROUP
N
MEMSCO Experime
10
Mean
Std. Std. Error
Deviation
Mean
3.7000 2.4518
.7753
Research Methods II Notes
R ntal group
Control
group
22
10
7.2000
2.8597
.9043
Independent Samples Test
Levene's
Test for
Equality of
Variances
MEMSCOR
Equal variances
assumed
Equal variances
not assumed
t-test for
Equality of
Means
Std.
95%
Sig. (2Mean
Error
Confidence
tailed) Difference Differenc Interval of the
e
Difference
Lower
F
Sig.
t
df
.901
.355
-2.938
18
.009
-3.5000 1.1912
-6.0026
-2.938
17.590
.009
-3.5000 1.1912
-6.0067
Types of inferential statistics
- now we are on the “experimental design: # of I.V.’s, # of levels of each I.V., how
manipulated (within between, or mixed), # D.V.s
Within-subjects t-test
- also called t-test for correlated samples or matched samples t-test, paired t-test
- when is it appropriate?
- when comparing two means and within-subjects experimental design- everyone gets
both levels of independent variable or when you have matched group designs (put
them in groups based on pretest scores)
- if you put twins or siblings or roommates in different levels or little mates- use this
- with married couples, dating couples- the unit might be the couple- split them up
- within-subjects designs
- when is this type of design appropriate?
- When participants are scare
- When effect size is small
- When you are interested in effects over time
- When error variance (unatributable variability due to subject-related factorsdifferences between the subjects)- you can control for this by using this design
- Understand the correlation in the denominator of the within-subjects t-test: it is
the correlation between the scores of the people- as it gets bigger the
denominator will get larger- denominator will get smaller, t will get bigger, more
likely you are to reject the null hypothesis
- When differences amoung subjects is high and consistent…correlation is
bigger; when you get rid of this part you basically have the between-subjects
t-test (if you put the D numerator to the Y stuff
- If correlation is 0, you are basically left with a between-subjects design but
your degrees of freedom will be more so you will have LESS POWER
- disadvantages
Research Methods II Notes
-
-
-
-
increased fatigue, boredom
subject attrition
carryonver affects
if the correlation between
across treatments is low…
what does it take into account?
- mean difference (change
scores)
- variance
computation:
- average difference is on top
- standard error of the average
difference on bottom
- how do you calculate standard
error- remember that you don’t
have the source of error
variance that differences in
people could have affected the
differences in average scores
(the problem that all the
smartest people might have
been in the same group)- but
in this test you have the same
people so that is eliminated
calculating the denominator- the
“standard error” of the difference
you are looking at summing the
differences, etc.- the sigma D
(summing the differences of all the
people
don’t memorize but recognize
computational formulae
becomes…
so….
- D = difference scores (husband/
wife differences and same person
differences in both groups)- the
Dbar is the mean difference
scores- like ybar – ybar but since
you don’t’ have two different
groups you don’ t have two
different distributions
- Df’s adjust for the chance
- df for between-subjects design
would be n-1 and n-1 would be n2 but for within-subjects design is
n-1
Another version of the formula:
23
Research Methods II Notes
24
My phenomenal E.S.P. abilities
-
-
-
when using spss for paired samples t-test
- you don’t need the column telling you what groups (the 0 an 1)
- every row is a new group (since every person is in each group)
- we had a 0.0075- so, we had significance…we reject the null hypothesis
he used a biased sample- just looked at the ones that had done poorly…not everyone
which wasn’t random subject selection
- you must have this as a criterion for doing these tests
understand D- it is the difference scores (for husband, wife for examples) instead of
group differences
computation formula used to save steps and calculate by hand
Sign Test
- a very simple non-parametric test for within-subjects or situations when you have paired
data
- SPSS: help box: a nonparametric procedure used with two related samples to test the
hypothesis that two variables have the same distribution. The differences between the
variables for all cases are computed and classified as either positive, negative, or tied. If
the two variables are similarly distributed, the numbers of positives and negative
differences will not be significantly different.
- you should get just as many positives as negatives if your test was bogus (esp for
instance wouldn’t be any more likely to increase than decrease responses)
- just say you had 4 +’s and 1 -  the sign test tells you the probability of getting this by
chance
- you get a p of .188- it is greater than 0.05 because it is not smaller than alpha- fail to
reject null- ACCEPT NULL
- the old way used tables- you look at the # of nontied values you have- along the x is how
many –‘s (violations of experimental hypothesis) and the y is how many +’s- you just go
to the point on the table and find 0.188
- three values to keep in mind- if all five had remembered more items in control than
shock group- the probability would have been 0.031- reject null
- interesting to note that 5 out of 6 isn’t significant even though 5 out of 5 is
- 7 out of 8 makes it- everyone has to be consistent with hypothesis until you have 8
participants (one can violate experimental hypothesis)
- when you have 11 you can have two people violating
- down the y axis is actually the N- not the number of positives
- sign test can be used in the esp data
- every sample should have a mean of about 5
- programs like weight watchers- use the best results on posters
- you can use sign test with esp data- out of 7, 6 of them did score lower- do a sign test on
that it is not significant- p should be 0.062
Sign-Test Results of Bivalent Within-Subjects E.S.P.
Intervention Experiment
How to use sign test on spss
- it is a nonparametric test for “Related samples”- what are your paired scores- the before
and after test scores
Research Methods II Notes
-
-
25
we had the .125 because it was double .0625 (because on spss was a two-tailed test)the one tailed probability is always going to be half of the two-tailed test- so we reject the
null- they didn’t do significantly worse after intervention
we could also do a parametric paired
t-test
- you don’t need the levine for
paired t-tests
- your t ends up being 3.548
- p was 0.012- their scores did
decrease significantly according
to this test (we rejected the null)
- you can lie with statistics- it was
legit to use both of these teststhe sign test gave different result
from the paired t-test
- in general you want to use
parametric tests: they are
more powerful- they are more
likely to lead you to the conclusion that will support your hypothesis
- with the sign test: we are treating the difference between 7 and 6 the same as
the difference between 7 and 3 (either have plus or minus- no more differences
other than that)
- the one plus was only a difference of 1 but we are treating it as the same as the
minuses of 3 or 4!
- advantage to use interval or ratio scales
- when you take the magnitude of the changes into consideration you get more
sensitive results
- we never test for insignificant results- too easy to do that- sloppy results, not
enough subjects, etc.
Wilcoxon Signed ranks test- looks at ranked values- more powerful than sign, less powerful
than paired t-test
- if you think about the scale of measurement as influencing power
- sign test is basically dealing with nominal data- lowest end of data
- ranks- dealing with ordinal data- dealing with ranked differences- if you use Wilcoxon
you get 0.031- significant- reject the null (people actually did decrease upon intervention
Power and inferential statistics
- all the factors that are involved in determining which inferential statistic is appropriate
can also influence statistical power
- remember that power is the probability of correctly rejecting the null (1- type II error rate)
- this is why you always use the most powerful test you can
- you want to use the t-test
- if data is not normally distributed, you go to most powerful nonparametric test
- power is ability to correctly support experimental hypothesis- it is 1- type II error rate
- probability of correctly rejecting the null hypothesis when it is false
- power is an ordinal scale- it can increase and decrease
- factors that determine which test is appropriate
- scale of measurement: as you go from nominal  ratio you gain statistical power
- shape of sample distributions: parametric tests more powerful
Research Methods II Notes
-
26
experimental design: within-subjects designs more powerful than between-subjects
designs under certain circumstances
Within subjects design
- when is it advantageous?
- when participants are scarce? (people with a rare disorder for example; like
firefighters for instance)
- you want to get a lot of information from them
- when effect size is small- when the independent variable doesn’t effect the
dependent variable very much; too few measures
- when you are interested in effects over time
- when error variance is due to subject-related factors is high: when a lot of variability
in scores is due to difference in people (i.e. people differed in how shocked they
were in experiment so you had a lot of variance in memory recall)
- when you have a between-subjects design you control for these differences
- another version of the formula:
- key thing to this formula is the fact that there is a correlation coefficiant in this
formula- 2r12 (correlation of the scores of the
- as correlation gets more positive: you subtract it and denominator gets small- t goes
up
- if husband and wife scores not correlated at all- it will be no different from
between- subjects design
- if you have lot of differences- t will go up
- if correlation is 0- the denominator is pretty much the same as it is in the betweengroups design
- back in spss output: look at “paired samples correlations” box- it tells you the
correlation between the two: shows you the extent to which these values differ from a
between subjects design
- big different between between and within- has the correlation
- disadvantages:
- increased fatigue, boredom
- subject attrition- different levels might have to be tested on different days- they might
not come back for the other days and you wasted time; subjects drop out
- carryover effects:
- Learning: if a subject learns how to perform a task in the first treatment,
performance is likely to be better if the same or similar tasks are used in
subsequent treatments.- now they have learned a little bit
- Fatigue: if performance leads to fatigue, then performance in later treatments
may deteriorate regardless of any effects of the independent variable- in all
cases the influence on the dependent variable is not related to the independent
variable (CONFOUNDS)
- Habituation: under some conditions repeated exposure to a stimulus leads to
reduced responsiveness to that stimulus. This reduction is termed habituation
- it is when you get used to the stimulus- it doesn’t produce the same effects
that it did in the beginning- to study perception
- Sensitization: sometimes exposure to one stimulus can cause subjects to
respond more strongly to another stimulus (e.g. potentiated startle)
- classic example: potentiated startle: if you shock an animal before they are
given another aversive sound stimulus they will be even more on edge
Research Methods II Notes
27
-
-
Adaptation: if subjects go through a period of adaptation then earlier results may
differ from later results because of the adaptive changes (e.g. adjusting to the
dark)
- example: adjusting to the dark: if you wait to give a carrot to people after a
while in the dark you will be able to see better but only because in takes
about 20 minutes to become adjusted; alcohol as an example
and if the correlation between across treatments is low, these designs will be less
powerful due to d.f
Carryover Effects: Contrast
- contrast: because of contrast, exposure to one condition may alter responses of subjects
in other conditions (i.e. if you pay some people 10 first and some others 5 and then ask
them to do a boring task again for 8- groups will respond differently)
Applying What You’ve Learned: Contrast Effects
Dear Mother and Dad:
Since I left for college I have been remiss in writing and I am sorry for my thoughtlessness in not
having written before. I will bring you up to date now, but before you read on, please sit down. You
are not to read further unless you are siting down, okay?
Well then, I am getting along pretty well now. The skull fracture and the concussion I got when I
jumped out the window of my dormitory when it caught on fire shortly after my arrival here is pretty
well healed now. I only spent two weeks in the hospital and
now I can see almost normally and only get those sick headaches once a day. Fortunately, the fire
at the dorm, and my jump, were witnessed by an attendant at the gas station nearby and he called the
fire department and ambulance. He also visited me in the hospital and since I had nowhere to live
because of the burntout dormitory, he was kind enough to invite me to share his apartment with him.
It’s really a basement room, but it’s kind of cute. He is a very fine boy and we have fallen deeply in
love and are planning to get married. We haven’t got the exact date yet, but it will be before my
pregnancy begins to show.
Yes, Mother and Dad, I am pregnant. I know how much you are looking forward to being
grandparents and I know you will welcome the baby and give it the same love and devotion and
tender
care you gave me when I was a child. The reason for the delay in our marriage is that my
boyfriend has a minor infection which prevents us from passing our pre-marital blood test and I
carelessly caught it from him. I know you will welcome him into our family
with open arms. He is kind and, although not well educated, he is ambitious. Although he is of a
different race and religion then ours, I know your often well expressed tolerance will not permit you
to be bothered by that.
Now that I have brought you up to date, I want to tell you that there was no dormitory fire, I did
not have a concussion or skull fracture, I was not in the hospital, I am not pregnant, I am not engaged,
I am not infected, and there is no boyfriend. However I
am getting a “D” in Statistics and an “F” in Chemistry and I want you to see those marks in their
proper perspective.
Your loving daughter,
Sharon
Research Methods II Notes
28
Carryover Effects can be serious
- whenever you see a within-subjects design: question of possibility of order-effects!
Dealing with them
- counterbalancing: giving different groups of people the treatments in different orders
- complete and partial
- latin square designs
- minimising carryover
- problems of unequal carryover
Example 1: Stress induced analgesia (insensitivity to pain)
- requires
1) stress inducing procedure: ultra-light flying lessons
2) measure of pain sensitivity: finger vise; water as being hot; the toe in the bath test
3) willing participants: 10 people taking the lessons; 7 male and 3 female
4) ethical approval
- how can you do this?
- Between-subjects (5/group), gender?- you wouldn’t be able to counterbalance for
gender and you would only have 5 people per group- Not good!
- Within-subjects: increased statistical power (2 times as much data so increased
statistical power); subject variance no longer contributing to between treatment
differences- MORE POWERFUL STUDY
Stress-induced analgesia
- hypothetical data- 6 minutes for no-stress and 10 minutes for stress (no-stress first,
stress next)
- could be carryover affects- they could be more used to the thing on the finger (think
they were wimpy in the beginning)
- what you should do is to split subjects into groups and give different groups the
treatments in different orders (half have one first, half have other first)……do
counterbalancing because no one had the stress first- order is confounded
Stress- induced analgesia
- hypothetical data- now order is not a problem- if you have all possible orders it is
complete counterbalancing
- you have same data even with counterbalancing- you could conclude more about your
test
Another pattern of results: differences- opposite results
- if you are in no stress first- 10 minutes in vice, second trial 6 minutes
- stress first- 10 minutes first, 6 in second trial
- stress vs. no stress doesn’t make any effect- what seems to make a difference is the
order in which you get the trials
- here, counterbalancing you get data suggesting that independent variable is
irrelevant but that order is influential
Complete Counterbalancing- you can ignore order or look at it as an independent variable
- split subjects into groups and give the different groups the treatments in all possible
orders
- assuming the effects of order will be balanced equally across treatment conditions
(levels of the IV) order effects will not confound your results
Research Methods II Notes
29
-
-
-
if there is some benefit of going first then that will be balanced across the two
conditions
can test this assumption by having order as an independent variable (becomes a mixed
factorial design, with order manipulated between- and within-subjects)
- you can analyze order as a manipulation and see if it actually does- have order as an
independent variable- you would have 2 x 2- two levels of each independent
variable- the point is that YOU CAN TEST THE ASSUMPTION
if you have three treatment conditions- there are 6 possible orders (you will need a
multiple of 6 subjects for complete counterbalancing)- if you have 10 subjects you
wouldn’t have the same number of people in each group- you either ignore some or get
more
- when you keep getting more treatment conditions, this is not always good because
you probably didn’t have that many to begin with
of you can do subset of possible orders (partial counterbalancing)- only use some of the
possibilities, but which ones?
Example 2:
- you only have five participants
- I will give them all the drug twice and the placebo twice- I will get 4 bits of data from each
of my participants
- What order do you give them?
- Assume the following order effects: suppose you knew that if you gave people the
finger vice- you knew they would keep it on longer and longer (with less anxiety,
getting used to it)
- If you knew this, you can create some treatment orders that will balance that outif you were from a drug company you would want to use the placebo first and
then the drug (they would get more used to the drug as the trials went on- it
would be higher than the placebo just because of order effects
- you could have drug first and it would show no effect
- if you put placebo, drug, placebo, drug:
- you could create two groups- PDDP and DPPD- but the problem is that we
usually don’t even know was the order effects are usually
- this would be the best orders to use?
Assume the following order effectswhat sequence to use?
Order to Use?
What sequence of drug and
placebo trails to use? Could do half in
each of:
Group 1: P D D P
Group 2: D P P D
This distributes order effects
equally to the two conditions
Problem, we don’t usually know for
certain if there are order effects or exact nature (function) that describes order effects.
Research Methods II Notes
30
Partial counterbalancing
- no orders distribute the order effects equally to the two conditions
- partial counterbalancing is risky unless you know the nature of the order effects
- better to randomize treatment order across subjects (this works especially if you have
big numbers- doesn’t work with small numbers using within-subjects designs)
- create all possible orders and randomly use some of them: will assume that order will
be taken care of
- this will also assume that any order effects are equally distributed across treatment
conditions (so order not confounding)
- with partial counterbalancing you can no longer analyze order as an independent
variable- disadvantage is that you can’t analyze order as an independent variable
Latin Square
- one restriction on using it: you have to have the # of treatment orders has to equal the
number of treatments you have (i.e. five different doses of a drug, have you have five
different orders in which they can receive the drug)
- this would work well if you had 10 subjects- if you had 12 you wouldn’t use two or try
to find three more
- so- if you choose to make the number of treatment orders in an partially
counterbalanced design equal to the number of treatments, you can use the Latin
square design
- a latin square is a k x k table of items in which each item appears exactly once in each
column and row
- use it when you have four or more
treatments
Creating a 4 x 4 Latin Square- steps..put in
order of slides
-
-
-
each treatment is in each group ONCE (A)
then shuffle the rows: randomly order rows,
e.g. by drawing the numbers 1-4 out of a hat
(use a dice- ignore 5 and 6)- (B)
- as an illustration, suppose the number
were drawn in the order 2, 3, 1, 4
- row 2 becomes 1, 3 becomes 2, 1
becomes 3 and 4 stays 4
then shuffle columns- randomly order
columns e.g. by drawing the numbers i – iv
out of a hat- as an illustration, suppose the numbers
were drawn in the order- ii, i, iv, iii
- column ii becomes column i, column i
becomes column ii, etc.- (C)
after both columns and rows are
randomized- randomly assign treatment
conditions to your letter codes
- plug them into the square- every
treatment will only appear once in every
trial ; you can assume that order effects
will be balanced across treatments
Research Methods II Notes
31
Creating a Latin Square
After you have randomly ordered columns,
and rows, then randomly assign treatment
conditions to your letter codes:
e.g., treatment level 1 is assigned to C
treatment level 2 is assigned to A
treatment level 3 is assigned to B
treatment level 4 is assigned to D
Minimizing Carryover
- you should try and minimize carryover effects
- if you do this it will increase the power of your design by decreasing error variance (more
likely you can reject the null hypothesis)
- not always possible to get rid of carryover effects- you can give people practice
sessions, or breaks can reduce some carryover effects (e.g. learning and fatigue or
habituation respectively)
- a learning curve goes up and up and then levels off- if you gave everyone practice trials
you would get up to the level part (learning will not be a problem after the second trial)- if
you are dealing with fatigue or habituation you want to give them a break (if they are
doing physical exercise for example)- give them a break in between doing exercise
- if you can use complete counterbalancing, you can make treatment order an
independent variable- this allows you to MEASURE order effects (but only with
complete counterbalancing or Latin Square)
Counterbalancing
- if the carryover induced by different orders are approximately equal, counterbalancing
can control carryover
- However: if carryover is different for different orders of treatment, then counterbalancing
may not work (if for instance, A coming after B has different effect of C coming after B)
- also, some treatments are irreversible- teach people how to use a nnemonic device; they
will always have the training- can’t go back
- it would be hard to use counterbalancing to see the effects of personality after
irreversible interventions like psychosurgery (frontal lobe lobotomy); logic of
experimental designs is relatively new
Inferential Statistics
- remember: the key to all inferential statistics is the RATIO between the between-group
(between sample) variance (difference in group means) and the within-group variance,
or, the extent to which the two sample distributions overlap
- another issue related to power where you look at overlap is one tailed vs. two tailed tests
Research Methods II Notes
32
-
if you have a non-directional hypothesis- “the IV will have an influence on the DV”- it
will change it in some way; the
null that you are testing against is
that there is no change- a twotailed test MUST BE USED
- a two-tailed test divides alpha
in half, placing half in each
tail. The null hypothesis in
this case is no difference, and
there are two alternative
hypothesis, one positive and
one negative. The critical
value of t, tcrit, is written with
both a plus and a minus sign
- for example, the critical value
of t when there are 10
degrees of freedom and alpha is set to .05 is tcrit = +-2.228
- see “non-directional hypothesis graph
- the sampling distribution model used in a two-tailed t-test is illustrated below:
-
if you have a directional hypothesis: specifies a priori (specifies before you collect
data) the direction of change caused by the IV, then the null to be tested is not that there
is no difference but that there is not an increase/decrease
- a one-tailed test may be used- will be more powerful
- there are really two different one-tailed t-test, one for each tail. In a one-tailed t-test,
all the area associated with alpha is placed in either one tail or the other. Selection of
the tail depends upon which
direction tobs would be plus
or minus if the results of the
experiment came out as
expected
- the selection of the tail
MUST be made before the
experiment is conducted and
analyzed and the directional
hypothesis must be well
justified
- a one-tailed t-test in the +
direction is illustrated below:
you don’t need as big a t to
reject the null if you have a
directional hypothesis
-
Multivalent Designs
- why use designs with more than two levels of the IV?
a) when you need more than one control group (multiple control conditions)- a
control group that is really getting nothing and then a placebo group for example
looking at drugs
- i.e. when you consume alcohol, more likely to reduce inhibitions- rate movies as
erotics, but they also had a control group- rated the movies the same as the
people who consumed alcohol
Research Methods II Notes
33
-
without the placebo group you could make different conclusions- it is a social
thing (you are allowed to let your guard down?)
b) you might be interested in more than two qualitatively different groups of
people- like homosexuals, heterosexuals and bisexuals for example
c) when you want to map out the
function of the effect of the IV
- 1986: show graph on the
blood sugar levels of rats and
learning
- with increased sugar you
should increase learning for
mazes
- NO BIVALENT STUDY CAN
SHOW NON-LINEAR
FUNCTIONS
- if you gave the rats too much
glucose they actually became
dumber
- it is the same graph for the effects of alcohol
- depending on the points he picked to use as two points- there is not a single
bivalent study that would tell you the true picture of the effects of glucose on rats
(he would always get a wrong straight line)
- how would you analyze it?- you could use multiple t-tests
- what would be the problem of conducting multiple t-tests?
- Type I error: if you are going to be wrong 1 in 20 and are making 20
comparisons- you will make at least one mistake
- “probability pyramiding”: as you do more comparisons, the probability that
you will have a mistake somewhere is greater
Analysis of Variance: ANOVA
- is a way of analyzing the total variability amoung dependent variable scores and dividing
or “partitioning” this total variability amoung the factors assumed to cause it
- tries to account for the fact that everyone will not remember the same amount of
numbers for instance
- tries to figure out how much of variability is due to different “sources of variance”
(factors assumed to cause the variability)
- e.g we have three groups of emotional shock videos- mild, strong, none
- the “total variability” is the complete range of scores you found (even though you
have three graphs- the lowest of the low left graph vs. the highest of the one to the
most right)
- things contributing to sources of variance
1) subject variables
2) experimental error (any variability you can’t account for)- measurement or recording
error; even if we don’t make mistakes our scales of measurements might be too
crude
3) level of the IV
- have to find out as much variability is accounted for by each of these factors
- ANOVA calculates an F-ratio which is the ratio of between-group variability to withingroup variability
- within-group variability is due to differences amoung subjects and/or experimental error
Research Methods II Notes
34
-
-
-
everyone in the same group got the same level of IV- but the other two have impact
here- subject factors and error contributing to variability within in each level
between- group variability can also be due to differences amoung subjects (in a
between-subjects design) and or experimental error and variability caused by
treatments effects (the IV)
when you work out F- the numerator will be the between-group variability (and subject
variance, error, and IV effects) and the denominator will be the within-group variability
(the subject variance and error)- the subject variance and error can cancel out and so
you are left with the F = IV effect
- the treatment level can only influence numerator
use ANOVA when you are comparing three or more means
a significant F-ratio tells you that at least some of the differences were unlikely to have
occurred by chance- amoung these groups there is at least one difference that is unlikely
to have occurred by chance
- it considers between- groups variation (deviations of the means from the grand
mean), within-groups variation (deviations of each score from its group meanestimate of error variance) and these variances are measured by mean-squares
Assumptions of ANOVA
- it is a parametric test:
- assumptions
1) the population is assumed to be normally distributed (don’t know true population, but
should examine samples, fairly robust but can do data correction or non-parametric
analyses)
- you can have non-normally distributed data and still use it- it is robust
- if you had bimodal graph you can’t use it however
2) homogeneity of variance- robust as long as samples about equal in size (largest no
more than 1.5x smallest), when samples unequal, can be sensitive to this
- if you don’t have equal samples, even a small difference in variance it can throw your
group off
- you want the same numbers of people in both your groups
3) independence of error terms- your measurements are independent- assumes that all
measurements are independent, random assignment can start you off that way, watch
for differential “subject mortality”
- it assumes that a measurement you take from a group will not be related to another
measurement you take from the same group
- problem- “subject mortality”: they drop out of study, problem is that if you start out
with 20 people in 3 groups, 10 people out of one of the groups drop out (maybe the
no alcohol group)- the problem is that the people that are left might not be a
representative sample
Evaluating the Significance of F
How big is big enough?
We know what F-distribution looks like
Generate a large number see shape almost normal, can know how
likely a specific F-value (or higher) would occur by chance.
What counts as chance?
- probability of making type I error
- you determine it (tradition is .05)
Research Methods II Notes
-
35
critical values: based on df for numerator and denominator and alpha level
print outs- give probabilities of getting an F-value as big, or bigger than the value
observed by chance
ANOVA
- protects against probability pyramiding
- it asks the question whether somewhere amoung the different treatment conditions are
there differences that were unlikely to occur by chance- you don’t know where the
differences are
- a significant F-ration tells you that at least some of the differences amoung means were
unlikely to have occurred by chance and you should reject Ho
- you therefore have tentative support for your experimental hypothesis that some
differences were caused by the IV
Interpreting ANOVAs
- how to determine where significant differences lie after getting a significant F-ratio?
- you don’t’ know which groups are significantly different from each other- only after
you have the F-ratio you can go in and compare the different groups- you either have
planned vs. unplanned comparisons (like the prior/ post hoc)
- with planned comparisons: a specific priori hypotheses (you should have a minimum of
two planned comparisons)
- you can go in and do t-tests only after you have significant ANOVAs
- use separate (1 degree of freedom) F ratios or t-tests for the pairs of means you
have hypothesized to differ
- the number that conveys new information (called orthagonol comparisons) will be
equal to the number of groups (means) –1
- with unplanned (post hoc) comparisons (fishing expeditions)
- you have to consider two types of error: error rate per comparisons and the
familywise error rate (which takes into consideration probability pyramiding)
- see table 12-4 in book: some more conservative than others
- know formula to calculate familywise error rates- tells you what the error rate will be
for all the t-tests put together- it is like the alpha levels added up four times because
you are making 4 comparisons- the probability that somewhere in the four
comparisons I will make a mistake is greater than the probability of making a mistake
in any one by itself)
Calculating Familywise Error Rate
 FW = 1 - (1 - )c
Where:
c = number of comparisons, and
 = the per-comparison error rate
(e.g., if making 4 comparisons and  is set at .05):
 FW = 1 - (1 - )c = 1-(1-.05)4 = 1-.954 = .185)
-
summarize stanovich’s arguments in …list the 7 arguments…etc.
know stuff from chapter 8 even though not covered in class
Research Methods II Notes
36
First Midterm:
Material to be Covered From the Text
Note: not all material is from the text
• Chapter 8 pp. 216-233; 240-245.
• Chapter 9 pp. 246-268; 278-280
• Chapter 12 pp. 361-380 (stop at Two-factor ANOVA); 388-393
First Midterm:
Lectures based on Stanovich
How to think straight about psychology
 using undergraduate psychology students
 human difficulties with probabilistic reasoning (seven listed)
 more general innumeracy problem: e.g., difficulties with large numbers
 characteristics of psychological science
First Midterm:
General Topic Headings
Review stuff: advantages of experimental research, between subjects design
characteristics of experiments
 terminology (IV, DV, control, extraneous, and confounding variables)
 importance of operational definitions
 sources of distortion
 principles of hypothesis testing: types I and type II errors
First Midterm: Inferential Statistics
Parametric and non-parametric tests
logic : how they work etc.
factors determining which one is most appropriate
t-tests, both between and within
Mann-Whitney U-test, Sign-test
one-tail vs. two-tail tests
First Midterm: Within Subject Designs
 advantages/disadvantages
 carryover effects- potential causes
 counterbalancing - complete, partial , Latin squares
First Midterm: Multilevel Designs
Why do them? How to analyze them?
 probability pyramiding
 ANOVA
 interpreting significant F-ratios, a priori and post hoc comparisons
 error rates per comparison and familywise
Research Methods II Notes
37
Class 10
-
more complicated experimental designs: Factorial designs
web page creation
Factorial designs
- where more than one independent variable influences your dependent variable
- if you want to assess the effects of several independent variables on a given dependent
variable, one solution would be to conduct a separate experiment for each independent
variable of interest- but you gain more info using a factorial design
- factorial experiment from the textbook: Glass, Singer, Friedman- measuring sound
intensity and predictability on number of tries to solve an insoluble puzzle (measuring
irritation)
- in a factorial design there is a separate group for each possible combination of the levels
of the independent variable
- when manipulating two independent variables you get four groups (2 X 2)- wanted to
vary the two independent variables in such a way as to identify the separate effects of
each variable on tolerance for frustration- wanted to avoid confounding the two variables
- once you have your four groups, participants are randomly assigned to the different
groups- not confounded because you can statistically separate the effects of the
independent variables
-
experiment on sensory modality effects on memory
20 subjects, 10 randomly assigned to each group
Auditory Group
- 10 min travel film
- no emotional shock
- written instructions, 1 o’clock group testing, etc.
- IV: Auditory Price Information
- Measure memory- D.V.
Visual Group
- 10 min travel film
- no emotional shock
- written instructions, 1 o’clock group testing, etc.
- IV: visual price information
- Measure memory- D.V.
You want to statistically compare differences in measured memory- answers the question of
whether auditory or visual information is remembered better
-
if we were to do the hypothetical data- we might get data like the following graph:
Research Methods II Notes
38
Hypothetical Data
- notes on drawing graphs:
- look at differences between group
means and variability within the
groups (those two bits of info are
important- one is nominator and
denominator of t-test)
- you present the means in the graph
(graphic representation of the
means)- this is not very meaningfulwe don’t have any information about
the denominator (no info about
spread or variance)
- so the above graph is not that helpful
- SPSS does not have good graphs- so
we are only presenting the mean values (recognise that they are not good figures but we
are using them anyway)
- for our purposes: to simplify: we will treat the differences that we plot as being real
(significant differences) if they are numerically different (assume that the variances are
not coming into play)
- so for the above graph: 5 items on average for auditory and 7 items on average for
visual (we will assume this is significant)- if you ran this bivalent study
Experiment on emotional shock effects on memory:
Hypothetical Data
Experiment on Emotional Shock Effects on Memory
20 subjects, 10 randomly assigned to each group
Shock Group
•
•
•
•
10 min travel film
Auditory Price Information
written instructions, 1 o’clock goup testing, etc.
IV: Emotional Shock
Measure memory - D.V.
No Shock Group
•
•
•
10 min travel film
Auditory Price Information
written instructions, 1 o’clock goup testing, etc.
•
Measure memory - D.V.
-
we are going to measure memory again- compare the difference statistically
will answer question of whether emotional shock effects memory
IV: No Emotional Shock
Research Methods II Notes
-
39
we have done the second bivalent experiment- does emotional shock disrupt memory for
info presented orally?
Hypothetical Data
here we have results that would lead
you to reject the null hypothesis (only 3
items with shock, 5 items with no shock)
- we have answered two questions at
this point- I wonder if that is the
same effect if you presented the
information visually
Advantages of the Factorial design
- might wonder if the effect of
emotional shock on memory is
equivalent for information presented
aurally or visually
- you could do another bivalent study
- with one factorial design you can answer all three questions- this is why people use
them
- answers three questions:
- is auditory or visual information remembered better?
- Does emotional shock disrupt memory?
- Does the effect of emotional shock depend upon whether the information was received
via auditory or visual channel? (looks at the interaction of the two variables)
- Or does the effects of modality of presentation depend upon the presence of emotional
shock?
- this third question addressees the generalizability of the main effects (you are
wondering whether the results of the first group are generalizable to the second
group
- with one factorial design you can answer all three questions effectively and efficiently
- in psychology, we often have informal hypothesis (we think emotional shock disrupts
memory as opposed to formal- anything expressed mathematically) in part because
there are more than one causal factor determining behaviour (this is why we couldn’t
really have a formal hypothesis- factorial design’s necessary here
- when a combination of IV’s acting together simultaneously determines the outcome
(DV), a factorial design is uniquely suited
- more complex (and accurate) causal explanations can be tested with these designs- in
psychology, the more complicated explanations often work better
Designing a Factorial experiment
- three steps
1) identify each hypothesized causal factor (independent variable) of interest (both shock
and modality of presentation as examples)
2) decide how many levels of each factor you want (simplest case being two levels of
each)- i.e. how many levels of shock (no shock, low shock, high shock)
3) determine all possible combinations (create a matrix of treatment combinations).
Designing a Factorial experiment- tables
Research Methods II Notes
-
40
if you have simplest possible casetwo causal factors at each level
we have created four groups- each with
one level of each independent variable
- this one with words:
2 x 2 factorial experiment
- still only have 40 subjects, 10
randomly assigned to each of 4
groups
- we can then compare the means from
the 4 treatment combinations
Factorial designs
- it may seem in some ways like this
experiment is confounded (because
altering two things at once), but this is
still an analytic experiment, the “gold
standard” of science
- ideally we still have random
assignment of subjects to treatment
groups (any differences won’t be
due to subject factors)
- good things:
1.) random selection of subjects from
defined population
2.) random assignment of subjects to
treatment groups
3.) concurrent control of extraneous
variables and contrast (statistical
comparison) of the measured DV
Factorial Designs
- all conditions are still being held constant except that in the factorial designed it is the
combination of treatments that are manipulated
- they may seem confounded but we can mathematically look at their effects
- because all possible combinations of levels of the IVs are represented, we can
statistically separate their effects- this is done by ANOVA.
-
Analysis of Variance: ANOVA
ANOVA is a way of analyzing the total variability amoung dependent scores and dividing
or partitioning this total variability amoung the factors assumed to have caused it- called
sources of variance
Sources of variance
1) subject variables
2) experimental error (measurement or recording error)
3) combination of IV treatments – do this in a two-way anova
Research Methods II Notes
41
2- Way ANOVA
- you will get an F-ratio for each of the
three effects and also an F-value for
each interaction with each other
- for main effects, F is the ratio of
between-group variability to withingroup variability
- Within group variability is due to:
differences between the subjects
and experimenter error
- Between group variability can also
be due to: differences amount
subjects, experimental error, and/or
variability caused by the combined
treatment effects of the IVs
-
Evaluate the significance of F
print-outs: give probabilities of getting an F-value as big, or bigger, than the observed by
chance for each main effect and interaction possible in the factorial design
for a 2-way ANOVA (2 factor design), probabilities are given for both main effects and
the interaction
Assumptions of ANOVA
- remember that ANOVA is a parametric test: as such it assumes:
- normally distributed populations
- homogeneity of variance (same #’s of people in each group)
- independence of error terms
Terminology of Factorial Designs
- the term “factor” is used interchangeably with “independent variable”
- a number is given for each IV (factor)
- for each IV you will have a number- it will be number by number by number, etc. factorial
designs (depending on how many numbers you have)
- the exact # tells you have many levels of the IV you have (i.e. 2 x 2- two IV’s (there are 2
levels- 2 “twos” and each has two levels))
- the number used refers to the number of levels that factor has
Modality of information ex.
-
design? 2 X 2 factorial
the two digits indicate that there are two Ivs
that both digits are “2”’s indicate each IV has two levels
Terminology of Factorial Designs
- in addition, how the variables are manipulated can affect the deign- within subjects,
between subjects or both (called a mixed design)- one may be manipulated within
subjects and the other manipulated between
Research Methods II Notes
42
Types of inferential statistics
- there are many- each intended for and appropriate under specific conditions like shape,
scale, experimental design of study
- remember that the experimental design was important- # of IV, # of levels of IV’s, how
manipulated- this is same as above
Interpreting Factorial data: from tables
Interpreting Factorial data: from tables
(again)
-
-
-
letters A, B, C, D to denote the mean
scores for the different groups
when looking at main effects- you just
look at one independent variable at a
time (you ignore independent variable
2) – combine the scores of all the
groups (collapse across IV1) to see if
the first IV had a MAIN effect
main effects: the separate effects of
each independent variable
how to calculate main effects of IV’s:
a) average group means in first
column and write result under
the first column
b) do the same for the group
means in the second columnyou now have two “column
means”
c) average the group means
across the first row and write the
result to the right of the first row
d) do the same for the second rownow you have two “row
means”
e) compare column meansrepresent the main effect for
one IV averaged over the
effects of the other IV
f) do the same for the row
means- represent the main
effects for the other IV
averaged over the first IV
g) reliable difference in means
indicates an effect of the IV
independent of the other IV
next table looking at all the different
main effects- if you don’t have equal
# of people, you get weighted averages
Research Methods II Notes
-
-
43
if you compare the (A+C)/2 and the (B+D)/2 different averages then you can determine
the main effect of IV #1
if you compare the other two you can
determine the main effect of IV #2
comparing values to determine the
main effects of Ivs 1 and 2 requires
statistical analyses to determine if the
observed differences are greater than
would be expected by chance for a
given, a priori research determined,
alpha error rate- this is done using
ANOVA
you require stats!
Interpreting factorial data: from tables
- you must also interpret interactions
- an interaction is present when the
effect of one independent variable changes across the levels of another independent
variable
- i.e. Glass et. al: found that the # of attempts to solve the insoluble puzzle was greater
when the noise was soft than when it was loud but this relationship held only when the
noise was unpredictable (when noise was predictable the # of attempts was about 26
regardless of noise intensity)
- parallel lines will not have any interaction
- look at the difference between A and B (A-B) and C and D (C-D)
- if there is a difference in the difference you have a significant interaction
- if you compare the two differences, and there is a difference- you have an interaction
effect between Ivs 1&2
- you can also have A-C and B-D to compare these values to determine interaction- look
for a difference in these differences – you can do it the other way (C-A, etc.)- just be
consistent
- comparing any of the differences in same coloured values determines the interactive
effect of IV’s 1&2
- see table with colours- slide 39
- comparing the values to determine the interaction between IV’s 1 and 2 also requires
statistical analyses to determine if the observed differences in differences are greater
than would be expected by chance…
Some (many) examples
Interpreting factorial data: from tables
(slide 42)
Interpreting factorial data: from figures
- good idea to graph results (plot
results)
- to answer question about main
effects- do the math
- in one low block compare the two
blocks (make an average)- in this
table they are the same- we do not
have a main effect
Research Methods II Notes
-
44
there is a main effect for the manipulation of the second IV- the red bars are higher than
the blue bars
Research Methods II Notes
45
Hypothetical Data: Stress-induced analgesia:
Revisited
-
-
using order as another
independent variable (other than
stress vs. no stress)
mixed design- order is between,
stress is within
Complete Counterbalancing
- split subjects into groups and give
the different groups and give the
treatments in all possible orders
- assuming the effects or order will
be balanced equally across
treatment conditions (levels of the
IV), order effects will not confound
-
your results
you can test this assumption by having
the order in which treatments are
received an independent variable
(becomes a mixed factorial design with
order manipulated between and
treatment within subjects)
we have three questions- main effect for
stress? Main effect for order? Interaction?
- Is there a main effect for stress?yes, when tested under stressful
conditions, subjects showed …blah,
blah, blah
- No main effect for order- the
average for both stress and no
stress scores is the same for both
no-stress first and stress first
- Is there an interaction?- does the
effect of stress- is it the same
whether they receive stress or no
stress first- there is no difference in
differences is 0
Another example- slide 54
-
no main effect for stressthere is an effect for order
no interaction
Another example (slide 56)
no effect for stress- the average between the blue and blue and red and red is the same
- no effect for order
Research Methods II Notes
-
interaction effect- does the effect of stress depend upon the order that they receive the
treatment- it will be 2 and –2 between the two no-stress first and stress first and also 2
and –2 between the two red bars and blue bars (one will be + and one will be -  either
one shows the interaction effect
Interpreting Factorial data: from figures (slide 58-64)
line graphs are not as good to look at
-
-
46
no effect for IV1
to find average for either blue or redit will be the middle of the line- they
do not differ so there is not an effect
for IV 2- there is an effect
no interaction effect- if you see a
parallel line there will be no
interaction- line data help determine
whether things are parallel
Interpreting Data from tables- slide 64
Hypothetical data (slide 65)
Research Methods II Notes
-
-
47
is there a main effect for modality?- yes, average for top red line is 7 and average for
bottom is 4- yes
yes, there is a main effect for shock- the difference between 7-5 (6) is different from 7-3
(5)
yes there is an interaction because the difference between 6 and 5 is different from the
difference between 7 and 4- no!
there would be an interaction between them because of looking at the four raw data
points- not the averages- #’s used in determining the main effects are not used to
determine the interaction effects
for this you would look at the difference
between 7 and 7 and between 5 and 3
Hypothetical data (slide 66)
- there is a main effect for auditory vs.
visual (difference of two)
- main effect for emotional shock
- no interaction- lines are parallel- people
will always remember more in the visual
group regardless of whether they are
shocked or not (no interaction
Hypothetical data *67*
-
main effect for modality
no main effect for emotional shock
there is an interaction
Problems with “higher order factorial designs”
1) not enough subjects per group: you should have at least 5 subjects per group for a
reasonable ability to detect the effects of the independent variable
2) numbers and complexity of the resulting interactions- if you have a 2 x 2 x 2 interaction
for example you would get three main effects, three 2 way interactions and a three-way
interaction….could be a headache after a while with all the numbers
February 9, 1999
-
figure out how to paste the figures into the notes
Research Methods II Notes
48
Main effects & interactions are independent
- are the three questions that you can answer independent?- yes!
- the three questions that can be answered with a two-factor design are independent
- this means that there are 8 (2X2X2) possible outcomes from the data analysis
- main effect for IV1: yes or no
- main effect for IV2: yes or no
- interaction: yes or no
Interpreting factorial data: from tables
- this table shows main effect for IV1 and no main effect for IV2 and no interaction
there is a main effect for IV1, no interaction, no main effect for IV2
Interpreting factorial data: from figures
no main effect for IV2, no interaction (lines are parallel),
- no main effect for IV1, no main effect for IV2, interaction
- when you are looking at the interaction, it is like saying, if you were to do two different
-
bivalent data would you data be different depending on what level of one IV you usedlike, would you get different data for IV2 depending on whether you used high or low
values for IV1
intersecting lines
Crossover interaction
- in a crossover interaction occurs when the first IV has one effect at one level of a second
Research Methods II Notes
-
49
IV but it has the opposite effect at the other level of the second IV
see figure with the crossover lines
Glass, Singer, & Friedman, 1969 (from
text)
- interested if effects of exposure
to irritating noise on several
behavioural measures of
tolerance for frustration
- operational definition of D.V.:
time spent trying to solve an
unsolvable problem
(deception)- how long a person
will work at it
- factors (Ivs):
- noise predictability (2 levels)
- noise intensity (2 levels)
- this would be a 2x2 factorial
design
Glass et al. 1969
-
if the samples are equal, they just added the two numbers together (sums instead of
averagings)- you can only use this method if equal numbers of people
there is a big difference for
one and not so much for the
other (big for predictability
and not so big for intensity)
Glass et al. data if you were to
plot it- line graph
-
use line graph because softloud is continuous
significant interaction- lines
are not parallel
worst possible scenario is a
roommate who plays very
loud noise randomly
Research Methods II Notes
-
50
throughout the day (unpredictable and loud)
predictability is a key aspect of control and this is related to psychological factors
Tomilson, Hicks & Pellegrini (1978)
- looked at whether pupil size served a role in communication
- attributions of female college students to variations in pupil size
- there were problems in the literature about this topic: inconsistent findings re. Effects of
pupil size in non-verbal communication
- Hess talked about innate releaser- things that trigger behaviours that they can’t help
- do humans do the same thing with pupil size?
- Janisse- thought pupil size was a trivial variable
- studies often come about because the literature is confused
- all the studies had used bivalent studies- two types of pupil size
- “digitally retouching” to examine this in a multivalent experiment
Capilano Canyon: North Vancouver
- it is a suspension bridge with a cable- it swings and sways and 30 feet down to rapids
- low light is romantic- why?
- your eyes dilate under low light
- pupils dilate when excited- your pupils will be dilated on the bridge
- the people in the study were approached by the male or female and then later asked by
a confederate to rate how attractive they thought the person was
Digital Manipulation- they manipulated a picture for an add, etc. that didn’t exist because
they had manipulated the picture
- OJ Newsweek picture
Tomlinson et. al
- limited previous literature also suggest women rate males and females differently re.
Pupil size (i.e., there is an interaction)- women rate a women with big pupils as less
attractive and males with big pupils as more attractive
- subjects: 246 female undergraduates in courses “primarily of interests to women”
- method: 10 photos of males and females of “moderate attractiveness”
- 35 judges rate these (7 pt scale): picked 1 male and 1 female closest to median with
least variance
- retouched the photos- made the pupils either smaller or larger
- design?
- 2 x 3 within subjects factorial design
- 2 is the two sexes
- 3 different pupil sizes
- so you have one group- because it is
within groups
- IVs?
- sex of model
- pupil size
- dealing with carryover?
- presentation order randomized
Data
- this is a good way to show data because
it shows variance as well as just the means
Research Methods II Notes
-
all three differences have to be the
same or you might have an
interaction
Tomlinson et. al
- line figure
-
-
-
-
-
you have to look at both mean
differences and variance
you can add the variance terms in
excel- indicates the standard
deviation value (gives you a better
idea about how the distributions
are distinct from each other)
you tend to just show the error
bars on one side so it’s not as
cluttered- you know the error bars are
symmetrical
there is a main effect of gender
there is an interaction
it is hard to tell for the other main effectprobably is
when you have 246 subjects- gender
effect and pupil size and interaction all
significant at the .001 level
rating of attractiveness depends upon
the sex of the person you are
rating and how big the pupil size is
maybe this says that large is TOO large for
males
The same graph as above worked out for main
effects, etc.
Tutorial interpreting factorial data
51
Research Methods II Notes
-
52
Fazio & Backler: topics in research methods: main effects and interactions’
This program provides an excellent opportunity to practice interpreting main effects and
interpretations
I highly recommend everyone do this
Advantages of the Factorial design
- when a combination of IV’s acting together simultaneously determines the outcome
(DV), a factorial design is uniquely suited
- more complex and accurate causal explanations can be tested
- Mischel- history of personality research has increasingly become the study of higherorder interactions
- E.g. delay of gratification: depends upon, age x gender, object x consequences x models
x prior experience x …
- ask a 3-year old if they want three smarties now or 10 smarties when they wait
- when they get to be 4: who’s smartest person in preschool- they will point to boy- and
say that they will do the smart thing but then they will not choose the smart thing
TutorialAt dos prompt- type in “effects”
You will get the main screen
-hit enter and go to the main screenFebruary 11, 1999
-
2 X 3 factorial designs
the question is whether speed increases similarly for both 2D and 3D- practice has the
same effect for both 2D and 3D
for the main effect- does it take longer to do a 2D rotation or a 3D rotation
you can add up the total seconds- 24 for the 2D rotation and 42 for the 3D rotation
this is the same as the 2 X 2 except that we are averaging across three numbers instead
of two numbers
three level analysis for practice- for main effect?- just some have to be different
you can just look at two and if they are different there is a main effect for practice
all of them have to be the same for there to be no main effect
for interaction- with no practice,3D tasks take 10 seconds longer, with some practice 3D
takes 6 seconds longer and with lots of practice 3D rotations take only 2 seconds longer
if there is a difference somewhere in the differences you have an interaction
this is done with ANOVA which tells you that there is a difference somewhere although
we don’t know where
when you ask a question about main effect- you are showing that the IV you are talking
about has an influence on the DV
regardless of practice, 3D will take longer
the interaction looks at whether your performance increases similarly for both
difference in the difficulty depends upon the practice that you have
look at the differences across none, some, lots (10-20, 8-14, 6-8) then the differences
between 10, 8, 6 and 20, 14, 8 (do the differences get smaller with practice?)
2 X3 Factorial designs
(graph)
Research Methods II Notes
-
we have a main effect for task
type, practice, and an interaction
2 X 3 factorial designs
-
-
-
-
the lines are not
parallel….remember that there is
probably an interaction then
main effect for practice: difference
for each bar None, some, lots,
etc. and if there is any difference
53
Research Methods II Notes
-
54
anywhere you will have a main
effect- all of them are different
2 X 3 Factorial designs
-
-
no main effect for rotation type
(10 + 6 +8)/2 compared with
the other three
main effect for practice
there is an interaction
2 X3 Factorial designs
(graph)
- also look at the line graph:
- when the two lines cross over
then you have an interaction
Modification of Stroop Lab
- stroop (1935): Stroop colour and word test
- stroop effect: the difficulty in naming the colour of an object when the colour conflicts
with the name of the object (i.e. when the word blue is written in red ink)
- cognitive interference between the naming process (consciously instructed) and the
reading process (automatic verbal processing response)
- tests the ability of separating the word and colour naming stimlui
Modification of stroop
- procedure: within subjects: each ind. Receives all levels of iv (within subjects designs
- counterbalance
- time, record in data table
Class data stuff
Descriptives
Output Created
Comments
Notes
11 Feb 99 10:53:46
Research Methods II Notes
55
Filter
Weight
Split File
N of Rows in Working
Data File
Input
Definition of Missing
Missing Value
Handling
Cases Used
Syntax
Resources
<none>
<none>
<none>
14
User defined missing values are treated as
missing.
All non-missing data are used.
DESCRIPTIVES
VARIABLES=reading plussign countdig
/STATISTICS=MEAN STDDEV MIN MAX .
Elapsed Time
READING
PLUSSIGN
COUNTDIG
Valid N (listwise)
Descriptive Statistics
Maximu
N Minimum
m
14
8.00
16.00
14
9.00
17.00
14
12.00
17.00
0:00:02.04
Mean
10.2857
12.5714
15.1429
Std. Deviation
2.0913
2.1738
1.5119
14
General Linear Model
Output Created
Notes
11 Feb 99 10:54:53
Comments
Input
Missing
Value
Handling
Filter
Weight
Split File
N of Rows in
Working Data File
Definition of
Missing
Cases Used
Syntax
Resources
Elapsed Time
<none>
<none>
<none>
14
User-defined missing values are treated as missing.
Statistics are based on all cases with valid data for all
variables in the model.
GLM
reading plussign countdig
/WSFACTOR = factor1 3 Polynomial
/METHOD = SSTYPE(3)
/CRITERIA = ALPHA(.05)
/WSDESIGN = factor1 .
0:00:00.99
Research Methods II Notes
56
Within-Subjects Factors
Measure: MEASURE_1
FACTOR1
Dependent Variable
1
READING
2
PLUSSIGN
3
COUNTDIG
Effect
Pillai's Trace
Wilks'
Lambda
FACTO
R1
Hotelling's Trace
Roy's Largest Root
a Exact statistic
b Design: Intercept
Within Subjects Design: FACTOR1
Multivariate Tests(b)
Value
F
Hypothesis df
.755
18.512(a)
2.000
.245
18.512(a)
2.000
3.085
18.512(a)
2.000
3.085
18.512(a)
2.000
Error df
12.000
12.000
12.000
12.000
Sig.
.000
.000
.000
.000
Mauchly's Test of Sphericity(b)
Measure: MEASURE_1
Epsilon(a)
Within
Subject
s Effect
Mauchl
y's W
Approx
. ChiSquare
df
Green Hu Lo
house ynh wer
Geiss Fel bou
er
dt nd
Sig.
FACTO
.807
2.575
2
.276
.838
.948
.500
R1
Tests the null hypothesis that the error covariance matrix of the
orthonormalized transformed dependent variables is proportional to an
identity matrix.
a May be used to adjust the degrees of freedom for the averaged tests of
significance. Corrected tests are displayed in the layers (by default) of
the Tests of Within Subjects Effects table.
b Design: Intercept
Within Subjects Design: FACTOR1
Source
FACTOR1
Tests of Within-Subjects Effects
Measure: MEASURE_1
Type III Sum of
df
Squares
Sphericity
Assumed
GreenhouseGeisser
Mean
Square
165.333
2
82.667
165.333
1.676
98.631
F
Sig.
23.53
.000
3
23.53
.000
3
Research Methods II Notes
Error(FACT
OR1)
57
Huynh-Feldt
165.333
1.896
87.208
Lower-bound
165.333
1.000
165.333
91.333
26
3.513
Sphericity
Assumed
GreenhouseGeisser
91.333
Huynh-Feldt
91.333
Lower-bound
91.333
21.79
2
24.64
6
13.00
0
23.53
.000
3
23.53
.000
3
4.191
3.706
7.026
Tests of Within-Subjects Contrasts
Measure: MEASURE_1
Source
FACTOR1
FACTOR
1
Linear
Quadrati
c
165.143
1
Mean
Square
165.143
.190
1
.190
60.857 13
4.681
30.476 13
2.344
Type III Sum of Squares
Linear
Error(FACTOR1
)
Quadrati
c
df
Source
Intercept
Tests of Between-Subjects Effects
Measure: MEASURE_1
Transformed Variable: Average
Type III Sum of Squares
df Mean Square
6738.667
1
6738.667
Error
56.667
13
F
Sig.
35.277
.000
.081
.780
F
1545.929
Sig.
.000
4.359
T-Test
Output Created
Notes
11 Feb 99 10:56:40
Comments
Input
Filter
<none>
Weight
<none>
Split File
<none>
N of Rows in
Working Data
File
14
Research Methods II Notes
Missing
Value
Handling
Definition of
Missing
Cases Used
Syntax
58
User defined missing values are treated as missing.
Statistics for each analysis are based on the cases with no
missing or out-of-range data for any variable in the analysis.
T-TEST
PAIRS= plussign reading WITH countdig countdig (PAIRED)
/CRITERIA=CIN(.95)
/MISSING=ANALYSIS.
Resource
Elapsed Time
s
0:00:00.22
Paired Samples Statistics
Mean
Pair 1
Pair 2
PLUSSIGN
COUNTDIG
READING
COUNTDIG
N
12.5714
15.1429
10.2857
15.1429
Std. Deviation
14
14
14
14
Std. Error Mean
2.1738
1.5119
2.0913
1.5119
.5810
.4041
.5589
.4041
Paired Samples Correlations
N
Pair 1
Pair 2
PLUSSIGN & COUNTDIG
READING & COUNTDIG
Me
an
P PLUSSIGN
ai
2.5
r COUNTDI
714
1
G
P
READING ai
COUNTDI 4.8
r
G
571
2
-
Std.
Deviati
on
Correlation
14
14
Sig.
.441
-.427
Paired Samples Test
Paired Differences
95% Confidence Interval
Std.
of the Difference
Error
Mean
Lower
Upper
t
d
f
.114
.127
Sig. (2tailed)
2.0273
.5418
-3.7419
-1.4009
1
4.7
3
46
.000
3.0598
.8178
-6.6238
-3.0904
1
5.9
3
39
.000
we got a significant F-ratio so we are now doing paired t-tests (because within subjects
design) to test the orthagonal comparisons
if it had been a between-subjects design, you would need a fourth column to say which
data each person was in (it would be a nominal variable to define group membership)
Research Methods II Notes
-
59
then you would have another column with the scoresagain we have one row for each person- each person now has only one set of data
in the window, DV would be reaction time and “factor” (IV) would be the scores
this also would have been significant although we would have a lower F-value in the
between groups design
because in between-groups the differences in the subjects could contribute to the
differences in the scores
Maximizing Chances of finding significance
- you can increase between groups differences
- use a stronger variable that has a bigger effect in the real world (like a more
frustrating variable if you are measuring frustration)
- change the level of the treatment of IV (stronger dosage of a drug, etc.)
- more sensitive operational definition for the DV (if you had had lists for the stroop that
were smaller it would have been a less sensitive measure
- you can also decrease within group variance
- increase control over extraneous factors
- decrease subject variance (change design to matched or within subject design)
- increase sample size
Craik & Lockart (’72): Levels of processing
-
we have discussed memory in the past
these people proposed that the more you thought about something the more you will
remember it
three types of understanding
graphemic level of processing focuses on surface features: 5 letters, 1 vowel, all
uppercase
phonological level of processing is based on sound patterns (e.g. rhymes with hot)
semantic level of processing gets at the meaning of the word
For instance- BEAR
- different people would be asked the different questions that follow that would get at
different things…
- is it all in uppercase? Graphic level of processing
- does it rhyme with chair?
Phonological level of processing
- is it an animal? Semantic level of
processing
- asked to pick out 60 words they had
seen before from a list of 180
Craik & Tulving: processing and
memory
- if you understand and processes the
meaning of the word you will
remember it more
Types of memory and how they are
assessed
Research Methods II Notes
Memory system
1. procedural
2. priming
3. semantic
4. primary
(working/short term)
5. episodic
-
60
Retrieval system
implicit
implicit
implicit
explicit
explicit
people with damage to mammillary bodies have memory problems
explicit memory- you have to remember remembering a word
implicit memory- you are shown _L_P_A_T (this would be elephant
you might not recall having heard the word elephant
you could also have ELE_______ : some people might say elephant and some elegantyou can measure what they don’t really remember when they remembered it
kangaroo
elephant
conference
How do amnestics do on tests?
- is amnestics memory bad for both implicit and explicit effects- would Jimmy show
priming effects?
- You could get all sorts of data like crossover or not
- In reality, the amnestics show the same priming effects as non-amnestic people
Three-factor designs
- you want to integrate the different information
- with a three-factor design, there are seven questions that can be answered
- there are three main effects (one for each IV)
- there are three 2-way interactions
- the influence of one IV may depend on anther IV
- there is one three way interaction
- this is where the two way
interactions depend upon the
level of the third variable you are
looking at
- examples based on a study of
gender, practice and plane
rotation…
Hypothetical higher-order study of
memory
(graph)
- it is not within because one
subject can not give you all the
bits of data
- whenever you look at two way
interactions, you will collapse
across the third IV
Research Methods II Notes
-
the three way interaction assesses whether there are differences in differences of
differences
-
when looking at main
effects, you ignore the
other two IVs
-
there is main effect for
male and female
there is a main effect
for practice
also main effect for
task rotation- 3D tasks
take longer
three two-way
interactions
1: whether males and
females improve the
same amount with
-
-
practice (don’t care about the 2 vs 3 D)
61
Research Methods II Notes
-
62
for gender X dimension- combine the two dimensional data for males with the three
dimensional data for males compared with the same for gender- they are the same so
there is no two-way interaction
next graph:
Research Methods II Notes
63
“To steal ideas from one person is plagiarism; to steal from many is research”
Three-factor design
- remember that you have three questions that you can answer
- with one analysis you end up with 7 f and p values to answer these questions
- there are main effects (one for each IV)
- there are three 2-way interactions
- there is one three-way interaction
- example he gave was the study on gender, practice, and mental rotation speed
- with a 2x2x2 it is safer to go back to the numbers, rather than tables or bar graphs
-
-
for main effect for gender- doesn’t talk about practice or 2D and 3D differences in
rotation tasks- single score for males and single score for females- just total them up
you get 36 (male) and 52 (female): you must describe: males, overal, are faster than
females (give description)
practice main effect- collapse across 2D and 3D and male/female
48 vs 40- main effect for practice
then add up all the 2D’s with all the 3D’s- there is a main effect for rotation- 2D tasks do
not take as long as 3D tasks
to do two-way interactions: we are still just dealing with four numbers
there are three of them- G X P, G X D, P X D
for G X P, you will collapse across the variable that is irrelevant- collapse across
dimension,
Research Methods II Notes
64
get a total value for males with practice, males without practice, females with practice,
females without practice
- 18, 18, 30, 22
- you can subtract these values in any way
- 18-18 = 0
- 30-22 = 8
- we have a difference in our
differences (males do not
improve with practice but
females do)
- G X D: don’t care about practice;
collapse across the numbers for
practice
- Male on 2D, males on 3D,
females on 2D, females on 3D
- 16, 20, 24, 28 respectively
- to examine the gender by
dimension interaction, collapse
across practice and examine the
differences in differences
- these differences are not
different- both are (-4);
- third two-way interactiondimension; collapse across
gender;
- practice with 2D, no practice with
2D, practice with 3D, no practice with 3D
- we do not have an interaction here (both are (-4))
Interpreting 3-way interactions
- there is one 3-way interaction amoung gender, practice, dimension (GxPxD) to be
analyzed
- the three way interactions
asks whether the two way
interactions differ depending
upon the level of the third
variable that is examined
- in other words, three way
interactions require you to
compare the difference in the
difference of difference
scores!
- because there are 3 two-way
interactions you can assess
the 3 way interaction in three
different ways
-
Interpreting 3- way interactions
- the easiest way to show/do
this with a 2x2x2 data table is to simplify the data to create a 2x2 table containing the
effects of the third variable at each combination of the two variables remaining
Research Methods II Notes
65
-
-
e.g. create a table of gender
x dimension and fill the four
cells with the effect of
practice on that combination
of gender and dimension
you want to make a table that
has only four cells- you put in
the effect of the other variable
(practice) on males doing 2
dimensional tasks (instead of
collapsing across the
other variables you put in
their effect so that all
variables are accounted
for)- you put the
difference in
-
males on 2D tasks: 0 sec (the difference between males with and without practice)
males on 3D tasks: 0 sec
females on 2D tasks: 4 sec
females on 3D tasks: 4 sec
look at the effect of practice at each combination of gender and dimension (similar to
what we did for the GxD interaction but instead of averaging, look at differences
put in the tables showing how to collapse big one into little ones
now 0-0=0 compared to 4-4=0 or (-4)-(-4)=0
Research Methods II Notes
-
66
no significant three way interaction
Higher order factorial designs (bar chart) this is a little difficult to see
Has other examples- best to do handout “2x2x2 factorial data interpretation exercises”
Next:
Chapter 10: Single-subject designs
- also called small n designs
lots of situations when you want to see about causality- this is harder when you are a
psychotherapist, you want to know if just your client will improve, not whole groups
- focus is on the behaviour of a few, even a single, individual
- termed small-n or single-subject designs
- common in clinical research
- generally avoids inferential statistics (different approach to determining reliability)
Small N-designs
- historically: many studies- psychophysics Fechner and Weber and memory, Ebbinghaus
based on very small subjects- even 1
- Ebbinghaus tried to remember words like BAV and GOM- did free recall to see how
many he got right
- all done prior to the invention of inferential statistics!
- Sir Francis Galton & Karl Pearson- correlation coefficient, late 1800’s
- Sir Ronals Fisher- F test; 1920-1930
- People realised very quickly how important this was
- by 1950’s had to use statistics to get published, animal learning folk (Skinner etc.)
created their own journal
- we still have an obsession for statistics- we get infatuated about p-values; his problem
with the editor who wanted stats done comparing animals who didn’t live at same time
Baseline vs. Discrete Trials Designs
- single-subject designs come in a variety of forms- all forms can be categorized into
either baseline designs or discrete trials designs
- today, when we say single-subject designs we usually mean baseline designs
Baseline designs
- becoming more popular again in clinical research
- essential feature is the establishment of a behavioural baseline during the “baseline
phase” of the experiment
- this baseline establishes level of performance on the dependent measure before
introduction of experimental treatment
- following baseline phase, subject exposed to experimental treatment and behaviour
measured again- second phase called “intervention phase”
- requires:
- at least one baseline period and
- at least one intervention period
- you can use biofeedback- if you can calm your physiological arousal at the same time as
having the needle, you can train your physiological responses, maybe you could cure
problems with needles
Research Methods II Notes
-
-
67
you have phobic needle person come in- rate anxiety over series of minutes and then
start biofeedback training and their anxiety is reduced
example in the book was of a subset from a larger study using rats and electric shockssubjects tested in a operant conditioning box
baseline phase  each one got a series of training sessions to framiliarize it with the
shock schedule and establish behavioural baseline; either had light on or off (light on
meant shocks were predictable and were preceeded by warning tone; darkness meant
unpredictable)- kept doing this until the points on the baseline remained within a 10%
range; bar had no effect on anything; measured # of times rat pressed bar
intervention phase  rat could “buy” 1 minute of time in the light predictable condition by
pressing the bar; continued to record bar pressing until baseline within 10% range
results: bar pressing went from 10 to 85% in intervention phase- had this twice with
baseline phase in between and it jumped down and then back up again
response rates only high when responses produced the signalled schedule
did follow up studies
the baseline data would be called “baseline (A)”
intervention (B)
there is a confound- passage of time
overhead: Fig 10-1: example in book: Badia & Culbertson 1972- shock in rats: either
signal or not- (note- will return to idea of stress and control?)
tet test: trained eyeball test: do you need statistical analysis to tell you that there is a
change from the baseline to intervention period
but there is still the possible phobia for the passage of time
table 10-1: more detail about characteristics about the baseline stuff; characteristics of
single-subject designs
Functions of baseline
- feature of the baseline design is its use of the “behavioural baseline” : record of a
subject’s performance across time within a phase
- has two important functions:
1) establishes level of the DV prior to intervention
2) allows assessment of variability in DV
- may involve a “stability criterion” to minimize error variance so that any effect of the
intervention will be more apparent
- stability criterion: defined to identify when the baseline no longer shows any systematic
trends; used to ensure that baseline accurately represents the level of behaviour
produced by a given treatment
- trends of either increasing or decreasing values usually occur right after change to a new
phase- stability criterion used when there are these systematic changes
- keep baseline up to date or you can run the risk of having your subject in a certain phase
for too long- waste of time, violation of experimental procedure
- you can determine whether baseline has met stability criterion by updating plot after
each session and examining it
- if stability criterion too stringent  baseline may never achieve it; not be able to proceed
to next phase
- too lax  may proceed to next phase before subject’s performance stabilized
- to develop good one may require pilot work; experience with % measure used indicated
that the baseline was likely to remain stable if it varied only within 10% over three
successive trials
- the less variance within groups, the easier it is to tell the difference between the two
groups
Research Methods II Notes
68
Intrasubject replication
- baseline and intervention phases each repeated
- each subject’s performance thus assessed twice under each phase
- look and see whether behaviour was repeatable or “replicable”
- ability to replicate  reliable data
Assessing Generalizability
- we are interested in generalizability
- use intersubject replication (between-subject replication): use two animals instead of
one
- should you have 20 rats…you don’t need to have that many
- example: ABAB design
- these are like within-subjects designs when we don’t counterbalance
- each time you change if the changes are consistent, the chances of changing by chance
becomes smaller and smaller
- this is different from group designs in which inferential statistics are used to assess
reliability after a fairly complex chain of logical inferences
- note that in both cases, the key is the degree of overlap of distributions
Rationale of the baseline design
- four areas in which the baseline design and group-designs differ
- dealing with random variability
- handling error variance
- assessing the reliability of findings
- determining the generality of results
Dealing with random variability
- when you manipulate IV you hope to show that it causes changes in behaviour
- systematic variance: the changes that vary systematically with the level of the IV
- error variance: changes in behaviour not related to variation of the IV (unsystematic
variation)
- shows up as unsystematic fluctuations of a baseline within a phase
- also shows up as variations in final, stable levels of responding across same treatment
- also occurs between subjects undergoing same treatment
- you control for this in group designs by holding extraneous variables constant and
averaging data across subjects
- with single-subject approach, error variance handled by tight experimental control
- statistical control measures avoided
- researchers make an effort to identify the possible sources of variance- graph data and
look for error variance (high levels of instability)
- must decide how much variance is acceptable just as variance- you never statistically
control for error variance you just continue to try and find sources of it
- by identifying the sources of variance you can better understand the behaviour in
question (as the sources of variance are contributing to different behavioural responses)in the group approach the effects of sources of variance are hidden with the averaging
process
Dealing with error variance
- philosophical difference between single-subject approach and group approach in dealing
with error variance
Research Methods II Notes
-
-
69
with a group approach, you control error variance a little by randomization, etc., but there
are also statistical ways- if effects are significant (can reject the null) little further efforts
is made to determine the sources of variance
with small n designs- you try and eliminate it completely; control as much as possible
experimentally
take repeated measures and implement a stability criterion
taking repeated measures of the DV within a treatment allows you to determine how
successful you have been at controlling extraneous variables
to increase stability of measure use more careful control of other variables like food
deprivation, amount of exercise, etc.
stability criterion removes transitional data from analysis and eliminates unnecessary
time spent collecting data after stability achieved
Assessing the reliability of findings
- group approach assesses reliability through chain of logical inferences from data of a
single experiment
- used to estimate population parameters
- single-subject design assesses reliability more directly through actual replication- each
subject experiences each phase two or more times
- performances across multiple exposures are compared- if data replicate then they are
reliable
- whether you can say the data has been replicated or not depends on the degree of
control you have over the DV and on the question the experimenter is looking at
- high degree of control  variation in baseline will be minimal
- low degree of control  baseline will be variable- variability can occur within and
between replications of the same conditions
- see pg. 292
Determining the Generality of Findings
- group approach establishes generality by averaging across large numbers of subjects
- assumes that the average performance will be “representative” of the population from
which the subjects of the experiment were sampled
- averages represent a blend of different patterns into an average that is unrepresentative
of the individual behaviour underlying the average
- single-subject approach: avoids problem by using intersubject replication instead of
averaging
- direct comparisons of the behaviour of different subjects provides a measure of
intersubject variation
- if all subjects show similar patterns you can be confident that similar results would be
obtained with most subjects from the same population
- Reynolds: showed failure of intersubject replication; pigeons trained to peck at
translucent disk on which was projected a triangle against a red background
- the response of each bird was controlled by a different stimuli
- each pigeon responded differently to the intervention time
- this showed that there were probably underlying determining factors at work that
wouldn’t have been noticed in a group design when averaging used
- in addition to intersubject replication, the generality of results in single-subject designs
evaluated by double-checking results in new experiments
- these experiments build on previous research while extending the range of variables that
are assessed- use different kinds of reinforcers and subjects
Research Methods II Notes
-
70
systematic replication: extensions that incorporate aspects of the original experiment
while adding new wrinkles
direct replication: exact replications
Time-series (Small-N) Designs
- don’t involve random assignment
- control and manipulate variables sequentially (not simultaneously as in analytic
experiments)
- anything else that occurred in this time is therefore a potential confound
- remember…
- phrase used to highlight the fact that your manipulation occurs over time- like in ABABtakes time to do that
Types of replication
1) intrasubject- to assess reliability (given time confound)
2) intersubject- to assess generalizability
also have…
- distinction between systematic replication (introduces some extension or variation on
the orignial research- often to assess generalizability of the phenomenon) and direct
replication
Variability in baseline
- variability can be due to:
a) chance variation
b) carryover effects- can also lead to unstable baselines
- after you have established your baseline for the baseline phase and the intervention
phase you return to the baseline phase to eliminate the confound of time
- i.e. the kids in the grade 10 remedial math class seemed to pay attention more in the
intervention phase but this could also be due to the passage of time (more maturity,
better parental relations)- because of this possibility you go back to the baseline phase
(remove the treatment, intervention, etc.) and watch the baseline- if it returns to normal it
is unlikely that the changes occurred because of the passage of time
- this is when you use an ABAB design or just an ABA design
- when you go back to the baseline phase at any point this is called “reversal strategy”
- if the behaviour returns to baseline levels during the second baseline phase and then
returns to the previous treatment levels during the second intervention phase you can be
confident that the treatment (and not a time variable) caused the observed changes
Problematic baselines
- a good baseline has:
- little unsystematic variation
- no systematic changes with time once stable levels of performance are reached
- this is not always achieved
- problems:
1. unsystematic baseline variability
- if the variability is high within a phase this variability is caused by uncontrolled factors
- if you can’t bring the factors under control, you can deal with the variability by extending
the number of observations you make within a phase
- the more observations made  closer you are to “true” baseline
2. drifting baselines- baseline shows a trend (doesn’t go down as low as the first
baseline in reversal time)
Research Methods II Notes
71
it might be impossible to stabilize a baseline against slow, systematic changes (called
“drift”)
- if you can’t control the drift you can effectively subtract it out
- an example of drifting baseline would be when the baseline slowly drifts upward in an
intervention phase
- you can see effect of treatment if you allow for the drift
3. unrecoverable baselines (due to carryover effects)
- this is when baseline levels of performance cannot be recovered during reversal
- these changes are considered “carryover effects”
- you need to use special designs to deal with completely irreversible changes
- i.e. this happens when learning develops during treatment conditions (if you have an
ABAB type of design for example)
- i.e. rats and bar pressing  they are not naive to the fact that in the intervention phase
bar pressing got results so they will never go back to not pressing it like in baseline
phase
4. unequal baselines between subjects
- sometimes the baselines of different subjects can vary right from the beginning or in the
intervention phase (i.e. one rat may press the bar a heck of a lot more than another rat)
- even if they were given all the same conditions
- can result in different functional relationships
- with this you would have intrasubject replication but not intersubject replication
- you can differ the amount of treatment between rats to make them have the same
baseline
- you might then be able to achieve intersubject replication
5. inappropriate baselines- floor and ceiling effects
- e.g. add: want to see if diet disrupts attentiveness; if they only sit in seat 20% and you
want to see if food can make it worse; maybe they are already at the baseline, can’t go
any lower; or someone who sits in seat all the time and want to know if something can
improve- ceiling effect
- low baseline desirable if you expect the treatment to increase the level of responding but
undesirable if you expect the treatment to decrease the level of responding
- solution  adjust experimental conditions to produce the desired baseline levels
- many times you do ABAB studies- baseline, intervention, reversal, intervention again
- plotting every data point for both subjects- helps generalizability if using two subjects
- you can have failure to replicate small N-designs: the pigeon study
- the finegold diet: A-baseline, B-placebo cooke, C-artificial colouring in cookie
- this data worked
-
February 18, 1999
Small-n Designs: Terminology
-
terminology is not really standardized but we can still think of several things that are
important: like:
types of single-subject baseline designs:
a) single-factor (e.g. AB, ABA, ABAB)
- baseline condition (A)
- intervention condition (B)
- ABA design deals with time confound
- you can have multiple levels of the independent variable- “parametric design”
- but you can’t have completely counterbalanced order of treatments
Research Methods II Notes
within the same subject you would want to have transitions between close
values (A – B) of the independent variable as well as distant values (A – C) –
if you had A, B, C as levels
- you can also return to the baseline after each level to deal with drift
b) multi-factor designs (> 1 IV)
- when you have more than one independent variable- assess interactions as in
factorial designs
- because it is not subjected to statistical analysis, you can omit some cells of the
factorial matrix and this does not present any analytical problems
- you can just have data points at regular intervals instead of every possible
interval if the functional relationships between IV and DV follow regular patterns
c) multiple baseline designs (> 1 DV)
- these are used when variables cause irreversible changes in behaviour
- these designs simultaneously sample several behaviours within the experimental
context  get multiple behavioural baselines
- i.e. people who have two habits they wish to kick- get a baseline for each and
then start treatment- you then introduce treatment but apply it to only one of the
behaviours
- once one behaviour is stabilized at the intervention phase then you attack the
next behaviour
- it doesn’t matter which behaviours are treated first
- the design uses the untreated behaviour as a partial control for time-correlated
changes that could confound the results
- the behaviours should be independent of one another
another way of getting at reliability of data rather than generalizability
you introduce therapy at different times for three behaviours you want to get rid of- you
have three different baselines for different interventions which tells you that it really is
your intervention that is decreasing the behaviour (and baseline)
also used for systematic desensitization for phobias- used when you have gradual
introduction to thing they are afraid of; just say someone had four phobias- use multiple
baseline study-get a baseline for original, and only after the systematic desensitization
you get the different baselines (telling you that it is unlikely that her self-reported
reduction of fear is caused by anything other than intervention)- if it was the therapist for
example that made her less afraid, all of the baselines for the different types of phobias
would all change at once…
-
-
-
-
72
AB (ABA) Type Non-experimental studies
useful when you don’t time the start of the intervention yourself
it is a quasi-analytical study
example: Drunk driving in Michigan
if you change the legal drinking age, would you get more drunk driving accidents
legal age 21  18 in 1972, then 18  21 in 1978
% alcohol related traffic accidents: 15%  22% in ’72 decreased again in ’78
does a lower drinking age CAUSE alcohol related traffic deaths?
you must still watch for confounds?
Wider alcohol availability in ’72
Oil crisis in ‘78- 55 mph limit imposed in 1973/72 (people driving more carefully)
Other examples
1.) TV effects on children (Tannis MacBeth-Williams- social psychologist)
- very few studies done on this
Research Methods II Notes
73
Wilson’s beach in BC- before they got a satellite they had lots of organized and
unorganised physical activities and bingo games and dances, etc.- studied the
community before they came, and shortly after lots of the extra-curricular activities
stopped
- this was an AB design- didn’t turn off the satellite; helped us get an idea of the effects of
tv on children
2.) David Phillips Research (see overheads):
- motor vehicle accidents after publicized suicides: if people see a well-publicized
suicides, more accidents (suicides to get insurance?)- 3 days after (time to think?)
- airplane accidents after publicized murder- suicides (non-equivalent control groups)
- murder-suicides people who want to kill themselves; many people own planes in states;
private, business, corporate-executive, etc.- same as cars but you get more insurance
with airplanes
- also on day 3 (peak)- time to plan again?
- With non-experimental AB designs you always have to be careful interpreting results
- homicides after heavyweight prize fights:
- looked at whether JFK assassination caused more violent crimes
- if it is violence (murder, suicide): lets look at other things that are just violent (heavy
weight prize fights)
- when you measure things you can’t control, you can statistically control for them- people
are going to kill someone more on holidays, etc. on christmas and holidays
- when you see all three of them showing the same pattern, you become a bit
convinced…
-
Non-Equivalent Control Groups
- different than AB type non-experimental groups…
Mining safety study…
look at scanned graph
- an industrial psychologist
was called in to help out in
Lucky Friday Mine – it had
a higher rate of accidents
than other mines
- collected data on two other
mines other than the other
mines- the intervention only
showed on the Lucky Minenot the others; more
evidence for the
intervention
External and Ecological Validity
Stress and Cancer
looks at psychoneuroimmunology
Research Methods II Notes
-
74
ad lib means they can eat whenever they want
between subjects design- it would be multivalent (more than two levels of IV)
you also have yoked control- who is going to be more stressed?- if the person in charge
presses the bar no one gets shocked – if they fail then
it is the psychological stress of not having control- when they thought it was the people
in charge had more stress
Results
-
not being able to get away from itthe anxiety that we put them
under that could reasonably
account for data
Factors influencing external
validity
- you want to use animal models to
humans
- do they extend beyond the
specific conditions of the rats- we
aren’t really interested in whether
rats get cancer or not…
- external validity: beyond that experiment
Factors influencing external validity
Population sampled
- always be careful generalizing beyond
population studies
- look for converging evidence that there is
nothing importantly unique (e.g. rat vs
human physiology and susceptibility to
cancers)
- look at converging validity- many studies
all pointing in the same direction
- one thing will always be less convincing
than many together
- if there is, information is still useful
- sometimes unusual populations are sought out (HIV resistant individuals, spotted
hyenas- extreme examples of prostitutes in Africa)- 10 or 20 customers a day for 20
years, prostitutes never use condoms- had thousands of partners and they don’t have
aids
- hyenas- high testosterone, females more aggressive, clitoris as big as penis, give birth
through the elongated clitoris (high mortality rate)- they are bizarre animals and can tell
us about the effects of hormones on behaviour
Research Methods II Notes
75
Operational definitions
- look at construct validity- stress:
inescapable electric shock
- unavoidable shock- unnatural but
are effects unique?- look for
converging evidence
Operational definition of stress:
Phenomena associated with
stress
Parameter values
- values selected for each variable
- both independent and controlled
- i.e. food additives on ADHD kidsyou better administer the dyes
WITH FOOD so you can have external validity
Demand characteristics
- subtle cues in a research procedure that influence the participants
- if you said “man, this is really hard, etc., you might tell the people what you are looking
for
- serious problems in social sciences (characteristics of volunteers)
- can influence both internal and external validity
- students holding “poisonous” snakes throwing “acid” in another’s face
Ecological validity
- how generalizable are the experimental results to the specific set of conditions- those of
the natural context in which the phenomena usually occurs
Scientific explanations
- science is a self correcting process
- single research findings are seldom conclusive
- problem especially in applied areas where answers are needed quickly
March 2, 1999
-
-
motivation is a factor in learning another language
look at placebo effects in surgery- research being sponsored by the big funding
research; you need a placebo control group in surgical research; allows you to have
better scientific data
also do this with double-blind research
Stress and the nervous system
- things happen in your body if you read erotic material- blood pressure increase,
pancreas starts secreting hormones, etc.
- can also be frightening thoughts or things you are worried about (exams)
- we have wonderfully responsive response systems
- survival of zebra depends on response system – get serious sympathetic activity
Research Methods II Notes
-
76
pupils dilate
inhibits saliva
heart rate accelerates
digestion inhibited
sexual arousal goes
steroids released from adrenal gland
remember about type A personality
think about cardiac disease- heart attacks is the number one killer in the US
once you have heart problems- type A personality just as much of a problem as smoking
influence of social support
bombing of London- people that were getting bombed every night were less likely to get
stomach disorders than the people in the suburbs- weren’t bombed every night
you need a way of studying things when you can’t have the contorls, etc. from the
beginning
Ex-Post Facto (after the fact) Designs
- much less able to determine causality than true experiments but necessary and
important designs
- necessary for:
1. ethical reasons or
2. an interest in organismic variables (interested in something the person brings to
the study; i.e. gender)
- in either case- common to both reasons is the fact that you can’t randomly
assigns people to treatment conditions
Types of Ex-Post Facto Designs
- two main types
1) Prospective study- looks forward in time, predicting
2) Retrospective study- look backward in time
- in both cases, you are obliged to find naturally occurring groups (thus “after the fact”)
and follow them forward (prospective) or trace their histories (retrospective).
- problem: even if you randomly sample from those populations, they will be confounded
because the populations differ- it is unlikely that the variable you are looking at is the
only difference between the two groups
-
-
-
high stress job- we’ll use air
traffic controller
low stress job- clerks in a
store
in both- this is prospective
design: you wait and look
forward in time
just say you have 20 in one
group with cancer and 5 in
the other group after the fact
in solution 2, it is
retrospective- here you
select people after the fact
on the basis of variabletake people with cancer and
Research Methods II Notes
-
-
77
those without and ask about stress in lives
the air traffic controllers have certain other variables
VDT: video display terminals
Is the difference in cancer rate due to stress; we don’t know; it could be but it could also
be due to the smoking, VDT exposure, etc. even though we have random samples of the
groups
If you did an unethical study
here you could have causal relationships
Problems with Prospective ex-post-facto designs
1.) subjects are not randomly assigned to treatments
- there will be inherent confounds in the populations studied (this is the most serious
problem)
2.) sampling problems (often a convenient sample)
- may be impossible to identify all members of the populations
3.) dropouts in prospective studies
4.) detection bias
- these designs assume that detecting and measuring the variable will be the same for
both groups
- if you visit doctor more, you might be more likely to detect cancer earlier
- all of these influence the internal validity of the study but as well the external validity
- internal is what we are mostly worried about- trying to find out if there is indeed a causal
relationship
- external comes in when you have confounds, etc.
Partial solutions
(1) matching:
- two kinds:
a) subject for subject (preferable but difficult- finding people that are the same in every
way except for the independent variable- as the list gets longer for the criteria that
you have to match for, it gets harder to find people that will match)
b) distribution by distribution- you want the average age, amount of smoking, etc. to be
the same for the two groups- it is only the averages that are the same
- if the risk factors interact- maybe this could be an issue
- in both cases, you will selectively drop individuals and bias the sample further (if you
drop someone who is hard to match you have a biased sample)
- maybe you forgot to match for something
(2) measuring:
a) know if potential confounds (uncontrolled or extraneous variables)- you measure
things that are potential confounds and see if they are different between groups
b) to statistically control for these variables- even if you have confounded variables, if
they have been measured you can mathematically control for the variables
Retrospective studies
- these have additional problems in that they rely on memory so the partial solutions are
more difficult to employ successfully
- the advantage (over prospective designs) is that they are more efficient (cheaper and
faster); may be necessary with very rare groups or variables of interest (e.g. rare
diseases)
- note that even with measurement and matching, internal validity is still questionable
Research Methods II Notes
-
78
you don’t have dropout
Partial solutions for retrospective designs
- same problems, same (partial) solutions!
- matching and measuring
- they are much more difficult to employ!
- a real example would be a study of life events and the occurrence of cancer of children
- t.j. jacobs & e. charles (1980)
- retrospective ex post facto design
- case studies suggested experiencing separation or loss related to childhood leukemia
- method: semi-structured interview with children and parents
- had control group (“variety of physical complaints draw from a medical facility”)- sick kids
but not sick with cancer
- relates to specificity of hypothesis
- they matched on age, sex, SES
- measured:
- detailed medical history
- personality measures
- relationships measures
- psychological history
- marriage assessment
- changes in life one year
prior to disease onset
Characteristics of samples
(matched variables)- used
subject by subject matching
Measured variables in
samples
-
these are already
confounds (with
the little square)
Research Methods II Notes
Holmes & Rahe: Social Readjustment Rating scale
- have rank order
Paired data on LCU measure
kids actually did have more stressors
Frequency of life-events by group
- had more stressful events in the group of kids with cancer
D.V.s used is ex-post-facto studies
- two special dependent variables used in these studies
a) relative risk ratio (prospective studies)
- it is a dv that you can study from prospective studies only
- illustrated by breast cancer data
b) relative odds ratio (approximates the relative risk)
- only calculated in retrospective studies
- you must be careful
- If you are only presented with the ratio you don’t know what the numerator and
denominator were
- problem with both is that absolute risks are hidden
- both absolute and relative risks should be reported
Calculating relative risk ratio
- prospective ex post facto design
-
convert it into a contingency table
-
relative risk ratio is the risk of
developing cancer in high stress
group over the risk of developing
cancer in the low stress group
you will get a 4 here: indicates that
the risk of
someone
with a highly
stressful life
developing
cancer in the
future is four
times that of
someone
with a low
stress
lifestyle
-
79
Research Methods II Notes
Hypothetical data with same relative risk but very different absolute risks
Here you get a 5
- it was interesting that there were
just as many men as women that
developed toxic shock syndromeyet, they targeted tampons
80
Research Methods II Notes
-
-
-
81
if you only give the relative risk ratio you don’t get the whole picture
in both of these cases you get a number of 5- it hides the absolute risk
Calculating relative odds ratio
retrospective ex post facto design
the math has to be a little different- not a representative sample; half of the world does
not have cancer yet half of your study is cancer patients
relative odds ratio = 4.7
this is the odds of people in the cancer group reporting high stress in their past
compared to their odds of reporting low stress in their past. The relative odds ratio
provides an estimate of the relative risk ratio
know how these are used, when used, how calculated
Cautions about D.V.’s used in ex-post facto studies
- statistically, both
measures assume
random sampling (unlikely
or impossible)
- both the relative risk and
the relative odds ratios
hide absolute risks
Lifetime risks of developing
breast cancer
-
-
some of the types of
cancer- early detection in
some cancers was the
argument being made for
the fact that people
seemed to be getting
cancer
it seems to be that
exposure to estrogens is
a risk factor for cancer
Putting risk of breast cancer
in perspective
- one women in 9 in whom
breast cancer will
develop, has a 50%
chance of receiving the
Research Methods II Notes
82
-
diagnosis after age 65 and a 60% chance of surviving that cancer and dying of other
causes
risk of breast cancer in any given year never exceeds 1 in 34 (at 30 it is 1 in 250)
-
most of us will die from cardiovascular disorders
March 4, 1999
-
we will be looking at the comparison between prospective and retrospective data
volunteers
review for midterm
How many believe their moods vary with:
- days of the week.?
- Lunar cycle?
- some health care workers say they will be extra busy on days with a full moon
- Menstrual cycle?
- How would you study this?
- psychologists are interested in moods- it is an area of study
- you could do either a prospective or a retrospective study
- an experiment isn’t available to you
Raging female hormones in the courts
Macleans, June 15, 1981
Research Methods II Notes
-
-
83
if a woman was premenstrual she was given lenient sentencing for things even as bad
as murder
treatment for PMS ordered as stabber put on probation (Globe and Mail, Feb 10, 1987)
women’s violence blamed on period (Toronto Star, August 25, 1978)
this is interesting for women’s groups- could they wait till they are premenstrual or fake
other symptoms in premenstrual
women have lost custody because men say their ex’s suffer from wild mood swings
because of premenstrual- isn’t always to the advantage of the women
woman’s syndrome brings leniency
Politics of PMS
1929 term “premenstrual tension” Dr. Robert Frank
since 1970’s Dr. Katherine Dalton- says women are victims of “raging hormones”; she
prescribes progesterone therapy
estimates of prevalence rates go from 6-95% (it either exists in no one or everyone it
seems!; bad operational definitions?)
we aren’t talking about premenstrual dysphoric disorder- just the normal pms
150 somatic and psychological symptoms associated with pms
according to some, PMS is a social and political construct- there is no such thing in other
cultures; it seems to be an invention of western societies
Hormonal changes over menstrual cycle
- sharp drop in progesterone when you first have your period is thought to be by some the
cause of pms
Research Methods II Notes
84
Criteria for premenstrual dysphoric disorder
A: present during last week of luteal phase, remit within days of follicular phase onset and
absent week postmenses
(1) markedly depressed mood, feelings of hopelessness, or self deprecating thoughts
(2) marked anxiety, tension, feelings of being “keyed up” or “on edge”
(3) marked affective lability (feeling suddenly tearful or sad or increased sensitivity to
rejection)
(4) persistent or marked anger or irritability or increased interpersonal conflicts
(5) decreased interest in usual activities (work, school, friends, hobbies)
(6) subjective sense of difficulty in concentrating
(7) lethargy, fatigue, marked lack of energy
(8) marked change in appetite, overeating, or specific food cravings
(9) subjective sense of being overwhelmed or out of control
(10) other physical symptoms such as breast tenderness, headaches, joint or muscle
pain, headaches, “bloating”, weight gain
Criteria B  D
Research Methods II Notes
-
85
D is interesting because you have to use prospective data- retrospective data will not
due
Jessica McFarland did a study on “women vs men and menstrual versus other cycles”
(mood fluctuations)- comparing retrospective and prospective data
McFarland et al.
- Methodological problems with existing literature:
- demand characteristics (expectations of participants, bias, volunteers)- know
what the researchers want
- measured negative moods (truncated range)- sometimes the only possible
things they could talk about was grumpy or negative moods; didn’t allow them
to talk about happy moods
- early ones all based on retrospective reports- remember two weeks ago on
Monday, how did you feel?
- no control groups- you could compare against contraceptive users
- no assessment of the “normal” range of moods- normal has always been
men;
Research Methods II Notes
do women show the classic mood fluctuations and are they less normal than
men?
her subjects were deceived- told them that she was concerned with emotional, physical,
and behavioural patterns- they were blind to the study
followed participants for 70 days to get two cycles
had three groups- normally cycling women, women on contraceptives, men (men
variations on the month are good controls)
she used a mood grid-
-
86
Mood pleasantness
graph
there seemed to be
significant differencesbut not the classic way;
this data showed that
they were happier than
the rest of us during the
follicular and menstrual
phase
Research Methods II Notes
87
Arousal levels
the same women- difference
between how they remembered
they felt and how they actually
felt
- they reported a drop
premenstrually that wasn’t really
there
- difference for arousal and mood
pleasantness
- see two graphs
here we see a retrospective bias
- also did it for days of the weekseems to be more variability in
the days of the week than in the
menstrual cycle
-
-
important to realize that these
women didn’t show it does not
mean that some women do
not show the pattern
(dysphoric disorder pattern)
Data on prevalence of PMS
symptoms
- in surveys, most women report
being more emotional
premenstrualy
- with prospective studies, most
women do not show any
relationship between mood
and “time of month”
- of those who report PMS
symptoms, only 50% actually
have these mood fluctuations
- subsequent studies have
shown that significant positive
correlation between a
women’s belief in PMS
prevalence and the extent of
her retrospective bias- the
women who believe that it
exists there will be more of a
different between your
prospective and retrospective
data
- this shows biases
Research Methods II Notes
88
Characteristics of Volunteers
From text book pp. 122-128
- ethical treatment says  participants must be informed of the nature, purpose,
requirements of your study
- must be given opportunity to decline participation
- are there differences between people that agree to participate in research?
- Volunteer bias: validity of experiment can be affected if you have a sample made up
entirely of volunteers
- Issue is whether volunteers differ in meaningful ways from non-volunteers
- Do any differences affect the “external” validity of research?
- Do any differences affect the “internal” validity of research?
Characteristics of Research Volunteers
- you can have maximum confidence that volunteers…
1. tend to be more highly educated than nonvolunteers
2. tend to come from higher social class than nonvolunteers
3. are of higher intelligence in general, but not when volunteering for atypical research
(such as hypnosis, sex research)
4. have a higher need for approval than nonvolunteers
5. tend to be more social than nonvolunteers
- know these characteristics for exam
- other characteristics tend to be true but not always: volunteers…
1) are more “arousal seeking” than nonvolunteers (especially when the research involves
sex)
2) are more likely to be females
3) are more unconventional than nonvolunteers
4) less authoritarian than nonvolunteers
5) Jews more likely to volunteer than Protestants, however Protestants more likely than
Catholics
6) have a tendency to be less conforming than nonvolunteers except where volunteers are
female and the research is clinically oriented
- looking a volunteers and
nonvolunteers in people who were
profraternity and antifraternity
- had 42 undergraduate women
who were given an attitude
questionnaire about college
fraternities
- had volunteers and nonvolunteers
- a week later the women were
randomly assigned to either hear
a profraternity, antifraternity, or
neutral talk
- the volunteers were more affected
by the antifraternity
communication than
nonvolunteers- they have higher
need for approval; tend to see
experimenter as being
Research Methods II Notes
-
89
antifraternity themselves; more motivated to please experimenter- this might have
caused the observed attitude change and not the content of the persuasive measure
this affects internal validity of the experiment (the aspect of the differences between
volunteers and nonvolunteers
Rosenthal & Rosnow, 1975
From Horowitz
- looked at
psychedelics
- graph “from horowitz”this is a 2 x 2 x 2
design
- if you give people
negative info about
drugs, they become a
little change in
attitude; big scarebigger change in
attitude
- volunteerism may not
only affect the internal
validity of any
research, but can also
affect the ability to
generalize (external
validity)
- Horowitz: looked at the relationship between the level of fear aroused by a persuasive
communication and attitude change- also examined the impact of fear arousal along a
second variable (whether they had volunteered or not for the study)
- High fear group: read pamphlet; saw film  affected volunteers more than
nonvolunteers
- Low fear group: only read pamphlet  affected nonvolunteers more than volunteers
- got a bigger change with low fear than high fear in the nonvolunteers  could this be
because they were mad at being forced to participate in the study?
-
Remedies for Volunteerism (p. 127-128)
make the appeal interesting
make the appeal nonthreatening
state the importance of the research
state why the target population is relevant
pay them. Gifts.
have request come from a high status (preferable female) person
avoid stressful research
communicate the normative nature of participating
make the appeal come from a known person if possible
depending, commitment to volunteer might be better made privately or publically
Research Methods II Notes
90
Midterm review
- on powerpoints
- know mcfarland study- talk about how it illustrates problems with retrospective studies
Midterm Review
• ANOVA
• Factorial Experiments: Advantages
• Factorial designs: terminology
– #levels IV1 x #levels IV2 x #levels IV3 etc.
• Interpreting data from 3 factor experiments and/or experiments with three or more levels
of one or more IV (e.g., in class
• Ex-post-facto designs: prospective and retrospective designs - advantages
disadvantages
• Problems & Partial solutions:
Midterm Review
• DVs used in Ex-post-facto studies
• Problem with both in that absolute risks are hidden, both should be reported
• Time-series designs, small-n designs
– A-B studies (e.g., homicides after prize fights, JFK’s assassination, TV effects etc.)
– multiple baseline designs
• internal validity
– non-equivalent control group
– replication within-subjects
• generalizability
• Assessing external and ecological validity
• Volunteers
Research Methods II Notes
91
March 11, 1999
-
-
-
Correlations and Regressions
in some cases you want to evaluate the direction and degree of relationship (correlation)
between scores in two distributions
for this you must use a “measure of association”
remember Pearson product-moment correlation coefficient or “Pearson r”  used when
dependent measures are scaled on interval or ratio scale (provides index of direction of
the relationship between two sets of numbers)
can be direct relationships or inverse relationships (+ or -)
magnitude tells you degree of “linear relationship” (straight line) between two variables
factors that affect the magnitude and sign of Pearson correlation coefficient:
- range of scores
- presence of outliers
- shapes of score distributions
can use correlation/regression with either manipulated predictor variables or natural
variation
you can use it to look at true experimental data; isn’t only used in quasi-experimental
studies
correlational statistics can be applied to any type of design (including experimental)
correlational design occurs when we do not randomly assign participants to the level of
either variable- i.e. levels of variables are not manipulated
Steps in conducting correlational designs
- these are quasi-analytic experiments:
1) select population and subjects of interest
2) measure variables of interest (at least two)
3) calculate the extent to which the variables are systematically related
once you have measured your data it is a good idea to plot it (get a scatterplot)
- predictor (assumed
causal or IV) variable
on abscissa (X-axis)
- criterion or DV on
ordinate (Y-axis)
- you “regress Y on X”
(predict Y from X)
- you should always
look at the
scatterplots
Pearson’s product
moment correlation
coefficient
- the basis of almost all
statistics including
ANOVA
- appropriate for
interval or ratio data
only- it is like a
-
parametric test
measures the direction and degree of association
Research Methods II Notes
92
Correlational designs: concern
- Pearson’s r (based on means)
is very sensitive to the
presence of “outliers” (not
Spearmans, etc.)
-
-
-
-
-
sensitive to
heteroscedasticity (rxy relationship may vary across levels of X), and
can be biased by having a “restricted range”
combining group data can also influence the size of the correlation (in either direction)
examine scatterplots to detect these potential problems!
http: //www.ruf.rice.educ/~lane/stat_sim/comp_t/index.html as a reference
as the slope changes, the blue
box gets bigger in either directionmore variability able to be
explained by the relationship and
less variability unable to be
explained by Y
when you have a perfect
correlation, you will have no
unexplained variability (when dots
are not on the line)
having a restricted range is not
good- just the presence of one
outlier can make a relationship
Research Methods II Notes
-
-
-
-
93
nonsignificant
correlation is a measure
of the average
relationship between the
two variables
homoscedasticity is also
if you were to take
graphs showing
relationships between the
two- heteroscedasticity
shows less variance for
one of them
if you combined them it
would be a significant
negative correlation even
though the groups don’t
have those correlations
this shows that
sometimes combining
group data can alter your
correlation in other ways as well
Other diagrams:
Statistical inference
- there are tables of critical values (overhead)
- for a given sample size, larger the absolute value of r, the less likely it is to have
occurred by chance
- bigger numbers less likely to have occurred by chance
- for a given value of r, the larger the sample, the less likely it was to have occurred by
chance- you can have significance without something being meaningful
Research Methods II Notes
94
Power
- the power of a correlational design is increased by:
- minimizing error variance
- avoiding restricting the range of scores
- increasing the sample- if you want to make a big deal out of nothing you need a big
sample – i.e. women that have smaller size brains than men (14, 000 participants)
- with very powerful correlational designs (many subjects) significant, yet probably
meaningless results can be found
- e.g. correlation between height and IQ is about r=.1 and this was found to be statistically
significant [study based on 14, 000 children: with N=102 (100 d.f.) r +- 16, p<.05]
Statistical inference
- r2 (coefficient of determination)= estimate of the proportion of variance shared by the two
variables; extent to which the co-vary. (can be
used as a measure of effect size)
- it is the square of the correlation coefficient
- i.e. if variation in score x actually caused
variations to occur in score y, the coefficient of
determination would indicate what proportion of
the total variance in score y was caused by
variation in score x
- 1-r2: coefficient of nondetermination (also called
coefficient of alienation or error variance)
- continuing about coefficient of nondetermination:
- gives proportion of variance in one variable “not
accounted for” by variance in the other
variable- the “unexplained” variance caused
by unmeasured factors (if this is large your
measured variables are not having a lot of
impact on each other
- have a circle showing total variance in x and
another showing total variance in y
- no shared variance- no overlap
- other examples of venn diagrams
-
90% of variance shared  pretty much
overlapped
Drawing conclusions from correlational designs
- all the same concerns as with experiments
(valid, reliable measures, etc.)
- additionally, have concerns with
directionality, and there are usually many
potential confounds (uncontrolled
extraneous variables- the 3rd variable
problem)- i.e. the lice and health example
(turned out that when you are sick your body
is too hot for lice to live on; fever)
- sometimes it’s hard to know if they are
significant or not
Research Methods II Notes
-
95
causality can not be inferred
correlational designs can be used to:
discover relations
to solve ethical and practical problems
to provide greater external/ecological validity
Linear Regression
- linear regression looks at correlation in terms of predictability
- r2 is a measure of the proportion of variance in Y accounted for (predicted by) X
- idea behind “bivariate linear regression” is to find the line that best fits the data plotted
on a scatterplot:
Y’= a+bX [or X’=ax+bxY]
Y’ = predicted score of Y
b = slop of regression line (or “regression weight”)
X = value of x-variable
a = y-intercept
- this is called the “least squares” regression line
- this straight line is the one that minimizes the sum of the squared distances between
each data point and the line (as measured on the y-axis)
- minimizes the sum of squared deviations, sum of deviations between predicted values of
y’ and actual observed values of y =0.
- in other words, at any given value of x found in the data, the position of the line tells you
the predicted y according to the linear relationship- you can then compare these
predicted values to the actual values- best fitting straight line minimizes these
differences between predicted and observed values (as it would fit the points the best)
- the deviations are called “residuals” (residuals will be low when the regression equation
predicts scores on Y)
- using your line, you have values of x and then predicted values of Y- but since you have
your scatterplot you also
have your actual raw scores
of Y- you are in a position to
see how accurately your
regression equation predicts
scores on Y (difference
between the values of Y and
Yhat (Y – Yhat) is called a
“residual”)
- you will have lower residuals
when regression equation
generates values of Yhat that
are close to actual values of
Y
- unless variables perfectly
correlated, you will get a
“standard error of estimate”
- if regression based on raw
scores, “regression weight” is known as “raw score regression weight”. If you use
standardized scores you will obtain a different regression equation and have
“standardized regression weight” or “beta weight”
- not that if you were to then plot the residuals Yres against X, there would be no linear
relation, then correlation would be 0
Research Methods II Notes
96
the linear regression line,
Y’=a+bX can be thought of
as the straight line that
summarizes the linear
relationship in a scatterplot
by, on average, passing
through the average of the
Y scores for each X
- when variable perfectly
correlated- no error in
prediction
- however, when your
correlation is less than
perfect there will be error in
predicting Y from X
(estimate amount of error in
prediction by calculating
“standard error of
estimate”)
Lab2
- it takes time to rotate objects
- we want to look at things in the manner we are used to
- to mental images work the same way?
- measured time
-
Linear regression: - looks at correlation in terms of Predictability,
Linear regression finds the best fitting line: Y'=a+bX [or X'=ax+bxY]
This is called the least squares regression line [where Y' is the predicted value of Y] -minimizes the sum of squared deviations, sum of deviations between predicted values of y'
and actual observed values of y =0. these deviations are called residuals. [Note that if you
were to then plot the residuals Yres against X, there would be no linear relation, the
correlation would be 0].
[The linear regression line can be thought of as the straight line that summarizes the linear
relationship in a scatterplot by, on average, passing through the average of the Y scores for
each X.]
0):
1) Every participant who obtained a given value of X obtained one, and only one value of Y:
there are no differences in Y scores for a given X
2) Y scores are perfectly predictable from X scores: the data points for a given X are all on
top of one another and all data points fall along the regression line.
For intermediate correlations:
1) There are different values of Y for each X, however these different Ys are relatively close
in value (the variability in Y associate with a given X is less than the overall variability in Y)
2) knowing X allows prediction of approximately what Y will be: data points will fall near the
regression line but not on it.
Research Methods II Notes
97
For zero correlation:
1) Y scores are as variable at a given value of X as in the overall sample
2) The best prediction of Y, regardless of X will be the average of Y and there will be no
regression solution.
using standard scores and 2 variables [1 IV], regression coefficient (b) [or raw score
as the correlation grows less strong, Y' moves less in response to a given change in X, (the
slope, b approaches 0). If standard scores (z scores) are plotted, the slope of the least
squares regression line = r [r= change in S.D. units in Y' (the predicted value of Y)
associated with a change of 1 S.D. in X. If r=0, best predictor of Y from X is the mean of Y,
and the best predictor of X from Y is the mean of X. If r=
regressing Y on X and the regression line from regressing X on Y is the same (and passes
through the point (mean of X, mean of Y). As the correlation between X and Y weakens, the
predicted value of Y' for a Zx=1 will be Zy'<1 and the predicted value of X' for a Zy=1 will be
Zx'<1. The regression lines predicting Y' from X and X' from Y diverge with decreasing
correlation until at r=0.0, they are perpendicular: horizontal and vertical lines passing
through the means of Y and X respectively. This can lead to regression artifact (e.g.,
Rushton: women less brainy then men.
And remember cautions - same as for correlations: assumes linear relations among
variables, truncated ranges can reduce correlations or regressions, outliers,
heteroscedasticity, etc.
Correlations and Regressions
Can use correlation/regression with either manipulated predictor variables or natural
variation.
Correlational statistics can be applied to any type of design (including experimental) but a
correlational design occurs when we do not randomly assign participants to the level of
either variable - i.e., levels of variables are not manipulated.
Quasi-analytic experiments: Steps in conducting correlational designs
1) select population and subjects of interest;
2) measure variables of interest;
3) calculate the extent to which the variables are systematically related
Pearson's product moment correlation coefficient (for Interval or ratio data) measures the
direction and degree of association. r is the mean of z-score cross-products: r=
xZy)/N,
the extent to which deviations from the average on each measure are similar for each
subject sampled.
Research Methods II Notes
98
r2 (coefficient of determination) = estimate of the proportion of variance shared by the two
variables; extent to which they co-vary. (Can be used as a measure of effect size.)
1-r2: coefficient of nondetermination (also called coefficient of alienation or error variance)
Statistical inference: for a given sample size: larger the absolute value of r, the less likely it
is to have occurred by chance, similarly, for a given value of r, the larger the sample, the
less likely it was to have occurred by chance. The power of a correlational design is
increased by minimizing error variance, avoiding restricting the range of scores, and
increasing the sample.
Pearson's r (based on means) is very sensitive to the presence of outliers,
heteroscedasticity (rXY relationship may vary across levels of X), and can be biased by
having a restricted range. Combining group data can also influence the size of the
correlation (in either direction). So: examine scatterplots to detect these potential problems!!
Visual inspection of data
Graph data (scatterplot): predictor (assumed causal or IV) variable on abscissa (X-axis) and
criterion or DV on ordinate (Y-axis)
March 16, 1999
Calculating I.Q. and GPA Correlation
- using cross product as an
average of correlation
-
-
-
-
-
we are looking for
relationship between x and
y
the Z thing says that it is
1.45 times below the
standard IQ- we also have
Z scores for the y variable
if you were to plot it, you
can see the correlation- the
regression line is drawn on
there, given by .0593x1.5885
look at just the data we
have- use a truncated
range:
you can see the correlation and the
best fit least squares regression line
Statistical inference
- review r2 and 1-r2
Research Methods II Notes
-
99
coefficient of determination is an estimate of the proportion of variance shared by the
two variables
review venn diagrams
Drawing conclusions from correlational designs
- all the same concerns as with experiments (valid, reliable measures, etc.)
- additionally, have concerns with
directionality, and there are usually
many potential confounds
(uncontrolled extraneous variables- the
3rd variable problem)
- causality can not be inferred
- correlational designs can be used to:
- discover relations
- to solve ethical and practical problems
- to provide greater external/ecological
validity
Linear Regression
- we are looking at correlation in terms
of predictability
- r2 is a measure of the proportion of
variance in Y accounted for (predicted by) the amount of variance in X
- remember that we use the least squares regression line
- linear regression finds the best fitting line: this is called the “least squares” regression
line  Y’= a + bX [or X’= ax +bxY]
- minimizes the sum of squared
deviations, sum of deviations
between predicted values of Y’
and actual observed values of y=0
- these deviations are called
“residuals”
- note that if you were to plot the
residuals Yres against X, there
would be no linear relation, the
correlation would be 0
graph again:
-
-
values are above and below the
line- yellow arrows- if you take the
residuals (the differences) and plot
them against the predictor- you
will not have any relationship- you
have wiped out the linear
relationship; basis for how we can
use measured variables to control
them statistically by removing them
if you plot the residuals against
predictor and do a regression it
should be zero
Research Methods II Notes
-
see graph:
there is no linear regression line
Linear Regression
- it is the straight line that summarizes the linear relationship in a scatterplot by, on
average, passing through…
Formulas
- using standard scores and 2 variables
(1 IV), regression coefficient (b) [or
raw score regression weight] =
standardized regression weight (or
beta = correlation coefficient (r)
- for standardized scores, the slope is
equal to the correlation
Implications of formulas
- the correlation coefficient (r) is going to
tell you the change in standard
deviation units in Y’ (the predicted
value of Y) associated with a change
of 1 S.D. in X
- if standard scores (Z-scores) are
plotted, the slope of the least squares
regression line = r
graph:
-
when you convert the data to
standardized scores the regression
correlation and the correlation
coefficient are the same
100
Research Methods II Notes
101
For perfect correlations (r==-1.0)
1) every participant who obtained a given value of X obtained one, and only one value of Y:
there are no differences in Y scores for a given X
2) Y scores are perfectly predictable from X
scores: the data points for a given X are all
on top of one another and all data points fall
along the regression line
Regression Lines: r = 1
-
if you know one of the values, you can
predict the other variable perfectly with
perfect correlations
For intermediate correlations: 0<r<1
1) there are different values of Y for each X,
however these different Ys are relatively
close in value (the variability in Y associate
with a given X is less than the overall
variability in Y)
2) knowing X allows prediction of
approximately what Y will be: data points will
fall near the regression line but not on it
Regression line: 0<r<1
-
the best fit regression line for predicting x from y diverges from the best fit regression
line for predicting y from x- both lines go through the means, however
For zero correlation: r=0
1) Y scores are as variable at a given value of X
as in the overall sample
2) The best prediction of Y, regardless of x will
be the average of Y and there will be not
regression solution (i.e. if we were told a
student’s IQ and said that there was a
correlation of 0 you couldn’t predict gpa
except that they are more likely to be average
than nonaverage)
Implications of formulas
- as the correlation grows less strong, Y’
moves less in response to a given change in
X, (the slope, b approaches 0)
Research Methods II Notes
-
-
if r=0, best predictor of Y from X is the mean of Y, and the best predictor of X from Y is
the mean of X
if r=+-1.0: then the regression line from regressing Y on X and the regression line from
regressing X on Y are the same
as the correlation between X and Y weakens, the predicted value of Y’ of a Zx = 1 will be
Zy’<1 and the predicted value of X’ for a Zy = 1 will be Zx’<1
the regression lines predicting Y’ from X and X’ from Y diverge with decreasing
correlation until at r = 0.0 they are perpendicular: horizontal and vertical lines passing
through the means of Y and X respectively
this can lead to regression artifact: i.e. Rushton- women less “brainy” than men
question for the exam: suppose we have a mean x score of 8 and standard deviation is
equal to 2, mean y score = 10, sd = 4
correlation between the two is perfect and positive, and if x=6 and Y= ?
standardized score value for the X is –1 (one standard deviation below the mean) our
correlation will be one standard deviation below the mean)
if the correlation was negative, you would have Y going up (one standard deviation
below the mean)
i.e. xmean = 8, sd = 4, ymean= 10, sd = 3, correlation (square root of xy)= +.7 and if x
was 4 what is y?
if correlation was 1- y would be 7
if correlation was 0- y would be 10
real answer will be somewhere in between there- 7.9 (.7x3) minus ymean
the linear relationship is very unlikely to have
occurred by chance
- we have two lines that describe 1000’s of
data points that say that brains seem to
be bigger- control for body size
- so you can take any body size you want
and take the regression line and see how
big her brain will be, take same body size
and predict male brain
-
-
-
-
-
102
for any given brain size, men should have
smaller body size than women- if you
regress body size on brain size
with correlations that are not perfect- the
two regression lines are diverging; try
it both ways:
so, for a given brain size, women
have smaller bodies- they are
potentially more brainy
what is going on?
as correlations weaken, predictions
tend to go to mean- tend to go to
average (if someone is 7 feetaverage brain most likely not huge)
it is just the fact that the relationship
is not great
Research Methods II Notes
103
Cautions for regression data
- same as correlations:
- regression assumes linear relations
- truncated ranges
- outliers heteroscedasticity
- combining data from different groups
- also, (if a correlational design)
1) subjects not randomly assigned
2) no attempt (in correlation designs) to control variables
3) different levels of the IV are not contrasted while concurrently holding other variables
constant
Correlation versus ex-post-facto designs
- these are very similar and you can convert one to the other
- e.g. assign dummy coding to the categorical (nominal- 1’s and 0’s) variable (if there is
one) and calculate a “point-biserial correlation” coefficient
- interpretation problems are not related to the statistical choice, rather due to the design
Correlation versus ex-post facto designs
These are very similar quasi-analytic designs and it is possible to convert one to the other
[e.g., assign dummy coding to the categorical (nominal) variable (if there is one and it has 2
levels) and calculate a point-biserial correlation coefficient instead of doing a between
groups t-test]
Interpretation problems are not related to the choice of statistical analysis, rather they
are due to the nature of these designs
Remember that unlike true analytic experiments: 1) subjects are not randomly assigned, 2)
there is no attempt (in correlational designs) to control variables, and 3) different levels of
the IV are not contrasted while concurrently holding all other variables constant.
Drawing conclusions from correlational designs -- we have all the same concerns as with
experiments (valid, reliable measures, etc.) but in addition, have concerns with directionality,
and there are usually many potential confounds (uncontrolled extraneous variables in
correlational designs - the 3rd variable problem).
Although causality can not be inferred from a single correlational design, correlational
designs can be used to discover relations, to solve ethical and practical problems, and to
provide greater external/ecological validity (by being more easily applicable outside
laboratory settings)
Causation is not a simple concept. To infer it from correlational studies, we want to
have:
1) an association between variables that recurs in different contexts (replication,
convergent evidence),
2) have a plausible explanation showing how the predictor variable could cause the
criterion variable, and
3) have no equally plausible 3rd variable that could cause the variance in the criterion
variable.
While correlation doesn't imply causation, causation does imply correlation
Research Methods II Notes
104
Death sentences for murder in southern U.S.
- example for problems with combining data in group data, etc.
-
-
-
the white man who lynched the
black man was sentenced to
death
there is a paradox- white’s more
likely to be sentenced, yet for
both black victims and white
victims, blacks more likely to be
sentenced to death- but if you
combine them together than
whites are more likely to be
sentenced to death
if you look at victims race- in
terms of sentencing, murdering
a black person is a less serious
crime than murdering a white
person- victim data
Paradox
- whites are more likely to be
sentenced to death than are
Blacks once convicted of murder
- yet for both black and white
victims, blacks are more likely to
be sentenced to death
Explaining the paradox
- how does this help us explain the
paradox?
- victims race is a confound
- people tend to murder members
of their own race
- whites are more likely to murder
whites and this is treated as a
more serious crime, at least in terms of the death penalty
- relative risk ratio = (30/214)/ (6/112) = 2.6
- murders are 2.6 times as likely to be sentenced to death for killing a white vs. a black
victim
Simpson’s paradox- 2nd example
- classify two groups with respect to the incidence of one attribute
- if the groups are then separated into several categories or subgroups  the group with
the higher overall incidence can have lower incidence within each category or subgroup
- there is a negative correlation between starting salary for people with economics
degrees and the level of degree they have obtained (i.e. PhD’s earn less than M.A.’s,
how earn less than BA’s)
- does this make sense?…no!
Research Methods II Notes
-
-
105
break down this data in terms of the type of employment (industry, government,
teaching).
In every type of job: private industry, government, or teaching: there was a positive
correlation between degree and starting salary
employment selection is the confounding third variable influencing these resultsteachers get paid less than government workers who get paid less than those in private
industry
people with higher degrees are more likely to end up teaching and those with B.A.’s are
very unlikely to be teachers
this is similar to the white, black data
these are all examples of the danger of combining data from several distinct groups (with
respect to the relation between two variables) in calculating correlations
maybe it is that the people of different degrees were choosing different professionsones with ph’d’s were more likely to be teachers?
one way to have avoided the initial erroneous correlation would have been to use
“stratified” sampling
if equal numbers of people are sampled from the categories, the overall relationship will
be an average of the relations in the subcategories
Notes from text about sampling:
- at the heart of all sampling techniques is random sampling
- every member of the population has an equal chance of appearing in your sample
- other methods include
1) Stratified sampling
- divide population into segments or “strata”
- next select a separate random sample of equal size from each stratum- because
individuals are selected from each stratum, you guarantee that each segment of the
population is represented in the sample
2) Proportionate sampling
- variant of stratified sampling
- problem with stratified sampling is that it may lead to certain groups being overrepresented in the sample (e.g. consider a community of 5000 that has 500 Hispanics,
1500 blacks, 3000 whites- and you randomly select 400 people from each segmentHispanics would be overrepresented
- in this, the proportions of people in the population are reflected in the sample- you would
sample so that there would be 10% Hispanics, 30% blacks, 60% whites just like it is in
reality
3) Systematic sampling
- sampling every kth element after a random start
4) Cluster sampling
- surveying all people within certain clusters
- this is used if populations are too large to allow cost-effective random sampling or even
systematic sampling
- you identify naturally occurring groups and randomly select certain clusters (like one
class within a school)
- advantage: saves time
- but limits sample to those participants found in the clusters
5) Multistage sampling
- First stage: identify large clusters and randomly select from amoung them
- Second stage: randomly select individual elements
- for all of these sampling procedures, you must take sample size into consideration
Research Methods II Notes
106
-
you want an “economic sample”: one that includes enough participants to ensure valid
survey but no more
take into consideration the amount of acceptable error and expected magnitude of
population parameters
sampling error: deviation of sample characteristics from those of the population
-
survey research used to evaluate behaviour and attitudes of participants
falls into the category of correlational research
cannot draw causal inferences
-
Data analysis
- doing two simple regressions- one with each dependent variable
- have to find means for the variables
- within-subjects design- use compare means – choose dependent ones
- give you a table with all means
Simple regression
- use regression equation to predict the outcome of y from score on x
- y=bX +a
- Y’ is the predicted value of Y
- b is the slope (amt of change in Y for a unit change in X)
- a is the y –intercept
- things to note on spss
- check that the angle of rotation is a significant predictor
- relationship is linear
- R square (proportion of variance accounted for)
- (Constant) is your y intercept
- how to decipher spss output:
- what do we want to do first: you have anova and the t- these are both the same, you
only have one iv and one dv in each one
- first see if it is significant
- now we want to look at specifics- y intercept and slope; the intercept is the number
under b in the first one, the slope is the number under that is under that
- r2 value is the R Square value- 6.1% variability predicted by variability in the other
variable- i.e. angle of … is accounting for 6.1% of the variability in reaction time
- what else could account for the variability
- it is significantly predicting 8.1% of the variability in the dependent measure
- error is accounting for other %
discussion: did median RT increase with increased angle of rotation? My much?
What about errors?
Findings consistent with literature?- not causal relationship, but predictive
Method: see handout for details, describe program used, type of stimuli presented,
procedure followed
- two dependent measures- error and reaction time
- ideas for future research
- if you add up the error % and the rotation % it is about 15% and the change in the
numbers is less than 15%- so the two combined account for all the variance
Research Methods II Notes
107
Class 20
•
•
•
•
•
-
-
-
-
Sampling techniques
Part & Partial correlations
Multiple correlation and regression
Types of Multiple regression
Return Midterms
Mental Rotation: Shephard & Metzler
it seems that we are hard
wired
the 20% improvement might
be because with a lot of
practice you know exactly
what to do, whereas with no
practice you might have to
wait and make a decision
first
this was done a long time
ago- used statistiscope
they don’t present error
data- i.e. speed-accuracy
trade off
for our lab, if you had
accuracy increasing with
bigger rotation angles and
reaction time increasing with
bigger rotation angles- this is the worst case scenario because you could say nothing
about the angle of rotation and you could just be seeing an accuracy-speed trade-off
some programs just eliminate data that is too sloppy or took too long
if you had speed and accuracy both with positive slope- maybe you are not even
mentally rotating something?
you look at t-value to determine the significance of the slope- it is the t value in the
above example- shows that it makes
sense you can use correlations to
test the significance of the slope
Simpson's Paradox
Classify two groups with respect to the
incidence of one attribute;
if the groups are then separated into
several categories (subgroups),
the group with the higher overall
incidence can have lower incidence
within each category (sub-group).
•
Simpson's Paradox - Examples
Sentencing of blacks and whites in
southern U.S.
Research Methods II Notes
•
•
108
Starting salary and level of education
Illustrate the danger of combining data from several distinct groups in calculating
correlations.
• Could have avoided the initial erroneous correlation by using stratified sampling.
- you should have had whites killing whites, blacks killing blacks, whites killing blacks, and
blacks killing whites
- unless you have these categories you can think things are ok (no racism, for example)
when there actually is
• If equal numbers of people were sampled from the subcategories, the overall
relationship will be an average of the relations in the subcategories.
- you should look at subgroups for example like gender- they sometimes throw in the fact
of looking for gender differences- sometimes people argue that this is not fair to look at
just later
- Phillip Rushton- should a person be looking for what he is looking for- racial differences
and intelligence- what possible good can come from this- you could find negative
evidence that could argue against racism but what if you find evidence that supports the
view?
- others argue that information is knowledge
Simple Random Sampling
- everyone in the population has equal chance of being in the group
- random is not simple
- mathematicians have a hard time trying
to figure out how to generalize numbers
- wonderful because reduces possibility of
bias but you have more cost
- the bigger your sample, the more random
your design- you want many many
people but sometimes this gets
ridiculous- do you want to have to subject
people to hard procedures even when
you already know what the answer is (if
you already have significance)
Randomness?
-
-
which one is random?
the one on the right is
random- the other one is a
pattern of bug behaviour
(biological pattern- it is
spaced out)- randomness
has clusters, etc.
imagine the dots are
incidence rats of cancer in
an area- imagine you live
where the cluster is, there
will always be clusters
somewhere just do to
random fluctuations
Research Methods II Notes
109
-
there will be clusters just due to chance but you can’t tell where they are- the chance of
there being clusters somewhere is more likely than there being a cluster in a certain spot
-
where you have pre-existing groups
and you randomly sample groups
for example, classes in school- you
don’t need to randomly sample from
all students, just randomly sample 4
classes and then just sample kids
out of just those classes
they do this to test many products- if
it flies in Winnipeg it will fly
elsewhere
you can combine this with multistage sampling- interested in
attitudes of high school students
sexual practices; figure out the
school boards (10) – randomly
select three of the school boards; each school board has 5 high schools- randomly
select 2 schools- each has 10 classrooms; randomly select 2 classes- in there you
sample everybody- you have more than one “stage”- it can make it more practical
problem is if clusters differ- if school boards differ or if classrooms differ in terms of other
factors
-
-
-
-
-
-
-
•
•
–
•
•
–
-
divide world into strata and ensure
you have equal numbers from each
strata (even if there aren’t equal
numbers in the real world)
example of blacks and whites killing
people- there aren’t equal numbers
in reality but you sample same
number- get rid of Simpson’s
paradox
i.e. want male and female- have two
stratas and want same #’s of each
Sampling Methods
Simple random sampling
Cluster sampling
can be sued with multistage sampling
Stratified sampling
Proportional sampling
very popular, e.g., based on S.E.S. for voting, cultural background etc. (can do later)
this is where you want the ratio of different subgroups in same to reflect ratio in real
world
used in survey studying in attitudes
i.e. who will win next election?
if you have 60% of upper class- you have to take a sample that has 60% of upper class
if looking at upper and lower class differences in voting and want to combine to see who
will win
Research Methods II Notes
-
•
–
-
•
•
•
•
•
-
110
example of the people who didn’t have phones so a phone interview is not a
representative sample
Systematic sampling
every kth person, cheap, easy
every X person from a student list, every 5th person that comes in the sub, convenient,
easy, probably fine
fine unless there is a pattern underlying survey
Problems in Causal Interpretations
Problems interpreting the results of this type of research will also be impeded by:
third variable problem
directionality (not always an issue)
regression artefact (e.g., Rushton)
limited range (floor and ceiling effects), look for converging evidence.
in all of these situations you want to look for converging evidence
Causation
Causation not a simple concept.
To show causation, you want to have:
1) an association between variables that recurs in different contexts (replication,
convergent evidence)
- word recurs is important- you have to have more than one study- no one study will give it
to you
- you need association to appear repeatedly
2) have a plausible explanation showing how the predictor variable could cause the
criterion variable
- a good thing is to have a large sample size
3) have no equally plausible 3rd variable that could cause the variance in the criterion
variable.
While correlation doesn't imply causation, causation does imply correlation
- a cross correlation is a “time lag”- male and female data- smoking causes cancer, the
association keeps recurring
- is there a plausible explanation?- tars, etc. cause cancer in animals so it makes sense
- is there an equally plausible 3rd variable- genetic predisposition to cancer, you can’t pin it
on smoking, added nicotine to cigarettes so that they will be more addictive
•
-
-
•
Partial Correlations
Partial correlation allows you to examine the relationship between two variables with the
effect of the third (or third & fourth etc.) removed from both.
very powerful mathematical technique
used when two variables are both influenced by a third variable- if the variable was not
held constant when the data was collected it could affect the relationship between the
two variables of interest but if it was recorded along with the other two, it’s impact on the
third variable can be statistically evaluated
this method determines the correlation between two variables while statistically
controlling for the effects of a third
Can be viewed as the average of the simple bivariate correlations across levels of the
third, "nuisance" variable (the one that has been "partialed out").
Research Methods II Notes
111
• rYX2X1- here the 2 and 1 are subscripts and the YX and X are multiplied together
Partial Correlations
 removes the systematic relationship with the 3rd variable statistically, by removing the
linear trend, then correlate residuals.
- Remember the regression line gives you the best equation that describes the
relationship between the variables
- in a nut shell just say you were
looking at SAT and GPA scores but
you thought that PE (parental
education) would be a third
confounded variable
- correlate (draw a graph) of SAT and
PE and then find residuals for this
graph
- correlate GPA and PE and then find
residuals- both of these residual graphs will have a correlation of 0 (with the effects of
one partialed out you no longer actually have a correlation)
- to get the correlation of just SAT and GPA you correlate the residual scores
- this is done to see if SAT and GPA will still have a significant relationship even when the
effects of PE are
partialed out
 any number of
measured
variables can be
partialed out as if
controlled for
experimentally
 partial correlation
can be tested for
statistical
significance with nj d.f. (where j=
number of
variables)
-
-
-
this is easy to use
if you have a
correlation matrix
if you have three
correlations from
data you can get
the partial
correlation, and also to compare it with the semi-partial correlation formula
example from our bookthe SAT test is used universally in states as a measure of knowledge- used to predict
how well student will do (whether they will get into a university)
we want to know whether given the info from the sats’ whether this is good data for
determining how well kids will do in university
but there is a third variable- parental education
Research Methods II Notes
-
112
we want to control for parental education
the partial correlation matrix- regress
GPA on parental education and then
take the residuals- GPA and sat
scores have no relationship with
parental education when you have
residual scores- the partial correlation
looks at the two residual scores and
tries to find a correlation  partials
out the effects by regressing the
confounded variable
instead of the simple two circles, we
have three
- the coefficients of determinationdegree to which the two circles
overlap that you are looking at
- coefficients of non-determination are
1-those values
- mathematically we are getting rid of
X2- the amount that is shared
between X1 and Y is determined and
the amount that is not shared is also
looked at
-
we use the formula with Y (a+f) and
not (a+g) because we are trying to
determine y
looking at amount of remaining variance
so it’s not just c- have to account for the
variance that is taken away by the green
circle
Partial Correlations - an Example
 Effects of alcohol consumption during
pregnancy on fetal outcome
 Significant negative bivariate correlation
 But, alcohol consumption tends to
correlate with tobacco (nicotine) and
caffeine consumption
 So, partial out effects of nicotine and
caffeine, alcohol still has a significant
negative partial-correlation with fetal
outcome.
 Other hypothetical example?? – overhead- smoking, stress, and colds (you want to see
if smoking causes colds but seems to be confounded with stress- use stress as the one
that is held constant)
Partial Correlations
 Major improvement (over simple correlation)
Research Methods II Notes
113
 Problem … can be other unmeasured variables
 random assignment, in theory, gets them all
Semi-partial (Part) Correlations
Allows you to examine the relationship between two variables with the effect of the third
removed from one.
-
-
-
here the effect of the third is just
removed from one of the variables
(not both)- the numerator is exactly
the same,
variables to the right of the dot have
been removed from the variables to
the left of that variable (the bracket
tells you that the effects of x2 have
only been removed from X1 and not
Y (if you had no brackets- partial)
doesn’t tend to be used a
whole lot by itself
in semipartial you are correlating
residuals with raw whereas with partial
you are correlating residuals with
residuals
-
•
we are only removing the nuisance
variable from one
Multiple Regression
When looking to make predictions,
third (or fourth) variables that
correlate with the criterion are not
nuisances, but may provide
Research Methods II Notes
•
•
•
•
•
•
•
-
additional information.
With several potential predictor variables, use Multiple Regression analyses
Built upon multiple correlation (just as with bivariate regression and correlation are
related)
Multiple Regression Equation
regardless of the number of predictors, the multiple linear regression equation is:
Y'=a+b1x1+b2x2+b3x3+ …bkxk
where Y’ = is the predicted value of Y
a = the regression constant
b1 - bk = regression weights or coefficients
and
x1 - xk = predictor variables
it is harder to draw the line- it is hard to
illustrate hyperspace
Multiple Regression Plane
-
114
you look for the best plane that goes
through the data sets
rather than the best line
that goes through the
data points
An Example- IQ, GPA and
Study Time
An Example:
IQ, GPA & Study Time
An Example- IQ, GPA and
Study Time
Multiple Regression
Research Methods II Notes
115
two variable are unlikely to have a
correlation of 0 just because of
chance
- how much of the variability in the y
can we predict from the variability in
both the x’s
- a + b + c – proportion of variance
you can account for by knowing the
values of x1 and x2
Multiple Regression
-
Multiple Regression
-
which variable should get credit for
b?
this is a problem
Considerations about R2
•
•
•
-
-
•
•
-
•
•
•
R2 Coefficient of determination (total
proportion of variance you can
account for.
1-R2: coefficient of non
determination.
R2 cannot be less than the largest
single bivariate r2yx
the amount of variance accounted for
by two predictors can not be less than
the variance accounted for by one
predictor? (could be the same, but
can’t be less)
the extent to which a new variable will
improve ability to predict y will depend
on how correlated it is with the
variables that are already correlated
Considerations about R2
With additional predictors, R2 will
increase only to the extent that the new predictor is not correlated with other predictors
already in the equation.
Additional variables entering the equation may simply be fitting noise.
two variables unlikely to have a 0 correlation- on average it would be zero, but for any
one of them it might actually be high
you have to…
Test the significance of improved fit between observed Y and predicted Y’ after each
step in the multiple regression (for most methods)
Shares all the assumptions and potential pitfalls of simple linear regression
Multiple Regression Methods
Direct (Simple) Regression
Research Methods II Notes
•
•
•
•
116
Forward regression
Backward regression
Stepwise Regression
Hierarchical Regression
Multiple Regression methods:
(Note that different terminology can be used by different authors)
Direct (Simple) Regression: All available predictor variable are put into the equation at
once and they are assessed as if they had been entered last i.e., are assessed on the basis
of the proportion of variance in the criterion variable (Y) they uniquely account for. (called
simple regression in Bordens and Abbott)
Forward regression: sequentially add variables, one at a time based on the strength of
their squared semi-partial correlations (or simple bivariate correlation in the case of the first
variable to be entered into the equation)
Backward regression: start with them all then delete them on the bases of smallest change
in the R2
Stepwise Regression: a combination of forward and backward: at each step one can be
entered (on basis of greatest improvement in R2 but one also may be removed if the change
(reduction) in R2 is not significant. (In the Bordens and Abbott text, it sounds like they use
this term to mean Forward regression.)
Hierarchical Regression: The researcher assumes control over the analyses. On basis of
theory or practicality (e.g., economics). Note: this is equivalent to doing semi-partial
correlations>
Class 20
- multiple correlation and regression
- types of multiple regression
- other multivariate designs and analyses
Multiple Regression
- the key thing about it (the distinguishing factor) instead of worrying about correlations,
we are trying to make the best prediction possible- the more variables the better
- the math is a little different than a simple regression- it builds upon simple regression or
simple correlation
- remember that regardless
the number of predictors,
the multiple regression
equation is Y’=
a+b1x1+b2x2+etc.
- we want to figure out what
proportion of variance in y is
shared with the two predictors
Multiple regression coefficient equation
Research Methods II Notes
117
R2
-
R2 Coefficient of determination (total proportion of variance you can account for)
1-R2: coefficient of non determination
R2 cannot be less than the largest single bivariate r2yx
Considerations about R2
- with addition predictors, R2 will increase only to the extent that the new predictor is not
correlated with predictors already in the equation
- if two are perfectly correlated (x1 and x3)- prediction will not be improved because that
bit of the square is already accounted for in x1- they can’t correlate perfectly if you want
to improve predictability
- increase dependent upon the squared semi-partial correlation coefficient:
R2Y•X1X2X3…Xn= r2YX1+ r2Y(X2•X1) + r2Y(X3•X1X2)+… + r2Y(Xn•X1X2X3…Xn-1)
-
•
-
•
-
•
-
•
-
•
-
additional variables entering the equation may simply be fitting noise
test the significance of improved fit between observed Y and predicted Y’ after each step
(except simple).
shares all assumptions that linear regression does
Multiple Regression Methods
Direct (Simple) Regression: all available predictor variables are put into the equation at
once and they are assessed as if they had all been entered last
how much of the variance in y can this variable and this variable alone account for?
this is the method of choice; it is conservative;
they are assessed on the basis of the proportion of variance in the criterion variable (Y)
they uniquely account for (Squared Semi-Partial Correlations)
called simple regression in Bordens and Abbott
Forward regression: sequentially add variables, one at a time based on the strength of
their squared semi-partial correlations (or simple bivarate correlation in the case of the
first variable to be entered into the equation
it is mindless
enters them into the equation depending on how much of the variance they account for
Backward regression: start with them all in the equation then delete them on the bases
of smallest change in the R2
if the change isn’t greater than expected by chance than this variable is just fitting noise,
so get rid of it
Stepwise Regression: a combination of forward and backward
at each step one variable can be entered (on basis of greatest improvement in R2) but
one may also be removed if the change (reduction) in R2 is not significant
in the Bordens and Abbott text, it sounds like they use this term to mean forward
regression
the order in which the variables are entered is based on a statistical decision, not on
theory
Hierarchical Regression: this is the only method with which the researcher assumes
control over the analyses
on basis of theory or practicality
you use this is you have a well-developed theory or model suggesting a certain causal
order
Research Methods II Notes
-
-
118
this is especially important when “multicollinearity” is a problem (when your predictor
variables are highly correlated with each other)- problem of which one gets credit for the
overlap?
when you think that one variable has relative importance over another, use this method
Multicolinearity
- results when variables in analysis are highly correlated
- impact of this is complex and beyond the scope of this chapter
- if two variables are highly correlated- one of them should be eliminated from the analysis
- the high correlation means the two variables are measuring essentially the same thing
Multiple regression
-
increased it so that ab is smaller than bc
x2 would enter equation first because it is accounting for more variance in y than x1
bc is 45% of y
ab is 15% of y
a is .06
c is .36
at the top, the first .51 said that both of them are important (hierarchical)
the second one says that only one might be important (stepwise)
Research Methods II Notes
119
Multicollinearity examples
-
With one predictor variable- you
can predict .66, with two it is .67,
with three it is still .67 and fourth is
still only .67- these new ones have
considerable overlap with the first
variable
Layton & Swanson
- done a while ago,
thousands of school kidsgrade 9, trying to predict
how well they would do in
high school
- verbal reasoning gets put
in first- it accounts for .31
of how well they will do in
grade 11, numerical ability
is less, abstract reasoning
accounts for 0%
- if abstract reasoning put in first, accounts for 20%
- how can everything change like that?
- what do you think the correlation is between verbal reasoning and abstract reasoning?- it
would be high
- final line- if you are second in, the effects of the first one has been partialled out, if you are third,
the effects of the second and first are partialed out, if you are last- not much variance left
Note: Give an example like on the final exam - perhaps as homework
Coleman Report
- explain school
achievement inequalities:
- 1 important DV was verbal
abilities, 60 IV’s! 5 chosen
(based on assumed
importance)- blacks and
whites examine separately
- 3 different orders of entry
- if you have a big
correlation even if it goes
in later in the equation- it
would be really important
- for whites, self-concept is
consistently important
- for blacks, less so: control
of the environment is more
important
Research Methods II Notes
-
120
interesting that both are quite subjective attitude measures
Warsaw study
- assessing massive efforts by
government to achieve
educational equality
- spread individuals around the
city (mixing SES groups that
might differ in ability and
achievement)
- 1300 kids tested on nonverbal IQ
- 3 orders of entry
- successfully removed any
school or area effects
- family consistently significant
- concluded that societal
changes over a generation
failed to override forces that
determine social class
distribution in mental
performance
MR is powerful and easy to (mis)use
- it is extremely powerful
- Anton de Mann’s talk: suicide ideation (how
frequently you think about killing yourselfpeople are unlikely to commit suicide if they
haven’t talked about it)
- Anonymous support service, n.s. but after
demographics and things that could not be
controlledd
- he said there were confounds that could not be controlled for- age, SES
- his conclusion was that they didn’t work when there wasn’t a big change in data- a
relevant question would be- do these have an affect at all?- if you put it in first, would it
account for more?- he didn’t even try that- program put it in 12th
- another colloquia on adolescent diabetics:
- after demographic information added, education (about dangers- potential fatal- of not
taking insulin regularly and maintaining strict diets) was n.s. – the programs they had
had didn’t affect- the one thing that you can control for- even though data says you can’t
control the fitting if it does help, why get rid of it?- the guy didn’t figure out what the %
was of just the intervention
- multiple R is the correlation between the predicted values of Y and the observed
values of Y
- R-square: the square of multiple R and provides index of the amount of variability in the
dependent variable accounted for by the predictor variable
Research Methods II Notes
121
Regression Weights
- for each predictor variable, a table of data will provide a raw regression weight and a
standardized regression weight (calculated after values of measures have been
standardized)
- for most research, use standardized regression weights (beta weights)  because they
can be compared directly even if the variables to which they apply were measured on
different scales- only when variables measured on the same scale can you use the raw
score regression weights
Class 22
• Other Multivariate Designs
• Developmental Designs
• Project data analysis
•
•
-
-
-
Multivariate Designs and Analyses
Multiple Regression: goal is to explain as much of the variance in the criterion variable
(Y - the DV) based on a set of predictor variables (Xs).
Discriminant Analysis: basically Multiple regression, with a categorical dependent
variable.
Activism Among Black South Africans:
C. Motjuwadi M.Sc.
this was how Clement showed
the relationships between
variables
like friend’s support and social
activism- more so than support
from family
did discriminate analysis
used stepwise multiple
regression
Motjuwadi’s Discriminant
Analyses
- discriminant analysis is a special
case of multiple regression
- used when dependent variable
is categorical (i.e. male-female
or Democrati-RepublicanIndependent) and you have
several predictor variables
Predicting Protest Participation
• gender, friend support, personal
power, perceptions of injustice,
& area
Predicting political Membership
(who would join the ANC or
other parties?)
• participation, gender
Predicting Detention
Research Methods II Notes
122
• participants, gender, area
-
-
•
-
-
-
•
-
-
•
-
this guy was supposed to get people to sign consent forms and then also fill out forms
about political activism  they wouldn’t do this, however- could get thrown in jail
in the homelands- you have right of passage at 12- asking a 13 year old man who is
head of household to get permission from his mother? She gets his permission to do
things-he had to watch how he treated the people
his presence on the school grounds was also illegal- risked his life for information for his
masters degree
Multivariate Designs and Analyses
Canonical Correlation: looks at the relationship between a set of predictor
variables and a set of dependent variables by creating one new predictor variable
and one new dependent variable and relates these canonical variates.
if you have a number of predictor variables and dependent variables- it amalgamates all
your predictor variables into one and then correlates the summary variable with the
mathematical summary of the dependent variables
works by creating two new variables for each subject called “canonical variates”- one of
these is computed for both independent and dependent variables and then the two are
correlated- this correlation is called the “canonical correlation”
it tends not to be used a lot- it is not clear what they will map on to
its development is a purely descriptive strategy
can’t be used to infer causal relationships
Multivariate Analysis of Variance (MANOVA). Used when you have more than one
independent variable and more than one dependent variable that you believe are
related (i.e., not independent).
this is anova when you have more than one independent variable
we do anova to avoid probability pyramiding- if you have 20 dependent variables and do
20 different anova’s you will end up with probability pyramiding again- it controls for the
familywise error rates, etc. – allows you to look for interactions amoung dependent
variables just like interactions between independent variables
it makes sense statistically, not used practically that much
print-outs are awkward – it is easier to do anova separately- this is less difficult to
interpret
operates by forming a new linear combination of dependent variables for each effect in
you design- a different linear combination of scores is formed for each of the two main
effects and for the interaction (examples 469-473)
Log-linear analysis. This non-parametric statistic is basically a multivariate Chisquared.
you are dealing with frequencies of categories and dealing with contingency tables
this is like chi-square but you have more than two variables
chi-square is nonparametric version of t tests
log-linear analysis is like a multivariate chi-square
Research Methods II Notes
-
when you have variables
that are measured
categorically
Log-Linear Example
- looking at social behaviour
between coyotes
- adult-pup interactions looked
at
- we crudely looked at
whether interactions were
affiliative, rebuff, aggressive,
etc.
- we were interested in quality
of interaction was dependent
on whether adult or pup
initiated it
- it is a 2x2x4 chi-squareeverything was significant
- frequency data
•
-
-
-
Multivariate Designs and Analyses
Path Analysis. Uses multiple
regression methods to examine
hypothesized causal relationships
among variables with only
correlational data. See how well
your theoretically derived model
describes relationships among
variables. Can also compare
competing theories about the
relationships among variables.
trying to figure out how correlated
variables are causally related
it isn’t taught so much in stats
programs
idea behind it- create a model of how
you think variables are related and
then test with your correlations- the
best model “wins”
simplest causal relationships would
be A causing B and then A and C
both causing B and then A and C
both predict B but A and C are also
related
READ PG. 474
Possible Causal Relationships
- this figure: parental education predicts
scholastic achievement scores
123
Research Methods II Notes
124
One model might be that the messages children get at home could cause motivation
which could be related to sat scores
- or you could also have work habits directly relating to sat scores- that DIRECTLY
related to scholastic achievement- all of these could be tested using correlations
- both A and B relate through C and D, and D influences C which both influence E
- correlation between A and B, A and C, B and C, C and D, A and D and B and D (these
would be weaker because they have to go through C
Causal Antecedents of Attachment
Research Methods II Notes
125
Cross-correlation in
Developmental
Research
- preference for tv
violence and
aggression is
correlated in third
grade but not
thirteenth grade, but
watching it at a
younger age
correlated more with
aggression in
thirteenth grade
Multivariate Designs and Analyses
•
Factor analysis is a multivariate form of data reduction. Factor analysis is typically
use to extract a relatively small number of underlying dimensions or factors that
can account for relationships among measures (see example from text)
- know about the different techniques and know when they are appropriate
- the following table- what makes a person attracted to another person?- physical
attractedness
- had large number of questions rating individuals; many are correlated to each other
- wanted a simpler version- did factor analysis- extracts the mathematically created
factors and it tells you how your measures are related to each other
- factor one- how kind
someone is (this was
most highly related)
- had three main factors
from all the factors to
summarize what is
going on- how good
they think the other
person is, how socially
vital and how
personally strong they
are- but still only
accounting for 50% of
the variance
- this is a method of
extracting what is
really relevant
Research Methods II Notes
126
Multivariate Designs and Analyses are all very powerful and some are easy statistics to use,
and misuse.
To use these the techniques appropriately depends upon careful research design and
thought.
-
Data Collections Methods in Developmental Psychology
if you are interested in evaluating changes in behaviour that relate to changes in a
person’s chronological age  use developmental design
they are quasi-experimental designs
person’s age usually serves as quasi-independent variable- age is often a variable
Naturalistic Observations
Interviews
• structured
– questionnaires
– surveys
• unstructured
– clinical
Case Studies
Experimental:
• lab
• field
Quasi-experimental
• correlational
• ex post facto
Experimental Designs in Developmental Psychology
• Longitudinal Designs : take a group and follow them over time
• Cross-sectional Designs
• Cohort-Sequential (Cross-sequential, time-sequential) Designs
Longitudinal Designs
Examine developmental changes in one cohort followed over time
Cohort: people of same age that have had the same cultural experience (5 year olds in
Canada and 5 year olds in Africa are not in the same cohort)
Within-Subjects Quasi-analytic design
Advantages:
• Process of development can be followed with individuals
Disadvantages:
• Large investment of time and money is required (especially if the age span of interest is
large)
• Subject attrition can be a problem – you can lose subjects
• Carryover effects (e.g., learning) can be a problem
• Differences among cohorts are not addressed
Research Methods II Notes
127
Cross-sectional Designs
Examine two (or more) ages (or cohorts) at one time
Between-Subjects Quasi-analytic design
Advantages:
• Fast and cheap
• No subject attrition
Disadvantages:
• Confounds age and cohort effects – “generation” effect (influence of generational
differences in experience which become confounded with the effects of age per se)
• Unable to examine the process of development within individuals
Cohort-Sequential Designs
Combination of cross-sectional & longitudinal designs
• two (or more) cohorts, each studied at two (or more) ages. (Sometimes with additional
groups tested once to "fill in" the design.)
Mixed Quasi-analytic design
Advantages & Disadvantages
• This is a compromise solution with some of the advantages and disadvantages of crosssectional & longitudinal designs
• depending upon the length of the within cohort component and the number of different
cohorts.
- read the stuff in the text book  172-177
- result of study done cross-sectional study looking at age and IQ- if you plot IQ as a
function of age
- people get dumber as they
get older- increases for a
while
- the 50 year olds here could
have been that they were in
the depression times-not as
much schooling
- the second one shows year
of education and IQ- it is a
third variable and it is almost
perfectly correlated with age
Research Methods II Notes
128
Age, Education and I.Q.
intelligence increases
until 50 or 60 years of
age- the results you get
depend on the type of
measure you use and
the cohort effects
-
Research Projects
Due: Next Wed (A)/Thurs(B)
• .ppt presentations due Monday (A) , Tuesday (B)
• hand in on disk with your names on it to Jill
Research Project Report
• in APA format
• all materials should be included as appendices
• review verb tense, SPSS data file, SPSS output
• All consent forms, raw data and/or coding sheets to be handed in (separate bundle)
Research Project Presentations
• Approx. 10 minutes
• all partners must participate to receive credit
What we want to hear and understand
• Research context - existing relevant literature (your approach to the study, why you are
•
–
•
•
•
•
studying what you did)
your hypotheses and design to test Hexp
(IV/DVs, how & what you used to measure them)
results (stats, figure)
discuss what your results mean, relate to literature
thoughtful suggestions for improvements/future research
be ready for questions
Don’t…
- present things that will be consistent across situations (consent, method sectionrecruiting procedures, debriefed)
Research Methods II Notes
129
March 30, 1999
Class 23
Discrete Trials Designs
• Psychophysics
• Signal Detection Theory
• Course evaluations
• Hand in .ppt files for presentations
Characteristics of Discrete Trials Designs
1) individual subjects receive each treatment condition of the experiment dozens (perhaps
hundreds) of times. Each exposure to the treatment, or trial, produces one data point for
each dependent variable measured.
2) Extraneous variables that might introduce unwanted variability in the DV are tightly
controlled.
3) If feasible, the order of presenting the treatments is randomized or counterbalanced to
control order effects.
3) The behaviour of individual subjects undergoing the same treatment may be compared
to provide intersubject replication.
Analysis of Data from Discrete Trials Designs
- begins by averaging the responses across the repeated presentations of a particular
treatment
- large number of presentations helps to ensure the resulting mean provides a stable and
representative estimate of the population mean (of the mean that would be obtained if an
infinite number of trials could be given to the subject under the treatment conditions)
- means obtained from different treatment conditions may then be compared to determine
whether they appear to differ (may or may not include inferential statistics)
- analysis usually guided by theory or model of behaviour being examined
- these analyses yield a small number of descriptive statistics such as the d’ (measure of
sensitivity) and B (measure of response bias) in signal detection
Psychophysics
- this is the branch of psychology that is interested in signal detection trials
Concerned with the four perceptual problems of:
1. Detection
2. Identification
3. Discrimination
4. Scaling- how big is it, how many of them are there, how far away is it
- you would be asked- is there something there, what is it, might be one or two,
- once you get the detection it is easier to continue
- think about what the people are doing in former Yugoslavia- you think you see
something, what is it, discrimination- one of ours, one of there’s, scalling- how far away
is it, do we want to fire?
Psychophysics
Absolute thresholds are often used as the index of an individuals sensitivity to a specific
stimuli, or differences between stimuli.
Gustav Theodor Fechner (1860) defined the absolute threshold as the stimulus that "lifted
the sensation or sensory difference over the threshold of consciousness"
Research Methods II Notes
-
130
amount of mechanical energy that has to be out there before you are consciously aware
of something (visual or auditorally)
in theory, these absolute thresholds look like this:
The Absolute Threshold
-
we could be looking at touch,
visual, etc.
you play a soft note- start out
really quiet, if they can’t hear it
you snap it out a notch, at
some point she still can’t hear
it but you will be close to the
absolute threshold- bump it up
one notch and she can
suddenly hear it (grandma at
the hearing doctor)
Method of Limits
- you have trials that either
increase in stimulus strength
or decrease
- so in this one you indicate
whether you have heard it or
not (no “I think something might
be there”)
- so at 7, they suddenly say yes
- but you can’t stop here- you
then try with a loud one and
then stop it
- how can they not hear it at 8
when they could hear it at 7
- 4 is a decreasing one- here
they detect it at 7 and 8
- so their data is not always
consistent- but we can get an
average
- just look at descending trials
- just look at ascending trials
- overall mean- of the total six trials
- this method is sometimes used by audiologists
- it is inconsistent- the absolute threshold isn’t all that absolute
- first of all you have alternating – this is good
- you can have anticipation and perservation errors- if on the weak trials they couldn’t hear
it till the fourth one they might keep that in mind for the other trials
- when you have many many trials when they can’t hear it- perservation errors where they
go with what they just said in the last trial when they aren’t sure
Research Methods II Notes
131
Staircase Method
- the staircase method- once they say
no you increase the strength and
once they say yes you decrease the
strength
- you are toggling around the absolute
threshold
- you can track the person’s perceptual
sensitivity to the perceptual stimuli
- it should track around the persons
threshold
- it allows you to do experimental
manipulations if you use this as a
stable baseline- you could introduce
a drug treatment, etc.
- you can track changes over time or
introduction of drugs or other
experimental treatments
- remember that this is all for the same stimuli- fixed wavelength or tone
Why do Thresholds Seem to Vary?
- why isn’t it absolute?
Stimuli being presented is not the only one that the subject is experiencing (you are never
without sensory activity)- cave example
Constant background stimulation for any signal
Endogenous noise
Noise - any background stimulus other than the one to be detected. Can be visual,
chemical, mechanical, thermal, or auditory.
- in this research there are differences between what you are trying to look at and then
everything else you aren’t looking at (noise)
Can also be lapses of attention, fatigue, and other psychological changes.
-
-
-
Determining the “Absolute”
Threshold:
Method of Constant Stimuli
so with real data, trying to
determine the absolute
threshold
use “Method of constant
stimuli”- very precise method;
good idea of persons
perceptual abilities
takes a lot of effort
you have one particular
stimulus- 8 different intensities
for each intensity you have a
measure of stimulus strengthone way you can do this is how
likely the person is to say they
heard it
Research Methods II Notes
-
-
-
-
132
for 1 and 2 you had 6%each of the data points
might be 100 trials or 50 ,
etc. – so you have 800
trials (for one frequency)
the plots are called
“Ogive’s”- when you plot
perceptual perceived
against stimulus intensity
so we’ve run 800 trials on
grandma- what is her
absolute threshold
somewhere between 1
and 8
Psychophysics
Basic assumption in doing
psychophysics is that any
type of behaviour has
some strength. In
Psychophysics the measure of strength most often used is response probability.
p(yes) = #yes responses /(#yes+ #no responses)
- they have to assume that you can’t measure the actual brain of people (putting
electrodes on the brain for example)
- we had likelihood of perception on the y axis but in this you use response probability
(probability with which they are going to say they thought it- the equation)
Determining the “Absolute” Threshold: Method of Constant Stimuli
- our absolute threshold will be arbitrary because there is no abrupt increase
- traditional response- point at which they say they can detect it 50% of the time (called
50% threshold)
Approximate Thresholds
Vision: Candle flame from 48km on a dark clear night
Audition: Wristwatch vfrom 6m in a quiet room
Taste: 1 tsp sugar in 7.5 litres water
Olfaction: 1 drop of perfume in a 3 room apartment
Touch: a bee’s wing falling on your cheek from 1cm
Signal Detection Theory
A mathematical, theoretical system that recognises that individuals are not merely passive
receivers of stimuli.
Participants are also engaged in the process of deciding whether they are confident enough
to say "Yes, I detect that stimuli" when engaged in psychophysics experiments.
- participants are engaged in decision making processes
- will they be biased towards saying yes or no
- i.e. fighter pilots- you need perfect vision; you might be biased towards saying yes; or if
you want to be compensated for auditory loss you might be biased towards saying no
- on some trials you are not going to be sure
Research Methods II Notes
Signal Detection Theory
Problem: subjects may wish to appear sensitive (or insensitive). Subject bias.
To account for decision making component, can introduce “catch trials”
- they put in catch trials to see if people are lying – if they are saying they are
sensing something when something really wasn’t there
- there are only two options- a catch trial or an experimental trial
- you either say yes or no
- it’s either going to be there or it’s not
With two possible experimental trials
(signal present or absent) and two
possible participant responses
("yes" it is present or "no" it isn't
there) there are four possible
outcomes to each of many trials.
Participants' responses on each trial
are going to be consequences of
both their perceptual sensitivity to
the stimuli presented and their
decision strategy or bias toward
saying some thing is there or not
when they are in doubt.
Manipulating Bias
By varying the conditions of the
experiment bias can be altered.
• alter expectations
• or alter the relative importance of
the two types of error. (Payoff
matrix)
- I’ll give you a loonie every time
you get it right, and take back a
quarter every time you have a
false alarm
- pretty good odds- but if it was the
other way around it would be
different; wouldn’t change your
sensitivity but you would
probably have different results
Outcome Matrix: Signal Present 50%
of Trials
- you tell the person that there is only
going to be a signal 50% of the time
Outcome Matrix: Signal Present 90% of
Trials
Outcome Matrix: Signal Present 10% of
Trials
133
Research Methods II Notes
134
- all of this variability is from us messing around with the decision making process
Isosensitivity (ROC)Curve
-
-
-
-
each of the data points summarizes
one of the outcome matrixes- plotting
the probability of a hit against
probability of false alarms
the guess line is the diagonal- she’d
just be guessing
the d’ is the index of sensitivity  get
an idea of the difference between
guessing and the other line
person who generated the red line
wasn’t as sensitive as the person
who generated the blue databecause they were closer to guessing
the same person can generate both
lines if it was for different frequencies
Calculating d' From a Single Outcome
matrix
Data required for each point on an
isosensitivity (ROC) curve requires
hundreds of trials (to get accurate
probabilities for Hits and False Alarms).
With a few assumptions, d' can be
calculated from a single outcome matrix
using Signal Detection Theory.
- you need many many trials to get more
accurate
- we can calculate the d’ from one single
outcome matrix
Signal Detection Theory Assumptions
1) Noise is normally distributed.
- sometimes it’s high, sometimes low, most likely it is in the middle (shows normal
distribution)
- you add a fixed amount of noise to the background noise so you are going to shift the
amount of sensory activity constant amount
Presenting a signal on top of that noise, will therefore shift the amount of sensory activity to
the right (higher), by an amount equal to that sensory systems sensitivity to that signal.
The difference between the mean amount of sensory activity generated by the noise alone
trials and the signal+noise trials will equal sensitivity (d') measured in z-score (standard
deviation) units.
- some levels are more likely to occur, some are less likely to occur (tops of graphs
more likely to occur)
- you shift the whole distribution to the right- now that system is going to have an increase
proportional to the sensitivity to the signal
- d’ becomes indication of sensitivity to the system
Research Methods II Notes
135
Stronger Signal (or More Sensitive Receiver)
you crank up the frequency, more
sensitive person, change the frequency
will result in this
- still shift over by fixed amount that is
equal to the sensitivity to the signal
Signal Detection Theory Assumptions
2) Participants adopt a criterion () for
dealing with those values of sensory
activity that could result from either noise
alone or signal plus noise (the area
where the noise and signal+noise
distributions overlap).
If the amount of sensory activity exceeds
that amount, the participant will say the
detected the signal, any amount less than
that and they will say they did not detect
the signal.
- this adoption of criterion is not
conscious- they are setting an absolute
threshold for dealing for the decisions
that occur in the absence or presence of
a signal
- the issue is the overlap when they have to
guess
- if the amount of sensory activity is greater
than their criterion they will say yes, if lower,
they will say no, there is nothing there
- it is not conscious remember
- they are acting in a way that seems to say
this it is not conscious
- if we make this assumption we can save
ourselves hundreds of trials
-
Manipulation of Bias
We can now interpret the manipulation of a receiver’s motivation to say “yes” when in doubt
(due to either changing expectations of payoffs) as effecting the placement of the criteria
- you are introducing a lax or liberal criterion- when you do the pay off thing; you are not
influencing the actual sensitivity
but the decision making process
- all that this manipulation of bias
has done is moved where you put
your criterion in the range of
overlap
- we haven’t changed sensitivity
- where the criterion is put has no
influence on sensitivity- it is only a
reflection of bias
Research Methods II Notes
136
Sensitivity
Criterion location has no effect on sensitivity
Sensitivity refers to the average amount of sensory activity generated by a signal compared
with the average amount of noise generated sensory activity
- difference in the mean distributions
- look on the links page about Fechner
Class 24
• Remaining Presentation(s)
• Signal Detection Theory
–conclusion
–tutorial
at DOS prompt: type “percept”
at menu pick “E. Theory and Methodology”
at menu pick “B. Signal Detection Theory”
The introduction works, part B usually doesn’t
Using theory
Review for Final (April 17th Auxiliary Gym, 9-12)
Pick Date for Review Class
•
•
•
•
•
•
•
Signal Detection Theory
Calculating d' From a Single
Outcome matrix
Data required for each point
on an isosensitivity (ROC)
curve requires hundreds of
trials (to get accurate
probabilities for Hits and
False Alarms).
With a few assumptions, d'
can be calculated from a
single outcome matrix
using Signal Detection Theory.
Research Methods II Notes
137
Signal Detection Theory Assumptions
1) Noise is normally distributed.
Presenting a signal on top of that
noise, will therefore shift the
amount of sensory activity to the
right (higher), by an amount equal
to that sensory systems sensitivity
to that signal.
The difference between the mean
amount of sensory activity
generated by the noise alone trials
and the signal+noise trials will
equal sensitivity (d') measured in zscore (standard deviation) units.
d’ is increased sensitivity between
the two- d’ is a measure of
sensitivity
2) Participants adopt a criterion () for
dealing with those values of
sensory activity that could result
from either noise alone or signal plus
noise (the area where the noise and
signal+noise distributions overlap).
If the amount of sensory activity
exceeds that amount, the participant
will say the detected the signal, any
amount less than that and they will
say they did not detect the signal.
- you can’t possible know
there is this range of sensory
activity in which you have to guessyou can’t know for certain if there
was a signal
- your system sets the criterion- as long
as the level exceeds the criterion you
will say yes, if it doesn’t, you will say no
- in the orange area, you could either be
hearing a signal with relatively low
background noise, or you could be
hearing more background noise
We can now interpret the manipulation of a
receiver’s motivation to say “yes” when
in doubt (due to either changing
expectations of payoffs) as effecting the
placement of the criteria
- if person quite willing to say yes, they have a lax or liberal criterion- they are saying yes
for most of the trials when the signal is there but also when it isn’t
- very high false alarms
-
Research Methods II Notes
-
138
these people require more sensory activity to say yes
they will hardly have any false alarms- only for the trials when the signal is absent in
which background noise is greatest
they are conservative in saying that they say yes
Sensitivity
Criterion location has no effect on sensitivity
Sensitivity refers to the average amount of sensory activity generated by a signal compared
with the average amount of noise generated sensory activity
- with these assumptions, the four cells in our matrix can be related to the areas under the
two normal curves with the criterion dividing them
Signal Detection Theory
With two assumptions:
1) Noise is normally distributed,
2) Participants adopt a criterion () for dealing with those values of sensory activity that
could result from either noise alone or signal plus noise,
The four cells of an outcome matrix (Hits,
Misses, False Alarms & Correct Negatives)
can be represented as areas under the two
normal distributions.
-
-
when you say yes and the signal is
there- the proportion of the curve to the
right of the criterion variable will be
proportion of hits
here the person has a miss
- false alarm- this part coresponds to false
alarm rate
-
signal is absent and person says no
d’ is difference between two means
Research Methods II Notes
-
139
we can measure how far apart the means are
Signal Detection Theory
d’ can then be measured in z-sore units by: use this because of the area under the curveyou want to know how many sd’s the two curves are away from each other
d' = ZFA - ZHit
Tables for the z-score distribution or percent area under the normal curve typically present
the z-score distances between the mean and the Criterion value ().
If you are using such a table, ZFA can be
found by looking up the z-score associated with (50 - False Alarm %).
- tables look at the area under the curve between the mean and the z-score
d' = ZFA - ZHit
If this number (50-FA%) is positive, then the z-score to be put into the above formula will
also be positive, if it is negative, the z-score value for the formula will also be negative.
It is essential that the proper signs be used.
A good way of checking would be to draw the distributions and the criterion and see the
relationship between d' and the two z-scores.
Similarly, to find ZHit, look up (50 - Hit %), again, the resulting sign will be the same as is
used for the z-score in the formula.
Example
- tradition is to look at hits and false alarms
- the hit rate is 60%- over half
- the false alarm rate is the 20%
- 20% of the alarm absent curve is to the right of the criterion
- we are interested in finding d’- difference between the two means
Signal
Present
Absent
Yes Response
.60
.20
No Response
.40
.80
- z scores to right of mean are positive, to left of mean are negative
d' = ZFA - ZHit = Z (50-20) - Z (50-60)
z-score associated with 50-20= 30% of the normal curve = .842; for 50-60=
Research Methods II Notes
140
-10% it is -.253
d' =.842-(-.253) = .842+.253= 1.095 z-score units
Signal
Present
Absent
Second Example
Yes Response
.95
.75
No Response
.05
.25
Did this person have a lax or strict criterion? Lax- pretty willing to say yes
d' = ZFA - ZHit = Z (50-75) - Z (50-95)
z-score associated with 50-75= -25% of the normal curve = -.675; for 50-95= -45% it is 1.645
d' =-.675-(-1.645) = .970 z-score units
-
start everything with just the graph and beta (the person’s criterion) then start putting the
distributions in when you know how often they say yes, etc. d’ is the increase in sensory
activity level from one distribution to another; looking at normal distributions; z units
used when you have Phit = .5 and Pfa = .3- the z score associated with Phit is 0 and the z
score associated with Pfa is .524
Research Methods II Notes
-
lets do another one with someone with a very strict criterion: Phit = .15 and Pfa = .05
Phit is the number of times they say the signal is there when it is- so you would subtract
this from 50 (35% must be between the mean and criterion)
Pfa is when they say it is there when it isn’t
Remember we are always looking at when they say it is there (when both graphs are to
the right of the criterion)
also have to look up 45%
both numbers are positive- the criterion is located to the right of the means (both are
positive)- look up numbers, etc.
all of this is a nice solution to deal with bias- way to test even when they don’t even know
if they are right
-
-
-
141
Sensitivity to Pain: An Experimental Study of Acupuncture
what happens, Lx is measure of
bias- person receives the same
amount of thermal energy but they
are less likely to say it is painful
just remember, if the criterion is to
the right of the mean, you subtract
the % from 50%,
if criterion is to left of mean, you
subtract the % from 50 as well, but
the number that you eventually get
off the table you make negative
Research Methods II Notes
-
142
Using Theory: Chapter 15
a scientific theory is one that goes beyond the level of a simple hypothesis, deals
with potentially verifiable phenomena, and is highly ordered and structured
consists of a set of interrelated propositions that attempt to specify the relationship
between a variable and some behaviour
characteristics of scientific theories:
a) describes a scientific relationship (one established through observation and logic)
that indicates how variables interact within a system to which the theory applies
b) described relationship cannot be observed directly- its existence must be inferred
from the data (if you could observe the relationship directly there would be no need
for a theory)
c) statement is only partially verified (theory has passed some tests but not all relevant
tests have been conducted)
Distinctions among:
• Hypotheses- less complex than theories; looks at only one variable at a time
unlike theories which can look at a system of variables
• Laws- a theory that has been substantially verified; not subject to disconfirmation
like theories are
- laws may idealize real-world relationships (such as the law of “ideal” gases when in
reality there are no ideal gases) but it holds well enough for most purposes
• Models- refers to a range of concepts; in most cases “model” refers to a specific
implementation of a more general theoretical viewpoint (like classical
conditioning)
- can represent an application of a general theory to a specific situation
• & Scientific Theories
Ways to Distinguished Among Theories
1) Quantitative vs. Qualitative
- a quantitative theory is expressed in mathematical terms; specifies the variables and
constants with which is deals numerically and relates the numerical states of the
variables and constants to one another (i.e. theory of information integration)
- a qualitative theory is any theory that is not quantitative; tend to be stated in verbal
terms; state which variables are important and how they interact; can describe
quantitative relationships but not measured on a scale higher than ordinal (i.e. theory of
language acquisition of Noam Chomsky)
2) Levels of Description
- some theories primarily designed to describe a phenomenon while others attempt to
explain relationships amoung variables
• Descriptive theories: a theory that merely describes a relationship; does not give
explanations (e.g. Kepler’s theory that the planets move around the sun- described but
did not say why this happened); trap is to think you have explained a phenomenon when
all you have really done is given it a name
• Analogical : explain a relationship through analogy; you must equate each variable in
the physical system with a variable in the behavioural system to be modeled; you can
then plug in values and apply rules of the original theory to generate predictions
• Fundamental- really explain what is going on: a theory that proposes a new structure or
underlying process to explain how variables and constants relate; includes assumptions;
Research Methods II Notes
143
rare in psychology (e.g. Festinger’s cognitive dissonance theory- when two attitudes or
behaviours are inconsistent, cognitive dissonance is aroused)
3) Domain of a Theory- whether it applies to just certain situations or all animals
- domain = scope
- this dimension looks at the range of situations to which the theory may be legitimately
applied
- a theory with a wide scope can be applied to a wider range of situations than can a
theory with a more limited scope
- chances of dealing adequately with a range of phenomena are better for a small area of
behaviour than they are for a large area
- most psychological theories very specific
•
Some Roles of Theory in Science
Understanding: theories represent a particular way to understand the phenomena they
are dealing with
•
Prediction: even when theories do not provide a fundamental insight into mechanisms
of a behaving system they can provide a way to predict behaviour
•
Organizing & Interpreting data: theories can provide sound framework for organizing
and interpreting research results; research results can be interpreted in light of a theory
•
Generating Research: theories provide ideas for new research; this is known as
“heuristic value” of a theory (often independent of validity)
-
•
•
•
•
•
Characteristics of Good Theories
whether or not a theory falls by the wayside or stands the test of time is determined by
the following:
ability to Account for data: to be of value, theory must account for most of existing data
within its domain; not all data because some data might be unreliable
Have “explanatory relevance”: the explanation for a phenomenon provided by a theory
must offer good grounds for believing the phenomenon would occur under the specified
conditions
Are testable: a theory is testable if it is capable of failing some empirical test; i.e.
problem with Freud’s theories (they are not testable- cannot be proven wrong)
Predict novel events: a good theory should predict new phenomena beyond for which
the theory was originally designed; e.g. Einstein’s theory of relativity accounted for the
same data and produced the same predictions for a wide range of phenomena as did
Newtonian mechanics, but it went beyond Newton’s to predict phenomena not expected
to occur from Newton’s point of view
Are parsimonious- simple: a theory that makes relatively few assumptions
Steps in Developing Theories
1. Defining scope of your theory: defining the domain or scope of the theory; you want to
develop a theory that will provide explanations for observed relationships
2. Knowing the literature: become thoroughly familiar with current and past research in the
area the theory will cover; know what lawful relationship already discovered, etc.
Research Methods II Notes
144
3. Formulating the theory: requires effort, insight, inspiration, luck; need to make random
pieces fit together neatly to reveal coherent picture
–Preparedness  for theories to come almost by accident you must already be prepared
(Archimedes example)
–using analogy  ideas for a theory may develop from knowledge of other well-understood
systems that behave similarly to the system you are trying to understand
–using introspection  at least when dealing with your own behaviour you have access to
private information concerning perceptions, problem-solving strategies, memories, etc.;
–problems with this method:
– important aspects of what you are looking at might not be conscious
 act of attempting to examine one’s own mental processes might interfere with
the processes themselves
 observable mental events may have no causal role in the operation of system
 theories used to explain behaviours of animals- can’t use introspection or
“anthropomorphizing”(changing into human form)
4. Establishing predictive value: check the theory’s predictiveness against existing data;
theory should be adequate to account for the relationships already discovered (you can
modify theory or ignore discrepancies if predictions do not fit with theory)- should you
ignore discrepancies? (world is not ideal, experimental conditions only approximate
conditions specified by the theory; use your own judgment)
5. Testing your theory empirically: set up specified conditions and observe the whether
outcome agrees with predictions
What Rule Generates This Sequence?
2 4 8
16
32
?
Formulate your hypothesis then test it, I will tell you if it is acceptable as the next number in
the sequence.
Numbers get larger.
- you could just say 64 which would be correct, but the real thing is just that numbers get
larger
Hexp: All “X” also have a star
Research Methods II Notes
145
Testing Theories
Both confirmation and disconfirmational strategies can be used to test theories.
- confirmation of theories
- if relationships predicted by a theory are observed in empirical data, then theory has
been supported
- when theory supported you have more confidence in ability of theory to explain and
predict phenomena within domain- does not mean that theory has been proven
- disconfirmation of theories
- although you can never prove a theory correct you can prove it wrong
- this is hard, however, because when theories are disproved experimenters can often
blame other factors for the outcome
- strategies for testing theories:
- Kuhn (1970): history of science reveals that most theories continue to be defended and
elaborated by their supporters even after convincing evidence to the contrary has
been amassed
- new view only takes hold when supporters of old view die off
- how can this be avoided?- process of “strong inference”: a strategy for testing a theory in
which a sequence of research studies are systematically carried out to rule out
alternative explanations for a phenomenon
- confirmation strategy: strategy of looking for confirmation of the theory’s predictions
- disconfirmation strategy: strategy of looking for evidence that will disconfirm the
prediction
It is best to use both.
Research Methods II Notes
-
-
146
you need both to adequately test a theory
usually pursue a confirmation strategy when theory is fresh and relatively untested
if theory survives these tests you pursue disconfirmation strategies  the object during
this phase of testing is to determine whether outcomes that are expected from the point
of view of the theory always occur
you don’t want to have just confirmational testing because it is biased- you have to have
testing that tries to disprove theory as well as prove it
Research Methods II Notes
147
Review: Since Last Midterm...
• Quasi-analytic experiments: Bivalent correlation designs
• Calculate the extent to which the two variables are systematically related
• Graph data (scatterplot): predictor (assumed causal or IV) variable on abscissa (X-axis)
and criterion or DV on ordinate (Y-axis)
• Pearson's product moment correlation coefficient (for Interval or ratio data) measures the
direction and degree of association.
• r is the mean of z-score cross-products
• cautions - assumes linear relations, truncated ranges, outliers, heteroscedasticity,
combining group data
– So: examine scatterplots!!
• Problems interpreting the results of this type of research:
–third variable problem
–directionality (not always an issue),
–regression artifact (e.g., Rushton),
–floor and ceiling effects,
–look for converging evidence
• Correlation versus ex-post facto design
• Interpretation problems are not related to the statistical choice, rather due to the design
• Causation not a simple concept
• Simpson's Paradox.
• Partial correlation
• Remember: can be other confounding variable not measured
• Semipartial correlations (sometimes called Part correlations)
• Correlation versus ex-post facto design
• Interpretation problems: due to the design
• Causation is not a simple concept.
• Simpson's Paradox
• Partial correlation
• Semipartial correlations (Part correlations)
• Multiple Correlation and Regression
–formulae, types, importance of order of entry
–considerations about R2
• Developmental Designs
• longitudinal, cross-sectional, & cohort-sequential
• Cohort: a group with common experiences
• Sampling: types: random, stratified, proportional, systematic, cluster (multistage)
• Volunteers
• Discrete trials designs - Psychophysics
• Signal Detection Theory
• teasing apart sensory ability and decision to say “yes”
• Isosensitivity curves also called ROC (receiver operating characteristic) curves:
• calculate d' from hit and false alarm probabilities (using tables of areas under the normal
curve)
• Scientific theories: types of theories, functions of theories
• Evaluation on the basis of : parsimony, testability, precision
• Confirming vs. disconfirming strategies (confirmational bias)
Download