Solutions Manual for Fundamental Statistics for the Behavioral

advertisement
Student Manual for Fundamental Statistics
for the Behavioral Sciences (7th edition)
David C. Howell
The University of Vermont
Contents
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 20
Chapter 21
Introduction
Basic Concepts
Displaying Data
Measures of Central Tendency
Measures of Variability
The Normal Distribution
Basic Concepts of Probability
Sampling Distributions and Hypothesis Testing
Correlation
Regression
Multiple Regression
Hypothesis Tests Applied to Means: One Sample
Hypothesis Tests Applied to Means: Two Related Samples
Hypothesis Tests Applied to Means: Two Independent Samples
Power
One-way Analysis of Variance
Factorial Analysis of Variance
Repeated-Measures Analysis of Variance
Chi-Square
Nonparametric and Distribution-Free Statistical Tests
Choosing the Appropriate Analysis
ii
Preface
The purpose of this manual is to provide answers to students using the accompanying
text, Fundamental Statistics for the Behavioral Sciences, 7th ed. I have provided complete
answers to all of the odd-numbered questions. I am often asked for answers to evennumbered exercises as well. I do not provide those because many instructors want to have
exercises without answers. I am attempting to balance the two competing needs.
You may find on occasion that you do not have the same answer that I do. Much of this
will depend on the degree to which you or I round off intermediate steps. Sometimes it
will make a surprising difference. If your answer looks close to mine, and you did it the
same way that I did, then don’t worry about small differences. It is even possible that I
made an error.
I know that there will be errors in some of these answers. There always are. Even the
most compulsive problem solver is bound to make errors, and it has been a long time
since anyone accused me of being compulsive. I do try, honest I do, but something
always slips past—sometimes they even slip past while I am correcting another error. So
I maintain a page on the web listing the errors that I and other have found. If you find an
error (minor and obvious typos don’t count unless they involve numbers), please check
there and let me know if it is a new one. Some classes even compete to see who can find
the most errors—it’s rough when you have to compete with a whole class.
The address for the main web page, is
http://www.uvm.edu/~dhowell/fundamentals/ , and the link to the Errata is there.
Important note: Due to the way hypertext links are shown by Microsoft Word, the
underlining often obscures a single underline character, as in “More_Stuff.” If you see a
space in an address, it is often really a “_.”
iii
Chapter 1-Introduction
1.1 A good example is the development of tolerance to caffeine. People who do not
normally drink caffeinated coffee are often startled by the effect of one or two cups of
regular coffee, whereas those who normally drink regular coffee see no such effect. To
test for a context effect of caffeine, you would first need to develop a dependent variable
measuring the alerting effect of caffeine, which could be a vigilance task. You could test
for a context effect by serving a group of users of decaffeinated coffee two cups of
regular coffee every morning in their office for a month, but have them drink decaf the
rest of the time. The vigilance test would be given shortly after the coffee, and tolerance
would be seen by an increase in errors over days. At the end of the month, they would be
tested after drinking caffeinated coffee in the same and in a different setting.
The important points here are:
1. Tolerance is shown by an increase in errors on the vigilance task.
2. To see the effect of context, subjects need to be presented with
caffeine in two different contexts.
3. There needs to be a difference between the vigilance performance in
the two contexts.
1.3 Contexts affects people’s response to alcohol, to off-color jokes, or to observed
aggressive behavior.
1.5 The sample would be the addicts that we observe.
1.7 Not all people in the city are listed in the phone book. In particular, women and
children are underrepresented. A phone book is particularly out of date as a random
selection device with the increase in the use of cell phones.
Many telephone surveys really miss the general population, and instead
focus on a restricted population, dominated by male adults.
1.9 In the tolerance study discussed in the text, we really do not care what the mean
length of paw-lick latency is. No one would be excited to know that a mouse can stand on
a surface at 105 degrees for 3.2 seconds without licking its paws. But we do very much
care that the population mean of paw-lick latencies for morphine-tolerant mice is longer
in one context than in another.
1.11 I would expect that your mother would continue to wander around in a daze,
wondering what happened.
1.13 Three examples of measurement data: performance on a vigilance task; typing
speed, blood alcohol level.
1.15 Relationship: The relationship between stress and susceptibility to disease; the
relationship between driving speed and accident rate.
1.17 You could have one group of mice trained and tested in the same condition, one
group trained in one condition and tested in the other, and a group given a placebo in the
training context but given morphine in the testing condition.
1.19 This is an Internet search exercise without a fixed answer. The Statistics Homepage
is an online statistics text. Various departments offer data sets, computing advice, and
clarifying examples.
2
Chapter 2-Basic Concepts
2.1 Nominal: names of students in the class; Ordinal: the order in which students hand
in their first exam; Interval: the student’s grade on that first exam; Ratio: the amount of
time that the student spent studying for that exam.
2.3 If the rat lies down to sleep in the maze, after performing successfully for several
trials, this probably says little about what the animal has learned in the task. It may say
more about the animals level of motivation.
In this exercise I am trying to get the students to see that there is often
quite a difference between what you and I think our variable is measuring
and what it actually measures. Just because we label something as a
measure of learning does not make it so. Just because the numbers
increase on a ratio scale (twice as much time in the maze) doesn’t mean
that what those numbers are actually measuring is ratio (twice as much
learning).
2.5 We have to assume the following at the very least (and I am sure I left out some)
1. Mice are adequate models for human behavior.
2. Morphine tolerance effects in mice are like heroin tolerance effects in humans,
3. Time on a warm surface is in some way analogous to a human response to
heroin.
4. A context shift for mice is analogous to a context shift for humans.
5. A drug overdose is analogous to pain tolerance.
2.7 The independent variables are the sex of the subject and the sex of the other person.
2.9 The experimenter expected to find that women would eat less in the presence of a
male partner than in the presence of a female partner. Men, on the other hand, were not
expected to vary the amount that they ate as a function of sex of their partner.
2.11 We would treat a discrete variable as if it were continuous if it had many different
levels and were at least ordinal.
2.13 When I drew 50 numbers 3 times I obtained 29, 26, and 19 even numbers,
respectively. For my third drawing only 38 percent of my numbers were even, which is
probably less than I might have expected—especially if I didn’t have a fair amount of
experience with similar exercises.
3
2.15 Eyes level condition:
a) X3 = 2.03; X5 = 1.05; X8 = 1.86
b) ∑X = 14.82
10
c) i 1 Xi  14.82
2.17 Eyes level condition:
a) (∑X)2 = 14.822 = 219.6324; ∑X2 = 1.652 + ... + 1.732 = 23.22
b) ∑X/N = 14.82/10 = 1.482
c) This is the mean, a type of average.
The above answers are the variance and standard deviation of Y. You
really aren’t going to do much more calculation that this.
2.19 Putting the two sets of data together:
a) Multiply pairwise
b) ∑XY = 22.27496
c) ∑X∑Y = 14.82*14.63 = 216.82
d) XY ≠ XY. They do differ, as you would expect.
 X Y 22.7496  14.82 *14.63
 XY  N
1.0679
10
e)


 .1187
N 1
9
9
2.21
X
X+4
5
9
7
11
3
7
6
10
3
7
∑X = 24
∑(X + 4) = 44 = (24 + 5*4)
2.23 In the text I spoke about room temperature as an ordinal scale of comfort (at least
up to some point). Room temperature is a continuous measure, even though with respect
to comfort it only measures at an ordinal level.
2.25 The Beth Perez story:
a) The dependent variable is the weekly allowance, measured in dollars and
cents, and the independent variable is the sex of the child.
b) We are dealing with a selected sample—the children in her class.
c) The age of the students would influence the overall mean. The fact that these
children are classmates could easily lead to socially appropriate responses—or
what the children deem to be socially appropriate in their setting.
d) At least within her school, Beth could randomly sample by taking a student
roster, assigning each student a number, and matching those up with numbers
drawn from a random number table. Random assignment to Sex would
obviously be impossible.
e) I don’t see negative aspects of the lack of random assignment here because
that is the nature of the variable under consideration. It would be better if we
could randomly assign a child to a sex and see the result, but we clearly can’t.
f) The outcome of the study could be influenced by the desire of some children
to exaggerate their allowance, or to minimize it so as not to appear too
4
different from their peers. I would suspect that boys would be likely to
exaggerate.
g) The descriptive features of the study are her statements that the boys in her
class received $3.18 per week in allowance, on average, while the girls
received an average of $2.63. The inferential aspects are the inferences to the
population of all children, concluding that “boys” get more than “girls.”
2.27 I would record the sequence number of each song that is played and then plot them
on a graph. I can’t tell if they are truly random, but if I see a pattern to the points I can be
quite sure that they are not random.
I think that it is important for students to become involved with the
Internet early on. There is so much material out there that will be helpful,
and you have to start finding it now. I find it impossible to believe that my
explanations of concepts are always the best explanations that could be
given and that they serve each student equally well. If one explanation
doesn’t make sense, you can find others that may.
5
Chapter 3-Displaying Data
3.1 Katz et al (1990) No Passage Group:
4
There is too little data to say very
much about the shape of this
distribution, but it certainly isn’t
looking normally distributed.
Frequency
3
2
1
<34
34.5
35.5
36.5
37.5
38.5
39.5
40.5
41.5
42.5
43.5
44.5
45.5
46.5
47.5
48.5
49.5
50.5
51.5
52.5
53.5
54.5
55.5
56.5
57
•
0
Test Score
3.3 I would use stems of 3*, 3., 4*, 4. 5*, and 5. for this display.
3.5 Compared to those who read the passages:
a) Almost everyone who read the passages did better than the best person who
did not read them. Certainly knowing what you are talking about is a good
thing (though not always practiced).
b)
4
68966
44343
6669697
42102
57557
|
|
|
|
|
|
|
|
|
|
|
3* |
3. |
4* |
4. |
5* |
5. |
6* |
6. |
7* |
7. |
HI |
5669
66
21232231
5
91 93
Notice that I have entered the data in the order in which I encountered
them, rather than in increasing order. It makes it easier.
c) It is obvious that the two groups are very different in their performance. We
would be worried if they weren’t.
d) This is an Internet exercise with no fixed answer. That source is far more
advanced than the students would be at this time, but I think that they should be
able to read it if they just skip over what they don’t understand.
6
3.7 The following is a plot (as a histogram) of reaction times collapsed across all
variables.
3.9 Histogram of GPA scores
3.11 (1) Mexico has very many young people and very few old people, while Spain has
a more even distribution. (2) The difference between males and females is more
pronounced at most ages in Spain than it is in Mexico. (3) You can see the high infant
mortality rate in Mexico.
3.13 The distribution of those whose attendance is poor is far more spread out than the
distribution of normal attendees. This would be expected because a few very good
students can score well on tests even when they don’t attend, but most of the poor
attenders are generally poor students who would score badly no matter what. The
difference between the average grades of these two groups is obvious.
3.15 As the degree of rotation increases, the distribution of reaction time scores appears
to move from left to right—which is also an increase.
I think it is a good idea to really think through this problem, rather than to
just take the answer as given. It is important to see that looking at data can
lead to conclusions to scientific questions, even without formal statistical
tests. Many students have a hard time seeing the relationship between data
7
and a question they would like to ask. (Probably many older adults do as
well.)
3.17 The data points are probably not independent in that data set. As time went on, there
would be changes in the subject’s performance. At first he might get better with practice,
but then fatigue would start to set in. Since the data are given in the order in which they
were collected, at least within each condition, data nearer in time should be more similar
than data farther apart in time.
3.19 The amount of shock that a subject delivers to a white participant does not vary as a
function of whether or not that subject has been insulted by the experimenter. However,
the black participants do suffer more shocks when the subject has been insulted.
3.21 Wikipedia gives an excellent set of data on HIV/AIDS prevalence at
http://en.wikipedia.org/wiki/List_of_countries_by_HIV/AIDS_adult_prevalence_rate
3.23 There is a tremendous increase in Down’s Syndrome in children born to older
mothers. This increase doesn’t really take off until mothers are in their 40s, but with
parents delaying having children, this is a potential problem.
Percent Down's Infants
2.5
2
1.5
1
0.5
0
15
20
25
30
35
40
Maternal Age
45
50
8
3.25 Smoking and low birthweight:
Percent Births < 2500 grams
The data are given as the percentage of births for each group that were less than
2500 grams.
12
Smokers
10
NonSmokers
8
6
4
2
0
1989
1990
1991
Year
1992
1993
The relationship is unlikely to be a fluke because it is so consistent year after year. You
can see that within each group there is very little variability.
Students often wonder why behavioral scientists care about what appears
to be a public health problem. But public health problems are very often
behavioral problems. Psychologists spend a great deal of time dealing with
the behavioral consequences of low birthweight, and trying to find ways of
lowering the rate, and with addictions such as smoking.
3.27 White females have a longer life expectancy than black females, but the difference
has shrunk considerably since 1920, though recent changes have been modest.
9
Chapter 4-Measures of Central Tendency
4.1 Mode = 72; Median = 72, Mean = 70.18
4.3 Even without reading the passage, students are still getting about twice as many
items correct as they would by chance. This suggests that the test, while testing reading
comprehension, is also testing something else. I am not surprised at these results because
most students can guess at better than chance levels.
I think it is worth pointing out that these data suggest that the test
measures something other than reading comprehension. Most students just
say “they were able to guess intelligently,” without realizing that this
means that the test is somehow measuring guessing ability. This will
become more obvious when we talk about correlation in Chapter 9. Any
positively skewed distribution will have a mean greater than the median.
4.5 The mean falls above the median.
4.7 Rats running a straight alley maze:
X 320
X  320; X 

 21.33
N
15
Median  21
4.9 Multiplying by a constant (5):
Original data 8 3 5 5 6 2
Revised data 40 15 25 25 30 10
Mean = 4.833, Mode = 5, Median = 5
Mean = 24.17 = 5×4.833, Mode = 25,
Median = 25
4.11 Measures of central tendency for ADDSC and GPA:
ADDSC
Mode = 50
Median = 50
Mean = 4629/88 = 52.6
GPA
Mode = 3.00
Median = 2.635
Mean = 216.15/88 =2.46
10
4.13 The means are very nearly the same for the two conditions.
4.15 The only measure that is acceptable for nominal data is the mode, because the mode
is the only one that does not depend on the relationships among the points on the scale.
4.17 Class attendance:
Regular Attendees
Mean = 276.42; Median = 276
Poor Attendees
Mean = 248.33; Median = 256
The two groups were 20 points apart in terms of the medians, and about 25 points
apart in terms of means. Clearly, those students who come to class do better.
Because this is not a true experiment (we don’t assign subjects to groups
at random), we don’t know exactly what it means. I would like to think
that students did poorly because they didn’t hear my brilliant
presentations, but it could also be that poorer students in general are less
likely to come to class. This is an issue of confounding, and it is a good
example making the preference for random assignment apparent in a
situation with which most students can identify.
4.19 This is an Internet activity in which there is no fixed answer.
4.21 a) mean = 46.57; 10% trimmed mean = 46.67.
b) mean = 28.4; 10% trimmed mean = 25.0
11
c) Trimming was more effective in the second example because the second
distribution was quite positively skewed.
4.23 The Male Optimists had a mean of 1.016, while the Male Pessimists had a mean of
0.945. This difference is very reliable.
12
Chapter 5-Measures of Variability
5.1 Variability of NoPassage group:
Range = 57 – 34 = 23
St. Dev. = 6.83
Variance = 46.62
5.3 The variability of the NoPassage group is much smaller than the variability of the
Passage group. If this difference turns out to be reliable, it could possibly be explained by
the fact that the questions for the Passage group are asking for more than guessing and
test-taking skills, and there may be greater variability due to variability in knowledge. On
the other hand, it is not uncommon to find one standard deviation equal to two to three
times another in small samples.
5.5 Percentages within two standard deviations in Exercise 5.2
s = 10.61
X + 2(10.61) = 70.18 + 21.22 = 48.96 — 91.4
16 scores (or 94%) lie within 2 standard deviations of the mean
5.7 Multiplying or dividing by a constant:
Original 2
3
4
4
5
X*2
4
6
8
8 10
X/2
1 1.5
2
2 2.5
5
10
2.5
9
18
4.5
X 1= 4.57 s1 = 2.23
X 2= 9.14 s2 = 4.45
X 3 = 2.29 s3 = 1.11
5.9 Convert revised data to mean = 0
Since adding or subtracting a constant will not change the standard deviation, but
will change the mean, I can subtract 3.27 from every score for X2 in Exercise 5.8,
making the mean = 0, and keeping s2 = 1.0. the new values are
X3 -0.889 0.539 -1.842 0.539 -0.413 1.016 1.016 X 1= 0 s1 = 1
5.11 Boxplot for Exercise 5.1:
Median location = (N + 1)/2 = 29/2 = 14.5
Median = 46
Hinge location = (median location +1)/2 = 15/2 = 7.5
Hinge = 43 and 52
H-spread = 52 – 43 = 9
Inner fences = hinges + 1.5*H-spread = hinges + 1.5*9 = hinges + 13.5 = 29.5
and 65.5
Adjacent values = 34 and 57
30
35
40
45
50
55
13
60
5.13 Boxplot for ADDSC:
Median location = (N + 1)/2 = 89/2 = 44.5
Median = 50
Hinge location = (median location +1)/2 = 45/2 = 22.5
Hinge = 44.5 and 60.5
H-spread = 60.5 – 44.5 = 16
Inner fences = hinges + 1.5*H-spread = hinges + 1.5*16 = hinges + 24
= 20.5 and 85.5
Adjacent values = 26 and 78
30
35
40
45
50
55
60
70
80
90
5.15 Variance when you add a score equal to the mean.
2
s 
(X  X)2 ( X  X)2

 46.6257
N 1
27
2
2
(X  X )  (N  1)s  27(46.6257)  1258.8933
Adding a score equal to the mean will not change the sum of the deviations
but will increase the denominator to 28.
2
(X  X)
1258.8933

 44.96
N 1
28
Note that the new variance is (1-1/N) times the old variance.
2
snew

The point that I was trying to make here is that adding scores that don’t
deviate from the mean actually decrease the variance because they
decrease the average deviation from the mean.
14
100
5.17 Angle of rotation:
5.19 The following is a cut-and-paste from the JMP help screen. (I don’t expect
students to make all of these distinctions from what they are given, because many of the
lines overlap.)
5.21 Treatment of anorexia:
I would hypothesize that the two treatment groups would show more of a weight gain
than the control group, but I have no reason to predict which treatment group would do
better. I would assume that the variability would be about the same within each group.
15
Complete (Before and After) data for the three groups—from which difference
scores were derived:
Cognitive
Behavioral
3.01
1.40
7.31
Mean
Median
St. Dev.
Family
Therapy
7.26
9.00
7.16
Control
-.45
-.35
7.99
10
5
8
4
6
3
4
2
2
1
Std. Dev = 7.31
Std. Dev = 7.16
Mean = 3.0
Mean = 7.3
N = 29.00
0
-10.0
-5.0
-7.5
0.0
-2.5
5.0
2.5
10.0
7.5
15.0
12.5
N = 17.00
0
20.0
-5.0
17.5
0.0
-2.5
Cognitive Behavior Therapy
5.0
2.5
10.0
7.5
15.0
12.5
20.0
17.5
22.5
Family Therapy
6
5
4
3
2
1
Std. Dev = 7.99
Mean = -.4
N = 26.00
0
-12.5
-7.5
-10.0
-2.5
-5.0
2.5
0.0
7.5
5.0
12.5
10.0
15.0
Control Group
If we look at the weight gain or loss, it would appear that the Control group remained
stable, but the two treatment groups gained weight. The gain is greater for the Family
Therapy group.
5.23 The descriptive statistics from SPSS are given below. The variable labels should be
clear.
16
Notice that the Winsorized variance is considerably greater than the trimmed variance, as
it should be. However, it is lower than the variance of the original data, reflecting the fact
that the extreme values have been replaced. Cognitive behavior scores were positively
skewed, with several quite high values and one or two low values. Trimming and
Winsorizing reduced the influence of those values. This causes the Winsorized variance
to be considerably smaller than the original variance. The trimmed mean is considerably
smaller than the original mean, but the Winsorized mean is only slightly smaller.
17
Chapter 6-The Normal Distribution
6.1 Distribution of original values:
4
Frequency
3
2
1
0
Score / Deviation / z
For the first distribution the abscissa would take on the values of:
1
2
3
4
5
6
7
For the second distribution the values would be:
-3
-2
-1
0
1
2
3
For the third distribution the values would be:
-1.90 -1.27 -0.63 0
0.63 1.27 1.90
In these calculations I used the parameters as given, rather than the statistics
calculated on the sample.
6.3 Psychology 1 exam grades:
z
X


165  195
 1.0
30
z
X 


225  195
 1.0
30
a) The percentage between 165 and 225 is the percentage between z = -1.0 and z
= 1.0. This is twice the area between z = 0 and z = 1 = 2×0.3413 = .6826.
b) The percentage below 195 is just the percentage below z = 0 = .500.
c) The percentage below z = 1 is the percentage in the larger portion = .8413.
6.5 Guessing on the Psychology 1 exam:
a) We know the mean and standard deviation if the students guess; they are 75
and 7.5, respectively. We also know that a z score of 1.28 cuts off the upper 10%.
We simply need to convert z = 1.28 to a raw score.
X  75
1.28 
1.28* 7.5  75  X
7.5
X  9.6  75  84.6
b) For the top 25% of the students the logic is the same except that z = 0.675.
18
X  75
.675* 7.5  75  X
7.5
5.0625  75  X  80.0625
.675 
c) For the bottom 5% the cutoff will be z = -1.645.
X  75
1.645 
 1.645*30  75  X
30
X  75  49.35  25.65
d) I would conclude that students were not just guessing, and could make use of
test-taking skills that they had acquired over the years.
There is a difference between Exercises 6.3 and 6.4 on the one hand, and
6.5 on the other. In the first two we are talking about performance on the
test if students take it normally. There the mean is 195. In Exercise 6.5 we
are talking about performance if the students just guessed purely at
random without seeing the questions, but only the answers. Here the mean
is 75, with a standard deviation of 7.5. These parameters are given by the
binomial distribution with N = 300, p = .25, and q = .75, though the
students would certainly not be expected to know this.
6.7 Reading scores for fourth and ninth grade children:
a)
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
15
20
25
30
35
40
45
b) To do better than the average 9th grade student, the 4th grader would have to
have a score of 30 or higher.
30  25
1

5
The probability that a fourth grader would exceed a score of 30 is the probability
of a z greater than 1.00 = .1587.
c) The probability of a 9th grader doing worse than the average 4th grader is the
probability of a 9th grader getting a score below 25, which is the probability of
being more than half a standard deviation below the mean, which is .3085.
z
X

19
6.9 Diagnostically meaningful cutoffs:
X   X  50
1.28 


10
12.8  50  X  62.8
A T score of 62.8 is the score that cuts off the top 10% of the distribution, and is
therefore a diagnostically meaningful cutoff.
6.11 Seat belt study:
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
b) We need the probability of getting a 62 if the mean is 44 with a standard
deviation of 7.
62  44
z
 2.57
7
The probability of z > 2.57 = .0051. This is such a small probability that we will
probably conclude that the student just made up the data, rather than collecting
them honestly.
6.13 Distribution of correct responses
a) Distribution
b) The easiest way to find the cutoff for the lowest 10% is to simply take the
sample data and count them, empirically finding the point with 10% of the scores
below it.
20
6.15 Reaction time data:
X  1.626
.6373
0.675*.6373  1.626  X  2.06
0.675 
For a normal distribution we would expect 75% of the scores to be equal to or less
than 2.06 seconds. In our data the 75th percentile is 1.88 seconds.
6.17 Identifying the highest 2% of Behavior Problem scores:
The upper 2% is cut off by z = 2.05
X  50
2.05 
10
2.05 *10  50  X  70.5
The critical cutoff is a score of 70.5.
6.19 The statisticians were upset because, by defining “overweight” as weighing more
than 95% of peers (i.e. above the 95th percentile), the article seemed to be suggesting that
there were 22% of children in the top 5%. Moreover, the article says that in 1986 only
8% of children were in the top 15%. That is just silly—it is analogous to “all of the
children are above average.” I assume that they meant to say that 22% (etc.) were above
what the 95th percentile was some years ago, but that is a different thing. Even if that is
the case, the results still look too extreme to be likely.
6.21 Histogram of combined data on emotional stability
4 000
3 000
2 000
1 000
Std . Dev = 1 3 . 4 4
Me an = 1 0 0 .0
N = 10000.00
0
0
0.
0
2 .0
0
1 9 .0
0
1 8 .0
70
1 .0
0
6
1 .0
0
1 5 .0
0
1 4 .0
30
1 .0
20
1 .0
0
1 1 .0
0
10 0
0.
9 0
0.
8 0
.
70 0
.
60 0
0.
5 0
0.
4 0
.
30 0
.
20
Co m b in e d Dat a
Notice that we have combined two normal distributions with the same mean, but the
resulting distribution is not normal, as can be seen by comparing it to the superimposed
normal curve. If the means were very different the distribution would become bimodal.
21
Chapter 7-Basic Concepts of Probability
7.1 Views of probability:
a) Analytic—If two tennis players are exactly equally skillful so that the outcome
of their match is random, the probability is .50 that Player A will win the
upcoming match.
b) Relative Frequency—If in past matches Player A has beaten Player B on 13 of
the 17 occasions they have played, then, unless something has changed, Player
A has a probability of 13/17 = .76 of winning their upcoming match.
c) Subjective—Player A’s coach feels that she has a probability of .90 of
winning her upcoming match with Player B.
7.3 More raffle tickets:
a) The probability winning second prize given that you did not win first is 1/999
= .001.
b) The probability that mom comes in first and you are second = 1/1000 * 1/999
= .000001.
c) The probability of you first and mom second = 1/1000 * 1/999 = .000001
d) The probability that the two of you will take the top two prizes is .000001 +
.000001 = .000002.
7.5 Part a) of Exercise 7.3 dealt with conditional probabilities.
7.7 What is the probability that you will feel better about your life given that you seek
psychological counseling? The research hypothesis is that those who seek help when they
need it feel better about life than those who refuse to seek help.
7.9 The mother and child are both sleeping for 11 hours, so the probabilities must be
based on the remaining 13 hours.
p(mom looking) = 2/13 = .154; p(baby looking) = 3/13 = .231; p(both looking) =
.154*.231 = .036.
7.11 We would expect 3.33 percent of the fliers to end up in the trash if the message and
the behavior were independent. In fact, Geller et al. found 4.5 percent of those fliers in
the trash. This may look like a very small difference, but given the number of fliers that
were handed out, it is a reliable one. It would appear that having a message on a flier
increases its probability of being disposed of properly.
7.13 A continuous variable that is routinely treated as if it were discrete is children’s
learning abilities, where placement in classes often assumes that the child falls within one
category or another.
7.15 If we assume that we know nothing about the applicant, the probability of their
being admitted is the probability that they fall above the 80th percentile (which equals
22
.20) times the probability that they will be admitted if they do, which is 10/100 = .10. The
probability is .20*.10 = .02. Alternatively, we know that 10 out of 500 are admitted, so
we could take the probability as being 10/500 = .02, which is the same thing.
7.17 ADDSC N = 88 X = 52.6 s = 12.42
[calculated from data set]
50  52.6
z
 0.21
12.42
The probability associated with z = -.21 is .5832.
7.19 Dropouts with ADDSC > 60:
p(dropout|ADDSC > 60) = 7/25 = .28
7.21 Conditional and unconditional probability of dropping out:
p(dropout) = 10/88 = .11
p(dropout|ADDSC > 60) = .28
Students are much more likely to drop out of school if they scored at or above
ADDSC = 60 in elementary school.
7.23 If there is no discrimination in housing, then a person’s race and whether or not
they are offered a particular unit of housing are independent events. We could calculate
the probability that a particular unit (or a unit in a particular section of the city) will be
offered to anyone in a specific income group. We can also calculate the probability that
the customer is a member of an ethnic minority. We can then calculate the probability of
that person being shown the unit assuming independence and compare that answer
against the actual proportion of times a member of an ethnic minority was offered such a
unit.
7.25 The data again would appear to show that the U. S. Attorneys are more likely to
request the death penalty when the victim was White than when the victim was Nonwhite. (This finding is statistically significant, though we won’t address that question
until Chapter 19.)
7.27 In this situation we begin with the hypothesis that African Americans are fairly
represented in the population. If so, we would expect 0.43% of the pool of 2124 people
from which juries are drawn are African American. That comes out to be an expectation
of 9.13 people. But the pool actually only had 4 African Americans. We would not expect
exactly 9 people—we might have 7 or 8. But 4 sounds awfully small That is such an
unlikely event if the pool is fair that we would probably conclude that the pool is not a
fair representation of the population of Vermont. An important point here is that this is a
conditional probability. If the pool is fair the probability of this event is only .05—an
unlikely result.
23
Chapter 8-Hypothesis Testing
8.1 Last night’s hockey game:
a) Null hypothesis: The game was actually an NHL hockey game.
b) On the basis of that null hypothesis I expected that each team would earn
somewhere between 0 and 6 points. I then looked at the actual points and
concluded that they were way out of line with what I would expect if this were
an NHL hockey game. I therefore rejected the null hypothesis. Notice that I
haven’t drawn a conclusion about what type of game it actually was, because
that is not what I set out to test.
8.3 A Type I error would be concluding that I was shortchanged when in fact I was not.
8.5 The rejection region is the set of outcomes for which we would reject the null
hypothesis. The critical value would be the minimum amount of change below which I
would reject the null. It is the border of the rejection region.
8.7 For the Mode test I would draw a very large number of samples and calculate the
mode, range, and their ratio (M). I would then plot the resulting values of M.
8.9 Guessing the height of the chapel.
a) The null hypothesis is that the average of two guesses is as accurate as one
guess. The alternative hypothesis is that the average guess is more accurate than
the single guess.
b) A Type I error would be to reject the null hypothesis when the two kinds of
guesses are equally accurate. A Type II error would be failing to reject the null
hypothesis when the average guess is better than the single guess.
c) I would be tempted to use a one-tailed test simply because it is hard to image
that the average guess would be less accurate, on average, than the single guess.
8.11 A sampling distribution is just a special case of a general distribution in which the
thing that we are plotting is a statistic which is the result of repeated sampling.
8.13 Magen et al (2008) study
a) The null hypothesis is that the phrasing of the question will not effect the
outcome—the means of the two groups are equal in the population. The
alternative hypothesis is that the mean will depend on which condition the person
is in.
b) I would compare the two group means.
c) If the difference is significant I would conclude that the phrasing of the choice
makes a real difference in the outcome.
24
8.15 Rerunning Exercise 8.14 for  = .01:
We first have to find the cutoff for  = .01 under a normal distribution. The
critical value of z = 2.33 (one-tailed), which corresponds to a raw score of 42.69
(from a population with  = 59 and  = 7).
We then find where 42.69 lies relative to the distribution under H1:
X   42.69  50
z

 1.04

7
From the appendix we find that .85.08% of the scores fall above this cutoff.
Therefore  = .851.
8.17 To determine whether there is a true relationship between grades and course
evaluations I would find a statistic that reflected the degree of relationship between two
variables. (The students will see such a statistic (r) in the next chapter.) I would then
calculate the sampling distribution of that statistic in a situation in which there is no
relationship between two variables. Finally, I would calculate the statistic for a
representative set of students and classes and compare my sample value with the
sampling distribution of that statistic.
8.19 Allowances for fourth-grade students:
a) The null hypothesis in this case would be the hypothesis that boys and girls
receive the same allowance on average.
b) I would use a two-tailed test because I want to reject the null whenever there
is a difference in favor of one gender over the other.
c) I would reject the null whenever the obtained difference between the average
allowances were greater than I would be lead to expect if they were paid the
same in the population.
d) I would increase the sample size and get something other than a self-report of
allowances.
8.21 Hypothesis testing and the judicial system
The judicial system operates in ways similar to our standard logic of hypothesis testing.
However, in a court we are particularly concerned with the danger of convicting an
innocent person. In a trial the null hypothesis is equivalent to the assumption that the
accused person is innocent. We set a very small probability of a Type I error, which is far
smaller than we normally do in an experiment. Presumably the jury tries to set that
probability as close to 0 as they reasonably can. By setting the probability of a Type I
error so low, they knowingly allow the probability of a Type II error (releasing a guilty
person) to rise, because that is thought to be the lesser evil.
25
Chapter 9-Correlation
9.1 Low birthweight statistics:
The two outliers would appear to have a distorting effect on the correlation
coefficient. However, if you replot the data without those points the relationship is
still apparent and the correlation only drops to -.54.
9.3 With 24 degrees of freedom, and two-tailed test at  = .05 would require r > ± .388.
9.5 We can conclude that infant mortality is closely tied to both income and the
availability of contraception. Infants born to people living in poverty are much more
likely to die before their first birthday, and the availability of contraception significantly
reduces the number of infants put at risk in the first place.
9.7 Because both income and contraception are related to mortality, we might expect that
using them together would lead to a substantial increase in predictability. But note that
they are correlated with each other, and therefore share some of the same variance.
9.9 Psychologists have a professional interest in infant mortality because some of the
variables that contribute to infant mortality are behavioral ones, and we care about
understanding, and often controlling, behavior. Psychologist have an important role to
play in world health that has little to do with pills and irrigation systems.
26
There is a great deal of data available on these issues, and you can easily
find it on the Internet. If you are interested in this question, you might also
be interested in searching for similar literature on HIV/AIDS.
This question was partly intended to make students think about the
fact that all sorts of things are of interest to psychologists. We don’t just
run animals in a maze or inquire into people’s dirty minds. In addition,
low birthweight is a risk factor for all sorts of infant outcomes.
9.11 The relationship is extremely curvilinear, even though the linear correlation is quite
high. You can see that the best fitting line misses almost all of the data points at each end
of the distribution.
9.13 The relationship between test scores in Katz’ study and SAT scores for application
purposes is a relevant question because we would not be satisfied with a set of data that
used SAT questions and yet gave answers that were not in line with SAT performance.
We want to know that the tests are measuring at least roughly the same thing. In addition,
by knowing the correlation between SATs and performance without seeing the questions,
we get a better understanding of some of what the SAT is measuring.
9.15 Correlation for the data in Exercise 9.14:
SAT: mean = 598.57
∑X = 16760
Test: mean = 46.21
∑Y = 1294
covYX 
r
St. Dev. = 61.57
St. Dev. = 6.73
X Y
16760*1294
780500 
N 
28
 220.3175
N 1
27
XY 
cov YX
1
sY s X1

220.3175
 .53
61.57 *6.73
With 26 df we would need a correlation of .374 to be significant. Since our value exceeds
that, we can conclude that the relationship between test scores and the SAT is reliably
different from 0.
9.17 When we say that two correlations are not significantly different, we mean that they
are sufficiently close that they could both have come from samples from populations with
exactly the same population correlation coefficient.
9.19 The answer to this question depends on the students’ expectations.
9.21 It is sometimes appropriate to find the correlation between two variables even if you
know that the relationship is slightly curvilinear. A straight line often does a remarkably
good job of fitting a curved function, provided that it is not too curved.
27
9.23 The amount of money that a country spends on health care may have little to do
with life expectancy because to change a country’s life expectancy you have to change
the health of a great many individuals. Spending a great deal of money on one person,
even if it were to extend her life by dozens of years, would not change the average life
expectancy in any noticeable way. Often the things that make a major change in life
expectancy, like inoculations, really cost very little money.
The African Red Cross estimates that there are 300-500 million cases of
malaria each year, resulting in 1.5 to 2.5 million deaths. In particular,
more than 90% of the deaths are in children under 5 years of age, and they
occur predominantly in sub-Saharan Africa. Malaria cases could be cut by
up to a third with insecticide treated bednets, which are very cheap by U.S.
healthcare standards.
9.25 Extremely exaggerated data on male and female weight and height to show a
negative slope within gender but a positive slope across gender:
Height
Weight
Gender
68
72
66
69
70
185
175
190
180
180
Male Male Male Male Male
66
60
64
65
63
135
155
145
140
150
Fem. Fem. Fem. Fem. Fem.
200
190
Weight
180
170
160
150
140
130
55
57
59
61
63 65
Height
67
69
71
73
What we are effectively plotting here is the relationship between Gender
and Weight, more than between Height and Weight.
9.27 We have confounding effects here. If we want to claim that red wine consumption
lowers the incidence of heart disease, we have a problem because the consumption of red
wine is highest in those areas with the greatest solar radiation, which is another potential
cause of the effect. We would have to look at the relationship between red wine and heart
disease controlling for the effects of solar radiation.
9.29 This is an Internet search with no fixed answer.
28
Chapter 10-Regression
10.1 Regression equation predicting low birthweight from high-risk fertility.
Y = Low Birthweight Percentage
X = High-risk Fertility
sY = 0.698 sY2 = 0.487
Y = 6.70
sX = 6.289 sX2 = 39.553
X = 46.00
covXY = 2.7245
b
cov XY 2.7245

 0.069
s X2
39.553
a  Y  bX  6.70  (0.069) 46.00   3.53
Yˆ  0.069(X)  3.53
10.3 If the high risk fertility rate jumped to 70, we would predict that the incidence of
birthweight < 2500gr would go to 8.35.
Yˆ  bX  a  0.0689 X  3.53
 0.0689 * 70  3.53  8.35
This assumes that there is a causal relationship, which is plausible in some ways,
but not proven.
It may be trivial to point this out, but here we have a real world situation
where we can say something about changing trends in society and their
possible effects.
10.5 I would be more comfortable speaking about the effects on Senegal because it is
already at approximately the mean income level and we are not extrapolating for an
extreme country.
This may have little to do with a statistics course in psychology, but there have
been some noticeable improvements in infant mortality in Senegal, and one
device that has made a difference is a warm table on which newborn infants can
be placed. This may interest students who probably think of advances in medicine
in terms of MRIs. http://www.usaid.gov/stories/senegal/pc_sn_infant.html
10.7 Prediction of Symptoms score for a Stress score of 45:
Regression equation: Yˆ = 0.7831X + 73.891
If X = 45:
= 0.7831*45 + 73.891
Predicted Symptoms
= 109.13
29
10.9 Subtracting 10 points from every X or Y score would not change the correlation in
the slightest. The relationship between X and Y would remain the same.
10.11 Diagram to illustrate Exercise 10.10:
35
30
25
10 Point
difference
Y
20
15
10
5
0
0
2
4
6
8
10 12 14 16 18 20
X
10.13 Adding a constant to Y:
25
20
Y
15
10
5
0
0
2
4
6
8
10 12 14 16 18 20
X
a) From this figure you can see that adding 2.5 to Y simply raised the regression
line by 2.5 units.
b) The correlation would be unaffected.
10.15 Predicting GPA (Y) from ADDSC (X):
b
cov XY 6.580

 0.0426
s X2
154.431
a  Y  b X  2.456  0.0426*52.602  4.699
Yˆ  0.0426 X  4.699
When Hans Huessy and I first collected these data I was somewhat
disheartened by how well we were doing (and to some extent I still am).
We can take a measure in elementary school that is quickly filled out by
the teacher, and make an excellent prediction about how the student will
30
do in high school. That may be nice statistically, but I don’t think we like
to feel that children are that locked in.
10.17 The correlation dropped to -.478 when I added and subtracted .04 from each Y
value. This drop was caused by the addition of error variance.
One way to solve for the point at which they become equal is to plot a few predicted
values and draw regression lines. Where the lines cross is the point at which they are
equal. A more exact way of to set the two equations equal to each other and solve for X.
0.9X  31  1.5X  18
Collecting terms we get
31  18  1.5X  0.9X
13  0.6X
X  13/ 0.6  21.67
To check this, substitute 21.67 in both equations
0.9 * 21.67  31  50.503  1.5* 21.67  18
10.19 Weight as a function of height for males:
220
200
Weight
180
160
140
120
100
60
62
64
66
68
70
Height for Males
72
74
76
The regression solution that follows is a modification of printout from SPSS.
Equation Number 1
Dependent Variable..
Variable(s) Entered on Step Number
1..
HEIGHT
Multiple R
.60368
R Square
.36443
Adjusted R Square
.35287
Standard Error
14.99167
WEIGHT
Analysis of Variance
Regression
Residual
F =
31.53637
DF
1
55
Sum of Squares
7087.79984
12361.25279
Signif F =
Mean Square
7087.79984
224.75005
.0000
------------------ Variables in the Equation -----------------Variable
B
SE B
Beta
T Sig T
31
HEIGHT
(Constant)
4.355868
-149.933617
.775656
54.916943
.603680
5.616
-2.730
.0000
.0085
b) The intercept is given as the “constant” and is -149.93, which has no
interpretable meaning with these data. The slope of 4.356 tells us that a oneunit increase in height is associated with a 4.356 increase in weight.
c) The correlation is .60, telling us that for females 36% of the variability in
weight is associated with variability in height.
d) Both the correlation and the slope are significantly different from 0, as shown
by an F of 31.54 and a (equivalent) t of 5.616.
10.21 Predicting my own weight, for which I use the equation from Exercise 10.19:
Yˆ = 4.356*height - 149.93
Yˆ = 4.356*68 - 149.93 = 146.28
a) The residual is Y - Yˆ = 156 - 146.28 = 9.72. (I have gained some weight since
I last used this example.)
b) If the students who supplied the data gave biased responses, then, to the degree
that the data are biased, the coefficients are biased and the prediction will not
apply accurately to me.
10.23 Predictions for a 5’6” male and female
For the male, Yˆ = 4.356*66 - 149.93 = 137.57
For a female, Yˆ = 2.578*66 - 44.859 = 125.29
Difference =
12.28 pounds
10.25 Plot of Reaction Time against Trials for only the Yes/5-stimuli trials:
100
90
Reaction Time
80
70
60
50
40
30
20
10
0
200
205
210
215
220
225
Trial
230
235
240
245
250
The following regression solution is a modification of SPSS printout.
32
Equation Number 1
Dependent Variable..
Variable(s) Entered on Step Number
1..
TRIAL
Multiple R
R Square
Adjusted R Square
Standard Error
RXTIME
.01640
.00027
-.02056
12.76543
Analysis of Variance
DF
1
48
Regression
Residual
F =
.01291
Sum of Squares
2.10363
7821.89637
Signif F =
Mean Square
2.10363
162.95617
.9100
------------------ Variables in the Equation -----------------Variable
TRIAL
(Constant)
B
SE B
Beta
T
Sig T
-.014214
67.805186
.125100
28.267795
-.016397
-.114
2.399
.9100
.0204
The slope is only -0.014, and it is not remotely significant. For this set of data we can
conclude that there is not a linear trend for reaction times to change over time. From the
scatterplot above we can see no hint that there is any nonlinear pattern, either.
10.27 The evils of television:
285
Girls
Boys
NAEP Score
275
265
255
245
235
225
0
1
2
3
4
Hours Spent Watching TV
5
6
Regression equations:
Boys Yˆ = -4.821X + 283.61
Girls Yˆ = -3.460X + 268.39
b) The slopes are roughly equal, given the few data points we have, with a
slightly greater decrease with increased time for boys. The difference in
intercepts reflects the fact that the line for the girls is about 9 points below
that for boys.
c) Television can not be used as an explanation for poorer scores in girls,
because we see that girls score below boys even when we control for
television viewing.
10.29 Draw a scattering of 10 data points and drop your pencil on it.
33
b) As you move the pencil vertically you are changing the intercept.
c) As you rotate the pencil you are changing the slope.
d) You can come up with a very good line simply by rotating and raising or
lowering your pencil so as to make the deviations from the lines as small as
possible. (We really minimize squared deviations, but I don’t expect anyone’s
eyes to be good enough to do that.)
10.31 Galton’s data
a) The correlation is .459 and the regression equation is Yˆ = .646×midparent +
23.942. (Remember to weight cases by “freq”.)
b) I reran the regression requesting that SPSS save the Unstandardized prediction
and residual.
c)
d) The children in the lowest quartile slightly exceed their parents mean
(67.12 vs 66.66) and those in the highest quartile average slightly shorter
than their parents (68.09 vs 68.31).
e) It is easiest if you force both axes to have the same range and specify
that the regression line is Yˆ = 1×X + 0. (If you prefer, you can use an
intercept of 0.22 to equate the means of the parents and children.)
34
Chapter 11-Multiple Regression
11.1 Predicting quality of life:
a) All other variables held constant, a difference of +1 degree in Temperature is
associated with a difference of -.01 in perceived Quality of Life. A difference
of $1000 in median income, again with all other variables held constant, is
associated with a +.05 difference in perceived Quality of Life. A similar
interpretation applies of b3 and b4. Since values of 0 cannot reasonably occur
for all predictors, the intercept has no meaningful interpretation.
b) Yˆ = 5.37 - .01(55) + .05(12) + .003(500) - .01(200) = 4.92
c) Yˆ = 5.37 - .01(55) + .05(12) + .003(100) - .01(200) = 3.72
11.3 Religious Influence and religious Hope contribute significantly to the prediction,
but not religious Involvement.
It is worth pointing out here that even though religion Involvement does
not contribute significantly to the multiple regression, it does have a
significant simple correlation with Optimism. The matrix of correlations
(where N = 600) is
OPTIMISM
RELINVOL
RELINF
RELHOPE
OPTIMISM
1.0000
P= .
.1667
P= .000
.2725
P= .000
.2663
P= .000
RELINVOL
.1667
P= .000
1.0000
P= .
.4487
P= .000
.5439
P= .000
RELINF
.2725
P= .000
.4487
P= .000
1.0000
P= .
.4187
P= .000
RELHOPE
.2663
P= .000
.5439
P= .000
.4187
P= .000
1.0000
P= .
11.5 I would have speculated that religious Involvement was not a significant predictor
because of its overlap with the other predictors, but the tolerances kick a hole in that
theory to some extent.
That’s what happens when you ask a question before you are sure of the
answer. 
11.7 Adjusted R2 for 15 cases in Exercise 11.6:
R 0.1234 . 173
2
35
est R
2
( 1 R ) ( N 1)
( 1 . 173) ( 14)
1 
. 158
( N p 1)
( 15 4 1)
2
1 
Since a squared value cannot be negative, we will declare it undefined. This is all
the more reasonable in light of the fact that we cannot reject H0:R* = 0.
11.9 The multiple correlation between the predictors and the percentage of births under
2500 grams is .855. The incidence of low birthweight increases when there are more
mothers under 17, when mothers have fewer than 12 years of education, and when
mothers are unmarried. All of the predictors are associated with young mothers. (As the
question noted, there are too few observations for a meaningful analysis of the variables
in question.)
11.11 The multiple correlation between Depression and the three predictor variables was
significant, with R = .49 [F(3,131) = 14.11, p = .0000]. Thus approximately 25% of the
variability in Depression can be accounted for by variability in these predictors. The
results show us that depression among students who have lost a parent through death is
positively associated with an elevated level of perceived vulnerability to future loss and
negatively associated with the level of social support. The age at which the student lost
his or her parent does not appear to play a role.
11.13 The fact that the frequency of the behavior was not a factor in reporting is an
interesting finding. My first thought would be that it is highly correlated with the
Offensiveness, and that Offensiveness is carrying the burden. But a look at the simple
correlation shows that the two variables are correlated at less than r = .20.
11.15 Using random variables as predictors:
I drew the following data directly from the random number tables in the appendix
(and I didn’t cheat).
Y
5
2
3
6
9
2
6
3
9
8
X1
3
1
5
4
1
7
9
7
3
5
X2
7
6
2
1
0
6
2
3
3
6
X3
2
0
9
8
2
7
8
0
7
5
X4
7
9
1
7
9
1
8
4
9
6
X5
5
5
2
9
4
7
1
9
4
4
The multiple correlation for these data is .739, which is astonishingly high. Fortunately,
the F test on the regression is not significant. Notice that we have only twice as many
subjects as predictors.
36
This question is bound to lead to the question of how many cases we need
per variable. There is no good answer to this question. Some will tell you
that there should be at least 10 cases per predictor. I know of no argument
in defense of such a rule. Harris (1985) has suggested a rule that says that
N should exceed the number of predictors by at least 50. Cohen (1988) has
argued from the point of view of power, and gives the example that a
population correlation coefficient of .30 would require a sample size of
187 to have power = .80. This latter is sobering, but it is not a good
argument here because we have not yet discussed power in any
meaningful way.
11.17 Predicting weight:
11.19 The weighted average is 3.68, which is very close to the regression coefficient for
Height when we control for Gender.
11.21 Sex is important to include in this relationship because women tend to be smaller
than men, and thus probably have smaller, though not less effective, brains, but we
probably don’t want that contamination in our data. However, note that Sex was not
significant in the previous answer, though the sample size (and hence power) is low.
11.23 I could argue that PctSAT is a nuisance variable because we are not particularly
interested in the variable itself, but only in controlling it to allow us to have a clearer
view of Expend, which is the variable in which we are interested. At the same time, it is
an important contributor to the prediction of Combined, but we are led away from
noticing that because of our predominant interest in Expend.
37
11.25 The scatterplot follows and shows that the squared correlation is .434, which is
just what we found from the regression solution.
38
Chapter 12—Hypothesis Tests Applied to
Means: One Sample
12.1 Distribution of 100 random digits:
14
Frequency
12
10
8
6
4
2
0
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Number
12.3 The mean and standard deviation of the sample are 4.1 and 2.82, respectively, which are
reasonably close to the parameters of the population for which the sample was drawn (4.5 and
2.6, respectively). The mean of the distribution of means is 4.28, which is somewhat closer to
the population mean, and the standard deviation is 1.22.
a) The Central Limit theorem would predict a sampling distribution of the mean with a
mean of 4.5 and a standard deviation of 2.6/5 = 1.16.
b) These values are close to the values that we would expect.
12.5 If you had drawn 50 samples of size 15, the mean of the sampling distribution should still
approximate the mean of the population, but the standard error of that distribution would now be
only 2.67/15 = 0.689.
12.7 Why doesn’t the previous question address the issue of the terrible state of North Dakota’s
educational system? These students are certainly not a random sample of high school students in
North Dakota or elsewhere. Moreover, they scored above the mean of 500, which would
certainly not be expected if North Dakota’s system were inadequate. In addition, there is no
definition of what is meant by “a terrible state,” nor any idea of whether or not the SAT
measures such a concept.
12.9 Unlike the results in the two previous questions, this interval probably is a fair estimate of
the confidence interval for P/T ratio across the country. It is not itself biased by the bias in the
sampling of SAT scores.
39
12.11 Weight gain exercise:
For these data the mean weight gain was 3.01 pounds, with a standard deviation of 7.3 pounds.
This gives us
t
X   3.01  0 3.01


 2.22
s
7.3
1.357
N
29
With 28 df the critical value at  = .05, two-tailed, is 2.048, which will allow us to reject the null
hypothesis and conclude that the girls gained weight at better than chance levels in this
experiment.
There is an important movement within statistics right now in the direction
of laying a much heavier emphasis on confidence limits than on null
hypothesis tests. I think this is a very good example of a place where a
behavioral scientist might make good use of a confidence interval. I
didn’t ask the you to calculate these limits, but they are 0.227 and 5.787.
You should think about what these limits mean and about why they are
useful.
12.13 Effect size measure for data in Exercise 12.11:
One effect size measure would simply be the mean weight gain of 3.01 pounds. That statistic has
real meaning to us, especially if we keep the size of a standard deviation in mind. A dubious
alternative method would be to calculate an estimate of d̂ using the standard deviation of the
gain scores as our base.
X 3.01
dˆ  
 0.41
s
7.3
If I knew the standard deviation at baseline, that would make a good denominator. Unfortunately
that information is not available, and 7.3 is the standard deviation of weight gains, and it is
difficult to see how that creates a reasonable metric..
12.15 I needed to solve for t in Exercise 12.14 because I did not know the population variance.
12.17 Testing the null hypothesis that children under stress report lower levels of anxiety:
t
X   11.00  14.55 3.55


 3.50
s
6.085
1.014
N
36
40
With 35 df the critical value of t at  = .05 (two-tailed) is ±2.03. We can reject H0 and conclude
that children under stress show significantly lower levels of anxiety than normal children.
Here is another situation where the data say that children report lower
levels of anxiety, but it was necessary to first verify that their reports could
be relied upon.
12.19 Yes, the results in Exercise 12.18 are consistent with the t test in Exercise 12.17.
The t test showed that these children showed lower levels of anxiety than the normal
population, and the confidence interval did not include 14.55.
41
Chapter 13—Hypothesis Tests Applied to
Means: Two Related Samples
13.1 Sexual satisfaction of married couples. (Dependent variable = 1 for never fun and 4 for
always fun.)
Husband
Wife
Difference
t
Mean = 2.725
Mean = 2.791
Mean = -0.066
St. error diff = 0.136
St. Dev. = 1.165
St. Dev. = 1.080
St. Dev. = 1.298
N = 90
D   0.066  0 0.066


 0.48
sD
1.298
0.136
N
91
With 90 df the critical value of t is approximately ±1.98, so we cannot reject the null hypothesis.
We have no reason to conclude that wives are more or less satisfied, on the average, than their
husbands.
This is a matched-sample t because responses came from married couples. I would hope that
there is some relationship between the sexual satisfaction of one member of the couple and the
satisfaction of the other—but perhaps that is hoping for too much.
13.3 Scatterplot of data from Exercise 13.1:
(The frequencies of each combination are shown above the points.)
The correlation is .33, which is significant at  = .05
This analysis finally addresses the degree of compatibility between couples, rather than mean
differences. The correlation is significant, but it is not very large. That scatterplot is not very
informative because of the discreteness of the scale and hence the overlapping of points.
42
13.5 The most important thing about a t test is the assumption that the mean (or difference
between means) is normally distributed. Even though the individual values can only range over
the integers 1 – 4, the mean of 91 subjects can take on a large number of possible values between
1 and 4. It is a continuous variable for all practical purposes, and can exhibit substantial
variability.
I drew 10,000 random samples of N = 91 from treating Husband scores as a population. The
distribution of means follows.
13.7 We used a paired-t test for the data in Exercise 13.6 because the data were paired in the
sense of coming from the same subject. Some subjects generally showed more beta-endorphins
at any time than others, and we wanted to eliminate this subject-to-subject variability that has
nothing to do with stress. In fact, there isn’t much of a relationship between the two measures,
but we can’t fairly ignore it anyway. (Even though the correlation is not statistically significant, I
think that we would look foolish if we did not treat these as paired data.)
13.9 If you look at the actual numbers given in Exercise 13.6, you would generally be led to
expect that whatever was used to measure beta-endorphins was only accurate to the nearest half
unit. Fair enough, but then where did values of 5.8 and 4.7 come from? If we can tell the
difference to a tenth of a unit, why are most, but not all, of the scores reported to the nearest .5?
It’s a puzzle.
13.11 You would not want to use a repeated measures design in any situation where the first
measure will “tip off” or sensitize a subject to what comes next. Thus if you are going to show a
43
subject something and then ask him to recall it, the next time you show any item the subject will
expect to have to recall it. Similarly we should be careful about repeated measures in drug
studies because drugs often last surprisingly long in the body.
13.13 How many subjects do we need?
First of all, in Exercise 13.6 we had 19 subjects, giving us 18 df. This means that for a one tailed
test at  = .01 we will need a t of at least 2.552 to be significant. So we can substitute everything
we know about the data except for the N, and solve for N.
D 0
sD
N
7.70
2.552 
13.52
t
N
Therefore
N 
2.552 *13.52
 4.481
7.70
N  4.4812  20.078  21 subjects
This exercise should be a good lead in to power, because you should be
able to see the logic of this without knowing a thing about power. But in
the chapter on power we are really doing the same thing but disguising it
behind a bunch of Greek symbols. (Well, perhaps that’s a bit unfair.)
Notice that we had to guess at N to get the critical value of t before we
could calculate the needed N. Using the sample size they had is a
reasonable approximation.
13.15 As the correlation between the two variables increases, the standard error of the difference
will decrease, and the resulting t will increase.
13.17 First guess versus average guess
44
Notice that this is the same t as we had in Exercise 13.12. This is because there is a perfect linear
relationship between first, second, and average guesses. (If you know the first guess and the
average, you can compute what the second guess must have been.)
13.19 If I had subtracted the Before scores from the After scores I would simply change the sign
of the mean and the sign of the t. There would be no other effect.
13.21 There is no answer I can give for this question because it asks the students to design a
study.
45
Chapter 14—Hypothesis Tests Applied to
Means: Two Independent Samples
14.1 Reanalysis of Exercise 13.1 as if the observations were independent:
Males
Females
t
Mean = 2.725
Mean = 2.791
X1  X 2
2
1
2
2
s
s

N1 N 2


s = 1.165
s = 1.080
NM = 91
NF = 91
2.725  2.791
1.1652 1.0802

91
91
0.066 0.066

 .40
0.166
.028
[t.05(180) = ±1.98] Do not reject the null hypothesis.
We can conclude that we have no reason to doubt the hypothesis that males and females
are equal with respect to sexual satisfaction.
There was no need to pool the variances here because the sample sizes
were equal. If we did pool them, the pooled variance would have been
1.262.
14.3 The difference between the t in Exercises 13.1 and 14.1 is small because the
relationship between the two variables was so small.
14.5 Random assignment plays the role of assuring (as much as is possible) that there
were no systematic differences between the subjects assigned to the two groups. Without
random assignment it might be possible that those who signed up for the family therapy
condition were more motivated, or had more serious problems, than those in the control
group.
14.7 You can not use random assignment to homophobic categories for a study like the study of
homophobia because the group assignment is the property of the participants themselves. The
lack of random assignment here will not invalidate the findings.
14.9 In Exercise 14.8 it could well have been that there was much less variability in the
schizophrenic group than in the normal group because the number of TATs showing
positive parent-child relationship could have had a floor effect at 0.0. This did not
happen, but it is important to check for it anyway.
46
14.11 Experimenter bias effect:
Expect Good
Expect Poor
s
2
p
t
N1  1 s12   N 2  1 s22


N1  N 2  2
X1  X 2
s 2p
N1

Mean = 18.778
Mean = 17.625

s 2p
N2


s = 3.930
s = 4.173
N=9
N=8
8(15.44)  7(17.41)
 16.359
982
18.778  17.625
16.359 16.359

9
8
1.153 1.153

 0.587
3.863 1.965
[t.05(15) = ±2.131] Do not reject the null hypothesis.
We cannot conclude that our data show the experimenter bias effect.
14.13 Confidence limits for Exercise 14.8:
Mean difference = 1.45
standard error = 0.545
t.05(38) = 2.02
CI (.95)  ( X 1  X 2 )  t.05 (38) s X2 1  X 2
 1.45  2.02(0.545)  1.45  1.10
0.35    2.55
Note that the answers to Exercises 14.11 and 14.12 are in line with the
hypothesis tests, in that when we rejected the null hypothesis the
confidence limits did not include 0, and when we did not reject the null,
they did include 0.
47
14.15 Comparing GPA for those with low and high ADDSC scores:
X L  2.59
sL2  0.658
N L  75
X H  1.68
sH2  0.560
N H  13
s
2
p
 N  1 s
 1
  N 2  1 s22 74(0.658)  12(0.560)

 0.644
N1  N 2  2
75  13  2
2
1
t
X1  X 2
s 2p
N1


s 2p

N2
2.59  1.68
0.644 0.644

75
13
0.91
0.91

 3.77
.058 0.241
[t.05(86) = ±1.98] Reject H0 and conclude that people with high ADDSC scores in
elementary school have lower grade point averages in ninth grade than people with lower
scores.
Here I pooled the variances even though the Ns were substantially
different because the variance estimates were so similar.
14.17 The answer to 14.15 tells you that ADDSC scores have significant predictability of
grade point average several years later. Moreover the answer to Exercise 14.16 tells you
that this difference is substantial.
This is a nice example of a situation in which it is easy to see a test of
means as a test of predictability.
14.19 Anger with a reason is just fine.
X NoAttrib  3.40
2
sNoAttrib
 2.0736
X Attrib  5.02
s 2p 
t
2
s Attrib
 2.7556
N NoAttrib  41
N Attrib  41
 n1  1 s12   n2  1 s22   41  1  2.07362    41  1  2.75562   5.9466
n1  n2  2
XAXS
s 2p
N1

s 2p
N2

41  41  2
3.40  5.02
1.62

 3.01
5.9466 5.9466 0.538

41
41
48
The critical value is approximately 2.00, so we will reject the null hypothesis and
conclude that when given a reason for a woman’s anger, she is given more status than
when no reason was given for the anger.
14.21 If the two variances are equal, they will be equal to the pooled variance.
If you have a problem seeing this, you can take any two equal variances
and unequal Ns and try it for yourself. The answer becomes obvious when
you do.
49
Chapter 15—Power
15.1 The statement on skiing is intended to point out that just because two things are
different doesn’t mean that the larger (better, greater, etc.) one will always come out
ahead. To take a different example, one treatment might be better than another for
anorexia, but I would be very surprised if the difference was statistically significant every
time, or even that its mean was always greater than the other mean. I just hope that it is
significant most of the time.
15.3 Power for socially desirable responses:
Assume the population mean = 4.39 and the population standard deviation = 2.61
a) Effect size:
 
4.39  3.87 0.52
 1 0

 .20

2.61
2.61
b) delta:
  .20 36  1.20
c) power = .22
Notice that the value of  here is exactly the same as the value of t in that
example. This is as it should be.
15.5 For Exercise 15.3 we would need  approximately equal to 2.50, 2.80, and 3.25 for
power of .70, .80, and .90, respectively.
  N
2
 2.50 
2.50  .20 N therefore N  
  156.25
 .20 
2
 2.80 
2.80  .20 N therefore N  
  196
 .20 
2
 3.25 
3.25  .20 N therefore N  
  264.06
 .20 
Notice how quickly the required sample sizes increase, and how as p
increases the N required increases faster and faster.
50
15.7 Diagram of Exercise 15.6:


2.75
3.87
2.987 = 3.87-2.03*2.61/√36
15.9 Avoidance behavior in rabbits using a one-sample t test:
a) For power = .50 we need  = 1.95.
1  0 5.8  4.8 1.0


 .50

2
2
  N

2
 1.95 
1.95  .5 N therefore N  
  15.21
 .50 
b) For power = .80 we need  = 2.80.
  N
2
 2.8 
2.8  .5 N therefore N  
  31.36
 .50 
Because subjects come in whole units, we would need 16 subjects for power = .50
and 32 subjects for power = .80
51
15.11 Avoidance behavior in rabbits with unequal sample sizes:
  .50
N  Nh 
2 N1 N 2 2(20)(15)

 17.14
N1  N 2
20  15
   N / 2 =.5
17.14
 1.46
2
With  = 1.46, power = .31
15.13 Cognitive development of LBW and normal babies at 1 year—modified data:
a) Power calculations

1  0 25  28

 .375

8
N
20
 .375
 1.19
2
2
With  = -1.19, power = .22
 
b) t test:
t
X1  X 2
s 2p

s 2p

25  28
 1.19
64 64

20 20
N1 N 2
[t.05(38) = ±2.205] Do not reject the null hypothesis.
c) The t is numerically equal to , although t is calculated from statistics and  is
calculated from parameters. In other words,  is equal to the t we would get if the
data came out with statistics equal to the parameters,
15.15 The significant t with the smaller N is more impressive, because that test had less
power than the other, so the underlying difference is probably greater.
The fact that a significant difference with a small N is more impressive
should not lead you to conclude that small sample sizes are to be
preferred.
15.17 Social awareness of ex-delinquents—which subject pool would be better to use?
X
X
= 38 N = 50
College = 35 N = 100
Normal
X
Dropout
52
= 30 N = 25

38  35


N h  66.67


3

38  30

N h  33.33
66.67
2
33.33

2
32.66

17.32


8

Assuming equal standard deviations, the H. S. dropout group of 25 would result
in a higher value of , and therefore higher power.
15.19 Total Sample Sizes Required for Power = .60,  = .05, Two-Tailed ( = 2.20)
Effect
Size
Small
Medium
Large

0.20
0.50
0.80
One-Sample t
121
20
8
Two-Sample t
(per group)
242
39
16
Two-Sample t
(overall)
484
78
32
15.21 When can power = ?
The mean under H1 should fall at the critical value under H0. The question implies a onetailed test. Thus the mean is 1.645 standard errors above µ0, which is 100.
µ = 100 + 1.645X
= 100 + 1.645(15)/25
= 104.935
When µ = 104.935, power would equal .
15.23 The power of the comparison of TATs of parents of schizophrenic and normal subjects.
3.523  2.412
 2.968; s p  2.968  1.723
2
   3.55  2.10 1.45
 1 2

 0.842

1.723
1.723
N
20
 
 0.842
 2.66
2
2
Power = .75
s 2p 
53
15.25 Aronson’s research on stereotype threat.
s
2
p
n1  1 s12   n2  1 s22 11  1 3.17 2  12  1 3.02 2



n1  n2  2
11  12  2
 9.56
s p  3.09
X  X 2 9.64  6.58
dˆ  1

 0.99
sp
3.09
nh 
2n1n2 2 1112 

 11.48
n1  n2
11  12
 d
n
11.48
 0.99
 0.99 5.74  0.99  2.396   2.37
2
2
From Appendix D5 the power of this experiment, if these are accurate estimates of the
parameters, is .658.
54
Chapter 16—One-way Analysis of Variance
I am assuming that most people would prefer to see the solutions to these problems as
computer printout. (I will use SPSS for consistency.)
16.1 Analysis of Eysenck’s data:
a) The analysis of variance:
----- ONEWAY ----Variable RECALL
By Variable GROUP
Group Membership
Analysis of Variance
Sum of
Mean
Source
D.F.
Squares
Squares
Between Groups
1
266.4500
266.4500
Within Groups
18
190.1000
10.5611
Total
19
456.5500
F
F
Ratio Prob.
25.2294 .0001
Group
Count
Standard Standard
Mean Deviation
Error
Grp 1
Grp 2
10
10
19.3000
12.0000
2.6687
3.7417
.8439
1.1832
17.3909 TO 21.2091
9.3234 TO 14.6766
Total
20
15.6500
4.9019
1.0961
13.3558 TO 17.9442
95 Pct Conf Int for Mean
b) t test
t-tests for Independent Samples of GROUP
Group Membership
Number
Variable
of Cases
Mean
SD
SE of Mean
---------------------------------------------------------------------RECALL
Young
10
19.3000
2.669
.844
Older
10
12.0000
3.742
1.183
---------------------------------------------------------------------Mean Difference = 7.3000
Levene's Test for Equality of Variances: F= .383
P= .544
t-test for Equality of Means
95%
Variances
t-value
df
2-Tail Sig
SE of Diff
CI for Diff
---------------------------------------------------------------------Equal
5.02
18
.000
1.453
(4.247, 10.353)
Unequal
5.02
16.27
.000
1.453
(4.223, 10.377)
Notice that if you square the t value of 5.02 you obtain 25.20, which is the same as the F
in the analysis of variance. Notice also that the analysis of variance procedure produces
55
confidence limits on the means, whereas the t procedure produces confidence limits on
the difference of means.
16.3 Expanding on Exercise 16.2:
a) Combine the Low groups together and the High groups together:
Variable
By Variable
RECALL
LOWHIGH
Source
Between Groups
Within Groups
Total
Group
Mean
D.F.
1
38
39
Count
Mean
Analysis of Variance
Sum of
Mean
Squares
Squares
792.1000
792.1000
506.3000
13.3237
1298.4000
Standard
Deviation
Standard
Error
F
Ratio
59.4505
F
Prob.
.0000
95 Pct Conf Int for
Grp 1
Grp 2
20
20
6.7500
15.6500
1.6182
4.9019
.3618
1.0961
5.9927
13.3558
TO
TO
7.5073
17.9442
Total
40
11.2000
5.7699
.9123
9.3547
TO
13.0453
Here we have compared recall under conditions of Low versus High processing,
and can conclude that higher levels of processing lead to significantly better
recall.
b) The answer is still a bit difficult to interpret because both groups contain both
younger and older subjects, and it is possible that the effect holds for one age
group but not for the other.
16.5 2 and 2 for the data in Exercise 16.1:
SSgroup = 266.45
SStotal = 456.55
MSerror = 10.564
k=2
2 
2 

SS group
SStotal

266.45
 .58
456.55
SS group  (k  1) MSerror
SStotal  MSerror
266.45  (2  1)10.564 255.886

 .55
456.55  10.564
467.114
56
16.7 Foa et al. (1991) study:
Group
SIT
PE
SC
WL
Total
X .. 
n
14
10
11
10
45
Mean
11.07
15.40
18.09
19.50
15.622
S.D.
3.95
11.12
7.13
7.11
Total
155
154
199
195
703
Variance
15.6025
123.6544
50.8369
50.5521
703
 15.622
45

SStreat   n j X j  X ..

2
 14(11.07  15.622) 2  10(15.40  15.622) 2  11(18.09  15.622) 2  10(19.50  15.622) 2
 507.840
MSerror 
 (n  1)s
 (n  1)
2
i
i
i

13(15.6025)  9(123.6544)  10(50.8369)  9(50.5521)
41
 55.587
SSerror  [(n1  1)]MSerror  41*55.587  2279.067
From these values we can fill in the complete summary table and compute the F value.
Source
Treatment
Error
Total
df
3
41
44
SS
507.840
2279.067
2786.907
MS
169.280
55.587
F
3.04
[F.05(3,41) = 2.84] We can reject the null hypothesis and conclude that there are
significant differences between groups. Some treatments are more effective than others.
57
Number of Reported Symptoms
b)
20
18
16
14
12
10
8
6
4
2
0
SIT
PE
SC
Treatment Condition
WL
c) It would appear that the more interventionist treatments lead to fewer
symptoms than the less interventionist ones, although we would have to run
multiple comparisons to tell exactly which groups are different from which other
groups.
16.9 If the sample sizes in Exercise 16.7 were twice as large, that would double the
SStreat and MStreat. However it would have no effect on MSerror, which is simply the
average of the group variances. The result would be that the F value would be doubled.
16.11 Effect size for tests in Exercise 16.10.
It only makes sense to calculate an effect size for significant comparisons in this study, so
we will deal with SIT vs SC.
X  X SIT 18.09  11.07 7.02
dˆ  SC


 0.94
7.455
MSerror
55.579
The SIT group is nearly a full standard deviation lower in symptoms when compared to
the SC group, which is a control group.
16.13 ANOVA on GPAs for the ADDSC data:
Variable GPA
By Variable Group
Source
D.F.
Between Groups 2
Within Groups
85
Total
87
Sum of
Squares
22.5004
42.0591
64.5595
Mean
Squares
11.2502
.4948
58
F
F
Ratio Prob.
22.7362 .0000
Group
Standard
Count Mean Deviation
Standard
Error 95 Pct Conf Int for Mean
Grp 1
Grp 2
Grp 3
14
49
25
3.2536
2.5920
1.7436
.5209
.6936
.8020
.1392
.0991
.1604
2.9528 TO
2.3928 TO
1.4125 TO
3.5543
2.7913
2.0747
Total
88
2.4563
.8614
.0918
2.2737 TO
2.6388
There is a significant difference between the groups, telling us that there is a relationship
between ADDSC score in elementary school and the GPA the student has in 9th grade.
From the means it is clear that the GPA declines as the ADDSC score increases.
These are real data, and they tell us that a teacher in elementary school can
already pick out those students who will do well and badly in high school.
I have always found these results depressing and worrisome, even though
psychologists are supposed to like to be able to predict. There are some
things I wish weren’t so predictable.
16.15 Analysis of Darley and Latané data:
Group
1
2
3
Total
n
13
26
13
52
Mean
0.87
0.72
0.51
SStreat  n j  X j  X ..
Total
11.31
18.72
6.63
36.66
2
 13  0.87  0.705   26  0.72  0.705   13  0.51  0.705 
2
MSerror
2
2
 0.8541
 0.053 (given in text)
SSerror  [(n1  1)]MSerror  49*0.053  2.597
From these values we can fill in the complete summary table and compute the F value.
Source
Treatment
Error
Total
df
2
49
51
SS
0.854
2.597
3.451
MS
0.427
0.053
F
8.06
[F.05(2,49) = 3.18] We can reject the null hypothesis and conclude that subjects are less
likely to summon help quickly if there are other bystanders around.
59
16.17 Bonferroni test on data in Exercise 16.2:
Both of these comparisons will be made using t tests. The means are given in Exercise
16.15 above.
t
Xi  X j
MSerror MSerror

ni
nj
For Young/Low versus Old/Low:
6.5  7.0
0.5

 0.434
6.6278 6.6278 1.151

10
10
For Young/High versus Old/High:
t
t
19.3  12.0
7.3

 6.34
6.6278 6.6278 1.151

10
10
For 36 df for error and for 2 comparisons at a familywise error rate of  = .05, the critical
value of t = 2.34. There is clearly not a significant difference between young and old
subjects on tasks requiring little cognitive processing, but there is a significant difference
for tasks requiring substantial cognitive processing. The probability that at least one of
these statements represents a Type I error is at most .05.
It is worth pointing out that when we are using MSerror as our variance
estimate, and have equal sample sizes, the computations are very simple
because we only need to calculate the denominator once.
16.19 Effect size for WL versus SIT
X  X SIT 19.50  11.07 8.43
dˆ  WL


 1.18
sWL
7.11
7.11
The two groups differ by over a standard deviation.
60
16.21 Spilich et al. data on a cognitive task:
Variable ERRORS
By Variable SMOKEGRP
Analysis of Variance
Source
Between Groups
Within Groups
Total
Group
Count
D.F.
2
42
44
Mean
Sum of
Squares
2643.3778
11700.4000
14343.7778
Standard
Deviation
Mean
Squares
1321.6889
278.5810
Standard
Error
F
Ratio
4.7444
F
Prob.
.0139
95 Pct Conf Int for Mean
Grp 1
Grp 2
Grp 3
15
15
15
28.8667
39.9333
47.5333
14.6866
20.1334
14.6525
3.7921
5.1984
3.7833
20.7335
28.7838
39.4191
TO
TO
TO
36.9998
51.0828
55.6476
Total
45
38.7778
18.0553
2.6915
33.3534
TO
44.2022
Here we have a task that involves more cognitive involvement, and it does show a
difference due to smoking condition. The non-smokers performed with fewer errors than
the other two groups, although we will need to wait until the next exercise to see the
multiple comparisons.
16.23 Spilich et al. data on driving simulation:
Variable ERRORS
By Variable SMOKEGRP
Analysis of Variance
Source
Between Groups
Within Groups
Total
Group
Count
D.F.
2
42
44
Mean
Sum of
Squares
437.6444
992.6667
1430.3111
Standard
Deviation
Mean
Squares
218.8222
23.6349
Standard
Error
F
Ratio
9.2584
F
Prob.
.0005
95 Pct Conf Int for Mean
Grp 1
Grp 2
Grp 3
15
15
15
2.3333
6.8000
9.9333
2.2887
5.4406
6.0056
.5909
1.4048
1.5506
1.0659
3.7871
6.6076
TO
TO
TO
3.6008
9.8129
13.2591
Total
45
6.3556
5.7015
.8499
4.6426
TO
8.0685
Here we have a case in which the active smokers again performed worse than the nonsmokers, and the differences are significant.
61
16.25 Attractiveness of faces
a) The research hypothesis would be the hypothesis that faces averaged over more
photographs would be judged more attractive than faces averaged over fewer
photographs.
b) Data analysis
c) Conclusions
The group means are significantly different. From the descriptive statistics we can see
that the means consistently rise as we increase the number of faces over which the
composite was created.
62
Chapter 17—Factorial Analysis of Variance
17.1 Thomas and Wang (1996) study:
a) This design can be characterized as a 3  2 factorial, with 3 levels of Strategy
and 2 levels of delay.
b) I would expect that recall would be better when subjects generated their own
key words, and worse when subjects were in the rote learning condition. I would
also expect better recall for the shorter retention interval. (But what do I know?)
c)
Summaries of
By levels of
RECALL
STRATEGY
DELAY
Value Label
Variable
Mean
Std Dev
Cases
For Entire Population
11.602564
7.843170
78
STRATEGY
DELAY
DELAY
1.0000
1.0000
2.0000
9.461538
14.923077
4.000000
6.906407
5.330127
2.516611
26
13
13
STRATEGY
DELAY
DELAY
2.0000
1.0000
2.0000
11.269231
20.538462
2.000000
9.606488
1.983910
1.471960
26
13
13
STRATEGY
DELAY
3.0000
1.0000
14.076923
15.384615
6.183352
5.454944
26
13
DELAY
2.0000
12.769231 6.796492
13
17.3 Analysis of variance on data in Exercise 17.1:
RECALL
by
STRATEGY
DELAY
UNIQUE sums of squares
All effects entered simultaneously
Sum of
Mean
Source of Variation
Squares
DF
Square
Main Effects
STRATEGY
DELAY
F
Sig
of F
2510.603
281.256
2229.346
3
2
1
836.868
140.628
2229.346
42.992
7.224
114.526
.000
.001
.000
824.538
824.538
2
2
412.269
412.269
21.179
21.179
.000
.000
Explained
3335.141
5
667.028
34.267
.000
Residual
1401.538
72
19.466
Total
4736.679
77
61.515
2-Way Interactions
STRATEGY DELAY
63
There are significant differences due to both Strategy and Delay, but, more
importantly, there is a significant interaction.
This is a good example for showing all three effects. The Delay and
Interaction effects are obvious, but the overall Strategy effect is harder to
see. You would do well to calculate the Strategy means, which are 9.46,
11.27, and 14.08, respectively. It will help if you draw those means on the
figure for Exercise 17.2.
.
17.5 Bonferroni tests to clarify simple effects for data in Exercise 17.4:
t
Xi  X j
MSerror MSerror

ni
nj
For Data at 5 Minutes Delay:
For Generated versus Provided:
t
14.92  20.54
5.62

 3.15
20.7009 20.7009 1.784

13
13
For Generated versus Rote:
t
14.92  15.38
0.46

 0.26
20.7009 20.7009 1.784

13
13
For Provided versus Rote:
t
20.54  15.38
5.16

 2.89
18.2308 18.2308 1.784

13
13
64
For Data at 2 Day Delay:
For Generated versus Provided:
t
4.00  2.00
2.00

 1.19
18.2308 18.2308 1.674

13
13
For Generated versus Rote:
t
4  12.77
8.77

 5.24
18.2308 18.2308 1.674

13
13
For Provided versus Rote:
t
2  12.77
10.77

 6.43
1.674
20.7009 20.7009

13
13
For 6 comparisons with 36 df, the critical value of t is 2.80.
For the 5-minute delay, the condition with the key words provided by the experimenter is
significantly better than both the condition in which the subjects generate their own key
words and the rote learning condition. The latter two are not different from each other.
For the 2-day delay, the rote learning condition is better than either of the other two
conditions, which do not differ between themselves.
We clearly see a different pattern of differences at the two delay conditions. The most
surprising result (to me) in the superiority of rote learning with a 2 day interval.
In running these Bonferroni tests, I had a choice. I could have thought of
each simple effect as a family of comparisons, and obtained the critical
value of t with 3 comparisons for each. Instead I chose to treat the whole
set of 6 comparisons as a family and adjust the Bonferroni for 6 tests.
There is no hard and fast rule here, and many might do it the other way.
The results would not change regardless of what I decided.
17.7 The results in the last few exercises have suggested to me that if I were studying for
a Spanish exam, I would fall back on rote learning, painful as it sounds and as much
against common wisdom as it is.
17.9 In this experiment we have as many primiparous mothers as multiparous ones,
which certainly does not reflect the population. Similarly, we have as many LBW infants
65
as full-term ones, which is certainly not a reflection of reality. The mean for primiparous
mothers is based on an equal number of LBW and full-term infants, which we know is
not representative of the population of all primiparous births. Comparisons between
groups are still legitimate, but it makes no sense to take the mean of all primiparous
moms combined as a reflection of any meaningful population mean.
Many of our experiments are run this way (with equal sample sizes across
groups that are not equally represented in the population), and it is
important to distinguish between the legitimacy of between group
comparisons and the legitimacy of combined means.
17.11 Simple effects versus t tests for Exercise 17.10.
a) If I had run a t test between those means my result would simply be the square root of
the F = 1.328 that I obtained.
b) If I used MSerror for my estimated error term it would give me a t that is the square
root of the F that I would have had if I had used the overall MSerror, instead of the MSerror
obtained in computing the simple effect.
17.13 Analysis of variance for Spilich et al. Study:
The main effect of Task and the interaction are significant. The main effect of Task is of
no interest because there is no reason why different tasks should be equally difficult., We
don’t care about the main effect of Smoking either because it is created by large effects
for two levels of Task and no effect for the third. What is important is the interaction.
66
This is a good example of a situation in which main effects are of little
interest. For example, saying that smoking harms performance is not
really accurate. Smoking harms performance on some tasks, but not on
others. Often main effects are still interpretable in the presence of an
interaction, but not here.
17.15 Simple effects to clarify the Spilich et al. Example.
We have already seen these simple effects in Chapter 16, in Exercises 16.18, 16.19, and
16.21.
17.17 Factorial analysis of the data in Exercise 16.2:
Here we see that we have a significant effect due to age, with younger subjects
outperforming older subjects, and a significant effect due to the level of processing, with
better recall of material processed at a higher level. Most importantly, we have a
significant interaction, reflecting the fact that there is no important difference between
younger and older subjects for the task with low levels of processing, but there is a big
difference when the task calls for a high level of processing—younger subjects seem to
benefit more from that processing (or do more of it).
67
17.19 Nurcombe et al study of maternal adaptation.
b) The program worked as intended and there was no interaction between groups and
educational level.
17.21 Effect size for Level of Processing in Exercise 17.17
X  X low 15.65  6.75
8.90
dˆ  Hi


 3.46
2.5739
MSerror
6.628
This is a very large effect size, but the data show an extreme difference between the two
levels of processing.
I used the square root of MSerror here because that was in line with what I did in the text.
But a good case could be made for adding Age and the interaction sums of squares back
in and calculating a new error term. That would produce
dˆ 
X Hi  X Low
MSerror  revised

15.65  8.90 8.90

 2.44
3.65
13.323
which is considerably smaller but still a very large effect.
17.23 Set of data for a 2  2 design with no main effects but an interaction:
12
10
8
6
4
2
0
Col-1
Col-2
Row-1
Cell means:
8
12
12
8
Row-2
68
17.25 Magnitude of effect for Exercise 17.1
Summary table from Exercise 17.1:
Source
Strategy
Delay
SD
Error
Total
2
 Strategy

2
Strategy

2
 Delay

2
Delay

df
2
1
2
72
77
SS Strategy
SStotal

SStotal  MSerror
SStotal

MS
140.628
2229.346
412.269
19.466
F
7.224
114.526
21.179
281.256
 .06
4736.679
SS Strategy  ( s  1) MSerror
SS Delay
SS
281.256
2229.346
824.538
1401.538
4736.679

281.256  (3  1)19.466
 .05
4736.679  19.466
2229.346
 .47
4736.679
SS Delay  (d  1) MSerror
SStotal  MSerror

2229.346  (2  1)19.466
 .46
4736.679  19.466
2
 SD

SS SD
824.538

 .17
SStotal 4736.679
2
SD

SS SD  ( s  1)(d  1) MSerror 824.538  (3  1)(2  1)19.466

 .16
SStotal  MSerror
4736.679  19.466
17.27 Magnitude of effect for Exercise 17.13:
Summary table from Exercise 17.13:
Source
Task
SmokeGrp
TS
Error
Total
df
2
2
4
126
134
SS
28661.526
1813.748
1269.452
13587.200
45331.926
69
MS
14330.763
906.874
317.363
107.835
F
132.895
8.41
2.943
2
Task

SSTask 28661.526

 .63
SStotal 45331.926
2
Task

SSTask  (t  1) MSerror 28661.526  (3  1)107.835

 .63
SStotal  MSerror
45331.926  107.835
2
 Smoke

SS Smoke 1813.748

 .04
SStotal
45331.926
2
Smoke

SS Smoke  ( s  1) MSerror 1813.748  (3  1)107.835

 .04
SStotal  MSerror
45331.926  107.835
TS2 
SSTS
1269.452

 .03
SStotal 45331.926
TS2 
SSTS  (t  1)( s  1) MSerror 1269.452  (3  1)(3  1)107.835

 .02
SStotal  MSerror
45331.926  107.835
17.29 The two magnitude of effect measures (2 and 2) will agree when the error term
is small relative to the effect in question, and will disagree when there is a substantial
amount of error relative to the effect. But notice that this is a comparison of MSerror and a
sum of squares, and sums of squares can be large when there are many degrees of
freedom for them. So to some extent, all other things equal, the two terms will be in
closer agreement when there are several degrees of freedom for the treatment effect.
17.31 You should restrict the number of simple effects you examine to those in which
you are particularly interested (on a priori grounds), because the familywise error rate
will increase as the number of tests increases.
Although we routinely talk about familywise error rates with respect to
multiple comparison procedures, they really apply whenever you run more
than one test, whether you consider them tests on main effects and
interactions, or tests on simple effects, or tests on multiple contrasts. A
test is a test as far as the error rate is concerned.
Source
Gender
Condition
GX
Error
df
1
1
1
56
SS
223.49
1.35
0.69
1161.44
70
MS
223.49
1.35
0.69
20.74
F
10.78
<1
<1
Chapter 18—Repeated-Measures
Analysis of Variance
18.1 Descriptive statistics on study of migraines:
We e kly Me a ns
30
Me a n
20
10
0
1
2
3
4
5
WEEK
18.3 I would have liked to collect data from students on the use of pain killers and other
ways of dealing with migraines. I might also like to have data on stress levels over time
so that I could possibly rule out the effects of stress
Here again we are getting into issues of experimental design, which
underlie all meaningful analyses. This design differs from the one in the
“suggestions” section of the Resource Manual for Chapter 16. In that
design we had separate groups tested at the different times.
71
18.5 Repeated-measures analysis of variance of data used in Exercise 18.4:
Source
df
SS
MS
F
Subjects
8
612.00
Weeks
1
554.50
554.50
14.424
Error
8
302.00
37.75
Total
17
1159.7
[F.05(1,24) = 4.26]
There is a significant increase in decrease in severity over time. F = t2 = 3.7982 =
14.424.
18.7 Effect size for Exercise 18.4
We will use the square root of MSerror as our estimate of the standard deviation, because
this is a standard deviation corrected for any differences due to subject effects.
X  X 3 20.00  9.00 11.00
dˆ  0


 3.44
3.20
MSerror
10.22
The decrease in severity from baseline to training a reduction of approximately three and
one half standard deviations. (I used the standard deviation of the baseline scores in line
with what I said in the text.
18.9 d̂ for difference in Exercise 18.8
I would standardize the difference in means using the square root of the average of the
variances of the two baseline measures. This would leave individual differences as part of
the standard deviation, which seems appropriate. The average variance is 77.97, so the
standard deviation is 8.83
X baseline  X training 20.39  7.19 13.20
dˆ 


 1.49
s
8.83
8.83
On average, the severity of headaches decreased by nearly 1.50 standard deviations from
baseline to training.
18.11 Exercise 18.10 tested the null hypothesis that condom use did not change over
time. We would have hoped to see that the intervention worked and that condom use
increased, but that was not what we found. There was an increase, but it was not
significant.
18.13 It would appear that without the intervention, condom use would actually have
declined. This suggests that the intervention may have prevented that decline, in which
case that non-significant result is actually a positive finding.
72
18.15 Bonferroni t tests to compare the beginning and end of Baseline, and the beginning
and end of Training for the data in Table 18.1. We can use a standard t test because the
error term has been corrected by the repeated-measures analysis of variance, which has
already removed between subject variability.
The Bonferroni alpha level would be .05/3 = .01667
We will reject all of the null hypotheses because each p value is less than .0167.
73
Chapter 19—Chi-Square
19.1 Popularity of Psychology professors:
Anderson
25
22.3
Observed
Expected
2  
Klansky
32
22.3
Kamm
10
22.3
Total
67
67
(O  E)2
E
(25  22.3)2 (32  22.3) 2 (10  22.3) 2



22.3
22.3
22.3
 11.33
[
2
.05
(2)  5.99]
We will reject the null hypothesis and conclude that students do not enroll at
random.
19.3 Sorting one-sentence characteristics into piles:
Observed
Expected
Exp. %
1
8
5
10%
2
10
10
20%
3
20
20
40%
4
8
10
20%
5
4
5
10%
Total
50
50
100%
(O  E )2
 
E
2
(8  5) (10  10) 2 (20  20) 2 (8  10) 2 (4  5) 2





5
10
20
10
5
 2.4
2
[  2.05 (4)  9.49]
Do not reject the null hypothesis that my daughter’s sorting behavior is in line
with my theory.
Notice that here is a case where my theory is aligned with the null
hypothesis.
74
19.5 Racial choice in dolls (Clark & Clark, 1939):
Black
83
126
Observed
Expected
 
2
White
169
126
Total
252
252
(O  E) 2
E
(83  126)2 (169  126)2


126
126
 29.35
[  2 .05 (1)  3.84]
We can reject H0 and conclude that the children did not choose dolls at random, but chose
white dolls more often than black.
This may be an old study, but it is a very important one. It played a role in
Brown vs. Board of Education. It was used to argue the case that separate
but equal was not equal, and that it was destructive to the self-esteem of
black children.
19.7 Combining the two experiments:
1939
1970
Totals
2  
Black
83
(106.42)
61
(37.58)
144
White
169
(145.58)
28
(51.42)
197
Total
252
89
341
(O  E)2
E
2
2
2
(83  106.42) (169  145.58) (61  37.58)
(28  51.42)



106.42
145.58
37.58
51.42
 34.184

2
2
.05
(1)  3.84
Reject the null hypothesis and conclude that the distribution of choices between Black
and White dolls was different in the two studies. Choice is not independent of the study,
and could easily be related to the time at which the studies were run. We are no longer
asking whether one color of doll is preferred over the other color, but whether the pattern
of preference is constant across studies. In analysis of variance terms we are dealing with
an interaction.
75
19.9 There are several ways this study could be modified. We could simply rerun the
present analysis by defining smokers and non-smokers on the basis of the partner’s
smoking behavior. Alternatively, we could redefine the Smoker variable as “neither,”
“mother,” “father,” or “both.”
19.11 Howell and Huessy (1985) study of attention deficit disorder:
Remedial
English
22
(28.374)
19
(12.626)
41
Classification
Normal
ADD
Total
Nonremedial
English
187
(180.626)
74
(80.374)
261
Total
209
93
302
(O  E)2
 
E
2
2
2
2
2
(22  28.374)
(187  180.626) (19  12.626) (74  80.374)



28.374
180.626
12.626
80.374
 5.38

2
.05
(1)  3.84
We can reject the null hypothesis and conclude that achievement level during high school
varies as a function of performance during elementary school.
19.13 A one-way chi-square test on the data in the first column of Exercise 19.12 would
be asking if the students are evenly distributed among the eight categories. What we
really tested in Exercise 19.12 is whether that distribution, however it appears, is the
same for those who later took remedial English as it is for those who later took nonremedial English.
19.15 Inescapable shock and implanted tumor rejection:
Reject
No Reject
Total
Inescapable
Shock
8
(14.52)
22
(15.48)
30
Escapable
Shock
19
(14.52)
11
(15.48)
30
76
No
Shock
18
(15.97)
15
(17.03)
33
Total
45
48
93
2  
(O  E)2
E
2
2
(8  14.52) (19  14.52)



14.52
14.52
 8.852
2
(15  17.03)

17.03
2
.05
(2)  5.99
The ability to reject a tumor is affected by the shock condition.
I like this example particularly because it makes it clear that psychological
variables have very clear effects on physical health. We often say this, but
here are some quite dramatic data.
19.17 This is another place where we see the important relationship between sample size
and power.
19.19 Testosterone and childhood delinquency:
Not Delinquent
Delinquent
 
2
High
Testosterone
366
(391.824)
80
(54.176)
446
Normal
Testosterone
3554
(3528.176)
462
(487.824)
4016
Total
3920
542
4462
O  E 2
E
366  391.8242  ...  462  487.8242

391.824
487.824
 15.57
 .205 1  3.84
a) These results show that there is a significant relationship between the two
variables—2 = 15.57.
b) Testosterone levels in adults are related to the behavior of those individuals
when they were children.
c) This result shows that we can tie the two variables (delinquency and
testosterone) together historically. I would assume that people who have high
77
testosterone levels now also had high levels when they were children, but that is
just an assumption.
19.21 We could ask a series of similar questions, evenly split between “right” and
“wrong” answers. We could then sort the replies into positive and negative categories
and ask whether faculty were more likely than students to give negative responses.
.
19.23 Racial differences in desired weight gain.
For white females, the odds of wishing to lose weight were 352/183 = 1.9235, meaning
that while females are nearly twice as likely to wish to lose weight as to stay the same or
gain weight.
For African-American females, the corresponding ratio is 47/52 = .9038.
The odds ratio is 1.9235/.9038 = 2.1281. This means that the odds of wishing to lose
weight were more than twice as high among white females as compared to African
American females.
19.25 Death penalty study:
The chi-square statistic is clearly significant. Nonwhite defendants are sentenced to death
at a significantly higher rate than white defendants.
78
Chapter 20—Nonparametric and
Distribution-Free Tests
20.1 Inferences in children’s story summaries (McConaughy, 1980):
a) Analysis using the Mann-Whitney test (also known as Wilcoxon’s rank-sum
test):
Raw
Data
Ranks
0
1
1.5 3
R = 30
Younger Children
0
3
2
5
1.5 6
N=7
4.5
2
9
4
7
Older Children
6
4
8
4.5 7.5 11.5 10 7.5
R = 61
N=6
Ws = R for group with smaller N = 61
13
7
11.5
Ws'  2W  Ws  84  61  23
W’s < Ws; therefore W’s in Appendix E. Double the probability level for a 2tailed test.
W.025(6,7) = 27 > 23
b) Reject the null hypothesis and conclude that older children include more
inferences in their summaries.
20.3 The analysis in Exercise 20.2 using the normal approximation:
n1 (n1  n 2  1)
2
z
n1 n 2 (n1  n 2  1)
12
9(9  11  1)
53 
2

9(11)(9  11  1)
12
 3.15
Ws 
p ( z  3.15)  (2(.0009)  .0018  .05
We will reject the null hypothesis and come to the same conclusion we came to
earlier.
79
20.5 Hypothesis formation in psychiatric residents (Nurcombe & Fitzhenry-Coor, 1979):
Before
After
Diff.
Rank
Signed
Rank
8
7
-1
2
4
9
+5
8.5
8.5
2
3
+1
2
2
2
6
+4
7
7
4
3
-1
2
-2
8
10
+2
4.5
4.5
3
6
+3
6
6
1
7
+6
10
10
3
8
+5
8.5
8.5
-2
9
7
-2
4.5
-4.5
T+ = (positive ranks) = 46.5
T- = (negative ranks) = 8.5
T = smaller of |T+| or |T-| = 8.5
n = 10
T.05(10) = 8 < 8.5 Do not reject H0
b) We cannot conclude that we have evidence supporting the hypothesis that
there is a reliable increase in hypothesis generation and testing over time. (Here is
a case in which alternative methods of breaking ties could lead to different
conclusions.)
Here you might discuss how we could go about deciding how to break
ties, putting the emphasis on a priori decisions.
20.7 Independence of first-born children:
First
Second
Diff.
Rank
Signed
Rank
12
10
2
4
4
18
12
6
17.5
17.5
13
15
-2
4
17
13
4
11
11
8
9
-1
1
-4
15
12
3
8
8
16
13
3
8
8
5
8
-3
8
8
10
-2
4
-8
-4
2
7
-5
14
5
7
-2
4
15
13
2
4
4
-14
-4
-1
12
8
4
11
11
Data Cont.:
First
Second
Diff.
Rank
Signed
Rank
13
8
5
14
14
5
9
-4
11
14
8
6
17.5
17.5
20
10
10
20
20
19
14
5
14
14
-11
T+ = (positive ranks) = 164
T- = (negative ranks) = 46
T = smaller of |T+| or |T-| = 46
n = 20
T.05(20) =52 > 46
80
17
11
6
17.5
17.5
18
12
6
17.5
17.5
b) We can reject the null hypothesis and conclude that first-born children are
more independent than their second-born siblings.
Here is a good example of where we would use a “matched sample” test
even though the same children do not perform in both conditions (nor
could they). We are assuming that brothers and sisters are more similar to
each other than they are to other children. Thus if the first-born is
particularly independent, we would guess that the second-born has a
higher than chance expectation of being more independent. They share a
common environment.
20.9 Data in Exercise 20.7 plotted as a function of the first-born’s score:
12
10
FirstBorn - SecondBorn
8
6
4
2
0
-2
-4
-6
0
10
20
30
FirstBorn
The scatterplot shows that the difference between the pairs is heavily dependent upon the
score of the first-born.
20.11 The Wilcoxon matched-pairs signed-ranks test tests the null hypothesis that paired
scores were drawn from identical populations or from symmetric populations with the
same mean (and median). The corresponding t test tests the null hypothesis that the
paired scores were drawn from populations with the same mean and assumes normality.
This is an illustration of the argument that you buy things with
assumptions. By making the more stringent assumptions of a t test, we
buy greater specificity in our conclusions. However if those assumptions
are false, we may have used an inappropriate test.
20.13 Rejection of the null hypothesis by a t test is a more specific statement than
rejection using the appropriate distribution-free test because, by making assumptions
about normality and homogeneity of variance, the t test refers specifically to population
means—although it is also dependent on those assumptions.
81
20.15 Truancy and home situation of delinquent adolescents:
Analysis using the Kruskall-Wallis one-way analysis of variance:
Natural Home
Score
Rank
15
18
18
22
19
24.5
14
16
5
4.5
8
8
12
11.5
13
13.5
7
6.5
Ri
124.5
Foster Home
Score
Rank
16
19
14
16
20
26
22
27
19
24.5
5
4.5
17
20
18
22
12
11.5
170.5
Group Home
Score
Rank
10
9
13
13.5
14
16
11
10
7
6.5
3
2
4
3
18
22
2
1
83
N = 27
n=9
H
12
Ri2
 n  3(N  1)
N(N  1)
i
12
124.52 170.52 832 



 3(27  1)
27(27  1) 
9
9 
 9

 6.757
 (2)  5.99
2
.05
We can reject the null hypothesis and conclude the placement of these adolescents has an
effect on truancy rates.
This analysis doesn’t directly answer the question the psychologist wanted
answered, because he wanted to show that the group home was better than
the others. He might follow this up with Mann-Whitney tests serving in
the role of multiple comparison procedures, applying a Bonferroni
correction (although it might be difficult to find the necessary critical
values.) Alternatively, he could just run a single Mann-Whitney between
the group home and the combined data of the other two placements.
20.17 The study in Exercise 20.16 has the advantage over the one in Exercise 20.15 in
that it eliminates the influence of individual differences (differences in overall level of
truancy from one person to another).
82
20.19 For the data in Exercise 20-5:
a) Analyzed by chi-square:
More
Fewer
Total
Observed
7
3
10
Expected
5
5
10
(O  E)2 (7  5)2 3  52


E
5
5
 1.60
2  
2
.05
(1)  3.84
We cannot reject the null hypothesis.
b) Analyzed by Friedman’s test:
Before
Score
8
4
2
2
4
8
3
1
3
9
Totals
After
Rank
2
1
1
1
2
1
1
1
1
2
13
Score
7
9
3
6
3
10
6
7
8
7
N = 13 k= 2
Rank
1
2
2
2
1
2
2
2
2
1
17
12
2
Ri  3N(k 1)
Nk(k  1)
12

[132  17 2 ]  3(10)(2  1)
12(2)(2  1)
 1.60
 2F 
2
.05
(1)  3.84
These are exactly equivalent tests.
83
20.21 “The mathematics of a lady tasting tea;”
First Cup
Score
Rank
8
3
15
3
16
2
7
3
9
3
8
2
10
3
12
3
Totals
22
N=8
Second Cup
Score
Rank
3
2
14
2
17
3
5
2
3
4
9
3
3
1
10
2
16
Third Cup
Score
Rank
2
1
4
1
14
1
4
1
6
2
4
1
4
2
2
1
10
k= 3
12
Ri2  3N(k  1)
Nk(k  1)
12

[222  162  10 2 ] 3(8)(3  1)
8(3)(3  1)
 9.00
F2 
2
.05
(2)  5.99
We can reject the null hypothesis and conclude that people don’t really like tea made
with used tea bags.
84
Chapter 21—Choosing the Appropriate Test
[N.B. Please review the disclaimer concerning these answers at the beginning of Chapter
21. There are many different ways to think about a study.]
21.1 This test involves comparing two proportions, and the easiest way to do that is to
set up a 2 × 2 contingency table with Group on one dimension and Mastery on the other.
21.3 This is a repeated measures analysis of variance with assessment periods as
the repeated measure and condition as the between subject variable. If
measurements were taken on several occasions I would like to see the data plotted
over time, but all we currently have is the data at the end of the treatment phase.
21.5 This is a t test for two independent groups—children of divorced families and
children of intact families.
21.7 This is a standard one-way analysis of variance. I would be most likely to conduct a
planned comparison of the positive and negative conditions.
21.9 This is a two-way analysis of variance, with secure and insecure as one dimension
and mortality vs. dental as the other. No multiple contrasts are called for because there
are only two levels of each variable.
21.11 This could be treated as a two-way analysis of variance if we break the data down
by race and by Afrocentric facial features. A problem with this is that we would
presumably have more Afrocentric features for black inmates, which would lead to
unequal sample sizes (i.e. an unbalanced design).
21.13 This is a regression problem where we are time is one variable and the difference
in happiness between liberals and conservatives (by year) is the other variable.
21.15 The most important thing to do would be to plot the data over time looking for
trends. A repeated measures analysis of variance would tell you if differences are
significant, but it is the direction of differences, and whether they return to baseline, that
is likely to be most informative. The authors further broke down the participants in terms
of their preoccupation with 9/11 and looked at differences between those groups.
Interestingly, even the least preoccupied group showed changes over time.
21.17 This is a difficult one, partly because it depends on what Payne wants to know. I
assume that she wants to know how rankings of characteristics agree across sexes or
across years. She could first find the mean rank assigned to each characteristic separately
for each sex and year. Because the raw data were originally ranks, I would probably be
inclined to then rank these mean values. She could then calculate Spearman’s rS between
males and females for each year or between years within each sex. The correlations
would be obtained for the ten pairs of scores (one per characteristic).
85
21.19 This is a 2  4 analysis of variance with two levels of sex and 4 levels of
occupation. The major emphasis is on the occupations, so multiple comparisons of those
means would be appropriate.
21.21 There are two independent groups in this experiment. The authors should use a
Mann-Whitney test to compare average locus of control scores.
21.23 This is a situation for a chi-square goodness-of-fit test. The conditions are Rotated
versus Stationary, and the count is the number of subjects choosing that condition as
giving stronger contours. The expected values would be 37/2 = 18.5. The data are
sufficiently extreme that a test in superfluous.
21.25 This is another complex repeated-measures analysis of variance. The comparison
of recall of the two lists (one learned before administration of the drug and the other
learned after) is a repeated measurement because the same subjects are involved. The
comparison of the Drug versus Saline groups is a between-subjects effect because the
groups involve different subjects.
21.27 This is basically a correlational study, where we separately correlate the two
dependent variables with amount of alcohol consumed. Given the 14 year gap, and all of
the other factors that affect development, we should not expect very strong correlations
even under the best of conditions.
86
Download