The POWERMUTT Project: Comparing Means

advertisement
XI. Comparing Means
XI. COMPARING MEANS
Subtopics







Introduction
Paired-Samples
T Tests
IndependentSamples T Tests
One-way
Analysis of
Variance
(ANOVA) and
Eta2
Key Concepts
Exercises
For Further
Study
SPSS Tools


New with this topic
o t Tests
o Compare Means
Review
o Getting Started
o Boxplots
o Descriptives
Introduction
If we can calculate a mean score for one group of cases (such as all of the world's countries) on
one variable, we can also 1) compare a group's mean scores on two or more different variables
and, conversely, 2) compare the mean scores of two or more groups (e.g., countries in different
regions of the world) on the same variable. The variables for which means are compared must,
of course, be interval, though the groups that are compared can, and usually will, be nominal or
ordinal.
In this topic, we will first examine a technique called the t test,[1] a measure of whether the
difference between two mean scores is statistically significant. The test is sometimes called
"Student's t" because its inventor, William Gosset, wrote under the pseudonym "Student." One
version of this test (called a paired-samples t test) uses one group of cases and compares the
group's scores on two different variables. Another version (the independent-samples t test)
compares the scores of two different groups on the same variable. Except that it is more
powerful because it uses data at a higher level of measurement, the t test is a measure of
statistical significance similar in concept to the chi-square test discussed earlier. You might,
therefore, wish to review the section on statistical significance in the contingency table analysis
topic. T tests are often employed in experimental research, with paired-samples t tests commonly
used to compare pre and post experiment scores, and independent-samples t tests used compare
two groups of subjects, such as a control group and an experimental group, on the same measure.
We will end the topic with a discussion of a simple form of a very powerful method called
analysis of variance. Using one way analysis of variance, we will compare several groups in
terms of the same variable. By partitioning the distributions of scores into between group
100
XI. Comparing Means
variance and within group variance, we will be able to measure the strength of the differences
between groups using a proportional reduction in error measure called eta2 (η2), and also
determine whether the differences are statistically significant.
Paired-Samples t Tests
How do people feel about the two major political parties? In the 2008 American National
Election Study, respondents were asked to rate the Democratic and Republican parties on a
"feeling thermometer," on which 100 represented the warmest, or most favorable of feelings, and
0 the coldest, or least favorable. When we weight the data by the "weight" variable, and run a t
test to compare the means of the two variables, the output (from SPSS) that we obtain shows that
the Democratic party comes out somewhat better. The first table below shows a mean score for
the Democratic Party of 56.87, somewhat higher than the 48.15 mean for the Republican Party.
Is this difference sufficiently large that we can reject the null hypothesis that it is simply due to
random sampling error (that is, chance)? The figure in the last column of the second table below
helps us answer this question. The value of t with 2046 degrees of freedom (one less than the
number of cases) has a significance level of .000, that is, it would occur by chance less than one
time in a thousand. The difference is clearly statistically significant. Note: a two-tailed test of
significance is used for "non-directional" hypotheses, in which we suspect that there will be a
difference in scores, but don't know in advance of examining our results which score will be
higher. Normally, hypotheses are "directional," and we have reason to predict not just that there
will be a difference, but also which score will be the higher one. To calculate the one-tailed
probability, simply divide the two-tailed result by two. For example, if a relationship were
significant at the .04 level using a two-tailed test, it would be significant at the .02 level using a
one-tailed test. (Of course, if the difference is not in the direction we predicted, then our
hypothesis is not confirmed regardless of the level of significance.)
101
XI. Comparing Means
Independent-Samples t Tests
In 2000, Ralph Nader, running as the Green party candidate for president, won only about 97,000
votes in Florida (less than two percent of the total), but these votes almost certainly cost
Democrat Al Gore Florida's 25 Electoral College votes and, with them, the election. By 2004,
when Nader again ran for president, many Democrats had developed bitter feelings toward him.
The 2004 American National Election Study asked respondents their feelings about a number of
prominent politicians, including Nader. On the one hand, we might expect that most Democrats
would be closer to Nader philosophically than would most Republicans, and so would have
warmer feelings about him. On the other hand, there were the memories of the 2000 election.
An independent-samples t test will enable us to compare Nader's scores from respondents of both
major parties.
Again weighting by the "weight" variable, we can see, first of all, that respondents of neither
party had particularly warm feelings for Nader, with Republicans averaging 40.88 and
Democrats 39.47 (see first table below). For the independent-samples t test, there are two
versions of the computational formula, depending on whether we assume that the variances of
the two scores are equal. (The technical name for equal variance is homoscedasticity.) Before
deciding which version to use, we need to determine whether there is a statistically significant
difference (p<.05) between the variances of the two variables. The test for this uses an F ratio (a
measure of statistical significance in the same family of measures as t and chi-square). In this
case, we can see from the second table below that the F ratio is not statistically significant
(p=.326). We can therefore proceed to use the version of t that assumes equal variances (though
in this case, the results are almost identical regardless of which version we use). Because we
could in advance have made a case either way as to whether Democrats or Republicans would
have warmer feelings about Nader, we will use a two-tailed test. We find that the difference
between Democrats and Republicans could have easily been due to chance (p=.419). The
relationship is not statistically significant.
102
XI. Comparing Means
One-way Analysis of Variance and Eta2 (η2)
The independent-samples t test is a special case of a more general method that allows
comparisons among more than two groups of cases. If we think of group membership as an
independent variable, and the interval or ratio variable as a dependent variable, we might then
ask whether the differences between the groups are statistically significant, and how strong an
indicator group membership is of the value of the dependent variable. We can answer these
questions with one-way "analysis of variance" (ANOVA) and a related proportional reduction in
error measure of association called eta2 (η2).
We will illustrate these ideas by comparing the Gross Domestic Product per capita of countries
in different regions of the world. Boxplots displaying this relationship are shown in the
following figure:
103
XI. Comparing Means
Obviously, there are major wealth differences between regions. At the same time, there are
important differences within some regions. European and North American countries, while the
most affluent overall, vary considerably in their wealth. While most Asian countries are poor,
there are a few outliers in this region that are at least as affluent as most in Europe and North
America. But just how good a predictor is region of wealth? Put another way, how much of the
variance in wealth is between region, and how much is within region?
For an interval or ratio variable, our best guess as to the score of an individual case, if we knew
nothing else about that case, would be the mean. The variance gives us a measure of the error
we make in guessing the mean, since the greater the variance, the less reliable a predictor the
mean will be. For GDP per capita, we obtain the following parameters for all countries taken
together:
104
XI. Comparing Means
Note: Because it is such a large number, the variance is written in scientific notation. In the
SPSS output window, you can find the exact number by twice double-clicking on the number as
written in scientific notation. In this case, that number is 324,522,179.51.
How much less will our error be in guessing the value of the dependent variable (in this case,
GDP per capita) if we know the value of the independent variable (region)? We can calculate
the within-group variance in the same way that total variance is calculated, except that, instead of
subtracting each score from the overall mean, we subtract it from the group mean (that is, the
mean for the region in which the country is located). We can then determine how much less
variance there is about the group means than about the overall mean.
The formula:
η2 = (total variance — within-group variance)/total variance
provides us with the familiar proportional reduction in error. Eta2 thus belongs to the same
“PRE” family of measures of association as Lambda, Gamma, Kendall's tau, and others.
105
XI. Comparing Means
Recall that variance is the sum of squared deviations from the mean divided by N (the number of
cases). The “Sum of Squares” numbers in the ANOVA table refers to the sum of squared
deviations from the mean. (These numbers have been converted from scientific notation.) They
are, in other words, the same as the between groups, within groups, and total variances, except
that they have not been divided by N. Since N is the same for each, we can omit this last step.
Eta2 is then calculated as follows:
η2 = (62,637,780,452 - 44,214,993,508) / 62,637,780,452 = .294
In other words, by knowing the region in which a country is located, we can reduce the error we
make in guessing its GDP/capita by 29.4 percent.
We can also perform a test for the statistical significance of this measure using the F ratio
(assuming that we wish to calculate statistical significance for population data such as is in the
"countries" file). In this case, the differences between the regions would occur by chance less
than one time in a thousand. (See last column of the first table above). If there are only two
groups being compared, the F ratio is mathematically equivalent to the t test.
The following topic, regression (or ordinary least squares), is another powerful way to analyze
variance.
Key Concepts
ANOVA
eta2
homoscedasticity
independent-samples t tests
one-way analysis of variance
paired-samples t tests
Exercises
1. Start SPSS. Open the anes08s.sav file and the 2008 American National Election Study Subset
codebook. Do paired-samples t-tests to compare the feeling thermometers toward "people on
welfare" with "poor people." Weight cases by the "weight" variable.
2. Using the same dataset, and again weighting cases by the "weight" variable, do independentsamples t tests to compare the feelings of Democrats and Republicans toward Joe Biden and
Sarah Palin. Using boxplots, display these same relationships graphically.
3. Using the same dataset, and again weighting cases by the "weight" variable, do comparison of
means tests, requesting ANOVA and eta2 to see how well party (use “partyid7”) and ideology
106
XI. Comparing Means
explain respondents’ scores on the various “feeling thermometers” included in the file. Using
boxplots, display these same relationships graphically.
4. Open the house.sav file and the house codebook. Do a comparison of means test between the
voting records of Democrats and Republicans, requesting eta2. Repeat with representatives’
gender and with representatives’ ethnicity as your independent variables. Which independent
variable does the best job of explaining voting record? Using boxplots, display these same
relationships graphically.
For Further Study
Fiddler, Linda, Laura Hecht, Edward E. Nelson, Elizabeth Ness Nelson, and James Ross, SPSS
for Windows 13.0: A Basic Tutorial, N.Y.: McGraw-Hill, 2004): chapter 6;
http://www.ssric.org/SPSS_manualV13/Chapter6_v13.pdf. Accessed September 20, 2010.
Stockburger, David W., "One and Two Tailed T Tests," in Introductory Statistics: Concepts,
Models, and Applications, revised February 19, 1998,
http://www.psychstat.missouristate.edu/introbook/sbk25.htm. Accessed September 20, 2010.
Zhang, Ying (Joy), "Confidence Interval and the Student's T Test,"
http://projectile.sv.cmu.edu/research/public/talks/t-test.htm. Accessed September 20, 2010.
[1] For the various formulas used to compute t tests, see Ying (Joy) Zhang, "Confidence Interval
and the Student's T Test," http://projectile.is.cs.cmu.edu/research/public/talks/t-test.htm#types.
Note: what we have, using SPSS's terminology, called "paired-samples t tests," Zhang calls
"paired t tests," and what we have called "independent-sample t-tests," he calls "unpaired t
tests." He also describes "one sample" t tests, a subject not covered in POWERMUTT.
107
Download