Comparing Two Groups’ Means or Proportions Independent Samples t-tests

advertisement
Comparing Two Groups’
Means or Proportions
Independent Samples t-tests
Review
Confidence Interval for a Mean
Slap a sampling distribution* over a
sample mean to determine a range
in which the population mean has a
particular probability of being—such
as 95% CI.
If our sample is one of the middle 95%,
we know that the mean of the
population is within the CI.
2.5%
-1.96z
Slap a sampling distribution* over a
guess of the population mean to
determine if the sample has a very
low probability of having come from
a population where the guess is
true—such as α-level = .05.
If our sample mean is in the outer 5%,
we know to reject the guess, our
sample has a low chance of having
come from a population with the
mean we guessed.
Y-bar?
µ?
2.5%
Significance Test for a Mean
+1.96z
Y-bar
95% CI: Y-bar +/- 1.96 *(s.e.)
Y-bar?
2.5%
2.5%
-1.96z
+1.96z
µo=guess
z or t = (Y-bar - µo)/ s.e.
*sampling distribution: the way statistics for samples of a certain size would stack
up or be distributed after all possible samples are collected
Review
Let’s collect some data on educational aspirations and produce a 95% confidence interval to tell us
where the population parameter likely falls and then let’s do a test of significance where we
guess that average aspiration will be 16 years.
I collected a sample of 625 kids who reported their educational aspirations where 12 = high school, 16
equals 4 years of college and so forth. The average for the sample was 15 years with a
standard deviation of 2 years.
95% confidence interval 95% CI = Sample Mean +/- z * s.e.
1.
Find the standard error of the sampling distribution:
s / n = 2/√625 = 2/25 = 0.08
2.
Build the width of the Interval. 95% corresponds with a z of +/- 1.96.
+/- z * s.e = 1.96 * 0.08 = 0.157
3.
Insert the mean to build the interval:
95% CI = Sample Mean +/- z * s.e = 15 +/- 0.157
The interval:
14.84 to 15.16
We are 95% confident that the population mean falls between these values. (What does this say
about my guess???)
Review
Let’s collect some data on educational aspirations and produce a 95% confidence interval to tell us where the
population parameter likely falls and then let’s do a test of significance where we guess that average
aspiration will be 16 years.
I collected a sample of 625 kids who reported their educational aspirations where 12 = high school, 16 equals
4 years of college and so forth. The average for the sample was 15 years with a standard deviation
of 2 years.
Significance Test z or t = (Y-bar - µo)/ s.e.
1.
2.
3.
4.
5.
6.
7.
Decide -level ( = .05) and nature of test (two-tailed)
Set critical z or t: (+/- 1.96)
Make guess or null hypothesis,
Ho:  = 16
Ha:   16
Collect and analyze data
Calculate Z or t: z/t = Y-bar - o (s.e. = s/√n = 2/√625 = 2/25 = .08)
s.e.
z/t = (15 – 16)/.08 = -1/.08 = -12.5
Make a decision about the null hypothesis (reject the null: -12.5 < -1.96)
Find the P-value (look up 12.5 in z or t table). P < .0001
It is extremely unlikely that our sample came from a population where the mean is 16.
Comparing Two Groups
We’re going to move forward to more
sophisticated statistics, building on what
we have learned about confidence
intervals and significance tests.
Sociologists look for relationships between
concepts in the social world.
For example:
Does one’s sex affect income?
Focus on the relationship between the
concepts: Sex and Income
Does one’s race affect educational
attainment?
Focus on the relationship between the
concepts: Race and Educational
Attainment
Comparing Two Groups
In this section of the course, you will learn ways to infer from a sample
whether two concepts are related in a population.
Independent variable (X): That which causes another variable to
change when it changes.
Dependent variable (Y): That which changes in response to change in
another variable.
XY
(X= Sex or Race)
(Y= Income or Education)
The statistical technique you use will depend of the level of
measurement of your independent and dependent variables—the
statistical test must match the variables!
Levels of Measurement: Nominal, Ordinal, Interval-Ratio
Comparing Two Groups
The test you choose depends on level of measurement:
Independent
Dependent
Statistical Test
Dichotomous
Interval-ratio
Dichotomous
Independent Samples t-test
Nominal
Ordinal
Dichotomous
Nominal
Ordinal
Dichotomous
Cross Tabs
Nominal
Ordinal
Dichotomous
Interval-ratio
Dichotomous
ANOVA
Interval-ratio
Dichotomous
Interval-ratio
Correlation and OLS Regression
Comparing Two Groups
Independent
Dependent
Statistical Test
Dichotomous
Interval-ratio
Dichotomous
Independent Samples t-test
An independent samples t-test is concerned with whether a mean or proportion
is equal between two groups. For example, does sex affect income?
♀ Income
µ
Women’s mean
♂ Income
=
µ
Men’s Mean
???
Comparing Two Groups
Independent Samples t-tests:
Earlier, our focus was on the mean. We used the mean of
the sample (statistic) to infer a range for what our
population mean (parameter) might be (confidence
interval) or whether it was like some guess or not
(significance test).
Now, our focus is on the difference in the mean for two
groups. We will use the difference of the sample
(statistic) to infer a range for what our population
difference (parameter) might be (confidence interval) or
whether it is like some guess (significance test).
Comparing Two Groups
The difference will be calculated as such:
D-bar = Y-bar2 – Y-bar1
For example:
Average Difference in Income by Sex =
Male Average Income – Female Average Income
(What would it mean if men’s income minus
women’s income equaled zero?)
Comparing Two Groups
Like the mean, if one were to take random sample after random sample from
two groups and calculate and record the difference between groups each
time, one would see the formation of a Sampling Distribution for D-bar that
was normal and centered on the two populations’ difference.
=
Sampling Distribution of D-bar
Z
-3 -2 -1 0
1 2 3
95% Range
average
difference
between
two groups’
samples
Comparing Two Groups
It’s Power Time Again!
Using just a sample, our statistics will allow us to pinpoint the
difference between two groups in the population (confidence
interval) or to determine whether our sample could have come from
a population with a difference between two groups that we guessed
(significance test).
Comparing Two Groups
So the rules and techniques we learned for means and proportions apply to the
differences in groups’ means and proportions.
One creates sampling distributions to create confidence intervals and do
significance tests in the same ways.
However, the standard error of D-bar has to be calculated slightly differently.
For Means:
s.e. (s.d. of the sampling distribution) =
(s1)2
n1 +
(s2)2
n2
For Proportions:
s.e. =
1 (1 - 1)
n1
2 (1 - 2)
+
n2
Comparing Two Groups
Calculating a Confidence Interval for the Difference between Two
Groups’ Means
By slapping the sampling distribution for the difference over our
sample’s difference between groups, D-bar, we can find the values
between which the population difference is likely to be.
95% C.I. = D-bar +/- 1.96 * (s.e.)
= (Y-bar2 – Y-bar1) +/- 1.96 * (s.e.)
Or
= (2 – 1) +/- 1.96 * (s.e.)
99% C.I. = D-bar +/- 2.58 * (s.e.)
= (Y-bar2 – Y-bar1) +/- 2.58 * (s.e.)
Or
= (2 – 1) +/- 2.58 * (s.e.)
Comparing Two Groups
EXAMPLE:
We want to know what the likely difference is between male and female GPAs in a population of
college students with 95% confidence.
Sample:
50 men, average gpa = 2.9, s.d. = 0.5
50 women, average gpa = 3.1, s.d. = 0.4
95% C.I. = Y-bar2 – Y-bar1 +/- 1.96 * s.e.
1.
Find the standard error of the sampling distribution:
s.e. = (.5)2/ 50 + (.4)2/50 =  .005 + .003 =  .008 = 0.089
2.
Build the width of the Interval. 95% corresponds with a z or t of +/- 1.96.
+/- z * s.e = +/- 1.96 * 0.089 = +/- 0.174
3.
Insert the mean difference to build the interval:
95% C.I. = (Y-bar2 – Y-bar1) +/- 1.96 * s.e. = 3.1 - 2.9 +/- 0.174 = 0.2 +/- 0.174
The interval: 0.026 to
0.374
We are 95% confident that the difference between men’s and women’s GPAs in the population is
between .026 and 0.374.
If we had guessed zero difference, would the difference be a significant difference?
Comparing Two Groups
Conducting a Test of Significance for the Difference between Two Groups’
Means
By slapping the sampling distribution for the difference over a guess of the
difference between groups, Ho, we can find out whether our sample could
have been drawn from a population where the difference is equal to our
guess.
1. Two-tailed significance test for -level = .05
2. Critical z or t = +/- 1.96
3. To find if there is a difference in the population,
Ho: 2 - 1 = 0
Ha: 2 - 1  0
4. Collect Data
5. Calculate z or t: z or t = (Y-bar2 – Y-bar1) – (2 - µ1)
s.e.
6. Make decision about the null hypothesis (reject or fail to reject)
7. Report P-value
Comparing Two Groups
EXAMPLE:
We want to know whether there is a difference in male and female GPAs in a population of college
students.
Two-tailed significance test for -level = .05
Critical z or t = +/- 1.96
To find if there is a difference in the population,
Ho: 2 - 1 = 0
Ha: 2 - 1  0
4. Collect Data
Sample: 50 men, average gpa = 2.9, s.d. = 0.5
50 women, average gpa = 3.1, s.d. = 0.4
1.
2.
3.
s.e. = (.5)2/ 50 + (.4)2/50
5.
6.
7.
8.
=  .005 + .003 =  .008 = 0.089
3.1 – 2.9 – 0
=
0.2
= 2.25
0.089
0.089
Make decision about the null hypothesis: Reject the null. There is enough difference between
groups in our sample to say that there is a difference in the population. 2.25 >1.96
Find P-value: p or (sig.) = .0122
We have a 1.2 % chance that the difference in our sample could have come from a population
where there is no difference between men and women. That chance is low enough to reject the
null, for sure!
Calculate z or t: z or t =
Comparing Two Groups
The steps outlined above for
Confidence intervals
And
Significance tests
for differences in means are the same you would
use for differences in proportions.
Just note the difference in calculation of the
standard error for the difference.
Comparing Two Groups
• Now let’s do an example with SPSS, using
the General Social Survey.
Download