Sampling Distribution of the Mean

advertisement
Lecture 6
Hypothesis Tests Applied to Means I
Dog Colors
Judge 2
Green
Green
Observed
Expected
Judge 1
Judge 1
Red
Blue
Total
10
1
3
14
Red
2
5
2
9
Blue
0
1
9
12
7
14
Total
Sum
(Agree)
24
k
0.586351
% Agree
0.727273
Judge 2
Green
Green
Red
Blue
Total
5.090909
1
3
14
Red
2
1.909091
2
9
10
Blue
0
1
4.242424
10
33
Total
12
7
14
33
Sum
(Expected)
11.24242
2
Hypothesis tests applied to the means
Recall what you learned about sampling distributions in Chapter 4:
Sampling distribution: the distribution of the values of a particular statistic,
over a very large number of repeated samplings with equal sample size,
each taken from the same population.
Sample statistics: describe characteristics of a sample.
Standard error: The standard deviation of a sampling distribution.
Example:
Descriptive Statistics
N
Statistic
READING
STANDARDIZED SCORE
Valid N (lis twise)
271
Mean
Statistic
Std. Error
Std.
Deviation
Statistic
Variance
Statistic
51.82690
9.470882
89.698
.575315
271
3
Test statistics: describe differences or similarities between samples and allow us
to make inferences about their respective populations.
*As an observed statistic’s value falls farther and farther from the center of this
distribution, you would be less and less likely to believe that that sample
could have come from the hypothetical distribution that this sampling
distribution represents. This constitutes the conceptual framework for
hypothesis testing.
4
Recall the steps in the hypothesis testing process:
1.
Generate a research hypothesis—a theory-based prediction.
2.
State a null hypothesis(Ho)—one that, based on our theory, we believe
to be incorrect. That is, pretend that the data were chosen from a
population with known & uninteresting characteristics. The alternative
hypothesis (HA) is the logical converse of the null hypothesis.
3.
Obtain the sampling distribution of the statistic assuming that the null
hypothesis is true.
4.
Gather data.
5.
Calculate the probability of obtaining a statistic as or more extreme than
the one observed based on the sampling distribution.
6.
Decide whether the observed probability is too remote to support our
null hypothesis. If it is, then reject the null and support your theory.
7.
Substantively interpret your results.
5
Also recall that the decision can have several potential outcomes:
Truth
Decision
Ho True
Reject Ho
Type I error (a)
Power (1-b)
Retain Ho
Correct decision (1a)
Type II error
(b)
Ho False
And that a p-value indicates the probability of obtaining the observed statistic
value or more extreme assuming that the null hypothesis is true (as opposed
to alpha (α), which dictates the size of the rejection region based on the
researcher’s judgment).
6
Sampling Distribution of the Mean
One of the most interesting sampling distributions is the sampling
distribution of the mean—the distribution of sample means created by
repeatedly randomly sampling a population and creating equal-sized
samples. The characteristics of this distribution are summarized in the
central limit theorem:
Given a population with mean  X and variance  2, the sampling
distribution of the mean (the distribution of sample means) will
have a mean equal to  X (i.e.,  X 2  X ), a variance (  X ) equal
2
to  X , and a standard deviation (  X ) equal
n
X
to
. The distribution will approach the normal
n
distribution as N, the sample size, increases.
7
In English…..
Suppose you have a population, and you know the mean ( X ) and variance of
2
that population ( ) (recall that we almost never know these parameters).
Now suppose that you collect a very large number of random samples from that
population, each of size N, and compute the means of those samples. Now you
have a distribution of sample means—the sampling distribution of the mean.
Note that you’d have a slightly different sampling distribution if you selected a
different N.
8
The mean of the sampling distribution of the mean (  X ) equals the parameter
that you are estimating (  X ). In addition, the standard deviation of the
sampling distribution of the mean (  X , a.k.a. the
standard error of the mean) equals the population standard deviation divided
by the square root of the sample size (  X ).
N
In addition, the sampling distribution will be approximately normally
distributed when the sample size is large.
9
In R….
To demonstrate the concepts of the central limit theorem, let’s take some
random draws from a normal population with different sample sizes
10
Think about the following questions:
•
What is the value of the mean of a sample of N = the entire population?
•
What is the shape of the sampling distribution of the mean when N = the
entire population?
•
What is the standard deviation of the sampling distribution of the mean
when N = the entire population?
11
Revisiting the Z Test
Recall that the z-test is an inferential test that allows us to perform an
hypothesis test in situations in which we would like to determine whether
the mean of an observed sample could have come from a population with
a known population mean (  X ) and standard deviation ( X ).
Recall that you can standardize scores via: z 
X  X
X
Also, recall the following about the sampling distribution of the mean:
•
•
•
It has a mean equal to  ,X the population mean.
It has a standard deviation (standard error of the mean) equal to

X  X
N
It is normally distributed when the sample size, N, is large
12
We can use this information within the hypothesis testing framework in the
following way:
1.
Determine which test statistic is required for your problem and data.
*The z-test is relevant when you want to compare the observed mean of
a quantitative variable to a hypothetical population mean (theory-based)
and you know the variance of the population.
2. State your research hypothesis: that the observed mean does not come
from the population described by your theory.
3. State the alternative hypothesis: that the observed mean is not equal to
the hypothetical mean (i.e., X  0 or the appropriate one-tailed
alternative, like  X  0).
4. State the null hypothesis: that the observed mean equals the
hypothetical mean (i.e.,  X  0 or the appropriate one-tailed
alternative, like  X  0).
5. Determine the critical value for your test based on your desired a level.
13
6.
Compute your observed z-test statistic.
First, identify the location and dispersion of the relevant sampling
distribution of the mean. The location is dictated by the hypothetical
population mean (  X ). The dispersion equals the known population
standard deviation divided by the square root of the sample size:
X 
X
N
Second, turn the observed sample mean into a z-score from the
sampling distribution of the mean:
z
X 

or
z
X  0 X  0

X
X
N
7.
Compare the observed z-test statistic value to your critical value and
make a decision to reject or retain your null hypothesis.
8.
Make a substantive interpretation of your test results.
14
[Example]
Suppose we want to compare the mean GRE score of graduate students at Loyola
University Chicago to the GRE test-taking population.
We know the mean and standard deviation of that population—500 and 100, respectively.
Suppose the mean GRE score of our school is 565, based on 300 graduate students
last year.
Of course, we’d like to believe that our graduate students are more academically able than
the average graduate student—our research hypothesis. That means that we’ll use a
one-tailed test so that H0: X  0 , and HA:  X  0 .
If we adopt α = .05, then our one-tailed critical value (the value to exceed) equals 1.65
(from the z-table).
We compute our observed z-statistic by plugging our known values into the equation:
z
X  0 565  500 65


 11.27
100
X
5.77
300
15
The z-test statistic (11.27) is clearly larger than the critical value (1.65). It is clear that
the observed difference between the sample mean and the population mean is
much larger than would be expected due to sampling error. In fact, the p-value for
the observed statistic is less than .0001.
We would interpret this substantively with a paragraph something like this:
The mean GRE score for graduate students at Loyola University Chicago (565) is
considerably larger than the mean for the GRE testing population (500). This
difference is statistically significant (z = 11.27, p < .0001).
16
Graphically, here’s what we did:
0   X  500
X  565
z  11.27
 X  5.77
zCV  1.65
a  .05
GRECV  509.52
p  .0001
17
One-Sample t Test
The z-test is only useful in somewhat contrived situations--we hardly ever know
the value of the population standard deviation, so we can’t compute the
standard error of the mean.
We need a different statistic to apply to most real-world situations. The
appropriate statistical test is the one-sample t-test.
Recall the formula for a z-test.
z
X  0 X  0

X
X
N
It relies on the sampling distribution of the mean. We can create a parallel
statistic using the sample variance rather than the population variance.
t
X  0 X  0

sX
sX
N
18
We use a slightly different probability density function for the t-test than we do
for the z-test, because we now use the sample variance as an estimate of the
population variance. Specifically, we rely on the Student’s t distribution for
the t-test.
The feature that differentiates the various t distributions is the degrees of
freedom associated with the test statistic.
The degrees of freedom for a t-test relates to the number of data points that are
free to vary when calculating the variance. Hence, the degrees of freedom
for the t-test equals N – 1, and there is a separate probability distribution for
each number of degrees of freedom—recall that there was a single
probability distribution for the z-test. The lost degree of freedom is
attributed to the fact that the variance is based on the sum of the squared
deviations of observations from the mean of the distribution. Because the
deviations must sum to zero, one of the data values is not free to vary—one
degree of freedom is lost.
19
Let’s apply the one-sample t-test to the GRE data. We’d still like to believe that
our graduate students are more academically able than the average graduate
student (i.e., H0:  X  0 ), as stated on p.28, but in this case, we don’t
know the value of the population variance.
The COE’s mean GRE score of the 565, and the standard deviation equals 75, and
there are 300 students in the COE.
Our degrees of freedom equals 299 for this test (df=300-1). Looking at the
column in the t table on p.682 with the “Level of Significance for One-Tailed
Test” equals 0.05 and the df equals to  (since 299 is much larger than the
minimum value 100 in the df) , the critical value for this test at α=0.05 level is
1.645. Our observed t statistic is:
t
X  0 565  500 65


 15.01
sX
75
4.33
300
N
Since out test statistic (t=15.01) is larger than the critical t value (tc=1.645), our
decision and interpretation is the same as it was when we knew the
population variance.
20
SPSS Example:
Go to “Analyze””Compare means””One-Samples T Test”.
H0:
 reading  50
One-Sample Statistics
N
READING
STANDARDIZED SCORE
Mean
271
Std. Deviation
Std. Error
Mean
9.470882
.575315
51.82690
One-Sample Test
Test Value = 50
t
READING
STANDARDIZED SCORE
3.175
df
270
Sig. (2-tailed)
Mean
Difference
.002
1.826904
95% Confidence
Interval of the
Difference
Lower
Upper
.69423
2.95958
21
Two Matched-Samples t Test
A more common comparison in research is one in which two samples are
compared to determine whether there is a larger-than-sampling-error
difference between the means of the groups.
Two common strategies for constructing groups in experimental research: One
involves assigning individuals to groups via randomization (although not
necessary). The other involves matching individuals and assigning 1 member
of each matched pair to each group. Because the two matched-sample t-test
(a.k.a., two dependent-samples t-test) is a less complex extension of the one
sample t test, we’ll discuss it first.
22
But first, recall the reasons why we might do matching and the two most
common methods of matching.
We typically match cases because there are extraneous variables that are strongly
related to the outcome variable and we want to make sure that observed
differences between the groups on the dependent variable cannot be
attributed to group differences with respect to these extraneous variables.
For example, we may want to ensure that groups are equivalent on SES. We may
match samples by pairing individuals on levels of the extraneous variable
(matched samples), or we may expose the same individual to multiple
conditions (repeated measures).
23
The groups are compared by examining the difference between the two members
of each pair of individuals. The relevant statistic is the average difference
score (Note that N is the number of pairs.).
N
 X1i  X 2i
D  i 1
Ni
Although our null, theory-based, value for the magnitude of this difference can be
any value, we typically are interested in determining whether the difference is
non-zero. Hence, we state our null hypothesis to be that the mean difference
in the population equals zero (i.e., Ho:  D = 0).
 D  1   2  0   D
0
Formulating the null hypothesis in this way allows us to use a variation of the onesample t-test to make the comparison.
t
D   D0
sD
D   D0

sD
N
24
As an example, consider data from a study of the reading interests of 18 pairs of
college-educated husbands and wives. Each individual in the sample was
interviewed and asked how many books he/she had completed in the past
year. The research question is do males and females who come from similar
environments engage in similar levels of reading? This implies a two-tailed
null hypothesis ( D  0 ), and the corresponding alternative hypothesis
(  D  0).
Our degrees of freedom equal 17, so our two-tailed critical value using a = .05 is
2.11. The mean and standard deviation of the differences in the sample were
1.16 and 2.88, respectively. So, our t-statistic is:
D   D0 1.16  0
1.16
t


 1.71
sD
2.88
0.6788
18
N
Because the observed t-statistic is not more extreme than the critical value, we
retain the null hypothesis. That is, we do not have evidence that men and
women read different amounts. Incidentally, the p-value for the observed t
statistic equals .11.
25
SPSS Example:
Go to “Analyze””Compare means””Paired-Samples T Test”.
Difference: (Reading Score - Math Score)
H0:
D  0
Paired Samples Statistics
Mean
Pair
1
READING
STANDARDIZED SCORE
MATHEMATICS
STANDARDIZED SCORE
N
Std. Deviation
Std. Error
Mean
51.87816
270
9.450733
.575153
51.71431
270
10.083413
.613657
Pa ired Sa mpl es Correlations
N
Pair
1
READING
STANDARDIZED SCORE
& MATHEMATICS
STANDARDIZED SCORE
Correlation
270
.714
Sig.
.000
26
SPSS Example (cont’):
H0: D  0
Pa ired Sa mpl es Test
Paired Differenc es
Mean
Pair
1
READING
STANDARDIZED SCORE
- MATHEMATICS
STANDARDIZED SCORE
.163848
St d. Deviat ion
St d. Error
Mean
7.406301
.450733
95% Confidenc e
Int erval of t he
Difference
Lower
Upper
-.723565
1.051261
t
df
.364
Sig. (2-tailed)
269
.717
Another way to perform the same analysis:
We can calculate the differences between pairs and form a
new variable, I called it “diff”, using “transformcompute” to
calculate this new variable.
27
The outputs for this analysis.
One-Sample Statistics
N
diff
270
Mean
.1638
Std. Deviation
7.40630
Std. Error
Mean
.45073
One-Sample Test
Test Value = 0
t
diff
df
.364
269
Sig. (2-tailed)
.717
Compare them to the previous one.
Mean
Difference
.16385
95% Confidenc e
Int erval of t he
Difference
Lower
Upper
-.7236
1.0513
Pa ired Sa mpl es Test
Paired Differenc es
Mean
Pair
1
READING
STANDARDIZED SCORE
- MATHEMATICS
STANDARDIZED SCORE
.163848
St d. Deviat ion
St d. Error
Mean
7.406301
.450733
95% Confidenc e
Int erval of t he
Difference
Lower
Upper
-.723565
1.051261
t
df
.364
Sig. (2-tailed)
269
.717
28
Try the following questions in our text.
P.206
7.6
7.7
7.10
7.13
P.207
7.16
7.17
7.18
29
Download