Sampling Distributions (Chapter 4)

advertisement
STAT 305: Chapter 9 – Sampling Distribution of the Sample Mean (𝑌Ė…)
Spring 2014
Introduction:
When take a sample of size n from a population and calculate summary statistics
like the sample mean (đ‘ĻĖ…), the sample median (m), the sample variance ( s 2 ), the
sample standard deviation (s), or the sample proportion (𝜋Ė‚) we must realize that
these quantities will
__________________________________________ and hence are themselves
___________________.
Any random variable in statistics has a probability distribution. We have been
talking about three common probability distributions in statistics. When Y = # of
“successes” in n independent trials we used the binomial distribution to talk about
Y probabilistically. When Y was continuous and had an approximate bell-shaped
distribution we used the normal distribution to calculate probabilities and
quantiles associated with Y.
Because the summary statistics discussed above are random variables they also
have a probability distribution that determines the likelihood of certain values of
these statistics being obtained. The distribution of a summary statistic, e.g. the
sample mean (đ‘ĻĖ…) is
called the ______________________________________.
In this handout we explore the sampling distributions of the sample mean (𝑌Ė…).
Ė…
Sampling Distribution of 𝒀
The sample mean (𝑌Ė…) is a random quantity that varies from sample to sample. The
probability distribution the sample mean follows is called the sampling distribution of 𝑌Ė….
The sampling distribution demo I showed in class is found at the following web address:
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/
160
STAT 305: Chapter 9 – Sampling Distribution of the Sample Mean (𝑌Ė…)
Spring 2014
The Central Limit Theorem for the Sample Mean (CLT) ~ tells us
about the sampling distributions of the sample mean (𝑌Ė…). The CLT for the
sample mean 𝑌Ė… says the following:
1.
2.
3. The sampling distribution will be ___________ if either of the two conditions
below are met:
ī‚ˇ
or if
ī‚ˇ
We now consider applications of the central limit theorem (CLT).
Applications of the CLT to Decision Making
Example 9.1: Cholesterol levels of adult males (50-60 yrs. old)
The mean blood cholesterol level of adult males (50-60 yrs. old) is 200 mg/dl with
a standard deviation of 𝜎 = 40 mg/dl. Assume also that blood cholesterol levels
are approximately normally distributed in this population.
a) What is the probability that when taking a sample of size n = 25 that you
would obtain sample mean greater than 225 mg/dl?
161
STAT 305: Chapter 9 – Sampling Distribution of the Sample Mean (𝑌Ė…)
Spring 2014
b) Give a range of values that we would expect the sample mean to fall
approximately 95% of the time.
c) Suppose we took sample of adult males between the ages of 50 – 60 who are also
strict vegetarians and obtained sample mean of đ‘ĻĖ… = 188 mg/dl. Does this provide
evidence that the subpopulation of vegetarians have a lower mean cholesterol level
that the greater population of men in this age group? Explain.
Example 9.2: Mercury Levels Found in Boulder Reservoir Walleyes
Fish consumption guidelines suggest you should limit the number of fish you eat
with Hg levels above .25 ppm. Is there evidence to suggest that walleyes from
Boulder Reservoir have a mean Hg content exceeding .25 ppm?
162
STAT 305: Chapter 9 – Sampling Distribution of the Sample Mean (𝑌Ė…)
Spring 2014
Confidence Intervals for the Population Mean (𝝁)
Example 9.3: Suppose we are trying to estimate the mean protein content of
zebra mussels, which are becoming an increased part of the diet for ducks on the
Mississippi River. A sample of n = 25 zebra mussels are analyzed for their
protein content and a sample mean of đ‘ĻĖ… = 9.14 units.
This is called a _____________________ for the population mean (𝜇) because it
yields a single value for this unknown quantity.
A better estimate might be 9.14 give or take _____ units, i.e. ______ up to
_______.
This is called an __________________________ as it gives a range or interval of
plausible values for the population mean (𝜇).
How do we know if this is a good interval estimate? __________________
What properties should a good interval estimate have?
ī‚ˇ
It
ī‚ˇ
The central limit theorem states that if our sample size (n) is sufficiently large,
Y ī€­ī­
īŗ
~ N (0,1)
then Y ~ N ( ī­ ,
) which also implies that after standardizing Z ī€Ŋ
īŗ
n
n
This means that when we collect our data the probability our observed sample
mean will fall within two standard errors of the mean is approximately .95 or a
95% chance, or being more precise we could use ī‚ą 1.96 standard errors because
P(ī€­1.96 ī€ŧ Z ī€ŧ 1.96) ī€Ŋ .9500
Which gives,
īƒĻ
īŗ
īŗ īƒļ
Pīƒ§īƒ§ ī­ ī€­ 1.96
ī€ŧ Y ī€ŧ ī­ ī€Ģ 1.96
īƒˇīƒˇ ī€Ŋ .9500
n
nīƒ¸
īƒ¨
For a 99% chance we use _______ and for 90% we use ________ in place of 1.96.
163
STAT 305: Chapter 9 – Sampling Distribution of the Sample Mean (𝑌Ė…)
Spring 2014
Starting with the statement,
īƒĻ
īƒļ
īƒ§
īƒˇ
Y ī€­ī­
P(ī€­1.96 ī€ŧ Z ī€ŧ 1.96) ī€Ŋ Pīƒ§ ī€­ 1.96 ī€ŧ
ī€ŧ 1.96 īƒˇ ī€Ŋ .9500
īŗ
īƒ§
īƒˇ
n
īƒ¨
īƒ¸
we will perform algebraic manipulations to isolate the population mean 𝜇 in the
middle of this inequality instead. By doing this we will obtain an interval that
has a 95% chance of covering the true population mean (𝜇).
Algebraic manipulations of the inequality above:
This says that the interval from Y ī€­ 1.96 īƒ—
īŗ
up to Y ī€Ģ 1.96 īƒ—
īŗ
has a 95% chance of
n
n
covering the true population mean ī­. This interval is simply the sample mean plus or
minus roughly two standard errors. However, this interval cannot be calculated in
practice! WHY?
A “simple fix” to this would be replace ____ by the estimated standard deviation from
Y ī€­ī­
our data _____. The problem with our “simple fix” is that the distribution of
is
s
n
not standard normal, i.e. N(0,1) and therefore the 1.96 value will not necessarily produce
the desired level of confidence.
FACT:
If the population we are sampling from a population that is approximately
normal then,
Y ī€­ī­
has a t-distribution with degrees of freedom df = n – 1.
s
n
164
STAT 305: Chapter 9 – Sampling Distribution of the Sample Mean (𝑌Ė…)
Spring 2014
What does a t-distribution look like?
Facts about the t-distribution:
ī‚ˇ
ī‚ˇ
ī‚ˇ
Examples: Using the t-table to find confidence intervals
a) n = 20 and 95% confidence t =
b) n = 20 and 99% confidence t =
c) n = 50 and 90% confidence t =
d) n = 10 and 95% confidence t =
The basic form of most confidence intervals is:
(estimate) ī‚ą (table value)( SE of estimate)
MARGIN OF ERROR
General Form for a Confidence Interval for the Mean
For the population mean we have,
Y ī‚ą ( t - table value) SE ( X ) or
Y ī‚ąt
s
n
The appropriate columns in t-distribution table) for the different confidence intervals are
as follows:
90% Confidence look in the .05 column (if n is “large” we can use 1.645)
95% Confidence look in the .025 column (if n is “large” we can use 1.960)
99% Confidence look in the .005 column (if n is “large” we can use 2.576)
165
STAT 305: Chapter 9 – Sampling Distribution of the Sample Mean (𝑌Ė…)
Spring 2014
Example 9.3 (cont’d):
Suppose we are trying to estimate the mean protein content of zebra mussels, which are
becoming an increased part of the diet for ducks on the Mississippi River. A sample of n
= 25 zebra mussels are analyzed for their protein content and a sample mean of y ī€Ŋ 9.14
units with a sample standard deviation of s = 2.98 units.
a) Use this information to find a 95% CI for the mean protein content found in the tissues
of zebra mussels, assuming that protein content of zebra mussels has a normal
distribution.
Suppose a sample of n = 25 freshwater clams was obtained and similar protein analysis
was conducted resulting in a sample mean y ī€Ŋ 26.66 units with a standard deviation of
s = 12.12 units.
b) Find a 95% confidence interval for the mean protein content found in the tissue of
freshwater clams.
c) Does this interval in conjunction with the interval obtained for zebra mussels provide
evidence that freshwater clams are richer in protein than zebra mussels?
166
Download