Section 7.2 Sample Proportions

advertisement
What Is a Sampling Distribution?
Population Distributions vs. Sampling
Distributions
There are actually three distinct distributions
involved when we sample repeatedly and
measure a variable of interest.
1) The population distribution gives the values
of the variable for all the individuals in the
population.
2) The distribution of sample data shows the
values of the variable for all the individuals in
the sample.
3) The sampling distribution shows the
statistic values from all the possible samples of
the same size from the population.
Describing Sampling Distributions
Center: Biased and unbiased estimators
In the chips example, we collected many samples of size 10 and
calculated the sample proportion of red chips. How well does the
sample proportion estimate the true proportion of red chips, p = 0.5?
Note that the center of the approximate sampling
distribution is close to 0.5. In fact, if we took ALL
possible samples of size 10 and found the mean of
those sample proportions, we’d get exactly 0.5.
Definition:
A statistic used to estimate a parameter is an unbiased
estimator if the mean of its sampling distribution is equal
to the true value of the parameter being estimated.
What Is a Sampling Distribution?
We examine a sampling distribution the same way we
analyze any other distribution: CENTER, SHAPE,
SPREAD, OUTLIERS.
Describing Sampling Distributions
To get a trustworthy estimate of an unknown population parameter, start by using a
statistic that’s an unbiased estimator. This ensures that you won’t tend to
overestimate or underestimate. Unfortunately, using an unbiased estimator doesn’t
guarantee that the value of your statistic will be close to the actual parameter value.
n=100
n=1000
Larger samples have a clear advantage over smaller samples. They are
much more likely to produce an estimate close to the true value of the
parameter.
What Is a Sampling Distribution?
Spread: Low variability is better!
Variability of a Statistic
The variability of a statistic is described by the spread of its sampling distribution. This
spread is determined primarily by the size of the random sample. Larger samples give
smaller spread. The spread of the sampling distribution does not depend on the size of
the population, as long as the population is at least 10 times larger than the sample.
Describing Sampling Distributions
We can think of the true value of the population parameter as the bull’s- eye on a
target and of the sample statistic as an arrow fired at the target. Both bias and
variability describe what happens when we take many shots at the target.
Bias means that our aim is off and
we consistently miss the bull’s-eye
in the same direction. Our sample
values do not center on the
population value.
High variability means that
repeated shots are widely scattered
on the target. Repeated samples do
not give very similar results.
The lesson about center and spread is
clear: given a choice of statistics to
estimate an unknown parameter,
choose one with no or low bias and
minimum variability.
What Is a Sampling Distribution?
Bias, variability, and shape
Section 7.2
Sampling Distribution of
p
What’s in Store…
 Today, we’ll focus on one sampling
distribution – the sampling distribution of p̂ .
 So, we’re going to talk about the center,
shape, and spread of the sampling
distribution of p̂
 Center = mean
 Spread = standard deviation
The Sampling Distribution of P-hat
 pˆ  p
 pˆ 
In words, the mean of the
sampling distribution of p-hat is
p. That makes p-hat an unbiased
estimator of p.
p(1  p )
n
Let’s take a quick look at
where these formulas
come from.
Rules to live by
 We learned that a sampling
distribution is approximately normal
IF the sample size is large.
 How large is large enough? Hmmm…
 Also, a population should be at least
10 times the size of the sample.
Rules of Thumb and the Normal
Approximation
 We can use the normal approximation
for p-hat ONLY when
np ≥ 10 AND n(1-p) ≥ 10.
 We can use the formula for the
standard deviation of p-hat only when
the population is at least 10 times the
sample size. In symbols,
population ≥ 10n.
Example
 A polling organization asks an SRS of
1500 first-year college students
whether they applied for admission to
any other college. In fact, 35% of all
first-year students applied to colleges
besides the one they are attending.
What is the probability that the
random sample of 1500 students will
give a result within 2 percentage
points of the true value?
State, Plan, Do, Conclude =
Parameter, Conditions, Formula, Sentence
 Parameter
 p = proportion of 1st year college
students who applied to more than 1
college
 Conditions
 np≥10 and n(1 – p) ≥10
 Population ≥10n
 Formula
 I say “normal” you say “Z-score!”
 Sentence
Key Points
 Always define the population of interest.
 State the values of n, p, and 1-p.
 Check BOTH rules of thumb by plugging in
values.
 Graph the distribution you’re interested in.
 Convert to a Z-score. Make sure you know
what the mean and standard deviation are for
the problem.
 State the probability with symbols. Find the
probability using Table A.
 Write your conclusions in words in the context
of the problem.
Next Example
 One way of checking undercoverage and
nonresponse is to compare the sample with
known facts about the population. Suppose
11% of Americans are left-handed. The
proportion p-hat of left handed in an SRS of
1500 adults, therefore, should be close to
0.11. If a national survey contains only
9.2% left handed, should we be suspect
that the sampling procedure is somehow
underrepresenting left handed? To answer
this, we will find the probability that a
sample of size 1500 contains no more than
9.2% left handed.
Download