A P STATISTICS LESSON 9 – 1 ( DAY 1 )

advertisement
A P STATISTICS
LESSON 9 – 1
( DAY 1 )
SAMPLING DISTRIBUTIONS
ESSENTIAL QUESTION:
How often would this method give a
correct answer if I used it very many
times?
Objectives:
 To distinguish between parameters and
statistics.
 T0 define and recognize sampling
distributions, bias and variability.
 The advantages and disadvantages in size of
sample.
Introduction
The reasoning of statistical inference
rests on asking, “How often would this
method give a correct answer if I used
it many, many times?”
If it doesn’t make sense to imagine
repeatedly producing your data in the
same circumstances, statistical
inference is not possible.
Introduction (continued…)
All agree that inference is most secure
when we produce data by random
sampling or randomized comparative
experiments.
The reason is that when we use
chance to choose respondents or
assign subjects, the laws of probability
answer the question “What would
happen if we did this many times?”
Parameter, Statistic
A parameter is a number that describes the
population. A parameter is a fixed number,
but in practice we do not know its value
because we cannot examine the entire
population.
A statistic is a number that describes a
sample. The value of a statistic is known
when we have taken a sample, but it can
change from sample to sample. We often
use a statistic to estimate an unknown
parameter.
Example 9.1 Page 488
Making Money
The mean income of the sample of
households contacted by the Current
Survey was x = $57,045. The number
$57,045 is a statistic because it
describes the Current Population
Survey sample.
The parameter of interest is the mean
income of all of these households. We
don’t know the value of this parameter.
Symbols for Populations and Samples
The symbol for population proportion is
p.
The symbol for sample proportion is p.
Since most of the time the actual
parameters are not known, the mean
and standard deviation for a sample
are used for the parameters mean and
standard deviation.
Example 9.1 (continued…)
The representation for a parameter
mean is the Greek letter μ which is
“mu”.
The mean of the sample is the symbol
x.
The basic fact that every sample’s μ
will probably be different is called
sampling variability. The value of a
statistic varies in repeated random
samples.
Sampling Variability
If we take many samples:
1. Take a large number of samples from the
same population.
2. Calculate the sample mean x or
proportion p for each sample.
3. Make a histogram of the values of x and p.
4. Examine the distribution displayed in the
histogram for shape, center, and spread,
as well as outliers or other deviations.
Example 9.3
page 490
Baggage Check!
Simulation is a powerful tool for studying
chance.
It is much faster to use Table B than to
actually draw repeated SRS’s, and
much faster yet to use a computer
program to produce random digits.
Sampling Distribution
The sampling distribution of a statistic
is the distribution of values taken by
the statistic in all possible samples of
the same size from the same
population.
Strictly speaking, the sampling
distribution is the ideal pattern that
would emerge if we looked at all
possible samples of the same size
from the population.
Describing Sampling Distributions
Describe a sampling distribution by
finding the center and spread of the
sample.
Example 9.5 page 494
Are You a Survivor Fan?
Figure 9.5 shows the results of drawing 1000
SRSs of size n = 100 from a population with p
= 0.37.
We see that:
 The overall shape of the distribution is
symmetric and approximately normal.
 The center of the distribution is very close to
the true value p = 0.37.
The Bias of a Statistic
Sampling distributions allow us to describe bias
more precisely by speaking of the bias of a
statistic rather than bias in a sampling method.
(a) Sample
size 100
(b) Sample
size 1000
Bias concerns the center of the sampling
distribution. The statistic from the larger sample
is less variable.
Unbiased Statistics
A statistic used to estimate a parameter is
unbiased if the mean of its sampling
distribution is equal to the true value of the
parameter being estimated.
An unbiased statistic will sometimes fall
above the true value of the parameter and
sometimes below if we take many samples.
Because its sampling distribution is centered
at the true value, however, there is no
systematic tendency to overestimate or
underestimate the parameter.
The Variability of a Statistic
The statistics whose sampling distribution
is unbiased when the its center is
centered at the true proportion.
The sample proportion p from a random
sample of any size is an unbiased
estimate of the parameter p.
The Variability of a Statistic
(continued…)
The variability of a statistic is described by the
spread of its sampling distribution. This
spread is determined by the sampling design
and the size of the sample. Larger samples
give smaller spread.
As long as the population is much larger than
the sample (say, at least 10 times as large),
the spread of the sampling distribution is
approximately the same for any population
size.
Bias and Variability
 Bias means that our aim is off and we consistently miss
the bulls-eye in the same direction.
 Our sample values do not center on the population
value.
 High variability means that repeated shots are widely
scattered on the target.
 Notice that low variability (shots are consistently away
from the bulls-eye in one direction), and low bias (shots
centered on the bulls-eye), can accompany high
variability (shots that are widely scattered).
 Properly chosen statistics computed from random
samples of sufficient size will have low bias and low
variability.
Figure 9.9
Page 500
Download