The sampling distribution for y

advertisement
Chapter 23
1
The sampling distribution for y
In order to determine the population mean value µ
of a quantity of interest y, we sample values of y
€ compute a sample mean y .
from the population and
How good is the estimate y ? We study the
sampling distribution for y to answer this question.
€
A model for the sampling distribution is based on
the Central Limit€Theorem:
€
Regardless of the population from which we sample
or the statistic X we may be measuring, the normal
model more closely describes the distribution of X
the larger the sample size gets, and while the mean
E(X) is independent of n, its standard deviation
SD(X) decreases as n gets larger; therefore, X is a
better and better approximation of its true mean
value E(X) for larger and larger n.
In our present situation, we choose for the statistic
X the sample mean y . This leads to the following
observations about its sampling distribution.
€
Chapter 23
2
• We select a SRS of size n of independent
measurements of the quantity y from a population
with population mean value E(y) = µ and standard
deviation SD( y) = σ .
• the mean of the sampling distribution of the
statistic y is E(y ) = E(y) = µ (by the CLT).
€
• the variance of the sampling distribution of y is
€
€
€
∑ y 
Var( y ) = Var 

n

 €
1
=
⋅Var ∑ y
2
n
1
= 2 ⋅nVar(y)
n
Var(y)
=
n
σ2
=
n
( )
• so the standard deviation of the sampling
σ
distribution
€
of y is SD( y ) =
.
n
• since the population parameter σ is typically
unknown, we must estimate this standard
€
s
€ the standard error SE(y ) =
deviation by
.
n
€
Chapter 23
3
Student’s t distribution model
We have noted that
• the sampling distribution for y is well described by
a normal model for large sample sizes; and
• its standard deviation is SD( y ) = σ / n , but since
σ is typically unknown,
€ we estimate this standard
deviation in practice by the standard error
SE(y ) = s / n . €
€
€
These two observations suggest some difficulties with
our set-up so far:
• When the sample size is not large, the normal
model may do a poorer job of describing the
sampling distribution for y ; but in practice, we
either choose n large enough that this issue is not a
serious one, or we find that we are sampling from a
population that is nearly normal in shape so that
€
the sampling distribution
for y behaves normally
even for small sample sizes.
• More of a problem, however, is that replacing
SD( y ) = σ / n by SE(y ) = s / n introduces a
€
significant amount of additional variability due to
the fact that s is a sample statistic with its own
variability that increases as the sample decreases
in size. €
€
Chapter 23
4
This problem was resolved by W.S. Gossett in 1908. He
realized that use of the standard error estimate
SE(y ) = s / n required a new model for the sampling
distribution for y .
The standardized sample mean statistic
€
t=
y−µ
(s / n )
follows a distribution called the Student’s t model.
This distribution is not normal, although it is bell€ about 0 like the standardized
shaped and symmetric
normal z distribution N(0, 1). However, the Student’s t
distribution
• has thicker tails (larger spread) than the standard
normal distribution (because substituting σ with s
introduces more variation in values of t than
appear in the normal model for z), and
• behaves more like a normal distribution as the
number of degrees of freedom (n – 1) increases
(increasing n leads to a closer estimate of σ by s).
Since the t-distributions have thick tails, outliers tend
to be a bit more common than for normal distributions,
so t procedures are strongly influenced by the presence
of outliers.
Chapter 23
5
One-sample t procedures
Assumptions: independently selected
measurements (hence not more than 10% of the
size of the population when selected without
replacement) form a SRS drawn from a nearly
normal population. (View a histogram or normal
probability plot of the data to check this condition.)
A level C confidence interval for µ is
y ± t*
n −1
where t *
s
n
is the appropriate critical value
€
depending on the
€ level of confidence C for the
t-distribution with n – 1 degrees of freedom
n −1
€
Chapter 23
6
Choosing the sample size
Since
ME = t *
n −1
s
n
in the confidence interval for the mean, we can
assure a certain margin of error in our estimate by
setting a desired€value for ME, and solving the
above equation for n. Unfortunately, we need to
know s to do this, and we can’t know s until after
we have set the sample size n! Instead, we can
sometimes perform a pilot study to get an initial
value for s and then do the calculation above to find
the appropriate value of n for the main study.
(Remember to round up to determine n.)
Chapter 23
7
One-sample t hypothesis test
• State hypotheses:
Null hypothesis
H0: µ = µ0
Alternative hypothesis
HA: µ > µ0, or µ < µ0, or µ ≠ µ0
• Choose the model:
An independently selected SRS drawn from a
nearly normal population satisfying the 10%
Condition, so Student’s t model applies to
standardized sampling distribution for y
• Mechanics:
Compute t-statistic based on H€
0: t =
y−µ
(s / n )
.
Probability associated with appropriate HA:
P = P( T ≥ t ), or P = P( T ≤ t ), or P = 2P( T ≥ t )
€
• Conclusion:
Assess evidence against H0 in favor of HA
depending on how small P is.
[TI-83: STAT TESTS T-Test… ]
Chapter 23
8
The Sign test
One way to get around having to meet the stringent
conditions underlying the t test is to transform the
problem from a quantitative one into a categorical one:
recharacterize the hypothesized mean value µ as a
hypothesized median value, then ask what proportion of
the data exceed or fall shy of the value of this median?
This replaces the original underlying statistic y with a
proportion
pˆ = proportion of the data greater than
€
(or less than) the hypothesized
median
€
We then carry out a one-proportion z test with
null hypothesis H0: p = .50
and suitable alternative hypothesis. Of course, we
haven’t escaped the need to meet some conditions: our
data still need to come from a SRS satisfying the 10%
Condition and the Success/Failure Condition.
Download