Large sample size notes

advertisement
Chapter 8
Limit theorems and Statistics
In what follows we assume that we have a sequence Xn of independent,
identically distribution random variables (independent trials process), with
finite mean µ and finite variance σ 2 . We will be interested in the behavior
of the sums or averages
S N = X1 + · · · + X N ,
AN = SN /N.
Sometimes such sums or averages arise as we try to improve the behavior
of a random variable, like stock market daily closings. In other cases they
can sneak into natural problems. Suppose the intervals between bus arrivals
are independent, and exponentially distributed with a common mean and
variance. Then the time of arrival of bus N is
TN = (T1 − 0) + (T2 − T1 ) + · · · + (TN − TN −1 ),
so we have a sum of iid random variables.
8.1
Law of Large Numbers
We begin by considering Chebyshev’s inequality in case X has a continuous
density. This is a helpful theoretical tool because it is so general, but the
estimate it gives is not very sharp in many particular cases.
Theorem 8.1.1. (Chebyshev’s inequality) Suppose X has a density function
f (x), with finite mean µ and finite variance σ 2 . Then for any number ǫ > 0,
P (|X − µ| ≥ ǫ) ≤
93
σ2
.
ǫ2
94
CHAPTER 8. LIMIT THEOREMS AND STATISTICS
Proof.
P (|X − µ| ≥ ǫ) =
Z
Z
µ−ǫ
f (x) dx +
−∞
Z
∞
f (x) dx
µ+ǫ
∞
|x − µ| 2
1
σ2
2
≤
[
] f (x) dx = 2 E([X − µ] ) = 2 .
ǫ
ǫ
ǫ
−∞
The result is the same if X is a discrete random variable.
Sometimes one sees a rephrasing of this theorem, measuring the distance
from the mean as multiples of σ.
Corollary 8.1.2. Suppose X has a density function f (x), with finite mean
µ and finite variance σ 2 . Then for any number k > 0,
P (|X − µ| ≥ kσ) ≤
1
.
k2
For common examples like the exponential density or the Gaussian density, the rate of decay is exponential. Consider the standard normal, with
µ = 0, σ = 1. If k ≥ 2, then
Z
1
exp(−x2 /2) dx
P (X ≥ k) = √
2π |x|≥k
2
≤√
2π
Z
√
√ Z
2
2
exp(−x) dx = √ exp(−k),
exp(−x2 /2) dx ≤ √
π x≥k
π
x≥k
since |x| ≥ 2 implies
|x| ≤ x2 /2.
We can use Chebyshev’s inequality to show that with high probability
AN is close to its mean.
Theorem 8.1.3. (Weak Law of Large Numbers) Let X1 , X2 , . . . be independent, identically distribution random variables (independent trials process),
with finite mean µ and finite variance σ 2 . For any ǫ > 0
lim P (|AN − µ| < ǫ) = 1.
N →∞
95
8.2. CENTRAL LIMIT THEOREM
Proof. Independence gives
V (AN ) = σ 2 /N,
E(AN ) = µ.
By Chebyshev’s inequality,
P (|AN − µ| ≥ ǫ) ≤
σ2
.
Nǫ2
If ǫ is fixed,
lim P (|AN − µ| ≥ ǫ) = 0,
N →∞
which is equivalent to the stated result.
8.2
Central Limit Theorem
The Law of Large Numbers says that the density (or distribution) function
of AN becomes concentrated around the mean as N → ∞. A more detailed
description of the shape of the density is provided by the next result. This
striking result says that whatever the distribution of Xn , if we average a
large enough sample, the result will look Gaussian. To more easily understand the shape of the density, we replace the sum SN or average AN by the
standardized random variable
∗
SN
=
AN − µ
SN − Nµ
√
√
=
Nσ
σ/ N
∗
∗
∗
The random variable SN
has mean E(SN
) = 0 and variance V (SN
) = 1.
As we saw earlier, we want to recenter and rescale the random variable
to have mean 0 and variance 1 so our comparison makes sense.
Theorem 8.2.1. (Central Limit Theorem) The distribution of
∗
SN
=
SN − Nµ
√
Nσ
converges to the standard normal distribution n(z; 0, 1) with mean 0 and variance 1 as N → ∞, in the sense that for all a < b,
Z b
1
2
∗
lim P (a < SN < b) = √
e−x /2 dx.
N →∞
2π a
96
CHAPTER 8. LIMIT THEOREMS AND STATISTICS
Of course for applications it would be helpful to have a better picture of
how big N should be. A common rule of thumb is that the normal approximation for the sample distribution of the mean will be accurate if N ≥ 30.
This rule hides a number of issues. Usually the rule is ok if the density
function of Xn is roughly normal. In fact, if Xn are normal, but not necessarily identically distributed, then the sample means will be exactly normal,
and we only have to calculate the mean and variance. On the other hand,
if the density of Xn is quite far from normal, then taking N = 30 may be
inadequate.
Example Suppose patients arrive at the hospital emergency room on average every ten minutes. We use the exponential model with independent
intervals between arrivals. Let’s see when patient 100 is likely to arrive. In
this case we have
and
µ = E(X) = E(Tn − Tn−1 ) = 1/λ = 10,
λ = 0.1,
σ = 1/λ = 10.
Also
TN = SN .
Thus we have
T100 − 10 ∗ 100
1
P (a <
< b) = P (a <
< b) ≃ √
10 ∗ 10
2π
If a = −1, b = 1, then
Z 1
1
2
√
e−x /2 dx ≃ .7
2π −1
From
T100 − 10 ∗ 100
< 1) ≃ .7
P (−1 <
10 ∗ 10
we conclude that with probability about .7
∗
SN
900 < T100 < 1100.
If a = −3, b = 3, then
Z 3
1
2
√
e−x /2 dx ≃ 1 − .0026 = .9974.
2π −3
So with very high probability
700 < T100 < 1300.
Z
a
b
e−x
2 /2
dx.
97
8.3. LARGE SAMPLE STATISTICS
8.3
8.3.1
Large sample statistics
Terminology
Roughly speaking, in statistics we try to estimate characteristics of a population, which could be a population of people, manufactured products, bacteria,
etc. In many cases the population is large, and we want to understand its
characteristics by examining a subset of the population, called a sample. The
characteristics we wish to measure are real numbers (e.g. height, probability
of defects, prevalence of a gene type) which are considered random variables.
That is, we try to draw inferences about a population by examining a set
of random variables X1 , . . . , XN . For our discussions these random variables
are assumed to be independent, with the same probability distribution (identically distributed). Such a collection of random variables is called a random
sample of size N from the population. Any function of the random variables
in a random sample is called a statistic.
Our attention will be focused on two statistics, the sample mean
N
1 X
Xn ,
µ=
N n=1
and the sample variance
1 X
(Xn − µ)2 .
σ =
N − 1 n=1
N
2
The sample standard deviation is σ ≥ 0.
Let’s consider trying to estimate the average height of adult males in the
U.S. population. Since it would be an enormous undertaking to measure
150 million people, we’ll settle for a sample, which might have, for example
N = 10, 100, 1000, 104. Of course many outcomes are possible. We might
accidentally sample professional basketball players, or have a sample overly
representing short people. As we take different samples of size N we will
see variation in the results. That is, the statistic we’re measuring will be
a random variable, with its own distribution function, called the sampling
distribution.
98
8.3.2
CHAPTER 8. LIMIT THEOREMS AND STATISTICS
Confidence intervals
In many cases we are able to design studies with a sample size of our choosing.
The desire for accuracy may be traded off against the cost or inconvenience
of obtaining a large sample, but we will assume that those are secondary,
and a large sample size is used.
It is often the case that merely reporting the sample mean is not adequate
information. For example, suppose we sample the heights of N = 1000 adult
women and find a sample mean of 66 inches. We know that the actual mean
may be different, but how different?
To address this question, we compute a confidence interval. We have
the mean µ = 66. Assuming that N is large and the distribution of Xn is
roughly normal, we may also estimate the (usually unknown) variance σ 2 by
the sample variance σ 2 . Suppose the sample variance in this experiment is
σ 2 = 4.
Construct
the interval
√
√ which is centered at the sample mean µ and√extends
1.96σ/ N = 1.96∗2/ 1000 on either side of µ. The number 1.96σ/ N represents a distance of 1.96 standard deviations for the normal approximation
of the sample mean. For a normally distributed random variable this interval
corresponds to a probability of .95. In our example the interval is
√
√
[66 − 1.96 ∗ 2/ 1000, 66 + 1.96 ∗ 2/ 1000] ≃ [66 − .133, 66 + .133].
We say we are 95% confident that the actual mean height of the population is
in the interval [66−.133, 66+.133]. If we repeatedly perform the experiment,
the actual mean will lie in our 95% confidence interval 95% of the time.
It is not hard to change the 95% confidence interval to a different percentage. To treat the problem in general, let α be a positive number. We
will look for the (1 − α) ∗ 100% confidence interval. In the original 95% case
we have α = .05.
Now suppose that Z is a standard normal random variable. Using a table
we find the value zα/2 such that
P (−zα/2 < Z < zα/2 ) = 1 − α.
In the original example z.025 = 1.96 ∼ 2, since one finds a normally distributed random variable within roughly two standard deviations of the mean
95% of the time.
Here is a summary. Suppose X1 , . . . , XN is a large (roughly N > 30)
random sample, so that µ is approximately normal. Assume that Xn has
99
8.4. EXERCISES
mean µ and standard deviation σ. Then a level (1 − α) ∗ 100% confidence
interval for µ is
X ± zα/2 σµ ,
where
√
σµ = σ/ N .
When the value of σ is unknown, it can be replaced with the sample standard
deviation σ.
8.4
Exercises
A table for calculations with standard normal random variables is on page
499 of Grinstead and Snell.
1. Suppose Xn is the height of the n-th person in a random sample of
adult U.S. males. Assume that Xn has mean of 68.5 inches and a standard
deviation 2.75 inches.
(a) What are the expected value and standard deviation of
µ=
N
1 X
Xn ?
N n=1
(b) How big should N be so that the standard deviation of µ is smaller
than 0.1 inches?
2. If X is the random variable which is the roll of a single die, then X has
a discrete uniform distribution with 6 values. The Central Limit Theorem
predicts that if we average the throws of several dice, the distribution will
look more Gaussian. Verify this by looking at the average of three dice rolls,
1
µ = [X1 + X2 + X3 ].
3
This should be done in several step.
(a) First count the number of ways to have two dice sum to each of the
possible outcomes, 2 − 12. Now extend the count to the sums of three dice.
(b) Replace the sums by the average of three dice rolls, and compute the
probabilities for each outcome.
(c) Plot the distribution (density) for the three cases, that is the throw
of one die, the average of two dice, and the average of three dice.
100
CHAPTER 8. LIMIT THEOREMS AND STATISTICS
3. A manufacturer produces electronics components with a mean lifetime
of one year, with a standard deviation of one month. Assume that the
component lifetime distribution is approximately normal.
(a) Find the probability that a component will fail in fewer than 10
months.
(b) Find the probability that the average lifetime of 10 components will
be more than 13 months.
4. We wish to find the average height of adult U.S. females by averaging
the heights of 100 women. We assume that, like males, the standard deviation
for an individual measurement is 2.75 inches. If the average measured height
is h, what is the 95% confidence interval for this experiment?
Download