Uploaded by Manling Huang

Chapter 5 CLT-2

advertisement
Central Limit Theorem
STP 231
Alejandro Vidales Aller
March 13, 2024
Outline
Normality
Distribution of the sample mean X̄ of a large sample without normality
Summary
Appendix
1 / 35
Normality
Random variable
Another way to think of a random variable is as one random draw from its
distribution. Or one draw from its sampling distribution
Sampling distribution
(or underline distribution) is the distribution where we collect the samples from
Random sample iid
A set of random variables {X1 , X2 , . . . ,Xn } where
• Each Xi is independent from each other
• Each Xi came from the same distribution with the same parameters
In other words, the random sample is independent, identically distributed
2 / 35
Independent Identical Distributed
Independent Identical Distributed
X1 , X2 , . . . , Xn are independent and have the same distribution, with same
parameters.
Example
A random sample of size n from a normal population distribution with mean µ
and variance σ 2 can be described as
iid
X1 , X2 , ..., Xn ∼ N(µ, σ 2 )
So all Xi are normal distributed with the same mean µ and variance σ 2 .
3 / 35
Motivation
Suppose we have a bunch of researchers interested in the same study but who all
gather independent simple random samples from the same NORMAL population (may
be different sample size).
=⇒ Each sample mean x̄i is going to be different
=⇒ Each sample mean X̄i is a random variable
=⇒ X̄i must have a distribution!
Sampling distribution of the sample mean under NORMALITY
The distribution of X̄ under normality
4 / 35
Process
• For X̄1
· X1 , X2 , . . . ,Xm are random observations from the same normal distribution
with same parameters. And using parameters we can find the mean µ and
variance σ 2 .
· X̄1 is random because we can get a different value for a different sample.
· Then, X̄1 has its own distribution with a mean and variance dependent from
the distribution where it was collected (in this case from normal)
• For X̄2
· Same story (but could be different sample size)
•
•
•
• For X̄n
Same story (but could be different sample size)
5 / 35
σ 2 is known
Distribution of X̄
iid
X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), then
σ2
X̄ ∼ N µ,
n
Where
σ 2 is the same fixed variance at any Xi
µ is the same fix population mean of any Xi
n is sample size.
We could also standardized this (next slide)
6 / 35
σ 2 is known
Distribution of X̄ (standardized)
iid
X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), then
X̄ − µ
√ ∼ N(0, 1)
σ/ n
Remark
In practice (and in this class),
• µ is unknown but fixed.
• σ 2 is unknown and fixed.
NOTE: From now on we are going to standardized!
7 / 35
Example of X̄ is exactly normal
Suppose the speed of a professional hockey player on ice is normally distributed with
mean 28 mph and standard deviation of 3 mph.
a) Find the probability that a hockey player has a speed between 27 and 29 mph.
b) What is the sampling distribution of the sample mean for n=25?
c) What is the probability that the mean of a sample of size 25 is between 27 and 29?
d) Suppose the underlying distribution is not actually normal. How does this affect
the answer to part c?
8 / 35
9 / 35
t-distribution
t-distribution
iid
If X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), then
T=
X̄ − µ
√S
n
∼ tn−1
That is, T has a t-distribution with ν = n − 1 degrees of freedom (df)
Where
σ 2 is fixed but unknown variance at any Xi
µ is the population mean of any Xi
n is sample size.
10 / 35
Remark about estimating σ 2
• We use the sample variance S 2 as an estimator for σ 2
• Note that, just like the sample mean X̄ is random, the sample variance S 2 is
random.
Remarks about t-distribution
• It is centered at zero.
• The number of degrees of freedom is n minus the number of mean
estimates of the sampling distribution.
• If Z ∼ N(0, 1), Y ∼ χ2k , and Z is independent of Y , then
Z
T=q
Y
k
∼ tk
11 / 35
Recall
The sample variance is
n
S2 =
1 X
(Xi − X̄ )
n−1
i=1
Jargon
The name is: t (or Student’s t), and it is a distribution, so it is: t-distribution or
Student’s t-distribution
12 / 35
Distribution of the sample mean X̄
of a large sample without normality
Motivation
Suppose we have a bunch of researchers interested in the same study but who all
gather independent simple random samples from the same population (may be
different sample size).
=⇒ Each sample mean x̄i is going to be different
=⇒ Each sample mean X̄i is a random variable
=⇒ X̄i must have a distribution!
Sampling distribution of the sample mean
The distribution of X̄
13 / 35
Process
• For X̄1
· X1 , X2 , . . . ,Xm are random observations from the same distribution with same
parameters. And using parameters we can find the mean µ and variance σ 2 .
· X̄1 is random because we can get a different value for a different sample.
· Then, X̄1 has its own distribution with a mean and variance dependent from
the distribution where it was collected.
• For X̄2
· Same story (but could be different sample size)
•
•
•
• For X̄n
Same story (but could be different sample size)
14 / 35
Remark
Check the process
Link
Remark
• X̄1 , X̄2 , . . . ,X̄n are random sample means that were collected from the
same distribution with same parameters (or iid).
• However, even though it says ”were collected from,” that does NOT mean
that the distribution of the sample means has the same distribution as
where it was collected. As we see, it looks normal in all the cases even
though the underline distribution is e.g. right skew.
Again, here, the population distribution is not necessarily normal.
15 / 35
Central Limit Theorem
Central limit theorem
For a large sample (n ≥ 30), the distribution of X̄ is approximated normal with
2
mean µ and variance σn
σ2
X̄ ≈ N µ,
n
OR
X̄ − µ
σ
√
n
→ N(0, 1)
Where
µ is the population mean of any Xi
σ 2 is the population variance of any Xi
n is sample size.
16 / 35
Remarks
• The central limit theorem works on large samples because it converges to a
normal distribution.
• It does not matter from which distribution Xi comes from (not necessarily
normal), as long as all observations Xi comes from the same distribution,
with same parameters, and are independent, the CLT works (CLT works
with iid).
• Moreover, since all Xi have the same parameter, they all have the same
population mean µ and population variance σ 2
Coincidence
The normal distribution have parameters µ and σ 2 , but that does not mean that
the CLT works only when all Xi are normal distributed. The CLT works with any
distribution.
17 / 35
Remarks
Remark
• The CLT says that X̄ is approximately normal, and NOT exactly normal all
the time.
• The big advantage of the CLT is that we can use it in any distribution and
the sample mean is approximately normal if sample is large.
Example
• If each Xi is normal, then the distribution of X̄ is exactly normal.
• If each Xi is exponential distribution with parameter λ, then the distribution
of X̄ is exactly Gamma(n, λ/n). But approximates to normal if sample is
large.
18 / 35
Large and small samples
Large sample
Rule of thumb: A sample is consider large if n ≥ 30
But other textbooks would say n ≥ 40 or n ≥ 25
Remark
However, if the sample size is small, then the CLT is not appropriate.
=⇒ We can only deal with the Normal distribution as the underline distribution.
19 / 35
CLT alternative form
X1 , X2 , . . . ,Xn are independent, identical distributed, Then the distribution of
Pn
2
i Xi is approximated normal with mean nµ and variance nσ
n
X
Xi ≈ N(nµ, nσ 2 )
i=1
OR
Pn
− nµ
i=1 X
√i
σ n
→ N(0, 1)
Remark
Again, if each Xi iid is normal, then the sum of Xi is exactly normal (not
approximately normal). On the other hand, the sum of iid not normal
distributions (e.g. Binomial, Poisson, etc.) is approximated normal with large
sample.
20 / 35
Example of X̄ is approximately normal
iid
X1 , X2 , . . . ,X100 ∼ Binom( 1 , .7 )
a) Find the approximately distribution of the sample mean
b) What is the probability that the average is less than .75?
21 / 35
22 / 35
More jargon
• The mean µ of the estimator X̄ is called mean of x̄
2
• The variance σn of the estimator X̄ is called variance of X̄
• But the standard deviation √σn of the estimator X̄ is called standard error of X̄
Standard error
σ
SE(X̄ ) = √
n
where
σ is standard deviation of the sampling distribution X
n is sample size.
23 / 35
CLT when σ 2 is unknown
CLT when σ 2 is unknown
For a large sample (n ≥ 30), the distribution of X̄ is approximated normal with
mean µ and variance S 2 /n
X̄ − µ
√ → N(0, 1)
S/ n
Remarks
• The more degrees of freedom, the lower the variance and better
approximates to standard normal Normal(µ = 0,σ 2 = 1).
Lets check it
Link
Recall: Under normality, it is exactly a t-distribution.
24 / 35
Summary
In summary
Small-or-large sample
• σ is known
Z=
X̄ − µ
σ
√
n
Large sample (n ≥ 30)
• σ is known
∼ N(0, 1)
Z=
X̄ − µ
σ
√
n
≈ N(0, 1)
• σ is unknown
T=
X̄ − µ
√S
n
• σ is unknown
∼ tn−1
Z=
X̄ − µ
√S
n
≈ N(0, 1)
Normality assumption
25 / 35
Things that we need to check
There is only two things that we need to worry when we do a problem
1. Sample size: big or small?
=⇒ Check for normality if sample is small.
2. Do we know σ 2 ?
=⇒ Estimate σ 2 using sample variance S 2 if σ 2 is not given.
Converging
As the sample size n increases,
• X̄ → µ
• S 2 → σ2
• √Sn → 0 which means that the variability of our estimator X̄ decreases, so
it is more accurate in estimating µ
26 / 35
Random and fix variables
• The sample mean X̄ is a random variable
• The sample variance S 2 is a random variables.
• The population mean µ is a fix (mostly unknown) variable.
• The population variance σ 2 is a fix (mostly unknown) variable.
Theorem
The sample mean X̄ and the sample variance S 2 are independent random
variables when the underline distribution is normal.
⋆ We can have two samples with same sample mean X̄ but different sample
variance S 2
⋆ We can have two samples with different sample mean X̄ but with the same sample
variance S 2
27 / 35
Appendix
Why do we use X̄ to estimate µ? And S 2 to estimate σ 2 ?
Expectation
EX =
n
X
xi · P(X = xi )
i=1
Expectation with equally likely values
n
EX =
1X
xi = x̄
n
i=1
28 / 35
Variance
VarX = E[(X − µ)2 ] =
n
X
(xi − µ)2 · P(X = xi )
i=1
Variance with equally likely values
n
VarX =
1X
(xi − µ)2
n
i=1
But if µ is unknown,
n
1X
(xi − x̄)2
n
But this over estimate σ 2
i=1
n
s2 =
1 X
(xi − x̄)2
n−1
we lose 1 by estimating µ
i=1
29 / 35
Can we come up with a better estimators?
For normal distribution (or under normality), from the method of maximum likelihood
(MLE), x̄ and s2 are the best estimators for µ and σ 2 .
Best because they have minimum variance and X̄ − E X̄ = 0 (or S 2 − E(S 2 ) = 0) among
all estimators. (STP 427, 502, for more details)
30 / 35
Sum of normal distributions is normal
X1 + X2 + ... + Xn =
n
X
Xi ∼ N(µ1 + µ2 + ... + µn , σ12 + σ22 + ... + σn2 )
i=1
If they are iid, let µ and σ 2 be any µ
2
i and σi , in other words,
µ = µ1 = µ2 = ... = µn
σ 2 = σ12 = σ22 = ... = σn2
So
n
X
Xi ∼ N(nµ, nσ 2 )
i=1
31 / 35
Divide sum of normal distributions by n
n
X
1
n
i=1
n
X
i=1
n
1X
n
Xi ∼ N(nµ, nσ 2 )
Xi ∼ N(
nµ nσ 2
)
,
n n2
Xi ∼ N(µ,
σ2
)
n
X̄ ∼ N(µ,
σ2
)
n
i=1
That is, if each Xi are iid normal, X̄ is exactly normal, not approximated normal.
If all Xi are iid but not normal, then change the ∼ to ≈ (CLT)
32 / 35
Another way: Sum of iid normal distributions is normal but from sample mean
X̄ ∼ N(µ,
X̄ − µ
σ
√
n
√
σ2
)
n
∼ N(0, 1)
√ Pn
1 Pn
n
1 Pn
n
i=1 Xi − n µ
n
n ( i=1 Xi − nµ)
i=1 Xi − nµ
=
=
σ
√
√1 σ
n
σ
n
n
multiply by √nn
Pn
i=1 Xi − nµ
nσ
n
X
∼ N(0, 1)
Xi ∼ N(nµ, nσ 2 )
i=1
33 / 35
X1 , X2 , . . . ,Xn are independent and have the same distribution, with same parameters.
So all Xi have the same mean µ and variance σ 2 . Then
VarXi
σ2
X̄ ≈ N E X̄ = EXi = µ, Var X̄ =
=
n
n
Pn
Xi
E(X̄ ) = E( i=1 )
n
n
X
1
= E(
Xi )
n
i=1
Pn
i=1 Xi
Var(X̄ ) = Var(
=
1
Var(
n2
n
n
X
)
Xi )
i=1
n
E(Xi )
1 X
= 2
Var(Xi )
n
1
= n · E(X )
n
=µ
1
= 2 n · Var(X )
n
σ2
=
n
=
n
1X
n
i=1
i=1
34 / 35
Independent
An important remark of the t-distribution is that we need that X̄ is independent of S 2
and the data is normal. It turns out that that just using the normal assumption is
enough because normality make X̄ is independent of S 2 (Take a mathematical statistics
course).
35 / 35
Download