Central Limit Theorem STP 231 Alejandro Vidales Aller March 13, 2024 Outline Normality Distribution of the sample mean X̄ of a large sample without normality Summary Appendix 1 / 35 Normality Random variable Another way to think of a random variable is as one random draw from its distribution. Or one draw from its sampling distribution Sampling distribution (or underline distribution) is the distribution where we collect the samples from Random sample iid A set of random variables {X1 , X2 , . . . ,Xn } where • Each Xi is independent from each other • Each Xi came from the same distribution with the same parameters In other words, the random sample is independent, identically distributed 2 / 35 Independent Identical Distributed Independent Identical Distributed X1 , X2 , . . . , Xn are independent and have the same distribution, with same parameters. Example A random sample of size n from a normal population distribution with mean µ and variance σ 2 can be described as iid X1 , X2 , ..., Xn ∼ N(µ, σ 2 ) So all Xi are normal distributed with the same mean µ and variance σ 2 . 3 / 35 Motivation Suppose we have a bunch of researchers interested in the same study but who all gather independent simple random samples from the same NORMAL population (may be different sample size). =⇒ Each sample mean x̄i is going to be different =⇒ Each sample mean X̄i is a random variable =⇒ X̄i must have a distribution! Sampling distribution of the sample mean under NORMALITY The distribution of X̄ under normality 4 / 35 Process • For X̄1 · X1 , X2 , . . . ,Xm are random observations from the same normal distribution with same parameters. And using parameters we can find the mean µ and variance σ 2 . · X̄1 is random because we can get a different value for a different sample. · Then, X̄1 has its own distribution with a mean and variance dependent from the distribution where it was collected (in this case from normal) • For X̄2 · Same story (but could be different sample size) • • • • For X̄n Same story (but could be different sample size) 5 / 35 σ 2 is known Distribution of X̄ iid X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), then σ2 X̄ ∼ N µ, n Where σ 2 is the same fixed variance at any Xi µ is the same fix population mean of any Xi n is sample size. We could also standardized this (next slide) 6 / 35 σ 2 is known Distribution of X̄ (standardized) iid X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), then X̄ − µ √ ∼ N(0, 1) σ/ n Remark In practice (and in this class), • µ is unknown but fixed. • σ 2 is unknown and fixed. NOTE: From now on we are going to standardized! 7 / 35 Example of X̄ is exactly normal Suppose the speed of a professional hockey player on ice is normally distributed with mean 28 mph and standard deviation of 3 mph. a) Find the probability that a hockey player has a speed between 27 and 29 mph. b) What is the sampling distribution of the sample mean for n=25? c) What is the probability that the mean of a sample of size 25 is between 27 and 29? d) Suppose the underlying distribution is not actually normal. How does this affect the answer to part c? 8 / 35 9 / 35 t-distribution t-distribution iid If X1 , X2 , ..., Xn ∼ N(µ, σ 2 ), then T= X̄ − µ √S n ∼ tn−1 That is, T has a t-distribution with ν = n − 1 degrees of freedom (df) Where σ 2 is fixed but unknown variance at any Xi µ is the population mean of any Xi n is sample size. 10 / 35 Remark about estimating σ 2 • We use the sample variance S 2 as an estimator for σ 2 • Note that, just like the sample mean X̄ is random, the sample variance S 2 is random. Remarks about t-distribution • It is centered at zero. • The number of degrees of freedom is n minus the number of mean estimates of the sampling distribution. • If Z ∼ N(0, 1), Y ∼ χ2k , and Z is independent of Y , then Z T=q Y k ∼ tk 11 / 35 Recall The sample variance is n S2 = 1 X (Xi − X̄ ) n−1 i=1 Jargon The name is: t (or Student’s t), and it is a distribution, so it is: t-distribution or Student’s t-distribution 12 / 35 Distribution of the sample mean X̄ of a large sample without normality Motivation Suppose we have a bunch of researchers interested in the same study but who all gather independent simple random samples from the same population (may be different sample size). =⇒ Each sample mean x̄i is going to be different =⇒ Each sample mean X̄i is a random variable =⇒ X̄i must have a distribution! Sampling distribution of the sample mean The distribution of X̄ 13 / 35 Process • For X̄1 · X1 , X2 , . . . ,Xm are random observations from the same distribution with same parameters. And using parameters we can find the mean µ and variance σ 2 . · X̄1 is random because we can get a different value for a different sample. · Then, X̄1 has its own distribution with a mean and variance dependent from the distribution where it was collected. • For X̄2 · Same story (but could be different sample size) • • • • For X̄n Same story (but could be different sample size) 14 / 35 Remark Check the process Link Remark • X̄1 , X̄2 , . . . ,X̄n are random sample means that were collected from the same distribution with same parameters (or iid). • However, even though it says ”were collected from,” that does NOT mean that the distribution of the sample means has the same distribution as where it was collected. As we see, it looks normal in all the cases even though the underline distribution is e.g. right skew. Again, here, the population distribution is not necessarily normal. 15 / 35 Central Limit Theorem Central limit theorem For a large sample (n ≥ 30), the distribution of X̄ is approximated normal with 2 mean µ and variance σn σ2 X̄ ≈ N µ, n OR X̄ − µ σ √ n → N(0, 1) Where µ is the population mean of any Xi σ 2 is the population variance of any Xi n is sample size. 16 / 35 Remarks • The central limit theorem works on large samples because it converges to a normal distribution. • It does not matter from which distribution Xi comes from (not necessarily normal), as long as all observations Xi comes from the same distribution, with same parameters, and are independent, the CLT works (CLT works with iid). • Moreover, since all Xi have the same parameter, they all have the same population mean µ and population variance σ 2 Coincidence The normal distribution have parameters µ and σ 2 , but that does not mean that the CLT works only when all Xi are normal distributed. The CLT works with any distribution. 17 / 35 Remarks Remark • The CLT says that X̄ is approximately normal, and NOT exactly normal all the time. • The big advantage of the CLT is that we can use it in any distribution and the sample mean is approximately normal if sample is large. Example • If each Xi is normal, then the distribution of X̄ is exactly normal. • If each Xi is exponential distribution with parameter λ, then the distribution of X̄ is exactly Gamma(n, λ/n). But approximates to normal if sample is large. 18 / 35 Large and small samples Large sample Rule of thumb: A sample is consider large if n ≥ 30 But other textbooks would say n ≥ 40 or n ≥ 25 Remark However, if the sample size is small, then the CLT is not appropriate. =⇒ We can only deal with the Normal distribution as the underline distribution. 19 / 35 CLT alternative form X1 , X2 , . . . ,Xn are independent, identical distributed, Then the distribution of Pn 2 i Xi is approximated normal with mean nµ and variance nσ n X Xi ≈ N(nµ, nσ 2 ) i=1 OR Pn − nµ i=1 X √i σ n → N(0, 1) Remark Again, if each Xi iid is normal, then the sum of Xi is exactly normal (not approximately normal). On the other hand, the sum of iid not normal distributions (e.g. Binomial, Poisson, etc.) is approximated normal with large sample. 20 / 35 Example of X̄ is approximately normal iid X1 , X2 , . . . ,X100 ∼ Binom( 1 , .7 ) a) Find the approximately distribution of the sample mean b) What is the probability that the average is less than .75? 21 / 35 22 / 35 More jargon • The mean µ of the estimator X̄ is called mean of x̄ 2 • The variance σn of the estimator X̄ is called variance of X̄ • But the standard deviation √σn of the estimator X̄ is called standard error of X̄ Standard error σ SE(X̄ ) = √ n where σ is standard deviation of the sampling distribution X n is sample size. 23 / 35 CLT when σ 2 is unknown CLT when σ 2 is unknown For a large sample (n ≥ 30), the distribution of X̄ is approximated normal with mean µ and variance S 2 /n X̄ − µ √ → N(0, 1) S/ n Remarks • The more degrees of freedom, the lower the variance and better approximates to standard normal Normal(µ = 0,σ 2 = 1). Lets check it Link Recall: Under normality, it is exactly a t-distribution. 24 / 35 Summary In summary Small-or-large sample • σ is known Z= X̄ − µ σ √ n Large sample (n ≥ 30) • σ is known ∼ N(0, 1) Z= X̄ − µ σ √ n ≈ N(0, 1) • σ is unknown T= X̄ − µ √S n • σ is unknown ∼ tn−1 Z= X̄ − µ √S n ≈ N(0, 1) Normality assumption 25 / 35 Things that we need to check There is only two things that we need to worry when we do a problem 1. Sample size: big or small? =⇒ Check for normality if sample is small. 2. Do we know σ 2 ? =⇒ Estimate σ 2 using sample variance S 2 if σ 2 is not given. Converging As the sample size n increases, • X̄ → µ • S 2 → σ2 • √Sn → 0 which means that the variability of our estimator X̄ decreases, so it is more accurate in estimating µ 26 / 35 Random and fix variables • The sample mean X̄ is a random variable • The sample variance S 2 is a random variables. • The population mean µ is a fix (mostly unknown) variable. • The population variance σ 2 is a fix (mostly unknown) variable. Theorem The sample mean X̄ and the sample variance S 2 are independent random variables when the underline distribution is normal. ⋆ We can have two samples with same sample mean X̄ but different sample variance S 2 ⋆ We can have two samples with different sample mean X̄ but with the same sample variance S 2 27 / 35 Appendix Why do we use X̄ to estimate µ? And S 2 to estimate σ 2 ? Expectation EX = n X xi · P(X = xi ) i=1 Expectation with equally likely values n EX = 1X xi = x̄ n i=1 28 / 35 Variance VarX = E[(X − µ)2 ] = n X (xi − µ)2 · P(X = xi ) i=1 Variance with equally likely values n VarX = 1X (xi − µ)2 n i=1 But if µ is unknown, n 1X (xi − x̄)2 n But this over estimate σ 2 i=1 n s2 = 1 X (xi − x̄)2 n−1 we lose 1 by estimating µ i=1 29 / 35 Can we come up with a better estimators? For normal distribution (or under normality), from the method of maximum likelihood (MLE), x̄ and s2 are the best estimators for µ and σ 2 . Best because they have minimum variance and X̄ − E X̄ = 0 (or S 2 − E(S 2 ) = 0) among all estimators. (STP 427, 502, for more details) 30 / 35 Sum of normal distributions is normal X1 + X2 + ... + Xn = n X Xi ∼ N(µ1 + µ2 + ... + µn , σ12 + σ22 + ... + σn2 ) i=1 If they are iid, let µ and σ 2 be any µ 2 i and σi , in other words, µ = µ1 = µ2 = ... = µn σ 2 = σ12 = σ22 = ... = σn2 So n X Xi ∼ N(nµ, nσ 2 ) i=1 31 / 35 Divide sum of normal distributions by n n X 1 n i=1 n X i=1 n 1X n Xi ∼ N(nµ, nσ 2 ) Xi ∼ N( nµ nσ 2 ) , n n2 Xi ∼ N(µ, σ2 ) n X̄ ∼ N(µ, σ2 ) n i=1 That is, if each Xi are iid normal, X̄ is exactly normal, not approximated normal. If all Xi are iid but not normal, then change the ∼ to ≈ (CLT) 32 / 35 Another way: Sum of iid normal distributions is normal but from sample mean X̄ ∼ N(µ, X̄ − µ σ √ n √ σ2 ) n ∼ N(0, 1) √ Pn 1 Pn n 1 Pn n i=1 Xi − n µ n n ( i=1 Xi − nµ) i=1 Xi − nµ = = σ √ √1 σ n σ n n multiply by √nn Pn i=1 Xi − nµ nσ n X ∼ N(0, 1) Xi ∼ N(nµ, nσ 2 ) i=1 33 / 35 X1 , X2 , . . . ,Xn are independent and have the same distribution, with same parameters. So all Xi have the same mean µ and variance σ 2 . Then VarXi σ2 X̄ ≈ N E X̄ = EXi = µ, Var X̄ = = n n Pn Xi E(X̄ ) = E( i=1 ) n n X 1 = E( Xi ) n i=1 Pn i=1 Xi Var(X̄ ) = Var( = 1 Var( n2 n n X ) Xi ) i=1 n E(Xi ) 1 X = 2 Var(Xi ) n 1 = n · E(X ) n =µ 1 = 2 n · Var(X ) n σ2 = n = n 1X n i=1 i=1 34 / 35 Independent An important remark of the t-distribution is that we need that X̄ is independent of S 2 and the data is normal. It turns out that that just using the normal assumption is enough because normality make X̄ is independent of S 2 (Take a mathematical statistics course). 35 / 35