Chapter 8 Limit theorems and Statistics In what follows we assume that we have a sequence Xn of independent, identically distribution random variables (independent trials process), with finite mean µ and finite variance σ 2 . We will be interested in the behavior of the sums or averages S N = X1 + · · · + X N , AN = SN /N. Sometimes such sums or averages arise as we try to improve the behavior of a random variable, like stock market daily closings. In other cases they can sneak into natural problems. Suppose the intervals between bus arrivals are independent, and exponentially distributed with a common mean and variance. Then the time of arrival of bus N is TN = (T1 − 0) + (T2 − T1 ) + · · · + (TN − TN −1 ), so we have a sum of iid random variables. 8.1 Law of Large Numbers We begin by considering Chebyshev’s inequality in case X has a continuous density. This is a helpful theoretical tool because it is so general, but the estimate it gives is not very sharp in many particular cases. Theorem 8.1.1. (Chebyshev’s inequality) Suppose X has a density function f (x), with finite mean µ and finite variance σ 2 . Then for any number ǫ > 0, P (|X − µ| ≥ ǫ) ≤ 93 σ2 . ǫ2 94 CHAPTER 8. LIMIT THEOREMS AND STATISTICS Proof. P (|X − µ| ≥ ǫ) = Z Z µ−ǫ f (x) dx + −∞ Z ∞ f (x) dx µ+ǫ ∞ |x − µ| 2 1 σ2 2 ≤ [ ] f (x) dx = 2 E([X − µ] ) = 2 . ǫ ǫ ǫ −∞ The result is the same if X is a discrete random variable. Sometimes one sees a rephrasing of this theorem, measuring the distance from the mean as multiples of σ. Corollary 8.1.2. Suppose X has a density function f (x), with finite mean µ and finite variance σ 2 . Then for any number k > 0, P (|X − µ| ≥ kσ) ≤ 1 . k2 For common examples like the exponential density or the Gaussian density, the rate of decay is exponential. Consider the standard normal, with µ = 0, σ = 1. If k ≥ 2, then Z 1 exp(−x2 /2) dx P (X ≥ k) = √ 2π |x|≥k 2 ≤√ 2π Z √ √ Z 2 2 exp(−x) dx = √ exp(−k), exp(−x2 /2) dx ≤ √ π x≥k π x≥k since |x| ≥ 2 implies |x| ≤ x2 /2. We can use Chebyshev’s inequality to show that with high probability AN is close to its mean. Theorem 8.1.3. (Weak Law of Large Numbers) Let X1 , X2 , . . . be independent, identically distribution random variables (independent trials process), with finite mean µ and finite variance σ 2 . For any ǫ > 0 lim P (|AN − µ| < ǫ) = 1. N →∞ 95 8.2. CENTRAL LIMIT THEOREM Proof. Independence gives V (AN ) = σ 2 /N, E(AN ) = µ. By Chebyshev’s inequality, P (|AN − µ| ≥ ǫ) ≤ σ2 . Nǫ2 If ǫ is fixed, lim P (|AN − µ| ≥ ǫ) = 0, N →∞ which is equivalent to the stated result. 8.2 Central Limit Theorem The Law of Large Numbers says that the density (or distribution) function of AN becomes concentrated around the mean as N → ∞. A more detailed description of the shape of the density is provided by the next result. This striking result says that whatever the distribution of Xn , if we average a large enough sample, the result will look Gaussian. To more easily understand the shape of the density, we replace the sum SN or average AN by the standardized random variable ∗ SN = AN − µ SN − Nµ √ √ = Nσ σ/ N ∗ ∗ ∗ The random variable SN has mean E(SN ) = 0 and variance V (SN ) = 1. As we saw earlier, we want to recenter and rescale the random variable to have mean 0 and variance 1 so our comparison makes sense. Theorem 8.2.1. (Central Limit Theorem) The distribution of ∗ SN = SN − Nµ √ Nσ converges to the standard normal distribution n(z; 0, 1) with mean 0 and variance 1 as N → ∞, in the sense that for all a < b, Z b 1 2 ∗ lim P (a < SN < b) = √ e−x /2 dx. N →∞ 2π a 96 CHAPTER 8. LIMIT THEOREMS AND STATISTICS Of course for applications it would be helpful to have a better picture of how big N should be. A common rule of thumb is that the normal approximation for the sample distribution of the mean will be accurate if N ≥ 30. This rule hides a number of issues. Usually the rule is ok if the density function of Xn is roughly normal. In fact, if Xn are normal, but not necessarily identically distributed, then the sample means will be exactly normal, and we only have to calculate the mean and variance. On the other hand, if the density of Xn is quite far from normal, then taking N = 30 may be inadequate. Example Suppose patients arrive at the hospital emergency room on average every ten minutes. We use the exponential model with independent intervals between arrivals. Let’s see when patient 100 is likely to arrive. In this case we have and µ = E(X) = E(Tn − Tn−1 ) = 1/λ = 10, λ = 0.1, σ = 1/λ = 10. Also TN = SN . Thus we have T100 − 10 ∗ 100 1 P (a < < b) = P (a < < b) ≃ √ 10 ∗ 10 2π If a = −1, b = 1, then Z 1 1 2 √ e−x /2 dx ≃ .7 2π −1 From T100 − 10 ∗ 100 < 1) ≃ .7 P (−1 < 10 ∗ 10 we conclude that with probability about .7 ∗ SN 900 < T100 < 1100. If a = −3, b = 3, then Z 3 1 2 √ e−x /2 dx ≃ 1 − .0026 = .9974. 2π −3 So with very high probability 700 < T100 < 1300. Z a b e−x 2 /2 dx. 97 8.3. LARGE SAMPLE STATISTICS 8.3 8.3.1 Large sample statistics Terminology Roughly speaking, in statistics we try to estimate characteristics of a population, which could be a population of people, manufactured products, bacteria, etc. In many cases the population is large, and we want to understand its characteristics by examining a subset of the population, called a sample. The characteristics we wish to measure are real numbers (e.g. height, probability of defects, prevalence of a gene type) which are considered random variables. That is, we try to draw inferences about a population by examining a set of random variables X1 , . . . , XN . For our discussions these random variables are assumed to be independent, with the same probability distribution (identically distributed). Such a collection of random variables is called a random sample of size N from the population. Any function of the random variables in a random sample is called a statistic. Our attention will be focused on two statistics, the sample mean N 1 X Xn , µ= N n=1 and the sample variance 1 X (Xn − µ)2 . σ = N − 1 n=1 N 2 The sample standard deviation is σ ≥ 0. Let’s consider trying to estimate the average height of adult males in the U.S. population. Since it would be an enormous undertaking to measure 150 million people, we’ll settle for a sample, which might have, for example N = 10, 100, 1000, 104. Of course many outcomes are possible. We might accidentally sample professional basketball players, or have a sample overly representing short people. As we take different samples of size N we will see variation in the results. That is, the statistic we’re measuring will be a random variable, with its own distribution function, called the sampling distribution. 98 8.3.2 CHAPTER 8. LIMIT THEOREMS AND STATISTICS Confidence intervals In many cases we are able to design studies with a sample size of our choosing. The desire for accuracy may be traded off against the cost or inconvenience of obtaining a large sample, but we will assume that those are secondary, and a large sample size is used. It is often the case that merely reporting the sample mean is not adequate information. For example, suppose we sample the heights of N = 1000 adult women and find a sample mean of 66 inches. We know that the actual mean may be different, but how different? To address this question, we compute a confidence interval. We have the mean µ = 66. Assuming that N is large and the distribution of Xn is roughly normal, we may also estimate the (usually unknown) variance σ 2 by the sample variance σ 2 . Suppose the sample variance in this experiment is σ 2 = 4. Construct the interval √ √ which is centered at the sample mean µ and√extends 1.96σ/ N = 1.96∗2/ 1000 on either side of µ. The number 1.96σ/ N represents a distance of 1.96 standard deviations for the normal approximation of the sample mean. For a normally distributed random variable this interval corresponds to a probability of .95. In our example the interval is √ √ [66 − 1.96 ∗ 2/ 1000, 66 + 1.96 ∗ 2/ 1000] ≃ [66 − .133, 66 + .133]. We say we are 95% confident that the actual mean height of the population is in the interval [66−.133, 66+.133]. If we repeatedly perform the experiment, the actual mean will lie in our 95% confidence interval 95% of the time. It is not hard to change the 95% confidence interval to a different percentage. To treat the problem in general, let α be a positive number. We will look for the (1 − α) ∗ 100% confidence interval. In the original 95% case we have α = .05. Now suppose that Z is a standard normal random variable. Using a table we find the value zα/2 such that P (−zα/2 < Z < zα/2 ) = 1 − α. In the original example z.025 = 1.96 ∼ 2, since one finds a normally distributed random variable within roughly two standard deviations of the mean 95% of the time. Here is a summary. Suppose X1 , . . . , XN is a large (roughly N > 30) random sample, so that µ is approximately normal. Assume that Xn has 99 8.4. EXERCISES mean µ and standard deviation σ. Then a level (1 − α) ∗ 100% confidence interval for µ is X ± zα/2 σµ , where √ σµ = σ/ N . When the value of σ is unknown, it can be replaced with the sample standard deviation σ. 8.4 Exercises A table for calculations with standard normal random variables is on page 499 of Grinstead and Snell. 1. Suppose Xn is the height of the n-th person in a random sample of adult U.S. males. Assume that Xn has mean of 68.5 inches and a standard deviation 2.75 inches. (a) What are the expected value and standard deviation of µ= N 1 X Xn ? N n=1 (b) How big should N be so that the standard deviation of µ is smaller than 0.1 inches? 2. If X is the random variable which is the roll of a single die, then X has a discrete uniform distribution with 6 values. The Central Limit Theorem predicts that if we average the throws of several dice, the distribution will look more Gaussian. Verify this by looking at the average of three dice rolls, 1 µ = [X1 + X2 + X3 ]. 3 This should be done in several step. (a) First count the number of ways to have two dice sum to each of the possible outcomes, 2 − 12. Now extend the count to the sums of three dice. (b) Replace the sums by the average of three dice rolls, and compute the probabilities for each outcome. (c) Plot the distribution (density) for the three cases, that is the throw of one die, the average of two dice, and the average of three dice. 100 CHAPTER 8. LIMIT THEOREMS AND STATISTICS 3. A manufacturer produces electronics components with a mean lifetime of one year, with a standard deviation of one month. Assume that the component lifetime distribution is approximately normal. (a) Find the probability that a component will fail in fewer than 10 months. (b) Find the probability that the average lifetime of 10 components will be more than 13 months. 4. We wish to find the average height of adult U.S. females by averaging the heights of 100 women. We assume that, like males, the standard deviation for an individual measurement is 2.75 inches. If the average measured height is h, what is the 95% confidence interval for this experiment?