T) (CL rem Theo

Stat 330 (Spring 2015): slide set 15 2 E[Xi] = (2 + 5)/2 = 3.5 min, V ar(Xi) = (5 − 2)2/12 = 9/12 = .75 min.2 Then X1, X2, . . . X30 ∼ iid U (2, 5) and it follows that Let Xi = time I wait for the bus on day i. (a) How much time do I expect to spent waiting for the bus in one month (30 days)? Example 1: The time I spend waiting for the bus in a day has a Uniform distribution between 2 minutes and 5 minutes. Some Example Applications of the Central Limit Theorem Recall: Φ is the cdf of the standard normal distribution i.e., Z ∼ N (0, 1) Use of CLT: Calculate probabilities associated with averages or sums of iid random variable. these approximate distributional statements. For e.g., b−μ a−μ √ √ P (a < X n < b) ≈ P (a < X < b)= Φ σ/ − Φ n σ/ n Central Limit Theorem (CLT) (cont’d) Last update: February 3, 2015 Stat 330 (Spring 2015) Slide set 15 Stat 330 (Spring 2015): slide set 15 X n∼N ˙ (μ, σ 2/n) Sn∼N ˙ (nμ, nσ 2) 1 Stat 330 (Spring 2015): slide set 15 (X1 + X2 + · · · + Xn) n Sn = X 1 + X 2 + · · · + X n Xn = i = 1, . . . , n CLT Examples (cont’d) Sample Sum: Sample Average: V ar[Xi] = σ 2, i.e. T ∼N ˙ (105, 22.5) ≈ 1 − P (Z ≤ (120 − 105) √ ) by CLT 22.5 = 1 − Φ(3.16) = 1 − .9992112 = .00079 P (T > 120) = 1 − P (T ≤ 120) 3 We need the probability that T is greater than 120 minutes, i.e., P (T > 120) From the CLT, we have T ∼N ˙ (30 × 3.5, 30 × 0.75) (b) Approximately, find the probability that I spend more than 2 hours waiting for a bus in a month. Thus the expected waiting time for a month E[T ] = 30 × 3.5 = 105 min. T is the sum of 30 iid random variables: T = X1 + X2 + . . . + X30 30 E[T ] = i=1 Xi = 30μ where μ = E[Xi] = 3.5. Let T ≡ random variable representing the total waiting time for a month. For large n Define E[Xi] = μ Suppose X1, X2, . . . , Xn are iid random variables with Main Idea: Sums and averages of random variables from arbitrary distributions have approximate normal distributions for sufficiently large sample sizes. Central Limit Theorem (CLT) Stat 330 (Spring 2015): slide set 15 P (|X n − d| > .5) = P (X n − d > .5) + P (X n − d < −.5) .5 −.5 Xn − d Xn − d +P = P > < 4/n 4/n 4/n 4/n −.5 .5 +P Z < ≈ P Z> 4/n 4/n Thus 6 We use the CLT to approximate each of the probabilities on the right. From the CLT we have that 7 • Thus the astronomer must take at least 62 measurements to have the accuracy specified above. √ • We need to find an integer n so that 2(1 − Φ( n/4)) is just less than or equal to .05. √ • We will set 2(1 − Φ( n∗/4)) = .05, solve for n∗ and take the required number of measurements to be the n∗. √ √ • Observe that 2(1 − Φ( n/4)) = .05 implies that Φ( n/4)) = .975. √ • Using the Normal cdf tables, this gives n/4 = 1.96; thus n∗ = 61.47. √ √ = 1 − Φ( n/4) + Φ(− n/4) √ = 2(1 − Φ( n/4)) CLT Big Example (cont’d) Stat 330 (Spring 2015): slide set 15 P (|X n − d| > .5) ≤ .05 Stat 330 (Spring 2015): slide set 15 CLT Big Example (cont’d) X n∼N ˙ (d, 4/n) (X1 +X2 +···+Xn ) n We want to find the number of measurements n so that The estimate of d is X n = X1, X2, . . . Xn ∼ iid with E[Xi] = d and V ar[Xi] = 4 Let Xi be the ith measurement. The astronomer assumes that 5 P (|X n − d| > .5) = P (X n − d > .5) + P (X n − d < −.5) We know that Stat 330 (Spring 2015): slide set 15 An astronomer wants to measure the distance, d, from the observatory to a star. Due to the variation of atmospheric conditions and imperfections in the measurement method, a single measurement will not produce the exact distance d. The astronomer takes n measurements of the distance and uses the sample average to estimate the true distance. From past records of these measurements the astronomer knows the variance of a single measurement is 4 parsec2. How many measurement should the astronomer make so that the chance that his estimate differs by d by more than .5 parsecs is at most .05? CLT Big Example 4 Since this probability is very high, the available disk space is very likely to be sufficient. P (sufficient space) = P (Sn ≤ 330)) Sn − nμ 330 − (300)(1) √ √ ) ≤ = P σ n 0.5 300 ≈ Φ(3.46) = .9997 We have n = 300, μ = 1 and σ = 0.5. The number of images n is large, so the CLT applies. Then Example 4.13 (Allocation of Disk Space) A disk has free space of 330 megabytes. Is it likely to be sufficient for 300 independent images, if each image has expected size of 1 megabyte with a standard deviation of 0.5 megabytes? CLT Example from Baron Stat 330 (Spring 2015): slide set 15 8 When either of np or n(1 − p) are < 20, a continuity correction is needed (see Baron p.94). Use this approximation only when np and n(1 − p) are both > 5; the approximation is pretty good when np and n(1 − p) are both > 20. Applying the CLT result for Sn, we have that Y ∼N ˙ (nμ, nσ 2) where μ = p 2 and σ = p(1 − p). That is, Y ∼N ˙ (np, np(1 − p)). Write Y as the sum of n iid Bernoulli variables each with μ = E(Xi) = p and σ 2 = V ar(Xi) = p(1 − p): Y = X1 + X2 + . . . + Xn Let Y be a variable with a Bn,p distribution. We know, that Y is the number of successes in n independent Bernoulli experiments with P (success) = p. For large n, the binomial distribution Bn,p is approximately normal Nnp,np(1−p). Why? Normal approximation to the Binomial

T) (CL rem Theo

Related documents

Products

Support

T) (CL rem Theo

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib