Statistics 512 Notes 4

Statistics 512 Notes 4 Confidence Intervals Continued Role of Asymptotic (Large Sample) Approximations in Statistics: It is often difficult to find the finite sample sampling distribution of an estimator or statistic. Review of Limiting Distributions from Probability Types of Convergence: Let X 1 , , X n be a sequence of random variables and let X be another random variable. Let Fn denote the CDF of X n and let F denote the CDF of X . P 1. X n converges to X in probability, denoted X n  X if for every  >0, P(| X n  X |  )  0 as n   . D 2. X n converges to X in distribution, denoted X n  X if for every  >0, Fn (t )  F (t ) as n   at all t for which F is continuous. Weak Law of Large Numbers Let X 1 , , X n be a sequence of iid random variables having mean  and variance    . Let X n 2   n i 1 n Xi . Then P Xn   . Interpretation: The distribution of X n becomes more and more concentrated around  as n gets large. Proof: Using Chebyshev’s inequality, for every  >0 Var ( X n )  2 P(| X n   |  )   2 2  n which tends to 0 as n   . Central Limit Theorem Let X 1 , , X n be a sequence of iid random variables having 2 mean  and variance    . Then Xn   n( X n  ) D Zn   Z  Var ( X n ) where Z ~ N (0,1) . In other words, z 1  x2 / 2 lim P( Z n  z )  ( z )   e dx .  n  2 Interpretation: Probability statements about X n can be approximated using a normal distribution. It’s the probability statements that we are approximating, not the random variable itself. Some useful further convergence properties: Slutsky’s Theorem (Theorem 4.3.5): D P P D X n  X , An  a, Bn  b, then An  Bn X n  a  bX . Continuous Mapping Theorem (Theorem 4.3.4): Suppose X n converges to X in distribution and g is a continuous function on the support of X. Then g ( X n ) converges to g(X) in distribution. Application of these convergence properties: Let X 1 , , X n be a sequence of iid random variables having 4 2 mean  , variance    and E ( X i )   . Let   n Xi 1 n 2 2 S  ( X  X )  n i n and . Then n n i 1 Xn D Tn  Z Sn n where Z ~ N (0,1) . Proof (only for those interested): Xn  Tn   We can write S n . Using the Central Limit n Xn D Z Theorem which says  and Slutksy’s Theorem, to n Xn i 1 D  P prove that Tn  Z , it is sufficient to prove that S 1 . n We can write S n2   n 2 X i i 1 n 2 X n  i 1 X i n  n  X n2   n 2 X i i 1 n  X n2 . By the p 2 2 2 weak law of large numbers, S  E ( X i )  [ E ( X i )]   Sn2 P or equivalently  2 1 . 2 n Back to Confidence Intervals CI for mean of iid sample X1 , X n from unknown 4 distribution with finite variance and E ( X i )   : By the application of the central limit theorem above, Xn D Tn  Z Sn . n Thus, for large n,     Xn 1    P   z   z   Sn 1 1 2 2   n     S S P   z  n  X n     z  n  X n   1 n 2  1 2 n   S S  P  X n  z  n    X n  z  n  1 1 n n 2 2  Thus, X n  z1  2 Sn (1   ) confidence n is an approximate interval. How large does n need to be for this to be a good approximation? Traditionally textbooks say n>30. We’ll look at some simulation results later in the course. Application: A food-processing company is considering marketing a new spice mix for Creole and Cajun cooking. They took a simple random sample of 200 consumers and found that 37 would purchase such a product. Find an approximate 95% confidence interval for p, the true proportion of buyers. 1 if ith consumer would buy product X  Let i 0 if ith consumer would not buy product .  If the population is large (say 50 times larger than the sample size), a simple random sample can be regarded as a sample with replacement. Then a reasonable model is that X1 , , X 200 are iid Bernoulli(p). We have 37 Xn   0.185 200  i 1 ( X i  X n )2 200 Sn 2    200 i 1 X i2  X n2  0.185  (0.185) 2  0.151 n 200 Thus, an approximate 95% confidence interval for p is S 0.151 X n  z  n  0.185  1.96  (0.131, 0.239) . 1 n 200 2 Note that for an iid Bernoulli ( p ) sample, we can write S n 2 in a simple way. In general,  i 1 ( X i  X n )2 n S n2  n  i 1 X i2 n = n 2 n   n 2 2 ( X  2 X X  X ) i i n n i 1 n 2 n 2nX nX    n n   n 2 X i i 1  X n2 n For an iid Bernoulli sample, let pˆ n  X n . pˆ n is a natural point estimator of p for the Bernoulli. Note that for a 2 Bernoulli sample, X i  X i . Thus, for a Bernoulli sample S n2  pˆ n  pˆ n2 and an approximate 95% confidence interval for p is pˆ n  1.96 pˆ n2  pˆ n n Choosing Between Confidence Intervals 2 2 Let X 1 , , X n be iid N (  ,  ) where  is known. Suppose we want a 95% confidence interval for  . Then for any a and b that satisfy P(a  Z  b)  0.95 ,     X  b , X  a   n n  is a 95% confidence interval because:   X   0.95  P  a   b      n       P  a  X    b X n n       P  X  b    X a  n n  For example, we could choose (1) a=-1.96, b=1.96 [P(Z<a)=.025; P(Z>b)=.025); the choice we have used before]; (2) a=-2.05, b=1.88 [P(Z<a)=0.02, P(Z>b)=0.03]; (3) a=-1.75, b=2.33 [P(Z<a)=0.04, P(Z>b)=0.01]. Which is the best 95% confidence interval? Reasonable criterion: Expected length of the confidence interval. Among all 95% confidence interval, choose the one that is expected to have the smallest length since it will be the most informative. Length of the confidence interval = (X  a  )  (X b  )  (b  a)  n n n, thus we want to choose the confidence interval with the smallest value of b-a. The value of b-a for the three confidence intervals above is (1) a=-1.96, b=1.96, (b-a)=3.92; (2) a=-2.05, b=1.88, (ba)=3.93; (3) a=-1.75, b=2.33, (b-a)=4.08. The best 95% confidence interval is (1) with a=-1.96, b=1.96. In fact, it can be shown that for this problem the best choice of a and b are a=-1.96, b=1.96.

Statistics 512 Notes 4

Related documents

Products

Support

Statistics 512 Notes 4

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib