1 CHAPTER 6 Point Estimation 6.1 Introduction Statistics may be defined as the science of using data to better understand specific traits of chosen variables. In Chapters 1 through 5 we have encountered many such traits. For example, in relation to a 2D random variable ( X , Y ) , classic traits include: X , X , f X ( x) (marginal traits for X), Y , Y , fY ( y) (marginal traits for Y), and XY , XY , f XY ( x, y)) (joint traits for ( X , Y ) ). Notice that all of these traits are simply parameters. This includes even a trait such as f XY ( x, y) . For each ordered pair ( x, y ) , the quantity f XY ( x, y) is a scalar-valued parameter. Definition 1.1 Let denote a parameter associated with a random variable, X, and let { X k }nk 1 denote a collection of associated data collection variables. Then the estimator g[{ X k }nk1 ] is said to be a point estimator of . Remark 1.1 The term ‘point’ refers to the fact that we obtain a single number as an estimate; as opposed to, say, an interval estimate. Much of the material in this chapter will refer to material in the course textbook, so that the reader may go to that material for further clarification and examples. 6.2 The Central Limit Theorem (CLT) and Related Topics (mainly Ch.8 of the textbook) Even though we have discussed the CLT on numerous occasions, because it plays such a central role in point estimation we present it again here. Theorem 2.1 (Theorem 8.3 of the textbook) (Central Limit Theorem) Suppose that X ~ f X ( x) and that X2 , and suppose that { X k }nk 1 ~ iid ( X ) . Define {Z n }n 1 where Z n ( X n X ) /( X / n ) , with X n n X k / n . Then lim Zn Z ~ N (0,1) . n k 1 Remark 2.1 In relation to point estimation of X for large n, This theorem states that ( X n X ) /( X / n ) ~ N (0,1) X n ~ N ( X X , X X / n ) . Remark 2.2 The authors state (p.269) that “In practice, this approximation [i.e. that X n ~ N ( X , X2 / n) ] is used when n 30 , regardless of the actual shape of the population sampled.” 2 Now, while this is, indeed, a ‘rule of thumb’, that does not mean that it is always justified. There are two key issues that must be reckoned with, as will be illustrated in the following example. Example 2.1. Suppose that { X k }nk 1 ~ iid ( X ) with X ~ Bernoulli(p). Then X p and X2 p (1 p ) / n . In fact, X ~ (1 / n) Binomial (n, p) . Specifically, S X {xk k / n}nk 0 . Case 1: n = 30 and p = 0.5.Clearly, the shape is that of a bell curve. 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 2.1 The pdf of X ~ (1 / 30) Binomial (n 30, p 0.5) Clearly, X is a discrete random variable; whereas to claim that X n ~ N (0.5,0.09112 ) is to claim that it is a continuous random variable. To obtain this continuous pdf, it is only necessary to scale Figure 1 to have total area equal to one (i.e. divide each number by 1/30, and use a bar plot instead of a stem plot). The result is given in Figure 2. Continuous Approximation of X 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 3 Continuous Approximation of X 4 3.5 3 2.5 2 1.5 1 0.5 0 0.44 0.46 0.48 0.5 0.52 0.54 0.56 Figure 2.2 Comparison of the staircase pdf and the normal pdf. The lower plot shows the error (green) associated with using the normal pdf to estimate the probability of the double arrow region. Case 2: n = 30 and p = 0.1. PDF of X ~ (1/n)*binomial(30,0.5) 0.25 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Continuous Approximation of X 8 7 6 5 4 3 2 1 0 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Figure 2.3 The pdf of X ~ (1 / 30) Binomial (n 30, p 0.1) (TOP), and the continuous approximations (BOTTOM). The problem here is that by using the normal approximation, we see that Pr[0 X 1] is significantly less than one! □ 4 We now address some sampling distribution results given in Chapter 8 of the text. n Theorem 2.2 (Theorem 8.8 in the textbook) For {Z k }nk 1 ~ iid N (0,1) , Y Z k2 k 1 ~ n2 . Proof: The proof of this theorem relies on two ‘facts’: (i) For Z ~ N (0,1) , the random variable W Z 2 ~ 12 . n 12 , the random variable Y Z k2 (ii) For {Wk }nk1 ~ iid ~ k 1 Claim 2.1 For Y ~ n2 : Y n n2 . and Y2 2n . The proof of the claim Y n follows directly from the linearity of E(*): n E (Y ) E Z k2 k 1 n E (Z k 1 2 k n 1 ) n. k 1 The proof of the claim Y2 2n requires a little more work. Example 2.2 Suppose that { X k }nk 1 ~ iid given by 2 S n2 (1 / n) n (X k 1 k )2 . N ( , 2 ) . Obtain the pdf for the point estimator of 2 Solution: Write n 2 / 2 nS n2 / 2 n Xk Z k2 k 1 k 1 n 2 ~ n2 . 2 2 n . Even though n / ~ , it is sometimes easier (and more insightful) to say that ~ n 2 2 2 2 n In either case, we have (i) (ii) 2 2 2 E ( 2 ) E n2 E ( n2 ) n 2 , and n n n 2 2 2 2 4 . Var ( ) Var n2 Var ( n2 ) 2n n n n n 2 2 2 Finally, if n is sufficiently large that the CLT holds, then 2 ~ N ( 2 2 , 2 2 2 / n ) . □ 5 nSn2 Proof: n 2 ( X k )2 k 1 n Z 2 k 1 2 k where {Z k }nk 1 ~ iid N (0,1) . □ Theorem 2.3 (Theorem 8.11 of the textbook) Given { X k }nk1 ~ iid not known. Let X X and X2 S n21 (1 / n 1) 2 and (ii) (n 1) X X2 n (X k 1 k N ( X , X2 ) , suppose that X is X ) 2 . Then (i) X and X2 are independent, 1 n ( X k X ) 2 for the related result. ~ n21 . The above should be: X2 n 1 k 1 Proof: [This is one reason that a proof can be insightful; not simply a mathematical torture ] n (X k 1 n k n X ) [( X k X ) ( X X )] ( X k X ) 2( X X ) ( X k X ) n( X X )]2 (X k 1 2 k 1 n But n 2 2 k 1 k 1 n X ) X k n X nX n X n( X X ) . And so, we have: k k 1 n n k 1 k 1 ( X k X ) 2 ( X k X ) 2 n( X X ) 2 . Dividing both sides by X2 gives: 1 n2 X n k 1 X 2 k X X k X k 1 n 2 X X X / n 2 n2 12 n21 where the rightmost equality follows from the independence result (not proved). X n Hence, we have shown that k 1 Sn21 X X2 n21 . From this result, we have 2 k 2 1 n X k X 2 X n21 n 1 k 1 n 1 and Sn2 2 1 n X k X 2 X n21 n k 1 n Remark 2.3 A comparison of Theorems 2.2 and 2.3 reveals the effect of replacing the mean X by its estimator X X ; namely, that one loses one degree of freedom in relation to the (scaled) chi-squared distribution for the estimator of the variance X2 . It is not uncommon that persons not well-versed in probability and statistics use X in estimating the variance, even when the true mean is known! The price paid for this ignorance is that the variance estimator will not be as accurate. Theorem 2.4 (Theorem 8.12 in the textbook) Suppose that Z ~ N (0,1) and Y ~ v2 are independent. Define T Z / Y / v . Then the probability density function for T is: 6 v 1 v 1 2 2 x 2 fT ( x ) 1 v v v 2 ; x (, ) . This pdf is called the (student) t distribution with v degrees of freedom and is denoted as tv. From Theorems 2.3 and 2.4 we immediately have Theorem 2.5 (Theorem 8.13 in the textbook) If X X and X2 S n21 are estimates of X and X2 , respectively, based on { X k }nk 1 ~ iid X ~ N ( X , X2 ) , then T X X S n21 / n ~ tn1 . Explain: Theorem 2.6 (Theorem 8.14 in the textbook) Suppose that U ~ v21 and V ~ v22 are independent. Then F U / v1 has an F distribution, and is denoted as Fv1 ,v2 . V / v2 As an immediate consequence, we have Theorem 2.7 (Theorem 8.15 in the textbook) Suppose that {S n2j 1 ( j )}2j 1 are sample variances of {X ( j ) ~ N ( j , 2j )}2j 1 , obtained using { X k (1)}nk11 and { X k (2)}nk21 , respectively, and that the elements in each sample are independent, and that the samples are also independent. Then F S n211 / 12 S n22 1 / 22 has an F distribution with v1 n1 1 and v2 n2 1 degrees of freedom. Remark 2.4 It should be clear that this statistic plays a major role in determining whether two variances are equal or not. Q: Suppose that n1 n2 n is large enough to invoke the CLT. How could you use this to arrive at another test statistic for deciding whether two variances are equal? A: Use their DIFFERENCE 7 6.3 A Summary of Useful Point Estimation Results The majority of the results of the last section pertained to the parameters and 2 . For this reason, the following summary will address each one of these parameters, individually, as the parameter of primary concern. Results when µ is of Primary Concern- Suppose that X ~ f X ( x ; X , X2 ) , and that { X k }nk 1 are iid data collection variables that are to be used to investigate the unknown parameter X . For such an investigation, we will use X X . Case 1: X ~ N ( x ; X , X2 ) (Remark 2.1): (Theorem 2.5): Z ( X X ) /( X / n ) ~ N (0,1) for any n, when X2 is known. X X 1 n ( X k X )2 T ~ tn1 for any n, when X2 is estimated by X2 2 n 1 k 1 X /n Case 2: X ~ f X ( x ; X , X2 ) For n sufficiently large (i.e. the CLT is a good approximation), then: (Remark 2.1): (Theorem 2.5): Z ( X X ) /( X / n ) ~ N (0,1) when X2 is known. X X 1 n Z ~ N (0,1) when X2 is estimated by X2 ( X k X )2 2 n 1 k 1 X /n Results when σ2 is of Primary Concern- Suppose that X ~ f X ( x ; X , X2 ) , and that { X k }nk 1 are iid data collection variables that are to be used to investigate the unknown parameter X2 . For such an investigation, we will use X X , when appropriate. Case 1: X ~ N ( x ; X , X2 ) (Example 2.2): (Theorem 2.3): (Theorem 2.7): n 2 / 2 (n 1) X2 X2 ~ 1 n n2 , when X is known and X2 ( X k X ) 2 n k 1 1 n ( X k X )2 ~ n21 for any n, when X2 is estimated by X2 n 1 k 1 X2 / X2 1 F 2 ~ f n ,n when X2 2 X / X nj 1 1 1 2 2 j 2 (Theorem 2.7): nj (X k 1 ( j) k X j )2 ; j 1,2 . n X2 / X2 1 2 F 2 ~ f n 1,n 1 when X ( X k( j ) X ) 2 ; 2 X / X n j 1 k 1 j 1 1 2 2 1 2 j j j 1,2 . 8 Case 2: X ~ f X ( x ; X , X2 ) For n sufficiently large (i.e. the CLT is a good approximation), then: (Example 2.2): Z X2 2 ~ N (0,1) 2 2/ n Application of the above results to (X,Y)For a 2-D random variable, ( X , Y ) , suppose that they are independent. Then we can define the random variable W X Y . Regardless of the independence assumption, we have W X Y . In view of that assumption, we have W2 X2 Y2 . Hence, we are now in the setting of a 1-D random variable, W, and so many of the above results apply. Even if we drop the independence assumption, we can still apply them. The only difference is that in this situation W2 X2 Y2 2 XY . Hence, we need to realize that a larger value of n may be needed to achieve an acceptable level of uncertainty in relation to, for example. W W . Specifically, if we assume the data collection variables {Wk }nk1 are iid, then 2 W2 / n ( X2 Y2 2 XY ) / n . W The perhaps more interesting situation is where are concerned with either XY or XY XY /( X Y ) . Of these two parameters, XY is the more tractable one to address using the above results. To see this, consider the estimator: XY 1 n ( X k X )(Yk Y ) . n k 1 Define the random variable W ( X X )(Y Y ) . Then we have XY 1 n 1 n ( X )( Y ) Wk W W . k X k Y n k 1 n k 1 And so, we are now in the setting where we are concerned with W XY . In relation to W XY , we have: 1 n 1 n 1 n E ( W ) E ( XY ) E ( X k X )(Yk Y ) E[( X k X )(Yk Y )] XY XY W . n k 1 n k 1 n k 1 We also have Var ( W ) Var (W ) / n . The trick now is to obtain an expression for Var (W ) . To this end, we begin with: 2 Var (W ) E[(W W ) 2 ] E (W 2 ) W2 E[{( X X )(Y Y )}2 ] XY . This can be written as: 9 X X Var (W ) E X 2 X 2 Y 2 Y Y Y 2 2 XY X2 Y2{E[( Z1Z 2 ) 2 ] Z21Z2 } X2 Y2Var (Z1Z 2 ) . And so, we are led to ask the Question: Suppose that Z1 ~ N (0,1) and Z 2 ~ N (0,1) have E(Z1Z2 ) . Then what is the variance of W Z1Z2 ? Answer: As noted above, the mean of W is E(Z1Z2 ) . Before getting too involved in the mathematics, let’s run some simulations : The following plot is for 0.5 : Simulation-based pdf for W with r = 0.5 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -6 -4 -2 0 2 4 6 8 10 12 Figure 3.1 Plot of the simulation-based estimate of fW (w) for 0.5 . The simulations also resulted in W 0.4767 and W2 1.1716 . The Matlab code is given below. % PROGRAM name: z1z2.m % This code uses simulations to investigate the pdf of W = Z1*Z2 % where Z1 & Z2 are unit normal r.v.s with cov= r; nsim = 10000; r=0.5; mu = [0 0]; Sigma = [1 r; r 1]; Z = mvnrnd(mu, Sigma, nsim); plot(Z(:,1),Z(:,2),'.'); pause W = Z(:,1).*Z(:,2); Wmax = max(W); Wmin=min(W); db = (Wmax - Wmin)/50; bctr = Wmin + db/2:db:Wmax-db/2; fw = hist(W,bctr); fw = (nsim*db)^-1 * fw; bar(bctr,fw) title('Simulation-based pdf for W with r = 0.5') Conclusion: The simulations revealed a number of interesting things. For one, the mean was W 0.4767 . One would have thought that for 10,000 simulations it would have been closer to the true mean 0.5. Also, and not unrelated to this, is the fact that fW (w) has very long tails. Based on this simple 10 simulation-based analysis, one might be thankful that the mathematical pursuit was not readily undertaken, as it would appear that it will be a formidable undertaking! QUESTION: How can knowledge of fW (w) be used in relation to estimating ρXY from {( X k , Yk )}nk 1 ? Computation of fXY(w): Method 1: Use THEOREM 7.2 on p.248 of the book. [A nice example of its application ] Let f ( x1 , x2 ) be the joint pdf of the 2-D continuous random variable ( X1 , X 2 ) . Let Y1 X1 and Y2 X1 X 2 . Then x1 y1 and x2 y2 / y1 . And so, the Jacobian is: 0 1 x / y x1 / y2 1 J det 1 1 det . 2 x2 / y1 x2 / y2 y2 / y1 1 / y1 y1 Hence, the pdf for (Y1 , Y2 ) is: 1 g ( y1 , y2 ) f ( x1 , x2 ) | J | f ( y1 , y2 / y1 ) . | y1 | We now apply this to: f ( x1 , x2 ) 1 e 2 1 2 1 ( x12 x22 2 x1x2 ) 2 (1 2 ) [see DEFINITION 6.8 on p.220.] 1 [ y1 ( y2 / y1 ) 2 y2 ] 2 1 1 . g ( y1 , y2 ) f ( y1 , y2 / y1 ) e 2(1 ) | y1 | | y1 | 2 1 2 2 2 Hence, fY2 ( y2 ) | y 1 2 1 | 2 1 e 1 [ y12 ( y2 / y1 )2 2 y2 ] 2 (1 2 ) dy1 (***) I will proceed with a possibly useful integral, but this will not, in itself, give a closed form for (***). [Table of Integrals, Series, and Products by Gradshteyn & Ryzhik p.307 #3.325] e 0 ( a x2 b ) x2 dx 1 2 e 2 a ab . Method 2 (use characteristic functions): Assume that X and Y are two independent standard normal random variables and let us compute the characteristic function of XY=W. One knows that E(eiX ) e 2 /2 . Hence, E (eiXY | Y ) e Y 2 2 /2 . And so: 11 E (e i XY 1 2 ) E[ E ( e e y / 2[1 /(1 2 i XY 2 | Y )] E[e 2Y 2 / 2 ] 1 2 e 2 y 2 / 2 y 2 / 2 e dy 1 2 e (1 2 ) y 2 / 2 dy 1 1 )] y 2 / 2[1 /(1 2 )] dy e dy W (i ) 2 2 1 2 [1 /(1 )] 1 2 . 1 Hence, 1 fW ( w) 2 W (i )e i w 1 d 2 e i w 1 2 d 1 cos( w) 0 1 2 d 1 K 0 ( w) where (G&R p.419 entry 3.7542) is: fW ( w) 1 cos( w) 0 1 2 d 1 K 0 ( w) 1 i (1) i (1) H 0 (iw) H 0 (iw) p.952, 8.4071 2 2 (1) 0 where H (iw) is a Bessel function of the third kind, also called a Hankel function. Yuck!!! For my own possible benefit: The OP mentions the characteristic function of the product of two independent brownian motions, say the processes (Xt)t and (Yt)t. The above yields the distribution of Z1=X1Y1 and, by homogeneity, of Zt=XtYt which is distributed like tZ1 for every t⩾0. However this does not determine the distribution of the process (Zt)t. For example, to compute the distribution of (Zt,Zs) for 0⩽t⩽s, one could write the increment Zs−Zt as Zs−Zt=Xt(Ys−Yt)+Yt(Xs−Xt)+(Xs−Xt)(Ys−Yt) but the terms Xt(Ys−Yt) and Yt(Xs−Xt) show that Zs−Zt is not independent on Zt and that (Zt)t is probably not Markov. [NOTE: Take X and Y two Gaussian random variables with mean 0 and variance 1. Since they have the same variance, X−Y and X+Y are independent.]□ 6.4 Simple Hypothesis Testing Example 4.1 (Book 8.78 on p.293): A random sample of size 25 from a normal population had mean value 47 and standard deviation 7. If we base our decision on Theorem 2.5 (t-distribution), can we say that this information supports the claim that the population mean is 42? Solution: While not formally stated as such, this is a problem in hypothesis testing. Specifically, H 0 : X 42 versus H1 : X 42 . The natural rationale for deciding which hypothesis to announce is simple: If X X is ‘sufficiently close’ to42, then we will announce H0. Regardless of whether or not H0 is true, we know that T X X S 252 / n ~ t24 . 12 Assuming that, indeed, H0 is true, (i.e. X 42 ), then T X 42 S 252 / n ~ t 24 . A plot of fT (t ) is shown below. Plot of fT(t) for v=24 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 t 1 2 3 4 Figure 4.1 A plot of fT (t ) for T ~ t24 . Based on our above rationale for announcing H 0 versus H1 , if a measurement of this T is sufficiently large, we should announce H1 , even if H 0 is, in fact, true. Suppose that we were to use a threshold value tth . Then our decision rule is: [| T | tth ] = The event that we will announce H 0 . Or, equivalently: [| T | tth ] = The event that we will announce H1 . If, in fact, H 0 is true, then our false alarm probability is: Pr[| T | tth ] where T X 42 S 252 / n ~ t24 . There are a number of ways to proceed from here. Case 1: Specify the false alarm probability, , that we are willing to risk. Suppose, for example, that we choose .05 (i.e. we are willing to risk wrongly announcing H1 with a 5% probability.) Then Pr[| T | tth ] where T X 42 S 252 / n ~ t 24 results in a threshold tth tinv(.975,24) = 2.064. 13 From the above data, we have t x X sn2 / n 47 42 7 2 / 25 3.53 . Since this value is greater than 2.064 then we should announce H1 . Case 2: Use the data to first compute t x X sn2 / n 47 42 7 2 / 25 3.53 , and then use this as our threshold, tth . Then because our value was t 3.53 , this threshold has been met, and we should announce H1 . In this case, the probability that we are wrong becomes: Pr[| T | 3.53] 2 Pr[T 3.53] 2*tcdf(-3.53,24) = 0.0017. QUESTION: Are we willing to risk announcing H1 with the probability of being wrong equal to .0017? ANSWER: If we were willing to risk a 5% false alarm probability, then surely we would risk an even lower probability of being wrong in announcing H1 . Hence, we should announce H1 . Comment: In both cases we ended up announcing H1 . So, what does it matter which case we abide by? The answer, with a little thought, should be obvious. In case 1 we are announcing H1 with a 5% probability of being wrong; whereas in case 2 we are announcing H1 with a 0.17% probability of being wrong. In other words, we were willing to take much less risk, and even then we announced H1 . □ The false alarm probability obtained in case 2 of the above example has a name. It is called the p-value of the test. It is the smallest false alarm probability that we can achieve in using the given data to announce H1 . As far as the problem goes, as stated in the textbook, we are finished. However, we have ignored the second type of error that we could make; namely, announcing H 0 true, when, in fact, H1 is true. And there is a reason that this error (called the type-2 error) is often ignored. It is because in the situation where H 0 is true, we have a numerical value for X (in this case, X 42 ). However, if H1 is true, then all we know is that X 42 . And so, to compute a type-2 error probability, we need to specify a value for X . Suppose, for example, that we assume X 45 . Suppose further that we maintain the threshold tth 3.53 . Now we have T45 X 45 S 252 / n ~ t24 , while our chosen test statistics remains T42 X 42 S 252 / n . And so, the question becomes: How does the random variable T42 relate to the random variable T45 ? To answer this question, write: T42 X 42 S 252 / n X 45 3 S 252 / n T45 3 S 252 / n T45 W (4.1) 14 where 3 S 252 / n W is a random variable; and not a pleasant one, at that! To see why this point is important, let’s presume for the moment that we actually know X2 . In this case, we would have Z X 45 X2 / n ~ N (0,1) . Hence, T42 variable with mean 3 X2 / n X 42 X2 / n Z 3 X2 / n Z 45 c . In words, T42 is a normal random c and with variance equal to one. Using the above data, assume X2 / n 7 2 / 52 1.4 . Then c 2.143 . This pdf is shown below. 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -2 -1 0 1 2 3 4 5 6 Figure 4.2 The pdf for T42 ~ N (0.467,1) . Now let’s return to the above two cases. Case 1 (continued): Our decision rule for announcing H1 with a 5% false alarm probability relied on the event [| T42 | 2.064] . This event is shown as the green double arrow in Figure 4.2. The probability of this event is computed as Pr[| T42 | 2.064] normcdf(2.064,2.143,1) – normcdf(-2.064,2.143,1) = 0.47. Hence, if, indeed, the true mean is X 45 , then under the assumption that the CLT is valid, we have a 47% chance of announcing that X 42 using this decision rule. Case 2 (continued): Our decision rule for announcing H1 with false alarm probability 0.17% relied on the event [| T42 | 3.53] . This event is shown as the red double arrow in Figure 4.2. The probability of this event is computed as Pr[| T42 | 3.53] normcdf(3.53,2.143,1) – normcdf(-3.53,2.143,1) = 0.92. Hence, if, indeed, the true mean is X 45 , then under the assumption that the CLT is valid, we have a 92% chance of announcing that X 42 using this decision rule. 15 In conclusion, we see that the smaller that we make the false alarm probability, the greater the type-2 error will be. Taken to the extreme, suppose that no matter what the data says, we will simply not announce H1 . Then, of course our false alarm probability is zero, since we will never sound that alarm. However, by never sounding the alarm, we are guaranteed that with probability one, we will announce H 0 when H1 is true. A reasonable question to ask is: How can one select acceptable false alarm and type-2 error probabilities. One answer is given by the Neymann Pearson Lemma. We will not go into this lemma in these notes. The interested reader might refer to: http://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma Finally, we are in a position to return to the problem entailed in (3.1), where we see that we are not subtracting a number (which would merely shift the mean), but we are subtracting a random variable. Hence, one cannot simply say that (3.1) is a mean-shifted t 24 random variable. We will leave it to the interested reader to pursue this further. One simple approach to begin with would use simulations to arrive at the pdf for (3.1). □ Example 4.2 (Problem 8.79 on p.293 of the textbook) A random sample of size n=12 from a normal population has (sample) mean x 27.8 and (sample) variance s 2 3.24 . If we base our decision on the statistic of Theorem 8.13 (i.e. Theorem 2.5 of these notes), can we say that the given information supports the claim that the mean of the population is X 28.5 ? Solution: The statistic to be used is T X X Sn2 / n ~ t11 . And so, the corresponding measurement of T 27.8 28.5 1.347 . Since the sample mean x 27.8 is less than the 3.24 / 12 speculated true mean X 28.5 under H 0 , a reasonable hypothesis setting would ask the question: Is 27.8 really that much smaller than 28.5. This leads to the decision rule: [T 1.347] Announce H1 . obtained from the data is t In the event that, indeed, T X 28.5 Sn2 / n ~ t11 , Pr[T 1.347] 0.1025 . In words, there is a 10% chance that a measurement of T would be contained in the interval (,1.347] if, indeed, X 28.5 . And so now, one needs to decide whether or not one should announce H1 . One the one hand, one could announce H1 with a reported p-value of 0.10 and let management make the final announcement. On the other hand, if management has, for example, already ‘laid down the rule’ that it will not accept a false alarm probability greater than, say, 0.05, then one should announce H 0 . □ We now proceed to some examples concerning tests for an unknown variance. Example 4.3 (EXAMPLE 13.6 on p.410 of the textbook) Let X denote the thickness of a part used in a 2 semiconductor. Using the iid data collection variables { X k }18 k 1 it was found that s 0.68 . The 16 manufacturing process is considered to be in control if X2 0.36 . Assuming X is normally distributed, test the null hypothesis H 0 : X2 0.36 against the alternative hypothesis H1 : X2 0.36 at a 5% significance level (i.e. a 5% false alarm probability). Solution: From Theorem 2.3 we have Q q (17) X2 2 X ~ 172 . From the data, assuming H 0 is true, we have (18 1)(0.68) 32.11 . The threshold value, qth is obtained from Pr[Q qth ] .05 . 0.36 Hence, qth = chi2inv(.95,17) = 27.587. Hence, we will announce H1 . Now let’s obtain the p-value for this test. If we use qth 32.11 , Then Pr[Q qth ] .0146 is the pvalue. Next, suppose that instead of using the sample mean (which is, by the way, not reported) one had used the design mean thickness. Then we have Q (18) X2 2 X ~ 182 . In this case, Pr[Q qth ] .0213 is the p- value. And so, at a 5% significance level, we would still announce H1 . □ Example 4.4 Suppose that X = The act of recording the number of false statements of a certain news network in any given day. We will assume that X has a Poisson distribution, with mean X 2 . Let { X k }5k 1 ~ iid X be the number of false statements throughout any 5-day work week. In order to determine whether or not the network has decided to increase the number of false statements, we will test 1 5 H 0 : X 2 versus H 0 : X 2 , where X X X k is the average number of false statements 5 k 1 in any given week. We desire a false alarm probability of ~10%. QUESTION: How should we proceed? ANSWER: Simulations! %PROGRAM NAME: avgdefects.m nsim = 40000; n = 5; mu = 2; x=poissrnd(mu,n,nsim); muhat = mean(x); db = 1/n; bctr = 0:db:5; fmuhat = nsim^-1 *hist(muhat,bctr); figure(1) stem(bctr,fmuhat) 17 hold on pause for m = 1:10 x=poissrnd(mu,n,nsim); muhat = mean(x); db = 1/n; bctr = 0:db:5; fmuhat = nsim^-1 *hist(muhat,bctr); stem(bctr,fmuhat) end 18 6.5 Confidence Intervals The concept of a confidence interval for an unknown parameter, θ, is intimately connected to the concept of simple hypothesis testing. To see this, consider the following example. Example 5.1 Let X = The act of recording the body temperature of a person who is in deep meditation. We will assume that X ~ N ( X , X2 ) . The data collection random variables { X k }nk 1 are to be used to investigate the unknown parameter X . We will assume that X 1.2o F is known and that as sample size n=25 will be used. The estimator X X will be used in the investigation. Hypothesis Testing: In the hypothesis testing portion of the investigation we begin by assuming that, under H 0 , the parameter X is a known quantity. Let’s assume that it is 98.6oF. In other words, we are assuming that deep meditation has no influence on X . Since we are investigating whether or not meditation has an effect on X in any way, then out test, formally is: H 0 : X 98.6o F versus H1 : X 98.6o F . Our decision rule for this test is, in words: If X X is sufficiently close to X( 0) 98.6o F , then we will announce H 0 . The event, call it A, that is described by these words then has the form: A [| X X( 0 ) | th ] . Here, the number th is the threshold temperature difference . Suppose that we require that, in the event that H 0 , we have Pr( A) Pr[| X X( 0) | th ] .90 . (5.1) In other words, we require a false alarm probability (or significance level) of size 0.10. X X . Then, assuming that H 0 is, indeed, In the last section, we used the standard test statistic: Z true, it follows that Z X X (0) X ~ N (0,1) . To convert (5.1) into an event involving this Z, we use the X concept of equivalent events: Pr( A) Pr[| X ( 0) X X X( 0) | th ] Pr[ th ] Pr{| Z | zth ] 0.90 . X From (5.2) we obtain X (5.2) 19 zth norminv(0.95 , 0 , 1) = 1.645. (5.3) Hence, if the magnitude of Z X X( 0) exceeds 1.645, then we will announce H1 . X Now, instead of going through the procedure of using the standard test statistic, Z, let’s use (5.1) more directly. Specifically, Pr[ X X( 0 ) th ] Pr[ X X( 0 ) th ] Pr[ X xth ] 0.05 . (5.4) Because X X / n 1.2 / 25 0.21 , from (5.3) we obtain xth norminv(0.95 , 98.6 , 0.21) = 98.945oF. (5.5) Hence, there was no need to use the standard statistic! Furthermore, the threshold (5.5) is in units of oF, making it much more attractive that the dimensionless units of (5.3). QUESTION: Given that using (5.1) to arrive at (5.5) is much more direct than using (5.2) and (5.3), then why in the world did we not use the more direct method in the last section? ANSWER: The method used in the last section is a method that has been used for well over 100 years. It was the only method that one could use when only a unit-normal z-table was available. Only in very recent years has statistical software been designed that precludes the use of this table. However, there are random variables other than the normal type for which a standardized table must be used. One example is T ~ tn . A linear function of T is NOT a T random variable; whereas a linear function of a normal random variable IS a normal random variable. Confidence Interval Estimation: We begin here with the same event that we began hypothesis testing with; namely A [| X X( 0 ) | th ] We then form the same probability that we formed in hypothesis testing; namely Pr( A) Pr[| X X | th ] .90 . A close inspection of equations (5.1) and (5.6) will reveal a slight difference. Specifically, in (5.1) we have the parameter X( 0 ) , which is the known value of X (=98.6oF) under H 0 . In contrast, the parameter X in (5.6) is not assumed to be known. The goal of this investigation is to arrive at a 90% 2-sided confidence interval for this unknown X . From (5.2) we have: 20 Pr( A) Pr[| X X | th ] Pr[ 1.645 X X 0.21 1.645] .90 . Finally, using the concept of equivalent events, we have Pr[ X 1.645(0.21) X X 1.645(0.21)] Pr[ X .345 X X .345] .90 . (5.6) Notice that the event A has been manipulated an equivalent event that is [ X .345 , X .345] (5.7) It is the interval (5.7) that is called the 90% 2-sided Confidence Interval (CI) for the unknown mean X . Remark 5.1 The interval (5.7) is an interval whose endpoints are random variables. In other words, the CI described by (5.7) is a random endpoint interval. It is NOT a numerical interval. Even though (5.7) is a random endpoint interval, the interval width is 0.69; that is, it is not random. To complete this example, suppose that after collecting the data, we arrive at X 97.2o F . Inserting this number into (5.7) gives the numerical interval [96.855 , 97.545] . The interval is not a CI. Rather it is an estimate of the CI. Furthermore, one cannot say that Pr[96.855 X 97.545] 0.9 . Why? Because even though X is not known, it is a number; not a random variable. Hence, this equation makes no sense. Either the number X is in the interval [96.855 , 97.545] or it isn’t. Period! For example, what would you say to someone who asked you the probability that the number 2 is in the interval [1 , 3]? Hopefully, you would be kind in your response to such a person, when saying “Um… the number 2 is in the interval [1 , 3] with probability one.” In summary, a 90% two-sided CI estimate should not be taken to mean that the probability that X is in that interval is 90%. Rather, it should be viewed as one measurement of a random endpoint interval, such that, were repeated measurements to be conducted again and again, then overall, one should expect that 90% of those numerical intervals will include the unknown parameter X . □ A General Procedure for Constructing a CI: Let be an unknown parameter for which a (1 )% 2-sided CI is desired, and let be the associated point estimator of . Step 1: Identify a standard random variable W that is a function of both and ; say, W g ( , ) . Step 2: Write: Pr[ w1 W w2 ] Pr[ w1 g ( , ) w2 ] 1 . Then, from this, find the numerical values of w1 and w2 . Step 3: Use the concept of equivalent events to arrive at an expression of the form: 21 Pr[ w1 W w2 ] Pr[ w1 g ( , ) w2 ] Pr[ h1 ( w1 , ) h2 ( w2 , )] 1 . The desired CI is then the random endpoint interval [h1 ( w1 , ) , h2 ( w2 , )] . □ The above procedure is now demonstrated in a number of simple examples. Example 5.2 Arrive at a 98% two-sided CI estimate for X , based on X 4.4 , X .25 and n=10. 2 Assume that the data collection variables { X k }10 k 1 are iid N ( X , X ) . Solution: Even though the sample mean X ~ N ( X , X2 / 10) , we do not have knowledge of X , and so will need to use X .Since n=10 is too small to invoke the CLT, we will use the test statistic Step 1: X T X ~ t9 . X / 10 Step 2: Pr[t1 T t2 ] .98 t2 tinv(.99,9) = 2.82 and t1 = -2.82. X t2 X t2 X / 10 X X t2 X / 10 . Hence, the X / 10 Step 3: [t1 T t 2 ] t1 T X desired CI is the random endpoint interval X 2.82 X / 10 , X 2.82 X / 10 . Using the above data, we arrive at the CI estimate: 4.4 2.82(.25) / 10 , 4.4 2.82(.25) / 10 4.4 .223 . □ Example 5.3 In an effort to estimate the variability of the temperature, T, of a pulsar, physicist used n=100 random measurements. The data gave the following statistics: T 2.11025 o K and T .0008 1025 o K . Obtain a 95% 2-sided CI estimate for T . 1 100 (Tk T ) 2 where T T . 99 k 1 (n 1) X2 ~ n21 (c.f. Theorem Step 1: The most common test statistic in relation to variance is 2 Solution: We will begin by using the point estimator of T2 : T2 X 2.7). Step 2: Pr[ x1 x2 ] 0.95 gives Pr[ x1 ] 0.025 and Pr[ x2 ] 0.9755 x1 chi2inv(.025,99) = 73.36 x2 chi2inv(.975,99) = 128.42 . 22 (n 1) X2 x1 1 x2 x2 2 2 2 x1 2 X (n 1) X X (n 1) X (n 1) X2 (n 1) X2 (n 1) X2 (n 1) X2 2 X X x2 x1 x2 x1 Step 3: . 99 2 99 X2 X Hence, the 95% 2-sided CI is the random endpoint interval , . From this and the 73.36 128.42 above data, we obtain the CI estimate: 99(.00082 ) 99(.00082 ) , [0.00070 , 0.00093] . 73.36 128.42 1 100 (Tk T ) 2 is an average of iid random variables, and because n=100 would Now, because T 99 k 1 2 seem to be large enough to invoke the CLT, repeat the above solution with this knowledge. Solution: (n 1) X2 X2 2 ~ n21 has mean n 1 and variance 2 2(n 1) . Writing X2 X , n 1 then it follows that 2 2 X X (n 1) X2 n 1 n 1 2 X and 2 2 2 X 2 2 2 2 X4 X 2 X 2(n 1) . n 1 n 1 n 1 And so, from the CLT, we have Step1: Z X2 X2 ~ N (0,1) . 2 X4 n 1 Step 2: Pr[ z1 Z z2 ] .95 Step 3: z2 1.96 and z1 1.96 . 23 2 2 X X 2 X2 X2 2 [ z1 Z z2 ] z1 z2 z1 z2 2 4 n 1 X n 1 2 X n 1 2 2 X2 1 z2 1 z1 n 1 X 2 2 1 X 1 n 1 X2 2 2 1 z1 1 z2 n 1 n 1 2 2 2 2 X X X X X2 X 2 2 2 2 1 z1 1 z2 1 z1 1 z2 n 1 n 1 n 1 n 1 Hence, the desired CI is the random endpoint interval X X , 2 2 1 1.96 1 1.96 99 99 X , X . 1.13 0.85 .0008 .0008 , [.00071,.00094] . And so the CI estimate is: X , X 1.13 0.85 1.13 0.85 This interval is almost exactly equal to the interval obtained above. Example 5.4 Suppose that it is desired to investigate the influence of the addition of background music in drawing customers into your new café during the evening hours. To this end, you will observe pedestrians who walk past your café while no music is playing, and while music is playing. Let X = The act of noting whether a pedestrian enters (1) or doesn’t enter (0) your café when no music is playing. Let Y = The act of noting whether a pedestrian enters (1) or doesn’t enter (0) your café when music is playing. Suppose that you obtained the following data-based results: pX 0.13 for sample size nX 100 , and pY 0.18 for sample size nY 50 . Obtain a 95% 1-sided (upper) CI for pY pX . Solution: We will assume that the actions of the pedestrians are mutually independent. Then we have 24 nX pX ~ Binomial (nX , pX ) is independent of nY pY ~ Binomial (nY , pY ) . Assumption: For a first analysis, we will assume that the CLT holds in relation to both p X and pY . This would seem to be reasonable, based on the above p-value estimates and the chart included in Homework #6, Problem 1. p X (1 p X ) p (1 pY ) and pY ~ N pY , Y . Hence nX n Y Then p X ~ N p X , pY p X ~ N pY p X , where we defined 2 p X (1 p X ) pY (1 pY ) N ( , 2 ) nX nY p X (1 p X ) pY (1 pY ) . The standard approach [See Theorem 11.8 on p.365 nX nY of the text] is to use: p X (1 p X ) pY (1 pY ) 0.13(0.87) 0.18(0.82) 0.0041 . nX nY 100 50 ~ N (0,1) , and with Pr[ Z zth ] 0.95 zth 1.645 . Then 0.064 , and so Z .064 2 Hence, we have the following equivalent events: [ Z zth ] 1.645 [ 1.645 ] .064 [ .105] . Recall that the smallest possible value for pY pX is -1. Hence, the CI estimate is: [-1 , .05+.105] = [-1 , .155] . Conclusion: Were you to repeat this experiment many times, you would expect that 95% of the time, the increase in clientele due to music would be no more that ~15%. p X (1 p X ) pY (1 pY ) Investigation of and its influence on the CI [1, 1.645 ] nX nY 2 Note: The primary intent of this section is to illustrate that: (i) the CI is a random variable, (i.e. the obtained CI estimate is simply one measurement of this variable), and (ii) the CLT assumption depends upon the size of the events being considered in relation to the size of the increments of the actual sample space for the variable. This section can be skipped with no loss of basic understanding of the above. This material could serve as the basis of a project, as it demonstrates a number of major concepts that could be elaborated upon (e.g. sample spaces, events, estimators, simulation-based pdf’s of more complicated estimators.) 25 In the above analysis we used estimates of the unknown parameters p X and pY to arrive at an estimate of the standard deviation . The value used, .064, is a single measurement of the random variable . In this section we will investigate the properties of this random variable using simulations. Specifically, we will use pX 0.13 and pY 0.18 , and 10,000 simulations of . The resulting simulation-based estimate of the pdf f (x) is shown in Figure 5.1. Simulation-Based Estimate of std(thetahat) 70 60 50 40 30 20 10 0 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Figure 5.1 Simulation-based (nsim = 10,000) estimate of f (x) . In relation to this figure, we have .064 ; which is the true value of . We also have .006 . From Figure 5.1, however, we see that f (x) is notably skewed. Hence, this should be noted if one chooses to use .006 as a measure of the level of uncertainty of . We are now proceed to investigate the random variable that is the upper random endpoint of the CI [1 , 1.645 ] . To this end, we begin by addressing the probability structure of . Because pY p X Y X is the difference between two averages, and because it would appear that nX 100 and nY 100 are sufficiently large that the CLT holds; that is, should be expected to have f (x ) that is bell-shaped (i.e. normal). In any case, we have: pY p X 0.05 and p X (1 p X ) pY (1 pY ) 0.064 nX nY The simulations that resulted in Figure 5.1 also resulted in Figure 5.2 below. Simulation-Based Estimate of the pdf of thetahat 6 5 4 3 2 1 0 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 26 Figure 5.2 Simulation-based (nsim = 10,000) estimate of f (x) using a 10-bin scaled histogram. Now, based on Figure 5.2, one could reasonably argue that the pdf, f (x) , has a normal distribution RIGHT? Well? Let’s use a 50-bin histogram instead of a 10-bin histogram, and see what we get. We are, after all, using 10,000 simulations. And so we can afford finer resolution. The result is shown in Figure 5.3. Simulation-Based Estimate of the pdf of thetahat 10 9 8 7 6 5 4 3 2 1 0 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 Figure 5.3 Simulation-based (nsim = 10,000) estimate of f (x) using a 50-bin scaled histogram. There are two peaks that clearly stand out in relation to an assumed bell curve shape. Furthermore, they real, in the sense that they appear for repeated simulations! Conclusion 1: If one is interested in events that are much larger than the bin width resolution, which in Figure 5.3 is ~0.011, then the normal approximation in Figure 5.2 may be acceptable. However, if one is interested in events on the order of 0.011, then Figure 5.3 suggests otherwise. The fact of the matter is that is discrete random variable. To obtain its sample space, write pY p X Y X 1 50 1 100 Y Xk . k 100 50 k 1 k 1 The smallest element of S is -1. The event {-1} is equivalent to [Y1 Y2 Y50 0 X 1 X 2 X 100 1] . The next smallest event is {-0.99}. This event is equivalent to [Y1 Y2 Y50 0 100 X k 1 k 99] . Clearly, the explicit computation of such equivalent events will become more cumbersome as we proceed. As an alternative to this cumbersome approach, we will, instead, simply use a bin width that is much smaller than 0.01 in computing the estimate of f (x) . Using not 50, but 500 bins resulted in Figure 5.4 below. 27 Simulation-Based Estimate of the pdf of thetahat 50 40 30 20 10 0 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 Simulation-Based Estimate of the pdf of thetahat 50 40 30 20 10 0 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 Figure 5.4 Simulation-based (nsim = 10,000) estimate of f (x) using a 500-bin scaled histogram. The lower plot includes the zoomed region near 0.08. What we see from Figure 5.4 is that the elements of S are not spaced at uniform 0.01 increments throughout. At the value x 0.08 we see that the spacing is much smaller than 0.01. Hence, it is very likely that the largest peak in Figure 5.3 is due to this unequal spacing in S . Now that we have an idea as to the structures of the marginal pdf’s f (x) and f (x) , we need to investigate their joint probability structure. To this end, the place to begin is with a scatter plot of versus . This is shown in Figure 5.6. 28 Scatter Plot of thetahat vs std(thetahat) 0.085 0.08 0.075 0.07 0.065 0.06 0.055 0.05 0.045 0.04 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 Figure 5.6 A scatter plot of versus for nsim =10,000 simulations. Clearly, Figure 5.6 suggests that and are correlated. We obtained 0.6 and Cov( , ) .0002 ]. Finally, since CI [1 , 1.645 ] , we have: E ( 1.645 )] 0.05 + 1.645(.06) = 0.15 , (i) and Var ( 1.645 ) Var ( ) 1.6452Var ( ) 2(1.645)Cov( , ) (ii) .0632 1.6452 (.0062 ) 2(1.645)(. 00023) .005 or, Std ( 1.645 ) .07 The pdf for assumed CIupper 1.645 ~ N (.06,.07 2 ) is overlaid against the simulation-based pdf in the figure below. 7 6 5 4 3 2 1 0 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Figure 5.7 Overlaid plots of the pdf of CIupper 1.645 ~ N (.15,.07 2 ) and the corresponding simulation-based pdf, f (x ) , for pY p X YnY 50 X nY 100 . Comment: Now, the truth of the matter in the chosen setting (the value of simulations- you know the truth. !) is that pY pX .18 .13 .05 . While this increase due to playing music may seem minor, it is, in fact, an increase of .05/.13, or 38% in likely customer business! This is significant by any business standards. But is the (upper) random endpoint interval CI [1 , 1.645 ] , in fact, a 95% CI estimator? Using the normal approximation in Figure 5.7, the probability to the left of the dashed line 29 is: Pr[CI upper .05] norminv(.05,.15,.07) = 0.035. And so, we actually have a 96.5% CI estimator. While this is only 1.5% larger than a 95% CI, it is an increase in the level of confidence. Summary & Conclusion This example to illustrate how one typically accommodates the unknown standard deviation associated with the estimated difference between two proportions. Because this standard deviation is a function of the unknown proportions, one typically uses the numerical estimates of the same to arrive at a numerical value for it. Hence, the estimator of the standard deviation is a random variable. Rather than using a direct mathematical approach to investigating its influence on the CI, it was easier to resort to simulations. In the course of this simulation-based study, it was discovered that the actual pdf of can definitely NOT be assumed to be normal if one is interested in event sizes on the order of 0.01. It was also found that and are correlated; as opposed to the case of normal random variables, where they are independent. For larger event sizes, it was found that the random variable is well-approximated by a normal pdf. It was also found that it corresponds not to a 95% CI but to a 96.5% CI. A one-sided CI was chosen simply to illustrate contrast to prior examples that involved 2-sided CI’s. It would be interesting to re-visit this example not only with other types of CI’s but also in relation to hypothesis testing. Finally, to tie this example to hypothesis testing, consider the hypothesis test: H 0 : 0 versus H 0 : 0 . Furthermore, suppose that we choose a test threshold th 0.05 . Then if, in fact, 0.05 , the type-2 error is 0.5 (see Figure 5.4).[Note: The Matlab code used in relation to this example is included in the Appendix.] □ 6.6 Summary This chapter is concerned with estimating parameters associated with a random variable. Section 6.2 included a number of random variables associated with this estimation problem. Many of these random variables (in this context they are also commonly referred to as statistics) relied on the CLT. Section 6.3 addressed some commonly estimated parameters and the probabilistic structure of the corresponding estimators. The results in sections 6.2 and 6.3 were brought to bear on the topics of hypothesis testing and confidence interval estimation in sections 6.4 and 6.5, respectively. In both topics the distribution of the estimator plays the central role. The only real difference between the two topics is that in confidence interval estimation the parameter of interest is unknown, whereas in hypothesis testing it is known under H0. 30 Appendix Matlab code used in relation to Example 5.4 %PROGRAM NAME: example54.m px=0.13; nx = 100; py=0.18; ny = 50; nsim = 10000; x = ceil(rand(nx,nsim) - (1-px)); y = ceil(rand(ny,nsim) - (1-py)); mx = mean(x); my = mean(y); thetahat = my - mx; %vx = nx^-1 *mx.*(1-mx); %vy = ny^-1 *my.*(1-my); vx = var(x)/nx; vy = var(y)/ny; vth = vx + vy; sth = vth.^0.5; msth = mean(sth) ssth = std(sth) vsth = ssth^2; figure(1) db = (.09 - .03)/50; bvec = .03+db/2 : db : .09-db/2; fsth = (nsim*db)^-1 *hist(sth,bvec); bar(bvec,fsth) title('Simulation-Based Estimate of the pdf of std(thetahat)') grid pause figure(2) db = (.3 +.25)/500; bvec = -.25+db/2 : db : .3-db/2; fth = (nsim*db)^-1 *hist(thetahat,bvec); bar(bvec,fth) title('Simulation-Based Estimate of the pdf of thetahat') grid mthetahat = mean(thetahat) sthetahat = std(thetahat) vthetahat = sthetahat^2 varthetahat = px*(1-px)/nx + py*(1-py)/ny pause figure(3) plot(thetahat,sth,'*') title('Scatter Plot of thetahat vs std(thetahat)') grid pause %Compute mean and std dev of CI cmat = cov(thetahat,sth); cov_th_sth = cmat(1,2) mCI = mthetahat + 1.645*msth vCI = vthetahat+1.645^2*vsth + 2*1.645*cov_th_sth; sCI = vCI^.5 pause xx = -.1:.0001:.4; fn = normpdf(xx,mCI,sCI); CIsim = thetahat +1.645*sth; db = (.4+.1)/50; bvec = -.1+db/2:db:.4-db/2; 31 fCIsim = (db*nsim)^-1 *hist(CIsim,bvec); figure(4) bar(bvec,fCIsim) hold on plot(xx,fn,'r','LineWidth',2) grid