Chapter 6 Point Estimation

advertisement
1
CHAPTER 6 Point Estimation
6.1 Introduction
Statistics may be defined as the science of using data to better understand specific traits of chosen
variables. In Chapters 1 through 5 we have encountered many such traits. For example, in relation to a 2D random variable ( X , Y ) , classic traits include:  X , X , f X ( x) (marginal traits for X),
Y , Y , fY ( y) (marginal traits for Y), and  XY ,  XY , f XY ( x, y)) (joint traits for ( X , Y ) ). Notice that all of
these traits are simply parameters. This includes even a trait such as f XY ( x, y) . For each ordered pair
( x, y ) , the quantity f XY ( x, y) is a scalar-valued parameter.
Definition 1.1 Let  denote a parameter associated with a random variable, X, and let { X k }nk 1 denote a

collection of associated data collection variables. Then the estimator   g[{ X k }nk1 ] is said to be a point
estimator of  .
Remark 1.1 The term ‘point’ refers to the fact that we obtain a single number as an estimate; as opposed
to, say, an interval estimate.
Much of the material in this chapter will refer to material in the course textbook, so that the reader may go
to that material for further clarification and examples.
6.2 The Central Limit Theorem (CLT) and Related Topics (mainly Ch.8 of the
textbook)
Even though we have discussed the CLT on numerous occasions, because it plays such a central role in
point estimation we present it again here.
Theorem 2.1 (Theorem 8.3 of the textbook) (Central Limit Theorem) Suppose that X ~ f X ( x) and

that  X2   , and suppose that { X k }nk 1 ~ iid ( X ) . Define {Z n }n 1 where Z n ( X n   X ) /( X / n ) ,

with X n 

n
 X k / n . Then lim Zn  Z ~ N (0,1) .
n 
k 1
Remark 2.1 In relation to point estimation of  X for large n, This theorem states that

( X n   X ) /( X / n ) ~ N (0,1) 

X n ~ N ( X   X ,  X   X / n ) .
Remark 2.2 The authors state (p.269) that “In practice, this approximation [i.e. that

X n ~ N (  X ,  X2 / n) ] is used when n  30 , regardless of the actual shape of the population sampled.”
2
Now, while this is, indeed, a ‘rule of thumb’, that does not mean that it is always justified. There are two
key issues that must be reckoned with, as will be illustrated in the following example.
Example 2.1. Suppose that { X k }nk 1 ~ iid ( X ) with X ~ Bernoulli(p). Then  X  p and
 X2  p (1  p ) / n . In fact, X ~ (1 / n)  Binomial (n, p) . Specifically, S X  {xk  k / n}nk  0 .
Case 1: n = 30 and p = 0.5.Clearly, the shape is that of a bell curve.
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 2.1 The pdf of X ~ (1 / 30)  Binomial (n  30, p  0.5)

Clearly, X is a discrete random variable; whereas to claim that X n ~ N (0.5,0.09112 ) is to claim that it is
a continuous random variable. To obtain this continuous pdf, it is only necessary to scale Figure 1 to have
total area equal to one (i.e. divide each number by 1/30, and use a bar plot instead of a stem plot). The
result is given in Figure 2.
Continuous Approximation of X
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
3
Continuous Approximation of X
4
3.5
3
2.5
2
1.5
1
0.5
0
0.44
0.46
0.48
0.5
0.52
0.54
0.56
Figure 2.2 Comparison of the staircase pdf and the normal pdf. The lower plot shows the error (green)
associated with using the normal pdf to estimate the probability of the double arrow region.
Case 2: n = 30 and p = 0.1.
PDF of X ~ (1/n)*binomial(30,0.5)
0.25
0.2
0.15
0.1
0.05
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Continuous Approximation of X
8
7
6
5
4
3
2
1
0
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Figure 2.3 The pdf of X ~ (1 / 30)  Binomial (n  30, p  0.1) (TOP), and the continuous
approximations (BOTTOM).
The problem here is that by using the normal approximation, we see that Pr[0  X  1] is significantly
less than one! □
4
We now address some sampling distribution results given in Chapter 8 of the text.
n
Theorem 2.2 (Theorem 8.8 in the textbook) For {Z k }nk 1 ~ iid
N (0,1) , Y   Z k2
k 1
~
 n2 .
Proof: The proof of this theorem relies on two ‘facts’:
(i) For Z ~ N (0,1) , the random variable W  Z 2 ~ 12 .
n
12 , the random variable Y   Z k2
(ii) For {Wk }nk1 ~ iid
~
k 1
Claim 2.1 For Y ~  n2 : Y  n
 n2 .
and  Y2  2n .
The proof of the claim Y  n follows directly from the linearity of E(*):
 n

E (Y )  E   Z k2  
 k 1 
n
 E (Z
k 1
2
k
n
1
) 
 n.
k 1
The proof of the claim  Y2  2n requires a little more work.
Example 2.2 Suppose that { X k }nk 1 ~ iid


given by  2  S n2 (1 / n)
n
(X
k 1
k
  )2 .


N (  ,  2 ) . Obtain the pdf for the point estimator of  2
Solution: Write n 2 /  2  nS n2 /  2 
n
 Xk   

Z k2






k 1 
k 1
n
2
~
 n2 .
 2  2
   n .
Even though n /  ~  , it is sometimes easier (and more insightful) to say that  ~ 
 n 
2
2
2
2
n
In either case, we have
(i)
(ii)
  2 
  2 
 2 

E ( 2 )  E     n2     E (  n2 )    n   2 , and
 n 
 n 
  n 
  2 
  2 
 2 
2 4
.
Var ( )  Var     n2     Var (  n2 )    2n 
n
 n 
 n 
  n 
2
2
2


Finally, if n is sufficiently large that the CLT holds, then  2 ~ N (  2   2 ,   2   2 2 / n ) . □
5
nSn2
Proof:

n


2
( X k   )2
k 1


n
Z

2
k 1
2
k
where {Z k }nk 1 ~ iid N (0,1) . □
Theorem 2.3 (Theorem 8.11 of the textbook) Given { X k }nk1 ~ iid



not known. Let  X  X and  X2  S n21  (1 / n  1)
2
and (ii)
(n  1) X
 X2
n
(X
k 1
k
N (  X ,  X2 ) , suppose that  X is


  X ) 2 . Then (i) X and  X2 are independent,
1 n


( X k   X ) 2 for the related result.
~  n21 . The above should be:  X2 

n  1 k 1
Proof: [This is one reason that a proof can be insightful; not simply a mathematical torture ]
n
(X
k 1
n
k
n
 X )   [( X k   X )  ( X   X )]   ( X k   X )  2( X   X ) ( X k   X )  n( X   X )]2
(X
k 1
2
k 1
n
But
n
2
2
k 1
k 1
n
  X )   X k  n X  nX  n X  n( X   X ) . And so, we have:
k
k 1
n
n
k 1
k 1
 ( X k  X ) 2   ( X k   X ) 2  n( X   X ) 2 . Dividing both sides by  X2 gives:
1
 n2
 X
n
k 1
 X
2
k
 X  X
   k
X
k 1 
n
2
  X  X 

  


  X / n 
2

 n2  12   n21 where the rightmost equality
follows from the independence result (not proved).
 X
n
Hence, we have shown that
k 1

Sn21 
 X    X2   n21 . From this result, we have
2
k
2
1 n
X k  X 2   X  n21

n  1 k 1
n 1

and
Sn2 
2
1 n
X k  X 2   X   n21

n k 1
n
Remark 2.3 A comparison of Theorems 2.2 and 2.3 reveals the effect of replacing the mean  X by its

estimator  X  X ; namely, that one loses one degree of freedom in relation to the (scaled) chi-squared
distribution for the estimator of the variance  X2 . It is not uncommon that persons not well-versed in
probability and statistics use X in estimating the variance, even when the true mean is known! The price
paid for this ignorance is that the variance estimator will not be as accurate.
Theorem 2.4 (Theorem 8.12 in the textbook) Suppose that Z ~ N (0,1) and Y ~  v2 are independent.

Define T  Z / Y / v . Then the probability density function for T is:
6
 v 1
v 1


2  2


x
2 
fT ( x )  
 1  
 v  
v 
 v  
2
; x  (, ) .
This pdf is called the (student) t distribution with v degrees of freedom and is denoted as tv.
From Theorems 2.3 and 2.4 we immediately have


Theorem 2.5 (Theorem 8.13 in the textbook) If  X  X and  X2  S n21 are estimates of  X and  X2 ,
respectively, based on { X k }nk 1 ~ iid X ~ N (  X ,  X2 ) , then
T
X  X
S n21 / n
~ tn1 .
Explain:
Theorem 2.6 (Theorem 8.14 in the textbook) Suppose that U ~ v21 and V ~  v22 are independent. Then
F
U / v1
has an F distribution, and is denoted as Fv1 ,v2 .
V / v2
As an immediate consequence, we have
Theorem 2.7 (Theorem 8.15 in the textbook) Suppose that {S n2j 1 ( j )}2j 1 are sample variances of
{X ( j ) ~ N ( j ,  2j )}2j 1 , obtained using { X k (1)}nk11 and { X k (2)}nk21 , respectively, and that the elements
in each sample are independent, and that the samples are also independent. Then

F
S n211 /  12
S n22 1 /  22
has an F distribution with v1  n1 1 and v2  n2  1 degrees of freedom.
Remark 2.4 It should be clear that this statistic plays a major role in determining whether two variances
are equal or not.
Q: Suppose that n1  n2  n is large enough to invoke the CLT. How could you use this to arrive at
another test statistic for deciding whether two variances are equal? A: Use their DIFFERENCE
7
6.3 A Summary of Useful Point Estimation Results
The majority of the results of the last section pertained to the parameters  and  2 . For this reason, the
following summary will address each one of these parameters, individually, as the parameter of primary
concern.
Results when µ is of Primary Concern- Suppose that X ~ f X ( x ;  X ,  X2 ) , and that { X k }nk 1 are iid
data collection variables that are to be used to investigate the unknown parameter  X . For such an

investigation, we will use  X  X .
Case 1: X ~ N ( x ;  X ,  X2 )
(Remark 2.1):
(Theorem 2.5):

Z  (  X   X ) /( X / n ) ~ N (0,1) for any n, when  X2 is known.

X  X
1 n


( X k   X )2
T 
~ tn1 for any n, when  X2 is estimated by  X2 

2
n  1 k 1
X /n
Case 2: X ~ f X ( x ;  X ,  X2 )
For n sufficiently large (i.e. the CLT is a good approximation), then:
(Remark 2.1):
(Theorem 2.5):

Z  (  X   X ) /( X / n ) ~ N (0,1) when  X2 is known.

X  X
1 n


Z 
~ N (0,1) when  X2 is estimated by  X2 
( X k   X )2

2
n  1 k 1
X /n
Results when σ2 is of Primary Concern- Suppose that X ~ f X ( x ;  X ,  X2 ) , and that { X k }nk 1 are iid
data collection variables that are to be used to investigate the unknown parameter  X2 . For such an

investigation, we will use  X  X , when appropriate.
Case 1: X ~ N ( x ;  X ,  X2 )
(Example 2.2):
(Theorem 2.3):
(Theorem 2.7):

n 2 /  2

(n  1) X2
 X2

~
1 n

 n2 , when  X is known and  X2   ( X k   X ) 2
n k 1
1 n


( X k   X )2
~  n21 for any n, when  X2 is estimated by  X2 

n  1 k 1
 X2 /  X2
1

F  2
~ f n ,n when  X2 
2
 X / X
nj
1
1
1
2
2
j
2

(Theorem 2.7):
nj
(X
k 1
( j)
k
  X j )2
;
j  1,2 .
n
 X2 /  X2
1
2

F  2
~ f n 1,n 1 when  X 
( X k( j )   X ) 2 ;

2
 X / X
n j  1 k 1
j
1
1
2
2
1
2
j
j
j  1,2 .
8
Case 2: X ~ f X ( x ;  X ,  X2 )
For n sufficiently large (i.e. the CLT is a good approximation), then:
(Example 2.2):
Z

 X2   2 
~ N (0,1)
 2 2/ n
Application of the above results to (X,Y)For a 2-D random variable, ( X , Y ) , suppose that they are independent. Then we can define the random
variable W  X  Y . Regardless of the independence assumption, we have W   X  Y . In view of
that assumption, we have  W2   X2   Y2 . Hence, we are now in the setting of a 1-D random variable, W,
and so many of the above results apply. Even if we drop the independence assumption, we can still apply
them. The only difference is that in this situation  W2   X2   Y2  2 XY . Hence, we need to realize that a
larger value of n may be needed to achieve an acceptable level of uncertainty in relation to, for example.

W  W . Specifically, if we assume the data collection variables {Wk }nk1 are iid, then
 2   W2 / n  ( X2   Y2  2 XY ) / n .
W
The perhaps more interesting situation is where are concerned with either  XY or  XY   XY /( X  Y ) .
Of these two parameters,  XY is the more tractable one to address using the above results. To see this,
consider the estimator:

 XY 
1 n
 ( X k   X )(Yk  Y ) .
n k 1
Define the random variable W  ( X   X )(Y  Y ) . Then we have

 XY 
1 n
1 n

(
X


)(
Y


)

Wk  W  W .


k
X
k
Y
n k 1
n k 1


And so, we are now in the setting where we are concerned with W   XY . In relation to W   XY , we
have:
1 n
 1 n
1 n


E ( W )  E ( XY )  E   ( X k   X )(Yk  Y )   E[( X k   X )(Yk  Y )]   XY   XY  W .
n k 1
 n k 1
 n k 1

We also have Var ( W )  Var (W ) / n . The trick now is to obtain an expression for Var (W ) . To this end,
we begin with:
2
Var (W )  E[(W  W ) 2 ]  E (W 2 )  W2  E[{( X   X )(Y  Y )}2 ]   XY
.
This can be written as:
9
 X  
X
Var (W )    E 
  X
2
X
2
Y



2
 Y  Y

 Y



2

2
   XY
  X2  Y2{E[( Z1Z 2 ) 2 ]   Z21Z2 }   X2  Y2Var (Z1Z 2 ) .

And so, we are led to ask the
Question: Suppose that Z1 ~ N (0,1) and Z 2 ~ N (0,1) have E(Z1Z2 )   . Then what is the variance of
W  Z1Z2 ?
Answer: As noted above, the mean of W is E(Z1Z2 )   . Before getting too involved in the
mathematics, let’s run some simulations : The following plot is for   0.5 :
Simulation-based pdf for W with r = 0.5
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-6
-4
-2
0
2
4
6
8
10
12
Figure 3.1 Plot of the simulation-based estimate of fW (w) for   0.5 . The simulations also resulted in
W 0.4767 and  W2  1.1716 .
The Matlab code is given below.
% PROGRAM name: z1z2.m
% This code uses simulations to investigate the pdf of W = Z1*Z2
% where Z1 & Z2 are unit normal r.v.s with cov= r;
nsim = 10000;
r=0.5;
mu = [0 0]; Sigma = [1 r; r 1];
Z = mvnrnd(mu, Sigma, nsim);
plot(Z(:,1),Z(:,2),'.');
pause
W = Z(:,1).*Z(:,2);
Wmax = max(W); Wmin=min(W);
db = (Wmax - Wmin)/50;
bctr = Wmin + db/2:db:Wmax-db/2;
fw = hist(W,bctr);
fw = (nsim*db)^-1 * fw;
bar(bctr,fw)
title('Simulation-based pdf for W with r = 0.5')
Conclusion: The simulations revealed a number of interesting things. For one, the mean was
W  0.4767 . One would have thought that for 10,000 simulations it would have been closer to the true
mean 0.5. Also, and not unrelated to this, is the fact that fW (w) has very long tails. Based on this simple
10
simulation-based analysis, one might be thankful that the mathematical pursuit was not readily
undertaken, as it would appear that it will be a formidable undertaking!
QUESTION: How can knowledge of fW (w) be used in relation to estimating ρXY from {( X k , Yk )}nk 1 ?
Computation of fXY(w):
Method 1: Use THEOREM 7.2 on p.248 of the book. [A nice example of its application ]
Let f ( x1 , x2 ) be the joint pdf of the 2-D continuous random variable ( X1 , X 2 ) . Let Y1  X1 and
Y2  X1 X 2 . Then x1  y1 and x2  y2 / y1 . And so, the Jacobian is:
0  1
 x / y x1 / y2 
 1
J  det  1 1
 det 

 .
2
x2 / y1 x2 / y2 
 y2 / y1 1 / y1  y1
Hence, the pdf for (Y1 , Y2 ) is:
1
g ( y1 , y2 )  f ( x1 , x2 ) | J | f ( y1 , y2 / y1 )
.
| y1 |
We now apply this to: f ( x1 , x2 ) 
1
e
2 1   2
1
( x12  x22 2  x1x2 )
2 (1  2 )
[see DEFINITION 6.8 on p.220.]
1
[ y1 ( y2 / y1 ) 2  y2 ]
2
1
1
.
g ( y1 , y2 )  f ( y1 , y2 / y1 )

e 2(1 )
| y1 | | y1 | 2 1   2
2
2
Hence,

fY2 ( y2 ) 
| y

1
2
1 | 2 1  
e
1
[ y12 ( y2 / y1 )2 2  y2 ]
2 (1  2 )
dy1 (***)
I will proceed with a possibly useful integral, but this will not, in itself, give a closed form for (***).
[Table of Integrals, Series, and Products by Gradshteyn & Ryzhik p.307 #3.325]

e
0
( a x2 
b
)
x2
dx 
1  2
e
2 a
ab
.
Method 2 (use characteristic functions):
Assume that X and Y are two independent standard normal random variables and let us compute the
characteristic function of XY=W.
One knows that E(eiX )  e
2
/2
. Hence, E (eiXY | Y )  e Y
2 2
/2
. And so:
11
E (e
i XY
1
2
)  E[ E ( e

e
 y / 2[1 /(1
2
i XY
2

| Y )]  E[e
 2Y 2 / 2
]
1
2

e
 2 y 2 / 2  y 2 / 2
e
dy 

1
2

e
(1 2 ) y 2 / 2
dy 




1
1
)]
 y 2 / 2[1 /(1 2 )]
dy 
e
dy
 W (i )



2
2
1    2 [1 /(1   )] 
1 2

.
1
Hence,
1
fW ( w) 
2


W
(i )e
i  w

1
d 
2



e i w
1 
2
d 

1 cos( w)

0
1 
2
d 
1

K 0 ( w)
where (G&R p.419 entry 3.7542) is:

fW ( w) 
1 cos( w)

0
1 
2
d 
1

K 0 ( w) 
1  i  (1)
 i  (1)
  H 0 (iw)    H 0 (iw) p.952, 8.4071
 2
2
(1)
0
where H (iw) is a Bessel function of the third kind, also called a Hankel function. Yuck!!!
For my own possible benefit: The OP mentions the characteristic function of the product of two
independent brownian motions, say the processes (Xt)t and (Yt)t. The above yields the distribution of
Z1=X1Y1 and, by homogeneity, of Zt=XtYt which is distributed like tZ1 for every t⩾0. However this does
not determine the distribution of the process (Zt)t. For example, to compute the distribution of (Zt,Zs) for
0⩽t⩽s, one could write the increment Zs−Zt as Zs−Zt=Xt(Ys−Yt)+Yt(Xs−Xt)+(Xs−Xt)(Ys−Yt) but the terms
Xt(Ys−Yt) and Yt(Xs−Xt) show that Zs−Zt is not independent on Zt and that (Zt)t is probably not Markov.
[NOTE: Take X and Y two Gaussian random variables with mean 0 and variance 1. Since they have the
same variance, X−Y and X+Y are independent.]□
6.4 Simple Hypothesis Testing
Example 4.1 (Book 8.78 on p.293): A random sample of size 25 from a normal population had mean
value 47 and standard deviation 7. If we base our decision on Theorem 2.5 (t-distribution), can we say
that this information supports the claim that the population mean is 42?
Solution: While not formally stated as such, this is a problem in hypothesis testing. Specifically,
H 0 :  X  42 versus H1 :  X  42 .
The natural rationale for deciding which hypothesis to announce is simple:

If  X  X is ‘sufficiently close’ to42, then we will announce H0.
Regardless of whether or not H0 is true, we know that T 
X  X
S 252 / n
~ t24 .
12
Assuming that, indeed, H0 is true, (i.e.  X  42 ), then T 
X  42
S 252 / n
~ t 24 . A plot of fT (t ) is shown
below.
Plot of fT(t) for v=24
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-4
-3
-2
-1
0
t
1
2
3
4
Figure 4.1 A plot of fT (t ) for T ~ t24 .
Based on our above rationale for announcing H 0 versus H1 , if a measurement of this T is sufficiently
large, we should announce H1 , even if H 0 is, in fact, true. Suppose that we were to use a threshold value
tth . Then our decision rule is:
[| T | tth ] = The event that we will announce H 0 . Or, equivalently:
[| T | tth ] = The event that we will announce H1 .
If, in fact, H 0 is true, then our false alarm probability is:

Pr[| T | tth ]   where T 
X  42
S 252 / n
~ t24 .
There are a number of ways to proceed from here.
Case 1: Specify the false alarm probability,  , that we are willing to risk. Suppose, for example, that we
choose   .05 (i.e. we are willing to risk wrongly announcing H1 with a 5% probability.) Then

Pr[| T | tth ]   where T 
X  42
S 252 / n
~ t 24 results in a threshold tth  tinv(.975,24) = 2.064.
13
From the above data, we have t 
x  X
sn2 / n

47  42
7 2 / 25
 3.53 . Since this value is greater than 2.064 then
we should announce H1 .
Case 2: Use the data to first compute t 
x  X
sn2 / n

47  42
7 2 / 25
 3.53 , and then use this as our threshold,
tth . Then because our value was t  3.53 , this threshold has been met, and we should announce H1 . In
this case, the probability that we are wrong becomes:
Pr[| T | 3.53]  2 Pr[T  3.53]  2*tcdf(-3.53,24) = 0.0017.
QUESTION: Are we willing to risk announcing H1 with the probability of being wrong equal to .0017?
ANSWER: If we were willing to risk a 5% false alarm probability, then surely we would risk an even
lower probability of being wrong in announcing H1 . Hence, we should announce H1 .
Comment: In both cases we ended up announcing H1 . So, what does it matter which case we abide by?
The answer, with a little thought, should be obvious. In case 1 we are announcing H1 with a 5%
probability of being wrong; whereas in case 2 we are announcing H1 with a 0.17% probability of being
wrong. In other words, we were willing to take much less risk, and even then we announced H1 . □
The false alarm probability obtained in case 2 of the above example has a name. It is called the p-value of
the test. It is the smallest false alarm probability that we can achieve in using the given data to announce
H1 .
As far as the problem goes, as stated in the textbook, we are finished. However, we have ignored the
second type of error that we could make; namely, announcing H 0 true, when, in fact, H1 is true. And
there is a reason that this error (called the type-2 error) is often ignored. It is because in the situation
where H 0 is true, we have a numerical value for  X (in this case,  X  42 ). However, if H1 is true,
then all we know is that  X  42 . And so, to compute a type-2 error probability, we need to specify a
value for  X .
Suppose, for example, that we assume  X  45 . Suppose further that we maintain the threshold
tth  3.53 . Now we have T45 
X  45
S 252 / n
~ t24 , while our chosen test statistics remains T42 
X  42
S 252 / n
.
And so, the question becomes: How does the random variable T42 relate to the random variable T45 ? To
answer this question, write:
T42 
X  42
S 252 / n

X  45  3
S 252 / n
 T45 
3
S 252 / n

 T45  W
(4.1)
14
where

3
S 252 / n
 W is a random variable; and not a pleasant one, at that! To see why this point is
important, let’s presume for the moment that we actually know  X2 . In this case, we would have
Z
X  45
 X2 / n
~ N (0,1) . Hence, T42 
variable with mean
3
 X2 / n
X  42
 X2 / n
Z

3
 X2 / n
 Z 45  c . In words, T42 is a normal random

 c and with variance equal to one. Using the above data, assume
 X2 / n  7 2 / 52  1.4 . Then c  2.143 . This pdf is shown below.
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-2
-1
0
1
2
3
4
5
6
Figure 4.2 The pdf for T42 ~ N (0.467,1) .
Now let’s return to the above two cases.
Case 1 (continued): Our decision rule for announcing H1 with a 5% false alarm probability relied on the
event [| T42 | 2.064] . This event is shown as the green double arrow in Figure 4.2. The probability of
this event is computed as
Pr[| T42 | 2.064]  normcdf(2.064,2.143,1) – normcdf(-2.064,2.143,1) = 0.47.
Hence, if, indeed, the true mean is  X  45 , then under the assumption that the CLT is valid, we have a
47% chance of announcing that  X  42 using this decision rule.
Case 2 (continued): Our decision rule for announcing H1 with false alarm probability 0.17% relied on
the event [| T42 | 3.53] . This event is shown as the red double arrow in Figure 4.2. The probability of
this event is computed as
Pr[| T42 | 3.53]  normcdf(3.53,2.143,1) – normcdf(-3.53,2.143,1) = 0.92.
Hence, if, indeed, the true mean is  X  45 , then under the assumption that the CLT is valid, we have a
92% chance of announcing that  X  42 using this decision rule.
15
In conclusion, we see that the smaller that we make the false alarm probability, the greater the type-2 error
will be. Taken to the extreme, suppose that no matter what the data says, we will simply not announce
H1 . Then, of course our false alarm probability is zero, since we will never sound that alarm. However,
by never sounding the alarm, we are guaranteed that with probability one, we will announce H 0 when
H1 is true.
A reasonable question to ask is: How can one select acceptable false alarm and type-2 error probabilities.
One answer is given by the Neymann Pearson Lemma. We will not go into this lemma in these notes. The
interested reader might refer to:
http://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma
Finally, we are in a position to return to the problem entailed in (3.1), where we see that we are not
subtracting a number (which would merely shift the mean), but we are subtracting a random variable.
Hence, one cannot simply say that (3.1) is a mean-shifted t 24 random variable. We will leave it to the
interested reader to pursue this further. One simple approach to begin with would use simulations to
arrive at the pdf for (3.1). □
Example 4.2 (Problem 8.79 on p.293 of the textbook) A random sample of size n=12 from a normal
population has (sample) mean x  27.8 and (sample) variance s 2  3.24 . If we base our decision on the
statistic of Theorem 8.13 (i.e. Theorem 2.5 of these notes), can we say that the given information supports
the claim that the mean of the population is  X  28.5 ?
Solution: The statistic to be used is T 
X  X
Sn2 / n
~ t11 . And so, the corresponding measurement of T
27.8  28.5
 1.347 . Since the sample mean x  27.8 is less than the
3.24 / 12
speculated true mean  X  28.5 under H 0 , a reasonable hypothesis setting would ask the question: Is
27.8 really that much smaller than 28.5. This leads to the decision rule: [T  1.347]  Announce H1 .
obtained from the data is t 
In the event that, indeed, T 
X  28.5
Sn2 / n
~ t11 , Pr[T  1.347]  0.1025 . In words, there is a 10%
chance that a measurement of T would be contained in the interval (,1.347] if, indeed,  X  28.5 .
And so now, one needs to decide whether or not one should announce H1 . One the one hand, one could
announce H1 with a reported p-value of 0.10 and let management make the final announcement. On the
other hand, if management has, for example, already ‘laid down the rule’ that it will not accept a false
alarm probability greater than, say, 0.05, then one should announce H 0 . □
We now proceed to some examples concerning tests for an unknown variance.
Example 4.3 (EXAMPLE 13.6 on p.410 of the textbook) Let X denote the thickness of a part used in a
2
semiconductor. Using the iid data collection variables { X k }18
k 1 it was found that s  0.68 . The
16
manufacturing process is considered to be in control if  X2  0.36 . Assuming X is normally distributed,
test the null hypothesis H 0 :  X2  0.36 against the alternative hypothesis H1 :  X2  0.36 at a 5%
significance level (i.e. a 5% false alarm probability).

Solution: From Theorem 2.3 we have Q 
q

(17) X2

2
X
~ 172 . From the data, assuming H 0 is true, we have
(18  1)(0.68)
 32.11 . The threshold value, qth is obtained from Pr[Q  qth ]  .05 .
0.36
Hence, qth = chi2inv(.95,17) = 27.587. Hence, we will announce H1 .
Now let’s obtain the p-value for this test. If we use qth  32.11 , Then Pr[Q  qth ]  .0146 is the pvalue.
Next, suppose that instead of using the sample mean (which is, by the way, not reported) one had used the

design mean thickness. Then we have Q 

(18) X2

2
X
~ 182 . In this case, Pr[Q  qth ]  .0213 is the p-
value. And so, at a 5% significance level, we would still announce H1 . □
Example 4.4 Suppose that X = The act of recording the number of false statements of a certain news
network in any given day. We will assume that X has a Poisson distribution, with mean  X  2 . Let
{ X k }5k 1 ~ iid X be the number of false statements throughout any 5-day work week. In order to
determine whether or not the network has decided to increase the number of false statements, we will test
1 5

H 0 :  X  2 versus H 0 :  X  2 , where  X  X   X k is the average number of false statements
5 k 1
in any given week. We desire a false alarm probability of ~10%.
QUESTION: How should we proceed?
ANSWER: Simulations!
%PROGRAM NAME: avgdefects.m
nsim = 40000;
n = 5;
mu = 2;
x=poissrnd(mu,n,nsim);
muhat = mean(x);
db = 1/n;
bctr = 0:db:5;
fmuhat = nsim^-1 *hist(muhat,bctr);
figure(1)
stem(bctr,fmuhat)
17
hold on
pause
for m = 1:10
x=poissrnd(mu,n,nsim);
muhat = mean(x);
db = 1/n;
bctr = 0:db:5;
fmuhat = nsim^-1 *hist(muhat,bctr);
stem(bctr,fmuhat)
end
18
6.5 Confidence Intervals
The concept of a confidence interval for an unknown parameter, θ, is intimately connected to the concept
of simple hypothesis testing. To see this, consider the following example.
Example 5.1 Let X = The act of recording the body temperature of a person who is in deep meditation.
We will assume that X ~ N (  X ,  X2 ) . The data collection random variables { X k }nk 1 are to be used to
investigate the unknown parameter  X . We will assume that  X  1.2o F is known and that as sample

size n=25 will be used. The estimator  X  X will be used in the investigation.
Hypothesis Testing:
In the hypothesis testing portion of the investigation we begin by assuming that, under H 0 , the parameter
 X is a known quantity. Let’s assume that it is 98.6oF. In other words, we are assuming that deep
meditation has no influence on  X . Since we are investigating whether or not meditation has an effect on
 X in any way, then out test, formally is:
H 0 :  X  98.6o F
versus
H1 :  X  98.6o F .

Our decision rule for this test is, in words: If  X  X is sufficiently close to  X( 0)  98.6o F , then we will
announce H 0 . The event, call it A, that is described by these words then has the form:

A  [|  X   X( 0 ) |   th ] .
Here, the number  th is the threshold temperature difference . Suppose that we require that, in the event
that H 0 , we have

Pr( A)  Pr[|  X   X( 0) |   th ]  .90 .
(5.1)
In other words, we require a false alarm probability (or significance level) of size 0.10.

X  X
. Then, assuming that H 0 is, indeed,
 
In the last section, we used the standard test statistic: Z 
true, it follows that Z 
X

X  
 
(0)
X
~ N (0,1) . To convert (5.1) into an event involving this Z, we use the
X
concept of equivalent events:

Pr( A)  Pr[|  X  

( 0)
X
 X   X( 0)

|   th ]  Pr[
 th ]  Pr{| Z | zth ]  0.90 .
 
 
X
From (5.2) we obtain
X
(5.2)
19
zth  norminv(0.95 , 0 , 1) = 1.645.
(5.3)
Hence, if the magnitude of
Z

 X   X( 0)
exceeds 1.645, then we will announce H1 .
 
X
Now, instead of going through the procedure of using the standard test statistic, Z, let’s use (5.1) more
directly. Specifically,




Pr[  X   X( 0 )   th ]  Pr[  X   X( 0 )   th ]  Pr[  X  xth ]  0.05 .
(5.4)
Because   X   X / n  1.2 / 25  0.21 , from (5.3) we obtain
xth  norminv(0.95 , 98.6 , 0.21) = 98.945oF.
(5.5)
Hence, there was no need to use the standard statistic! Furthermore, the threshold (5.5) is in units of oF,
making it much more attractive that the dimensionless units of (5.3).
QUESTION: Given that using (5.1) to arrive at (5.5) is much more direct than using (5.2) and (5.3), then
why in the world did we not use the more direct method in the last section?
ANSWER: The method used in the last section is a method that has been used for well over 100 years. It
was the only method that one could use when only a unit-normal z-table was available. Only in very
recent years has statistical software been designed that precludes the use of this table. However, there are
random variables other than the normal type for which a standardized table must be used. One example is
T ~ tn . A linear function of T is NOT a T random variable; whereas a linear function of a normal random
variable IS a normal random variable.
Confidence Interval Estimation:
We begin here with the same event that we began hypothesis testing with; namely

A  [|  X   X( 0 ) |   th ]
We then form the same probability that we formed in hypothesis testing; namely

Pr( A)  Pr[|  X   X |   th ]  .90 .
A close inspection of equations (5.1) and (5.6) will reveal a slight difference. Specifically, in (5.1) we
have the parameter  X( 0 ) , which is the known value of  X (=98.6oF) under H 0 . In contrast, the parameter
 X in (5.6) is not assumed to be known. The goal of this investigation is to arrive at a 90% 2-sided
confidence interval for this unknown  X . From (5.2) we have:
20

Pr( A)  Pr[|  X   X |   th ]  Pr[ 1.645 

X  X
0.21
 1.645]  .90 .
Finally, using the concept of equivalent events, we have




Pr[  X 1.645(0.21)   X   X  1.645(0.21)]  Pr[ X  .345   X   X  .345]  .90 . (5.6)
Notice that the event A has been manipulated an equivalent event that is


[ X  .345 ,  X  .345]
(5.7)
It is the interval (5.7) that is called the 90% 2-sided Confidence Interval (CI) for the unknown mean  X .
Remark 5.1 The interval (5.7) is an interval whose endpoints are random variables. In other words, the
CI described by (5.7) is a random endpoint interval. It is NOT a numerical interval. Even though (5.7) is a
random endpoint interval, the interval width is 0.69; that is, it is not random.

To complete this example, suppose that after collecting the data, we arrive at  X  97.2o F . Inserting
this number into (5.7) gives the numerical interval [96.855 , 97.545] . The interval is not a CI. Rather it is
an estimate of the CI. Furthermore, one cannot say that Pr[96.855   X  97.545]  0.9 . Why?
Because even though  X is not known, it is a number; not a random variable. Hence, this equation makes
no sense. Either the number  X is in the interval [96.855 , 97.545] or it isn’t. Period! For example, what
would you say to someone who asked you the probability that the number 2 is in the interval [1 , 3]?
Hopefully, you would be kind in your response to such a person, when saying “Um… the number 2 is in
the interval [1 , 3] with probability one.” In summary, a 90% two-sided CI estimate should not be taken
to mean that the probability that  X is in that interval is 90%. Rather, it should be viewed as one
measurement of a random endpoint interval, such that, were repeated measurements to be conducted
again and again, then overall, one should expect that 90% of those numerical intervals will include the
unknown parameter  X . □
A General Procedure for Constructing a CI:

Let  be an unknown parameter for which a (1   )% 2-sided CI is desired, and let  be the associated
point estimator of  .


Step 1: Identify a standard random variable W that is a function of both  and  ; say, W  g ( , ) .

Step 2: Write: Pr[ w1  W  w2 ]  Pr[ w1  g ( , )  w2 ]  1   . Then, from this, find the numerical
values of w1 and w2 .
Step 3: Use the concept of equivalent events to arrive at an expression of the form:
21



Pr[ w1  W  w2 ]  Pr[ w1  g ( , )  w2 ]  Pr[ h1 ( w1 , )    h2 ( w2 , )]  1   .


The desired CI is then the random endpoint interval [h1 ( w1 , ) , h2 ( w2 , )] . □
The above procedure is now demonstrated in a number of simple examples.


Example 5.2 Arrive at a 98% two-sided CI estimate for  X , based on  X  4.4 ,  X  .25 and n=10.
2
Assume that the data collection variables { X k }10
k 1 are iid N (  X ,  X ) .

Solution: Even though the sample mean  X ~ N (  X ,  X2 / 10) , we do not have knowledge of  X , and

so will need to use  X .Since n=10 is too small to invoke the CLT, we will use the test statistic
Step 1:

  X
T  X
~ t9 .
 X / 10
Step 2: Pr[t1  T  t2 ]  .98  t2  tinv(.99,9) = 2.82 and t1 = -2.82.





 
  X



 t2    X  t2 X / 10   X   X  t2 X / 10 . Hence, the
 X / 10

Step 3: [t1  T  t 2 ]  t1  T  X





desired CI is the random endpoint interval  X  2.82 X / 10 ,  X  2.82 X / 10 . Using the above

data, we arrive at the CI estimate: 4.4  2.82(.25) / 10 , 4.4  2.82(.25) / 10


4.4  .223 . □
Example 5.3 In an effort to estimate the variability of the temperature, T, of a pulsar, physicist used
n=100 random measurements. The data gave the following statistics:

T  2.11025 o K

and  T  .0008 1025 o K .
Obtain a 95% 2-sided CI estimate for  T .
1 100


(Tk  T ) 2 where T  T .

99 k 1

(n  1) X2
~  n21 (c.f. Theorem
Step 1: The most common test statistic in relation to variance is  
2

Solution: We will begin by using the point estimator of  T2 :  T2 
X
2.7).
Step 2: Pr[ x1    x2 ]  0.95 gives Pr[   x1 ]  0.025 
and Pr[   x2 ]  0.9755 
x1  chi2inv(.025,99) = 73.36
x2  chi2inv(.975,99) = 128.42 .
22


 

(n  1) X2
x1
1
x2
 x2   
2  2 
2 
 x1   2
X

  (n  1) X  X (n  1) X 




 (n  1) X2
(n  1) X2   (n  1) X2
(n  1) X2
2

X 
X 

x2
x1
x2
x1

 
Step 3:



.

 99 2
99 X2 
X
Hence, the 95% 2-sided CI is the random endpoint interval 
,
 . From this and the
73.36 
 128.42
above data, we obtain the CI estimate:
 99(.00082 )
99(.00082 ) 
,

  [0.00070 , 0.00093] .
73.36 
 128.42
1 100

(Tk  T ) 2 is an average of iid random variables, and because n=100 would
Now, because  T 

99 k 1
2
seem to be large enough to invoke the CLT, repeat the above solution with this knowledge.
Solution:


(n  1) X2
 X2
 2 

~  n21 has mean   n 1 and variance  2  2(n  1) . Writing  X2   X  ,
 n 1 
then it follows that
 2 
 2 
   X    X (n  1)   X2
 n 1 
 n 1 
2
X
and
2

2
2
X
2
 2 
 2 
2 X4
  X   2   X  2(n  1) 
.
n 1
 n 1 
 n 1 
And so, from the CLT, we have
Step1:
Z

 X2   X2 
~ N (0,1) .
2 X4
n 1
Step 2: Pr[ z1  Z  z2 ]  .95 
Step 3:
z2  1.96 and z1  1.96 .
23


2


 
2
 X  X
2
 X2   X2
2 


[ z1  Z  z2 ]  z1 
 z2   z1

 z2

2
4

 
n 1
X
n 1 
2 X


n 1


2

2

 X2  1  z2
1  z1
n 1  X



2


2 
1
X
1







n 1  
 X2
2
2 
1  z1
1  z2 n  1
n  1 


 
2
2
2
2




X
X
X
X



  X2 
X 
2
2  
2
2 

1  z1
1  z2
1  z1
1  z2 n  1
n  1  
n 1
n  1 
Hence, the desired CI is the random endpoint interval




X
X

,

2
2
 1  1.96
1  1.96
99
99







  X , X .
 1.13 0.85 





 .0008 .0008 
,
 [.00071,.00094] .
And so the CI estimate is:  X , X   
1.13 0.85   1.13 0.85 
This interval is almost exactly equal to the interval obtained above. 
Example 5.4 Suppose that it is desired to investigate the influence of the addition of background music in
drawing customers into your new café during the evening hours. To this end, you will observe pedestrians
who walk past your café while no music is playing, and while music is playing. Let X = The act of noting
whether a pedestrian enters (1) or doesn’t enter (0) your café when no music is playing. Let Y = The act
of noting whether a pedestrian enters (1) or doesn’t enter (0) your café when music is playing.

Suppose that you obtained the following data-based results: pX  0.13 for sample size nX  100 , and

pY  0.18 for sample size nY  50 . Obtain a 95% 1-sided (upper) CI for   pY  pX .
Solution: We will assume that the actions of the pedestrians are mutually independent. Then we have
24


nX pX ~ Binomial (nX , pX ) is independent of nY pY ~ Binomial (nY , pY ) .


Assumption: For a first analysis, we will assume that the CLT holds in relation to both p X and pY . This
would seem to be reasonable, based on the above p-value estimates and the chart included in Homework
#6, Problem 1.



p X (1  p X ) 
p (1  pY ) 
  
 and pY ~ N  pY , Y
 . Hence
nX
n
Y



Then p X ~ N  p X ,






  pY  p X ~ N  pY  p X ,


where we defined  2 
p X (1  p X ) pY (1  pY ) 
  N ( , 2 )

nX
nY

p X (1  p X ) pY (1  pY )

. The standard approach [See Theorem 11.8 on p.365
nX
nY
of the text] is to use:




p X (1  p X ) pY (1  pY ) 0.13(0.87) 0.18(0.82)



 0.0041 .
nX
nY
100
50

  

~ N (0,1) , and with Pr[ Z  zth ]  0.95  zth  1.645 .
Then    0.064 , and so Z 
.064

 2 
Hence, we have the following equivalent events:



  


[ Z  zth ]     1.645  [    1.645  ] .064  [    .105] .
  

Recall that the smallest possible value for   pY  pX is -1. Hence, the CI estimate is: [-1 , .05+.105] =
[-1 , .155] .
Conclusion: Were you to repeat this experiment many times, you would expect that 95% of the time, the
increase in clientele due to music would be no more that ~15%.





p X (1  p X ) pY (1  pY )


Investigation of   
and its influence on the CI [1,  1.645  ] nX
nY
 2
Note: The primary intent of this section is to illustrate that: (i) the CI is a random variable, (i.e. the
obtained CI estimate is simply one measurement of this variable), and (ii) the CLT assumption depends
upon the size of the events being considered in relation to the size of the increments of the actual sample
space for the variable. This section can be skipped with no loss of basic understanding of the above. This
material could serve as the basis of a project, as it demonstrates a number of major concepts that could be
elaborated upon (e.g. sample spaces, events, estimators, simulation-based pdf’s of more complicated
estimators.)
25
In the above analysis we used estimates of the unknown parameters p X and pY to arrive at an estimate of

the standard deviation   . The value used, .064, is a single measurement of the random variable   . In
this section we will investigate the properties of this random variable using simulations. Specifically, we

will use pX  0.13 and pY  0.18 , and 10,000 simulations of   . The resulting simulation-based
estimate of the pdf f  (x) is shown in Figure 5.1.

Simulation-Based Estimate of std(thetahat)
70
60
50
40
30
20
10
0
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Figure 5.1 Simulation-based (nsim = 10,000) estimate of f  (x) .

In relation to this figure, we have    .064 ; which is the true value of   . We also have     .006 .


From Figure 5.1, however, we see that f  (x) is notably skewed. Hence, this should be noted if one


chooses to use     .006 as a measure of the level of uncertainty of   .

We are now proceed to investigate the random variable that is the upper random endpoint of the CI



[1 ,   1.645  ] . To this end, we begin by addressing the probability structure of  . Because
 

  pY  p X  Y  X is the difference between two averages, and because it would appear that

nX  100 and nY  100 are sufficiently large that the CLT holds; that is,  should be expected to have
f (x ) that is bell-shaped (i.e. normal). In any case, we have:
  pY  p X  0.05
and
  
p X (1  p X ) pY (1  pY )

 0.064
nX
nY
The simulations that resulted in Figure 5.1 also resulted in Figure 5.2 below.
Simulation-Based Estimate of the pdf of thetahat
6
5
4
3
2
1
0
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
26
Figure 5.2 Simulation-based (nsim = 10,000) estimate of f  (x) using a 10-bin scaled histogram.

Now, based on Figure 5.2, one could reasonably argue that the pdf, f  (x) , has a normal distribution
RIGHT?
Well? Let’s use a 50-bin histogram instead of a 10-bin histogram, and see what we get. We are, after all,
using 10,000 simulations. And so we can afford finer resolution. The result is shown in Figure 5.3.
Simulation-Based Estimate of the pdf of thetahat
10
9
8
7
6
5
4
3
2
1
0
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Figure 5.3 Simulation-based (nsim = 10,000) estimate of f  (x) using a 50-bin scaled histogram.

There are two peaks that clearly stand out in relation to an assumed bell curve shape. Furthermore, they
real, in the sense that they appear for repeated simulations!
Conclusion 1: If one is interested in events that are much larger than the bin width resolution, which in
Figure 5.3 is ~0.011, then the normal approximation in Figure 5.2 may be acceptable. However, if one is
interested in events on the order of 0.011, then Figure 5.3 suggests otherwise.

The fact of the matter is that  is discrete random variable. To obtain its sample space, write



  pY  p X  Y  X 
1 50
1 100
Y

Xk .
 k 100 
50 k 1
k 1
The smallest element of S is -1. The event {-1} is equivalent to [Y1  Y2    Y50  0  X 1  X 2    X 100  1] .
The next smallest event is {-0.99}. This event is equivalent to [Y1  Y2    Y50  0 
100
X
k 1
k
 99] .
Clearly, the explicit computation of such equivalent events will become more cumbersome as we proceed.
As an alternative to this cumbersome approach, we will, instead, simply use a bin width that is much
smaller than 0.01 in computing the estimate of f  (x) . Using not 50, but 500 bins resulted in Figure 5.4

below.
27
Simulation-Based Estimate of the pdf of thetahat
50
40
30
20
10
0
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Simulation-Based Estimate of the pdf of thetahat
50
40
30
20
10
0
0.06
0.065
0.07
0.075
0.08
0.085
0.09
0.095
0.1
Figure 5.4 Simulation-based (nsim = 10,000) estimate of f  (x) using a 500-bin scaled histogram. The

lower plot includes the zoomed region near 0.08.
What we see from Figure 5.4 is that the elements of S are not spaced at uniform 0.01 increments
throughout. At the value x  0.08 we see that the spacing is much smaller than 0.01. Hence, it is very
likely that the largest peak in Figure 5.3 is due to this unequal spacing in S .
Now that we have an idea as to the structures of the marginal pdf’s f  (x) and f  (x) , we need to



investigate their joint probability structure. To this end, the place to begin is with a scatter plot of 

versus   . This is shown in Figure 5.6.
28
Scatter Plot of thetahat vs std(thetahat)
0.085
0.08
0.075
0.07
0.065
0.06
0.055
0.05
0.045
0.04
-0.2
-0.15
-0.1
-0.05

0
0.05
0.1
0.15
0.2
0.25
0.3

Figure 5.6 A scatter plot of  versus   for nsim =10,000 simulations.

 

Clearly, Figure 5.6 suggests that  and   are correlated. We obtained   0.6 and Cov( ,  )  .0002 ].


Finally, since CI  [1 ,   1.645  ] , we have:


E (  1.645  )]  0.05 + 1.645(.06) = 0.15 ,
(i)
and


 


Var (  1.645  )  Var ( )  1.6452Var (  )  2(1.645)Cov( ,   )
(ii)
.0632  1.6452 (.0062 )  2(1.645)(. 00023)  .005


or, Std (  1.645  )  .07


The pdf for assumed CIupper    1.645  ~ N (.06,.07 2 ) is overlaid against the simulation-based pdf in
the figure below.
7
6
5
4
3
2
1
0
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4


Figure 5.7 Overlaid plots of the pdf of CIupper    1.645  ~ N (.15,.07 2 ) and the corresponding
 

simulation-based pdf, f (x ) , for   pY  p X  YnY 50  X nY 100 .
Comment: Now, the truth of the matter in the chosen setting (the value of simulations- you know the
truth. !) is that   pY  pX  .18  .13  .05 . While this increase due to playing music may seem
minor, it is, in fact, an increase of .05/.13, or 38% in likely customer business! This is significant by any


business standards. But is the (upper) random endpoint interval CI  [1 ,   1.645  ] , in fact, a 95%
CI estimator? Using the normal approximation in Figure 5.7, the probability to the left of the dashed line
29
is: Pr[CI upper  .05]  norminv(.05,.15,.07) = 0.035. And so, we actually have a 96.5% CI estimator.
While this is only 1.5% larger than a 95% CI, it is an increase in the level of confidence.
Summary & Conclusion This example to illustrate how one typically accommodates the unknown
standard deviation associated with the estimated difference between two proportions. Because this
standard deviation is a function of the unknown proportions, one typically uses the numerical estimates of
the same to arrive at a numerical value for it. Hence, the estimator of the standard deviation is a random
variable. Rather than using a direct mathematical approach to investigating its influence on the CI, it was
easier to resort to simulations. In the course of this simulation-based study, it was discovered that the

actual pdf of  can definitely NOT be assumed to be normal if one is interested in event sizes on the order


of 0.01. It was also found that  and   are correlated; as opposed to the case of normal random
variables, where they are independent. For larger event sizes, it was found that the random variable is
well-approximated by a normal pdf. It was also found that it corresponds not to a 95% CI but to a 96.5%
CI. A one-sided CI was chosen simply to illustrate contrast to prior examples that involved 2-sided CI’s.
It would be interesting to re-visit this example not only with other types of CI’s but also in relation to
hypothesis testing. Finally, to tie this example to hypothesis testing, consider the hypothesis test:
H 0 :   0 versus H 0 :   0 .
Furthermore, suppose that we choose a test threshold th  0.05 . Then if, in fact,   0.05 , the type-2
error is 0.5 (see Figure 5.4).[Note: The Matlab code used in relation to this example is included in the
Appendix.] □
6.6 Summary
This chapter is concerned with estimating parameters associated with a random variable. Section 6.2
included a number of random variables associated with this estimation problem. Many of these random
variables (in this context they are also commonly referred to as statistics) relied on the CLT. Section 6.3
addressed some commonly estimated parameters and the probabilistic structure of the corresponding
estimators. The results in sections 6.2 and 6.3 were brought to bear on the topics of hypothesis testing and
confidence interval estimation in sections 6.4 and 6.5, respectively. In both topics the distribution of the
estimator plays the central role. The only real difference between the two topics is that in confidence
interval estimation the parameter of interest is unknown, whereas in hypothesis testing it is known under
H0.
30
Appendix
Matlab code used in relation to Example 5.4
%PROGRAM NAME: example54.m
px=0.13; nx = 100;
py=0.18; ny = 50;
nsim = 10000;
x = ceil(rand(nx,nsim) - (1-px));
y = ceil(rand(ny,nsim) - (1-py));
mx = mean(x);
my = mean(y);
thetahat = my - mx;
%vx = nx^-1 *mx.*(1-mx);
%vy = ny^-1 *my.*(1-my);
vx = var(x)/nx;
vy = var(y)/ny;
vth = vx + vy;
sth = vth.^0.5;
msth = mean(sth)
ssth = std(sth)
vsth = ssth^2;
figure(1)
db = (.09 - .03)/50;
bvec = .03+db/2 : db : .09-db/2;
fsth = (nsim*db)^-1 *hist(sth,bvec);
bar(bvec,fsth)
title('Simulation-Based Estimate of the pdf of std(thetahat)')
grid
pause
figure(2)
db = (.3 +.25)/500;
bvec = -.25+db/2 : db : .3-db/2;
fth = (nsim*db)^-1 *hist(thetahat,bvec);
bar(bvec,fth)
title('Simulation-Based Estimate of the pdf of thetahat')
grid
mthetahat = mean(thetahat)
sthetahat = std(thetahat)
vthetahat = sthetahat^2
varthetahat = px*(1-px)/nx + py*(1-py)/ny
pause
figure(3)
plot(thetahat,sth,'*')
title('Scatter Plot of thetahat vs std(thetahat)')
grid
pause
%Compute mean and std dev of CI
cmat = cov(thetahat,sth);
cov_th_sth = cmat(1,2)
mCI = mthetahat + 1.645*msth
vCI = vthetahat+1.645^2*vsth + 2*1.645*cov_th_sth;
sCI = vCI^.5
pause
xx = -.1:.0001:.4;
fn = normpdf(xx,mCI,sCI);
CIsim = thetahat +1.645*sth;
db = (.4+.1)/50;
bvec = -.1+db/2:db:.4-db/2;
31
fCIsim = (db*nsim)^-1 *hist(CIsim,bvec);
figure(4)
bar(bvec,fCIsim)
hold on
plot(xx,fn,'r','LineWidth',2)
grid
Download