Confidence Intervals and Sets

advertisement
Confidence Intervals
and Sets
Throughout we adopt the normal-error model, and wish to say some
things about the construction of confidence intervals [and sets] for the
parameters β0 � � � � � β�−1 .
1. Confidence intervals for one parameter
Suppose we want a confidence intervals for the �th parameter β� , where
1 ≤ � ≤ � is fixed. Recall that
and
β̂ − β�
� ��
� ∼ N(0 � 1)�
2
σ (X� X)−1 ���
�
� � �2 σ 2 χ 2
RSS
1 ��
�−�
S :=
=
Y� − Xβ̂
∼
�
�−�
�−�
�−�
�
2
�=1
and is independent of β̂. Therefore,
β̂ − β�
N(0 � 1)
� ��
= ��−� �
� ∼�
2
S 2 (X� X)−1 ���
χ�−�
/(� − �)
where the normal and χ 2 random variables are independent. Therefore,
��
�
(α/2)
β̂� ± ��−� S (X� X)−1 ���
75
76
10. Confidence Intervals and Sets
is a (1 − α) × 100% confidence interval for β� . This yields a complete analysis of confidence intervals for a univariate parameter; these confidence
intervals can also be used for testing, of course.
2. Confidence ellipsoids
The situation is more interesting if we wish to say something about more
than one parameter at the same time. For example, suppose we want
to know about (β0 � β1 ) in the overly-simplified case that σ = 1. The
general philosophy of confidence intervals [for univariate parameters]
suggests that we look for a random set Ω such that
�
�� � � �
β̂1
β1
∈ Ω = 1 − α�
P
−
β2
β̂2
And we know that
� �
�� �
�
β̂1
β1
∼N
�Σ �
β2
β̂2
where Σ is a 2 × 2 matrix with
�
�
Σ��� = (X� X)−1 �
Here is a possible method: Let
�
�
β̂1 − β1
� so that
γ̂ :=
β̂2 − β2
Also, recall that
���
γ̂ � Σ−1 γ̂ ∼ χ22 �
γ̂ ∼ N2 (0 � Σ) �
Therefore, one natural choice for Ω is
�
�
Ω := � ∈ R2 : � � Σ−1 � ≤ χ22 (α/2) �
What does Ω look like? In order to answer this, let us apply the spectral
theorem:
Σ = PDP �
��
Σ−1 = PD−1 P � �
Then we can represent Ω as follows:
�
�
Ω = � ∈ R2 : � � PD−1 P � � ≤ χ22 (α/2)
�
�
= � ∈ R2 : (P � �)D−1 (P � �) ≤ χ22 (α/2)
�
�
= P� ∈ R2 : � � D−1 � ≤ χ22 (α/2)
�
�
�12 �22
2
2
= P� ∈ R :
+
≤ χ2 (α/2) �
λ1
λ2
3. Bonferonni bounds
Consider
77
�
�
�12 �22
2
(1)
� := � ∈ R :
+
≤ χ2 (α/2) �
λ1
λ2
This is the interior of an ellipsoid, and Ω = P� is the image of the
ellipsoid under the “linear orthogonal map” P. Such sets are called
“generalized ellipsoids,” and we have found a (1 − α) × 100% confidence
[generalized] ellipsoid for (β1 � β2 )� .
2
The preceding can be generalized to any number of the parameters
β�1 � � � � � β�� , but is hard to work with, as the geometry of Ω can be complicated [particularly if � � 2]. Therefore, instead we might wish to look
for approximate confidence sets that are easier to work with. Before we
move on though, let me mention that if you want to know whether or
not H0 : β1 = β1�0 � β2 = β2�0 , then we can use these confidence bounds
fairly easily, since it is not hard to check whether or not (β1�0 � β2�0 )� is in
Ω: You simply compute the scalar quantity
�
�
�
� −1 β1�0
β1�0 � β2�0 Σ
�
β2�0
and check to see if it is ≤ χ22 (α/2)! But if you really need to imagine
or see the confidence set[s], then this exact method can be unwieldy
[particularly in higher dimensions than 2].
3. Bonferonni bounds
Our approximate confidence intervals are based on a fact from general
probability theory.
Proposition 1 (Bonferonni’s inequality). Let E1 � � � � � E� be � events. Then,
P (E1 ∩ · · · ∩ E� ) ≥ 1 −
�
�
�=1
P(E�� ) = 1 −
�
�
�
�=1
�
1 − P(E� ) �
Proof. The event E1� ∪ · · · ∪ E�� is the complement of E1 ∩ · · · ∩ E� . Therefore,
P (E1 ∩ · · · ∩ E� ) = 1 − P (E1� ∪ · · · ∪ E�� ) �
�
and this is ≥ 1 − ��=1 P(E�� ) because the probability of a union is at most
the sum of the individual probabilities.
�
Here is how we can use Bonferonni’s inequality. Define
��
�
(α/4)
(� = 1� 2)�
C� := β̂� ± ��−� S (X� X)−1 ���
78
10. Confidence Intervals and Sets
We have seen already that
P{β� ∈ C� } = 1 −
α
�
2
[This is why we used α/4 in the definition of C� .] Therefore, Bonferonni’s
inequality implies that
�α α�
P{β1 ∈ C1 � β2 ∈ C2 } ≥ 1 −
+
= 1 − α�
2
2
In other words, (C1 � C2 ) forms a “conservative” (1 − α) × 100% simultaneous confidence interval for (β1 � β2 ). This method becomes very inaccurate quickly as the number of parameters of interest grows. For instance,
if you want Bonferonni confidence sets for (β1 � β2 � β3 ), then you need to
use individual confidence intervals with confidence level 1 − (α/3) each.
And for � parameters you need individual confidence level 1 − (α/�),
which can yield bad performance when � is large.
This method is easy to implement, but usually very conservative.1
4. Scheffé’s simultaneous conservative confidence bounds
There is a lovely method, due to Scheffé, that works in a similar fashion
to the Bonferonni method; but has also the advantage of being often
[far] less conservative! [We are now talking strictly about our linear
model.] The starting point of this discussion is a general fact from matrix
analysis.
Proposition 2 (The Rayleigh—Ritz inequality). If L�×� is positive definite, then for every �-vector �,
� � 2�
(� �)
� −1
�
� L � = max
�� L�
��=0
The preceding is called an inequality because it says that
�� L−1 � ≤
(�� �)2
�� L�
for all ��×1 �
and it also tells us that the inequality is achieved for some � ∈ R� �
1On the other hand, the Bonferonni method can be applied to a wide range of statistical problems
that involve simultaneous confidence intervals [and is not limited to the theory of linear models.
So it is well worth your while to understand this important method.
4. Scheffé’s simultaneous conservative confidence bounds
79
Proof of Rayleigh–Ritz inequality. Recall the Cauchy–Schwarz inequality from your linear algebra course: (�� �)2 ≤ ���2 · ���2 with equality if
and only if � = �� for some � ∈ R. It follows then that
� � 2�
(� �)
max
= ���2 �
���
��=0
Now write L := PDP � , in spectral form, and change variables:
� ��
1
� := L /2 ��
in order to see that
� ��
1
�� � = �� L /2 ��
and so
�

���2 = max 
��=0
��
� � � = �� L��
� � �2 
L
� 
�
�� L�
�
1/2
Change variables again: � := L1/2 � to see that
�
�
� 1 �2
���2 = �L− /2 �� = �� L−1 ��
This does the job.
�
Now let us return to Scheffé’s method. Recall that
�
�
β̂ − β ∼ N� 0 � σ 2 (X� X)−1 �
so that
��
�
�
1 �
�
β̂
−
β
(X
X)
β̂
−
β
∼ χ�2 �
σ2
and hence
�
��
�
�
β̂ − β (X� X) β̂ − β
�S 2
=
�
β̂ − β
��
�
�
(X� X) β̂ − β /�
S2
∼ F���−� �
We apply the Rayleigh–Ritz inequality with � := β̂ − β and L := (X� X)−1
in order to find that
� �
��2 
� β̂ − β
�
��
�
�
�


β̂ − β (X� X) β̂ − β = max 
�
��=0
�� (X� X)−1 �
Therefore,


� �
��2 


�
 1

 � β̂ − β

P
max
≤
F
(α)
= 1 − α�


���−�
2


�� (X� X)−1 �

 �S ��=0
80
10. Confidence Intervals and Sets
Equivalently,
�
 ��

�
 ��� β̂ − �� β��

P �
≤ �S 2 F���−� (α) for all � ∈ R� = 1 − α�
 �� (X� X)−1 �

If we restrict attention to a subcollection of �’s then the probability is
even more. In particular, consider only �’s that are the standard basis
vectors of R� , in order to deduce from the preceding that
�
 ��

�
 �β̂� − β� ��

P �
≤ �S 2 F���−� (α) for all 1 ≤ � ≤ � ≥ 1 − α�
 [(X� X)]

���
In other words, we have demonstrated the following.
Theorem 3 (Scheffé). The following are conservative (1 − α) × 100%
simultaneous confidence bounds for (β1 � � � � � β� )� :
�
�
�
β̂� ± �S 2 (X� X)−1 ��� F���−� (α)
(1 ≤ � ≤ �)�
When the sample size � is very large, the preceding yields an as2
ymptotic simplification. Recall that χ�−�
/(� − �) → 1 as � → ∞. There2
fore, F���−� (α) ≈ χ� (α)/� for � � 1. Therefore, the following are
asymptotically-conservative simultaneous confidence bounds:
��
�
β̂� ± S (X� X)−1 ��� χ�2 (α)
(1 ≤ � ≤ �) for � � 1�
5. Confidence bounds for the regression surface
Given a vector � ∈ R� of predictor variables, our linear model yields
E� = � � β. In other words, we can view our efforts as one about trying
to understand the unknown function
�(�) := � � β�
And among other things, we have found the following estimator for �:
ˆ
�(�)
:= � � β̂�
Note that if � ∈ R� is held fixed, then
�
�
ˆ
�(�)
∼ N �(�) � σ 2 � � (X� X)−1 � �
Therefore, a (1 − α) × 100% confidence interval for �(�) [for a fixed
predictor variable �] is
�
(α/2)
ˆ
�(�)
± S � � (X� X)−1 � ��−� �
5. Confidence bounds for the regression surface
81
That is, if we are interested in a confidence interval for �(�) for a fixed
�, then we have
�
�
�
(α/2)
�
�
−1
ˆ
P �(�) = �(�) ± S � (X X) � ��−� = 1 − α�
On the other hand, we can also apply Scheffé’s method and obtain the
following simultaneous (1 − α) × 100% confidence set:
�
�
�
ˆ
P �(�) = �(�)
± S �� � (X� X)−1 � F���−� (α) for all � ∈ R� ≥ 1 − α�
Example 4 (Simple linear regression). Consider the basic regression
model
Y� = α + β�� + ε�
(1 ≤ � ≤ �)�
If � = (1 � �)� , then a quick computation yields
�
�
� 2 − 2� �
¯ + �2
� � (X� X)−1 � =
�
���2
��
where we recall ��2 := �=1 (�� − �
¯ )2 . Therefore, a simultaneous confidence interval for all of α + β�’s, as � ranges over R, is
�
�
2 � 2
2 F
α̂ + β̂� ± S
�
−
2�
�
¯
+
�
2��−2 (α)�
���2
This expression can be simplified further because:
� 2 − 2� �
¯ + � 2 = � 2 − (¯
� )2 + (¯
� )2 − 2� �
¯ + �2
= ��2 + (¯
� − �)2 �
Therefore, with probability 1 − α, we have
� �
�
1
(¯
� − �)2
α + β� = α̂ + β̂� ± S 2
+
F2��−2 (α)
�
���2
for all ��
On the other hand, if we want a confidence interval for α + β� for a
fixed �, then we can do better using our �-test:
��
�
1
(¯
� − �)2 (α/2)
α + β� = α̂ + β̂� ± S
+
��−2 �
�
���2
You should check the details of this computation.
�
82
6. Prediction intervals
10. Confidence Intervals and Sets
The difference between confidence and prediction intervals is this: For
a confidence interval we try to find an interval around the parameter
� � β. For a prediction interval we do so for the random variable �0 :=
�0� β + ε0 , where �0 is known and fixed and ε0 is the “noise,” which is
hitherto unobserved (i.e., independent of the vector Y of observations).
It is not hard to construct a good prediction interval of this type: Note
that
�ˆ0 := �0� β̂
satisfies
�
�
�
�
��
�ˆ0 − �0 = �0� β̂ − β − ε0 ∼ N 0 � σ 2 �0� (X� X)−1 �0 + 1 �
Therefore, a prediction interval is
�
(α/2)
�ˆ0 ± S ��−� �0� (X� X)−1 �0 + 1�
This means that
�
�
�
(α/2)
�
�
−1
P �0 ∈ �ˆ0 ± S ��−� �0 (X X) �0 + 1 = 1 − α�
but note that both �0 and the prediction interval are now random.
Download