Confidence Intervals and Sets

Confidence Intervals and Sets Throughout we adopt the normal-error model, and wish to say some things about the construction of confidence intervals [and sets] for the parameters β0 � � � � � β�−1 . 1. Confidence intervals for one parameter Suppose we want a confidence intervals for the �th parameter β� , where 1 ≤ � ≤ � is fixed. Recall that and β̂ − β� � �� ∼ N(0 � 1)� 2 σ (X� X)−1 �� 2 σ 2 χ 2 RSS 1 �� −� S := = Y� − Xβ̂ ∼ � �−� �−� �−� � 2 �=1 and is independent of β̂. Therefore, β̂ − β� N(0 � 1) � �� = ��−� � � ∼� 2 S 2 (X� X)−1 �� χ�−� /(� − �) where the normal and χ 2 random variables are independent. Therefore, �� (α/2) β̂� ± ��−� S (X� X)−1 �� 75 76 10. Confidence Intervals and Sets is a (1 − α) × 100% confidence interval for β� . This yields a complete analysis of confidence intervals for a univariate parameter; these confidence intervals can also be used for testing, of course. 2. Confidence ellipsoids The situation is more interesting if we wish to say something about more than one parameter at the same time. For example, suppose we want to know about (β0 � β1 ) in the overly-simplified case that σ = 1. The general philosophy of confidence intervals [for univariate parameters] suggests that we look for a random set Ω such that � �� β̂1 β1 ∈ Ω = 1 − α� P − β2 β̂2 And we know that � � �� β̂1 β1 ∼N �Σ � β2 β̂2 where Σ is a 2 × 2 matrix with � � Σ�� = (X� X)−1 � Here is a possible method: Let � � β̂1 − β1 � so that γ̂ := β̂2 − β2 Also, recall that �� γ̂ � Σ−1 γ̂ ∼ χ22 � γ̂ ∼ N2 (0 � Σ) � Therefore, one natural choice for Ω is � � Ω := � ∈ R2 : � � Σ−1 � ≤ χ22 (α/2) � What does Ω look like? In order to answer this, let us apply the spectral theorem: Σ = PDP � �� Σ−1 = PD−1 P � � Then we can represent Ω as follows: � � Ω = � ∈ R2 : � � PD−1 P � � ≤ χ22 (α/2) � � = � ∈ R2 : (P � �)D−1 (P � �) ≤ χ22 (α/2) � � = P� ∈ R2 : � � D−1 � ≤ χ22 (α/2) � � �12 �22 2 2 = P� ∈ R : + ≤ χ2 (α/2) � λ1 λ2 3. Bonferonni bounds Consider 77 � � �12 �22 2 (1) � := � ∈ R : + ≤ χ2 (α/2) � λ1 λ2 This is the interior of an ellipsoid, and Ω = P� is the image of the ellipsoid under the “linear orthogonal map” P. Such sets are called “generalized ellipsoids,” and we have found a (1 − α) × 100% confidence [generalized] ellipsoid for (β1 � β2 )� . 2 The preceding can be generalized to any number of the parameters β�1 � � � � � β�� , but is hard to work with, as the geometry of Ω can be complicated [particularly if � � 2]. Therefore, instead we might wish to look for approximate confidence sets that are easier to work with. Before we move on though, let me mention that if you want to know whether or not H0 : β1 = β1�0 � β2 = β2�0 , then we can use these confidence bounds fairly easily, since it is not hard to check whether or not (β1�0 � β2�0 )� is in Ω: You simply compute the scalar quantity � � � � −1 β1�0 β1�0 � β2�0 Σ � β2�0 and check to see if it is ≤ χ22 (α/2)! But if you really need to imagine or see the confidence set[s], then this exact method can be unwieldy [particularly in higher dimensions than 2]. 3. Bonferonni bounds Our approximate confidence intervals are based on a fact from general probability theory. Proposition 1 (Bonferonni’s inequality). Let E1 � � � � � E� be � events. Then, P (E1 ∩ · · · ∩ E� ) ≥ 1 − � � �=1 P(E�� ) = 1 − � � � �=1 � 1 − P(E� ) � Proof. The event E1� ∪ · · · ∪ E�� is the complement of E1 ∩ · · · ∩ E� . Therefore, P (E1 ∩ · · · ∩ E� ) = 1 − P (E1� ∪ · · · ∪ E�� ) � � and this is ≥ 1 − ��=1 P(E�� ) because the probability of a union is at most the sum of the individual probabilities. � Here is how we can use Bonferonni’s inequality. Define �� (α/4) (� = 1� 2)� C� := β̂� ± ��−� S (X� X)−1 �� 78 10. Confidence Intervals and Sets We have seen already that P{β� ∈ C� } = 1 − α � 2 [This is why we used α/4 in the definition of C� .] Therefore, Bonferonni’s inequality implies that �α α� P{β1 ∈ C1 � β2 ∈ C2 } ≥ 1 − + = 1 − α� 2 2 In other words, (C1 � C2 ) forms a “conservative” (1 − α) × 100% simultaneous confidence interval for (β1 � β2 ). This method becomes very inaccurate quickly as the number of parameters of interest grows. For instance, if you want Bonferonni confidence sets for (β1 � β2 � β3 ), then you need to use individual confidence intervals with confidence level 1 − (α/3) each. And for � parameters you need individual confidence level 1 − (α/�), which can yield bad performance when � is large. This method is easy to implement, but usually very conservative.1 4. Scheffé’s simultaneous conservative confidence bounds There is a lovely method, due to Scheffé, that works in a similar fashion to the Bonferonni method; but has also the advantage of being often [far] less conservative! [We are now talking strictly about our linear model.] The starting point of this discussion is a general fact from matrix analysis. Proposition 2 (The Rayleigh—Ritz inequality). If L�×� is positive definite, then for every �-vector �, � � 2� (� �) � −1 � � L � = max �� L� ��=0 The preceding is called an inequality because it says that �� L−1 � ≤ (�� )2 �� L� for all ��×1 � and it also tells us that the inequality is achieved for some � ∈ R� � 1On the other hand, the Bonferonni method can be applied to a wide range of statistical problems that involve simultaneous confidence intervals [and is not limited to the theory of linear models. So it is well worth your while to understand this important method. 4. Scheffé’s simultaneous conservative confidence bounds 79 Proof of Rayleigh–Ritz inequality. Recall the Cauchy–Schwarz inequality from your linear algebra course: (�� )2 ≤ ��2 · ��2 with equality if and only if � = �� for some � ∈ R. It follows then that � � 2� (� �) max = ��2 � �� =0 Now write L := PDP � , in spectral form, and change variables: � �� 1 � := L /2 �� in order to see that � �� 1 �� = �� L /2 �� and so �  ��2 = max  ��=0 �� = �� L�� 2  L �  � �� L� � 1/2 Change variables again: � := L1/2 � to see that � � � 1 �2 ��2 = �L− /2 �� = �� L−1 �� This does the job. � Now let us return to Scheffé’s method. Recall that � � β̂ − β ∼ N� 0 � σ 2 (X� X)−1 � so that �� 1 � � β̂ − β (X X) β̂ − β ∼ χ�2 � σ2 and hence � �� β̂ − β (X� X) β̂ − β �S 2 = � β̂ − β �� (X� X) β̂ − β /� S2 ∼ F��−� � We apply the Rayleigh–Ritz inequality with � := β̂ − β and L := (X� X)−1 in order to find that � � ��2  � β̂ − β � ��   β̂ − β (X� X) β̂ − β = max  � ��=0 �� (X� X)−1 � Therefore,   � � ��2    �  1   � β̂ − β  P max ≤ F (α) = 1 − α�   ��−� 2   �� (X� X)−1 �   �S ��=0 80 10. Confidence Intervals and Sets Equivalently, �  ��  �  �� β̂ − �� β��  P � ≤ �S 2 F��−� (α) for all � ∈ R� = 1 − α�  �� (X� X)−1 �  If we restrict attention to a subcollection of �’s then the probability is even more. In particular, consider only �’s that are the standard basis vectors of R� , in order to deduce from the preceding that �  ��  �  �β̂� − β� ��  P � ≤ �S 2 F��−� (α) for all 1 ≤ � ≤ � ≥ 1 − α�  [(X� X)]  �� In other words, we have demonstrated the following. Theorem 3 (Scheffé). The following are conservative (1 − α) × 100% simultaneous confidence bounds for (β1 � � � � � β� )� : � � � β̂� ± �S 2 (X� X)−1 �� F��−� (α) (1 ≤ � ≤ �)� When the sample size � is very large, the preceding yields an as2 ymptotic simplification. Recall that χ�−� /(� − �) → 1 as � → ∞. There2 fore, F��−� (α) ≈ χ� (α)/� for � � 1. Therefore, the following are asymptotically-conservative simultaneous confidence bounds: �� β̂� ± S (X� X)−1 �� χ�2 (α) (1 ≤ � ≤ �) for � � 1� 5. Confidence bounds for the regression surface Given a vector � ∈ R� of predictor variables, our linear model yields E� = � � β. In other words, we can view our efforts as one about trying to understand the unknown function �(�) := � � β� And among other things, we have found the following estimator for �: ˆ �(�) := � � β̂� Note that if � ∈ R� is held fixed, then � � ˆ �(�) ∼ N �(�) � σ 2 � � (X� X)−1 � � Therefore, a (1 − α) × 100% confidence interval for �(�) [for a fixed predictor variable �] is � (α/2) ˆ �(�) ± S � � (X� X)−1 � ��−� � 5. Confidence bounds for the regression surface 81 That is, if we are interested in a confidence interval for �(�) for a fixed �, then we have � � � (α/2) � � −1 ˆ P �(�) = �(�) ± S � (X X) � ��−� = 1 − α� On the other hand, we can also apply Scheffé’s method and obtain the following simultaneous (1 − α) × 100% confidence set: � � � ˆ P �(�) = �(�) ± S �� (X� X)−1 � F��−� (α) for all � ∈ R� ≥ 1 − α� Example 4 (Simple linear regression). Consider the basic regression model Y� = α + β�� + ε� (1 ≤ � ≤ �)� If � = (1 � �)� , then a quick computation yields � � � 2 − 2� � ¯ + �2 � � (X� X)−1 � = � ��2 �� where we recall ��2 := �=1 (�� − � ¯ )2 . Therefore, a simultaneous confidence interval for all of α + β�’s, as � ranges over R, is � � 2 � 2 2 F α̂ + β̂� ± S � − 2� � ¯ + � 2��−2 (α)� ��2 This expression can be simplified further because: � 2 − 2� � ¯ + � 2 = � 2 − (¯ � )2 + (¯ � )2 − 2� � ¯ + �2 = ��2 + (¯ � − �)2 � Therefore, with probability 1 − α, we have � � � 1 (¯ � − �)2 α + β� = α̂ + β̂� ± S 2 + F2��−2 (α) � ��2 for all �� On the other hand, if we want a confidence interval for α + β� for a fixed �, then we can do better using our �-test: �� 1 (¯ � − �)2 (α/2) α + β� = α̂ + β̂� ± S + ��−2 � � ��2 You should check the details of this computation. � 82 6. Prediction intervals 10. Confidence Intervals and Sets The difference between confidence and prediction intervals is this: For a confidence interval we try to find an interval around the parameter � � β. For a prediction interval we do so for the random variable �0 := �0� β + ε0 , where �0 is known and fixed and ε0 is the “noise,” which is hitherto unobserved (i.e., independent of the vector Y of observations). It is not hard to construct a good prediction interval of this type: Note that �ˆ0 := �0� β̂ satisfies � � � � �� ˆ0 − �0 = �0� β̂ − β − ε0 ∼ N 0 � σ 2 �0� (X� X)−1 �0 + 1 � Therefore, a prediction interval is � (α/2) �ˆ0 ± S ��−� �0� (X� X)−1 �0 + 1� This means that � � � (α/2) � � −1 P �0 ∈ �ˆ0 ± S ��−� �0 (X X) �0 + 1 = 1 − α� but note that both �0 and the prediction interval are now random.

Confidence Intervals and Sets

Related documents

Products

Support

Confidence Intervals and Sets

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib