Reversals of Least-Squares Estimates Brian Knaeble University of Utah June 2013 Abstract An adjusted estimate may be opposite the original estimate. This paper presents necessary and sufficient conditions for such a reversal, in the context of linear modeling, where adjustment is obtained through considering additional explanatory data associated with lurking variables. Keywords: Yule–Simpson effect, Thales’ theorem, Multiple regression, Confounding 1 Introduction The nature of an association between real-valued random variables X1 and Y , can be crudely estimated through observing vectors of data, x1 and y, and then computing a correlation coefficient r(x1 , y). However, in general, such an estimate is susceptible to confounding: additional data, x2 , may lead to an adjusted estimate that is opposite the original estimate. In this paper we study such reversals in the context of least-squares fitting of linear models. Given x1 , y, and the fitted coefficient β̂1 6= 0, we seek necessary and sufficient conditions on x2 , so that the adjusted estimate, denoted with 2,1 β̂1 , is opposite the original estimate β̂1 , which we henceforth denote with 1 β̂1 . Throughout this paper, left subscripts indicate those explanatory variables used to fit a model. Note also that we assume always that all vectors of data are of equal length. Visual intuition suggests that a reversal occurs if and only if the vector x2 lies “between” the vectors x1 and y. Geometric analysis in higher dimensions then leads to the conclusion that this region is bounded by a right circular, conical surface of two nappes (this geometric object will be described in detail in Section 2). Such an understanding leads quickly to useful results. The results are stated using “sign”, where sign of a positive real number is one, sign of a negative real number is negative one, and sign of zero remains undefined; the truth of sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) entails that both quantities are defined. Note also that r(y, x1 + y) = r(x1 , x1 + y). Theorem 1.1. When r(y, x1 ) 6= 0, a reversal, sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ), can occur only if |r(x2 , y)| and |r(x2 , x1 )| are both greater than |r(x1 , y)|. 1 Theorem 1.2. When r(y, x1 ) 6= 0, a reversal, sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ), occurs if and only if |r(x2 , x1 + y)| > |r(y, x1 + y)|. These results can be generalized so as to regard the robustness of an (already) adjusted estimate, against confounding due to multiple lurking variables. See Theorems 5.1 and 5.2, in Section 5. In particular, a special case of Theorem 5.1 gives necessary and sufficient conditions for the Yule–Simpson effect. See [14], [5] and [7] (Section 2)) for an overview of the Yule–Simpson effect. Mathematics relating to reversals appears in literature from the 1950s regarding smoking and lung cancer (see [2]), and there remains potential for application of such mathematics within epidemiology today: see [10] and [11] for educational articles, and any of [3], [6], [12], [8] or [1] for scientific papers. For related mathematical papers see [13], [9] or [4]. 2 Geometry This section presents the geometric foundation for the proofs of Theorems 1– 4. Notation has been chosen so as to best convey the ideas behind the proofs. Readily-verified lemmas are stated without proof. 2.1 Angles and Objects Definition 2.1. Let u1 and u2 be two nonzero vectors within an inner product space. Define the angle between them, θ, to be hu1 , u2 i −1 . θ(u1 , u2 ) = cos |u1 ||u2 | Definition 2.2. Let u ∈ Rp be a generic vector with coordinates (u1 , u2 , ..., up ). For a fixed θ, with 0 < θ < π/2, define the following geometric objects: q u21 + u22 + ... + u2p−1 ≤ tan(θ/2) , Vθ := u ∈ Rp : |up | Bθ := u ∈ Rp : up = ± cos(θ/2), u21 + u22 + ... + u2p−1 ≤ sin2 (θ/2) . The boundary of Vθ is a right circular, conical surface of two nappes, and Bθ is the intersection of Vθ with the union of two affine planes: with A+ = {(u1 , u2 , ..., up ) ∈ Rn : uk = cos(θ/2)} ∼ = Rp−1 and A− = {(u1 , u2 , ..., up ) ∈ Rn : + p−1 ∼ up = − cos(θ/2)} = R define Bθ = Vθ ∩ A+ and Bθ− = Vθ ∩ A− so that + − Bθ = Bθ ∪ Bθ . Similarly, Vθ+ and Vθ− to represent the upper (xp > 0) and lower (xp < 0) portions of Vθ . We use “∂” to signal the boundary of a set and “◦” to signal the interior of a set. Expressions such as ∂Vθ and Vθ◦ have meanings that are readily understood. For Bθ+ , however, since it is embedded within A+ , which itself is embedded within Rp , we should specify that the topological operations are carried out 2 within A+ , and likewise for Bθ− ⊂ A− and Bθ ⊂ A. Thus, ∂Bθ = A ∩ ∂Vθ and Bθ◦ = A ∩ Vθ◦ , etc. The boundary ∂Vθ separates Rp into the complement of Vθ , namely Vθc , and the interior of Vθ , Vθ◦ . Furthermore, the interior Vθ◦ is separated by the + − origin into what we will call Vθ◦ and Vθ◦ . In summary, the complement of the + − boundary of Vθ consists of three seperate regions: ∂Vθc = Vθc ∪ Vθ◦ ∪ Vθ◦ . 2.2 Projections Definition 2.3. Given an affine subspace L ⊆ Rd consider u ∈ Rd , u 6∈ L. The projection of u onto L is pL (u) = argmin(|u − l|). l∈L Lemma 2.1. Given an affine subspace L ⊆ Rd consider u ∈ Rd , u 6∈ L. A point p ∈ L is pL (u) if and only if for all l ∈ L we have (l − p) ⊥ (u − p). Definition 2.4. Given a set of vectors, {xk }k∈K , define the vector subspace KL = span({xk }k∈K }). Lemma 2.2. Consider a vector subspace K L ⊆ Rd and a vector y ∈ Rd , y 6∈ d K L. If q ∈ R and q 6∈ K L then |y − p q,K L (y)| < |y − p K L (y)|. Proposition 2.1. Set ed = (0, 0, ..., 1) ∈ Rd . Consider y ∈ Rd and x1 ∈ Rd with hy, ed , i > 0 and hx1 , ed i > 0. If a nonzero x2 ∈ Rd satisfies x2 ⊥ ed then 6 0. hp 2,1 L (y), ed i = Proof. Define Ed = {u ∈ Rd : u ⊥ ed } so that y 6∈ Ed , x1 6∈ Ed and x2 ∈ Ed . Observe that {span({x1 , x2 }) ∩ Ed } is one-dimensional, and thus by Lemma 2.1, hp 2,1 L (y), ed i = 0 =⇒ p 2,1 L (y) = p 2 L (y). This however contradicts the strict inequality in Lemma 2.2, so we reject hp 2,1 L (y), ed i = 0 and conclude hp 2,1 L (y), ed i = 6 0. Theorem 2.1 (Thales’ theorem). Consider the geometric sphere: Srm = {(u1 , u2 , ..., um ) ∈ Rm : u21 + u22 + u23 = r2 > 0}. For three points on Srm , namely t, its opposite −t, and another point s, the following holds: (s − t) ⊥ (s − (−t)). Thales’ theorem is typically proved for m = 2 using elementary geometry of angles and triangles; the higher-dimensional version that we have stated follows since span({t, −t, v}) ∼ = R2 and (span({t, −t, s}) ∩ Srm ) ∼ = Sr2 . Note that the theorem also holds for lower dimensional (q < m), embedded spheres (Srq ,→ Rm ), that are not necessarily centered at the origin. For example, the theorem holds when applied on ∂Bθ . 3 3 Zeros This section builds on the development of the previous section. Fixing y and x1 within Rn , we characterize the level set {x2 ∈ Rn : 2,1 β̂1 = 0}, as a step toward identification of those x2 that are associated with reversals. The analysis proceeds cleanly if we assume centered (mean zero) vectors of data. Tildes indicate centered data. 3.1 Space of Centered Vectors Assume ỹ, x̃1 and x̃2 , each within Rn , with 0 < θ(ỹ, x̃1 ) < π/2. Being centered, each vector is orthogonal to e = (1, 1, ..., 1), and thus span({ỹ, x̃1 , x̃2 }) is contained within an (n − 1)-dimensional, vector subspace E. Place orthonormal coordinates on E so that ỹ/|ỹ|+ x̃1 /|x̃1 | can be expressed as (0, 0, ..., 2 cos(θ(ỹ, x̃1 )/2)). Set p = n − 1 so as to write E ∼ = Rp , suggesting and allowing application of the remaining terminology from Section 2. Note that ỹ/|ỹ| and x̃1 /|x̃1 | are opposites on ∂B + , and x̃2 is somewhere in E\0. ˜ Because 0 < θ(ỹ, x̃1 ) < π/2, 1 β̂1 > 0, but depending on the positioning of ˜ x̃2 , 2,1 β̂1 could be positive, negative, or zero.† We specify the “sign” function to be “1” for positives, “−1” for negatives, and undefined for zero, so that ˜ ˜ sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) signals a change from a positive to a negative or vice ˜ ˜ ˜ ˜ versa: i.e. sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ sign( 2,1 β̂1 ) + sign( 1 β̂1 ) = 0. ˜ In case x̃2 /|x̃2 | = ±x̃1 /|x̃1 | think of x̃2 as dominating, and define 2,1 β̂2 to ˜ ˜ be 2 β̂2 while 2,1 β̂1 = 0. For the unrealistic case x̃2 = 0, think of arrival along ˜ span({ỹ}), so that by continuity it makes sense to formally define 2,1 β̂1 = 0. 3.2 Geometric Results Proposition 3.1. Let ỹ and x̃1 be such that 0 < θ(ỹ, x̃1 ) < π/2. For any x̃2 , ˜ 2,1 β̂1 = 0 ⇐⇒ x̃2 ∈ ∂Vθ(ỹ,x̃1 ) . ˜ Proof. First observe that x̃2 ∈ span({x̃1 )}) ∪ span({ỹ)}) =⇒ 2,1 β̂1 = 0. Also, ˜ by Proposition 2.1, 2,1 β̂1 6= 0 for any x̃2 6= 0 such that hx̃2 , (0, 0, ..., 1)i = 0. ˜ Furthermore, for c1 6= 0, the scaling, x̃2 7→ c1 x̃2 leaves 2,1 β̂1 unchanged. Thus, we assume x̃2 ∈ A+ \{ỹ, x̃1 }. Additionally, note that for c2 > 0, the scaling, p̃ 7→ p̃∗ = c2 p̃ leaves ˜ ˜ ˜ sign( 2,1 β̂1 ) unchanged, so we may consider 2,1 β̂1∗ (in place of 2,1 β̂1 ), where ˜∗ ∗ 2,1 β̂1 is the expression of p̃ in the basis {x̃1 , x̃2 }. † In general, K β̂k denotes the kth coefficient, when the projection of y onto the span of explanatory vectors indexed by K, is expressed in terms of such basis vectors. 4 Also note that p̃∗ , by Lemma 2.1, is the projection of ỹ onto the space {x̃2 + l(x̃1 − x̃2 )}l∈R . Denote with ˆl the value of the parameter l associated with ˜ p̃∗ , and note, since 0 < θ(ỹ, x̃1 ) < π/2, that sign( 2,1 β̂1∗ ) = sign(ˆl). + Finally, thinking of ˆl as a function of x̃2 ∈ A \{ỹ, x̃1 }, we can conclude that sign(ˆl) = 0 ⇐⇒ x̃2 ∈ ∂B + by appealing to Thales’ theorem. Proposition 3.2. Let ỹ and x̃1 be such that 0 < θ(ỹ, x̃1 ) < π/2. For any x̃2 , ˜ 2,1 β̂1 ◦ < 0 ⇐⇒ x̃2 ∈ Vθ(ỹ,x̃ . 1) ˜ Proof. Given our setup, 2,1 β̂1 is a function, f , of x̃2 , and, by Proposition 3.1, the zero set of f is ∂Vθ(ỹ,x̃1 ) . Also, as mentioned in Section 2.1, ∂Vθc = Vθc ∪ + − Vθ◦ ∪ Vθ◦ . Since, f is continuous we need only sample one representative + − ˜ point from each of Vθc , Vθ◦ and Vθ◦ , in order to determine the sign of 2,1 β̂1 throughout each region. The following points are conveniently selected to avoid ˜ projection, and direct computation reveals that x̃2 = ±ỹ + x̃1 =⇒ 2,1 β̂1 < 0 ˜ and x̃2 = x̃1 − ỹ =⇒ 2,1 β̂1 > 0. In case π/2 < θ(ỹ, x̃1 ) < π, replace x̃1 with x̃1 , so that Proposition 3.2 still applies, effectively extended the range of application, resulting in the following corollary. Corollary 3.1. Let ỹ and x̃1 be such that θ(ỹ, x̃1 ) ∈ {(0, π/2) ∪ (π/2, π)}. For any x̃2 , ˜ ˜ ◦ 2,1 β̂1 6= 1 β̂1 ⇐⇒ x̃2 ∈ Vθ(ỹ,x̃1 ) . 4 Correlations In this section we translate the results of Section 3 into the language of correlations, in order to prove Theorems 1.1 and 1.2. We posit the observation of y, x1 , and x2 , each within Rn . In case r(y, x1 ) = ±1, the truth of the theorems can be verified with the comments written at the end of Section 3.1, and r(y, x1 ) 6= 0 is assumed in both theorems. Thus, henceforth, we tacitly assume r(y, x1 ) ∈ {(−1, 0) ∪ (0, 1)}, ensuring applicability of Corollary 3.1. Lemma 4.1. Let u1 and u2 be two Dvectors inERn , and let their centered versions ũ1 , ũ2 = r(u1 , u2 ). be denoted with ũ1 and ũ2 . Then, |ũ 1 | |ũ2 | Given y and x1 projection determines their centered versions, ỹ and x̃1 . Define h̃ = ỹ + x̃1 . Since projection and addition commute, we know that h̃ is the centered version of h = y+xD1 . With Ethis notation, D E Corollary 3.1 asserts that ˜ ˜ ỹ x̃2 h̃ h̃ sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ |x̃2 | , |h̃| > |ỹ| , |h̃| , and by Lemma 4.1 this ˜ ˜ becomes sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ r(x2 , h) > r(y, h), thus demonstrating the truth of Theorem 1.2. 5 To prove Theorem 1.1, simply observe that within A the intersection F = {r̃1 ∈ A : |r̃1 − ỹ| < sin(θ(ỹ, x̃1 )/2)} ∩ {r̃2 ∈ A : |r̃2 − x̃1 | < sin(θ(ỹ, x̃1 )/2)} strictly contains B. Thus, by Corollary 3.1, the existence of a reversal implies that there exists a c 6= 0 such that cx̃2 ∈ B ◦ ⊂ F . An interpretation of cx̃2 ∈ F in terms of correlation coefficients produces the conclusion of Theorem 1.1. 5 Generalizations If we denote with J R the positive square root of the coefficient of determination, where J indexes a finite subset of the (perhaps very large) set of all conceivable, explanatory vectors of data, then the following generalization of Theorem 1.2 can be stated. Theorem 5.1. When r(y, x1 ) 6= 0, a reversal, sign( J,1 β̂1 ) 6= sign( 1 β̂1 ), occurs if and only if J R > |r(y, x1 + y)|. Theorem 5.1 is related to Theorem 1.2 and Proposition 3.2 through the observation that a reversal occurs, in this more general setting, if and only ◦ . Alternatively, one can think of reversals as occurring if p J,1 V (ỹ) ∈ Vθ(ỹ,x̃ 1) ◦ precisely when span({x̃1 , {x̃j }j∈J }) intersects Vθ(ỹ,x̃ . 1) We can generalize further. Let I index a finite, orthogonal† set of explanatory vectors, and let J index a disjoint, finite set of additional explanatory vectors. Theorem 5.2. 