Reversals of Least-Squares Estimates Brian Knaeble University of Utah June 2013 Abstract An adjusted estimate may be opposite the original estimate. This paper presents necessary and sufficient conditions for such a reversal, in the context of linear modeling, where adjustment is obtained through considering additional explanatory data associated with lurking variables. Keywords: Yule–Simpson effect, Thales’ theorem, Multiple regression, Confounding 1 Introduction The nature of an association between real-valued random variables X1 and Y , can be crudely estimated through observing vectors of data, x1 and y, and then computing a correlation coefficient r(x1 , y). However, in general, such an estimate is susceptible to confounding: additional data, x2 , may lead to an adjusted estimate that is opposite the original estimate. In this paper we study such reversals in the context of least-squares fitting of linear models. Given x1 , y, and the fitted coefficient β̂1 6= 0, we seek necessary and sufficient conditions on x2 , so that the adjusted estimate, denoted with 2,1 β̂1 , is opposite the original estimate β̂1 , which we henceforth denote with 1 β̂1 . Throughout this paper, left subscripts indicate those explanatory variables used to fit a model. Note also that we assume always that all vectors of data are of equal length. Visual intuition suggests that a reversal occurs if and only if the vector x2 lies “between” the vectors x1 and y. Geometric analysis in higher dimensions then leads to the conclusion that this region is bounded by a right circular, conical surface of two nappes (this geometric object will be described in detail in Section 2). Such an understanding leads quickly to useful results. The results are stated using “sign”, where sign of a positive real number is one, sign of a negative real number is negative one, and sign of zero remains undefined; the truth of sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) entails that both quantities are defined. Note also that r(y, x1 + y) = r(x1 , x1 + y). Theorem 1.1. When r(y, x1 ) 6= 0, a reversal, sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ), can occur only if |r(x2 , y)| and |r(x2 , x1 )| are both greater than |r(x1 , y)|. 1 Theorem 1.2. When r(y, x1 ) 6= 0, a reversal, sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ), occurs if and only if |r(x2 , x1 + y)| > |r(y, x1 + y)|. These results can be generalized so as to regard the robustness of an (already) adjusted estimate, against confounding due to multiple lurking variables. See Theorems 5.1 and 5.2, in Section 5. In particular, a special case of Theorem 5.1 gives necessary and sufficient conditions for the Yule–Simpson effect. See [14], [5] and [7] (Section 2)) for an overview of the Yule–Simpson effect. Mathematics relating to reversals appears in literature from the 1950s regarding smoking and lung cancer (see [2]), and there remains potential for application of such mathematics within epidemiology today: see [10] and [11] for educational articles, and any of [3], [6], [12], [8] or [1] for scientific papers. For related mathematical papers see [13], [9] or [4]. 2 Geometry This section presents the geometric foundation for the proofs of Theorems 1– 4. Notation has been chosen so as to best convey the ideas behind the proofs. Readily-verified lemmas are stated without proof. 2.1 Angles and Objects Definition 2.1. Let u1 and u2 be two nonzero vectors within an inner product space. Define the angle between them, θ, to be hu1 , u2 i −1 . θ(u1 , u2 ) = cos |u1 ||u2 | Definition 2.2. Let u ∈ Rp be a generic vector with coordinates (u1 , u2 , ..., up ). For a fixed θ, with 0 < θ < π/2, define the following geometric objects: q u21 + u22 + ... + u2p−1 ≤ tan(θ/2) , Vθ := u ∈ Rp : |up | Bθ := u ∈ Rp : up = ± cos(θ/2), u21 + u22 + ... + u2p−1 ≤ sin2 (θ/2) . The boundary of Vθ is a right circular, conical surface of two nappes, and Bθ is the intersection of Vθ with the union of two affine planes: with A+ = {(u1 , u2 , ..., up ) ∈ Rn : uk = cos(θ/2)} ∼ = Rp−1 and A− = {(u1 , u2 , ..., up ) ∈ Rn : + p−1 ∼ up = − cos(θ/2)} = R define Bθ = Vθ ∩ A+ and Bθ− = Vθ ∩ A− so that + − Bθ = Bθ ∪ Bθ . Similarly, Vθ+ and Vθ− to represent the upper (xp > 0) and lower (xp < 0) portions of Vθ . We use “∂” to signal the boundary of a set and “◦” to signal the interior of a set. Expressions such as ∂Vθ and Vθ◦ have meanings that are readily understood. For Bθ+ , however, since it is embedded within A+ , which itself is embedded within Rp , we should specify that the topological operations are carried out 2 within A+ , and likewise for Bθ− ⊂ A− and Bθ ⊂ A. Thus, ∂Bθ = A ∩ ∂Vθ and Bθ◦ = A ∩ Vθ◦ , etc. The boundary ∂Vθ separates Rp into the complement of Vθ , namely Vθc , and the interior of Vθ , Vθ◦ . Furthermore, the interior Vθ◦ is separated by the + − origin into what we will call Vθ◦ and Vθ◦ . In summary, the complement of the + − boundary of Vθ consists of three seperate regions: ∂Vθc = Vθc ∪ Vθ◦ ∪ Vθ◦ . 2.2 Projections Definition 2.3. Given an affine subspace L ⊆ Rd consider u ∈ Rd , u 6∈ L. The projection of u onto L is pL (u) = argmin(|u − l|). l∈L Lemma 2.1. Given an affine subspace L ⊆ Rd consider u ∈ Rd , u 6∈ L. A point p ∈ L is pL (u) if and only if for all l ∈ L we have (l − p) ⊥ (u − p). Definition 2.4. Given a set of vectors, {xk }k∈K , define the vector subspace KL = span({xk }k∈K }). Lemma 2.2. Consider a vector subspace K L ⊆ Rd and a vector y ∈ Rd , y 6∈ d K L. If q ∈ R and q 6∈ K L then |y − p q,K L (y)| < |y − p K L (y)|. Proposition 2.1. Set ed = (0, 0, ..., 1) ∈ Rd . Consider y ∈ Rd and x1 ∈ Rd with hy, ed , i > 0 and hx1 , ed i > 0. If a nonzero x2 ∈ Rd satisfies x2 ⊥ ed then 6 0. hp 2,1 L (y), ed i = Proof. Define Ed = {u ∈ Rd : u ⊥ ed } so that y 6∈ Ed , x1 6∈ Ed and x2 ∈ Ed . Observe that {span({x1 , x2 }) ∩ Ed } is one-dimensional, and thus by Lemma 2.1, hp 2,1 L (y), ed i = 0 =⇒ p 2,1 L (y) = p 2 L (y). This however contradicts the strict inequality in Lemma 2.2, so we reject hp 2,1 L (y), ed i = 0 and conclude hp 2,1 L (y), ed i = 6 0. Theorem 2.1 (Thales’ theorem). Consider the geometric sphere: Srm = {(u1 , u2 , ..., um ) ∈ Rm : u21 + u22 + u23 = r2 > 0}. For three points on Srm , namely t, its opposite −t, and another point s, the following holds: (s − t) ⊥ (s − (−t)). Thales’ theorem is typically proved for m = 2 using elementary geometry of angles and triangles; the higher-dimensional version that we have stated follows since span({t, −t, v}) ∼ = R2 and (span({t, −t, s}) ∩ Srm ) ∼ = Sr2 . Note that the theorem also holds for lower dimensional (q < m), embedded spheres (Srq ,→ Rm ), that are not necessarily centered at the origin. For example, the theorem holds when applied on ∂Bθ . 3 3 Zeros This section builds on the development of the previous section. Fixing y and x1 within Rn , we characterize the level set {x2 ∈ Rn : 2,1 β̂1 = 0}, as a step toward identification of those x2 that are associated with reversals. The analysis proceeds cleanly if we assume centered (mean zero) vectors of data. Tildes indicate centered data. 3.1 Space of Centered Vectors Assume ỹ, x̃1 and x̃2 , each within Rn , with 0 < θ(ỹ, x̃1 ) < π/2. Being centered, each vector is orthogonal to e = (1, 1, ..., 1), and thus span({ỹ, x̃1 , x̃2 }) is contained within an (n − 1)-dimensional, vector subspace E. Place orthonormal coordinates on E so that ỹ/|ỹ|+ x̃1 /|x̃1 | can be expressed as (0, 0, ..., 2 cos(θ(ỹ, x̃1 )/2)). Set p = n − 1 so as to write E ∼ = Rp , suggesting and allowing application of the remaining terminology from Section 2. Note that ỹ/|ỹ| and x̃1 /|x̃1 | are opposites on ∂B + , and x̃2 is somewhere in E\0. ˜ Because 0 < θ(ỹ, x̃1 ) < π/2, 1 β̂1 > 0, but depending on the positioning of ˜ x̃2 , 2,1 β̂1 could be positive, negative, or zero.† We specify the “sign” function to be “1” for positives, “−1” for negatives, and undefined for zero, so that ˜ ˜ sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) signals a change from a positive to a negative or vice ˜ ˜ ˜ ˜ versa: i.e. sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ sign( 2,1 β̂1 ) + sign( 1 β̂1 ) = 0. ˜ In case x̃2 /|x̃2 | = ±x̃1 /|x̃1 | think of x̃2 as dominating, and define 2,1 β̂2 to ˜ ˜ be 2 β̂2 while 2,1 β̂1 = 0. For the unrealistic case x̃2 = 0, think of arrival along ˜ span({ỹ}), so that by continuity it makes sense to formally define 2,1 β̂1 = 0. 3.2 Geometric Results Proposition 3.1. Let ỹ and x̃1 be such that 0 < θ(ỹ, x̃1 ) < π/2. For any x̃2 , ˜ 2,1 β̂1 = 0 ⇐⇒ x̃2 ∈ ∂Vθ(ỹ,x̃1 ) . ˜ Proof. First observe that x̃2 ∈ span({x̃1 )}) ∪ span({ỹ)}) =⇒ 2,1 β̂1 = 0. Also, ˜ by Proposition 2.1, 2,1 β̂1 6= 0 for any x̃2 6= 0 such that hx̃2 , (0, 0, ..., 1)i = 0. ˜ Furthermore, for c1 6= 0, the scaling, x̃2 7→ c1 x̃2 leaves 2,1 β̂1 unchanged. Thus, we assume x̃2 ∈ A+ \{ỹ, x̃1 }. Additionally, note that for c2 > 0, the scaling, p̃ 7→ p̃∗ = c2 p̃ leaves ˜ ˜ ˜ sign( 2,1 β̂1 ) unchanged, so we may consider 2,1 β̂1∗ (in place of 2,1 β̂1 ), where ˜∗ ∗ 2,1 β̂1 is the expression of p̃ in the basis {x̃1 , x̃2 }. † In general, K β̂k denotes the kth coefficient, when the projection of y onto the span of explanatory vectors indexed by K, is expressed in terms of such basis vectors. 4 Also note that p̃∗ , by Lemma 2.1, is the projection of ỹ onto the space {x̃2 + l(x̃1 − x̃2 )}l∈R . Denote with ˆl the value of the parameter l associated with ˜ p̃∗ , and note, since 0 < θ(ỹ, x̃1 ) < π/2, that sign( 2,1 β̂1∗ ) = sign(ˆl). + Finally, thinking of ˆl as a function of x̃2 ∈ A \{ỹ, x̃1 }, we can conclude that sign(ˆl) = 0 ⇐⇒ x̃2 ∈ ∂B + by appealing to Thales’ theorem. Proposition 3.2. Let ỹ and x̃1 be such that 0 < θ(ỹ, x̃1 ) < π/2. For any x̃2 , ˜ 2,1 β̂1 ◦ < 0 ⇐⇒ x̃2 ∈ Vθ(ỹ,x̃ . 1) ˜ Proof. Given our setup, 2,1 β̂1 is a function, f , of x̃2 , and, by Proposition 3.1, the zero set of f is ∂Vθ(ỹ,x̃1 ) . Also, as mentioned in Section 2.1, ∂Vθc = Vθc ∪ + − Vθ◦ ∪ Vθ◦ . Since, f is continuous we need only sample one representative + − ˜ point from each of Vθc , Vθ◦ and Vθ◦ , in order to determine the sign of 2,1 β̂1 throughout each region. The following points are conveniently selected to avoid ˜ projection, and direct computation reveals that x̃2 = ±ỹ + x̃1 =⇒ 2,1 β̂1 < 0 ˜ and x̃2 = x̃1 − ỹ =⇒ 2,1 β̂1 > 0. In case π/2 < θ(ỹ, x̃1 ) < π, replace x̃1 with x̃1 , so that Proposition 3.2 still applies, effectively extended the range of application, resulting in the following corollary. Corollary 3.1. Let ỹ and x̃1 be such that θ(ỹ, x̃1 ) ∈ {(0, π/2) ∪ (π/2, π)}. For any x̃2 , ˜ ˜ ◦ 2,1 β̂1 6= 1 β̂1 ⇐⇒ x̃2 ∈ Vθ(ỹ,x̃1 ) . 4 Correlations In this section we translate the results of Section 3 into the language of correlations, in order to prove Theorems 1.1 and 1.2. We posit the observation of y, x1 , and x2 , each within Rn . In case r(y, x1 ) = ±1, the truth of the theorems can be verified with the comments written at the end of Section 3.1, and r(y, x1 ) 6= 0 is assumed in both theorems. Thus, henceforth, we tacitly assume r(y, x1 ) ∈ {(−1, 0) ∪ (0, 1)}, ensuring applicability of Corollary 3.1. Lemma 4.1. Let u1 and u2 be two Dvectors inERn , and let their centered versions ũ1 , ũ2 = r(u1 , u2 ). be denoted with ũ1 and ũ2 . Then, |ũ 1 | |ũ2 | Given y and x1 projection determines their centered versions, ỹ and x̃1 . Define h̃ = ỹ + x̃1 . Since projection and addition commute, we know that h̃ is the centered version of h = y+xD1 . With Ethis notation, D E Corollary 3.1 asserts that ˜ ˜ ỹ x̃2 h̃ h̃ sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ |x̃2 | , |h̃| > |ỹ| , |h̃| , and by Lemma 4.1 this ˜ ˜ becomes sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ r(x2 , h) > r(y, h), thus demonstrating the truth of Theorem 1.2. 5 To prove Theorem 1.1, simply observe that within A the intersection F = {r̃1 ∈ A : |r̃1 − ỹ| < sin(θ(ỹ, x̃1 )/2)} ∩ {r̃2 ∈ A : |r̃2 − x̃1 | < sin(θ(ỹ, x̃1 )/2)} strictly contains B. Thus, by Corollary 3.1, the existence of a reversal implies that there exists a c 6= 0 such that cx̃2 ∈ B ◦ ⊂ F . An interpretation of cx̃2 ∈ F in terms of correlation coefficients produces the conclusion of Theorem 1.1. 5 Generalizations If we denote with J R the positive square root of the coefficient of determination, where J indexes a finite subset of the (perhaps very large) set of all conceivable, explanatory vectors of data, then the following generalization of Theorem 1.2 can be stated. Theorem 5.1. When r(y, x1 ) 6= 0, a reversal, sign( J,1 β̂1 ) 6= sign( 1 β̂1 ), occurs if and only if J R > |r(y, x1 + y)|. Theorem 5.1 is related to Theorem 1.2 and Proposition 3.2 through the observation that a reversal occurs, in this more general setting, if and only ◦ . Alternatively, one can think of reversals as occurring if p J,1 V (ỹ) ∈ Vθ(ỹ,x̃ 1) ◦ precisely when span({x̃1 , {x̃j }j∈J }) intersects Vθ(ỹ,x̃ . 1) We can generalize further. Let I index a finite, orthogonal† set of explanatory vectors, and let J index a disjoint, finite set of additional explanatory vectors. Theorem 5.2. For any i ∈ I, when r(y, xi ) 6= 0, a reversal, sign( J,I β̂i ) 6= sign( I β̂i ), occurs if and only if J R > |r(y, xi + y)|. References [1] Cervellati et al. “Bone mass density selectively correlates with serum markers of oxidative damage in post-menopausal women.” Clinical Chemistry and Laboratory Medicine (2012) Volume 51, Issue 2, Pages 333-338 [2] Cornfield et al. “Smoking and lung cancer: Recent evidence and a discussion of some questions”. Journal of the National Cancer Institute (1959) 22, 173-203 [3] Davis et al. “Rice Consumption and Urinary Arsenic Concentrations in U.S. Children”. Environmental Health Perspectives. (October 2012), Vol.120, Issue 10, p1418-1424 [4] Hosman, Hansen and Holland. “The sensitivity of linear regression coefficients’ confidence limits to the omission of a confounder”. The Annals of Applied Statistics (2010) Vol. 4, No. 2, 849-870 † The assumption of orthogonality allows for disregard of the supporting vectors, since they lie orthogonal to the space of interest. For a specific example that shows the importance of the assumption of orthogonality, and for a discussion of the practical applications of Theorem 5.2, see [7] (Table 1 and Sections 1 and 3). 6 [5] Julious and Mullee. “Confounding and Simpson’s paradox.” British Medical Journal (12/03/1994) 309 (6967): 14801481. [6] Jungert et al. “Serum 25-hydroxyvitamin D3 and composition in an elderly cohort from Germany: a sectional study”. Nutrition & Metabolism. (2012) http://www.nutritionandmetabolism.com/content/9/1/42 body cross9:42 [7] Knaeble, Brian. “Simple Conditions to Theoretically Limit the Potential for Confounding of Least-Squares Estimates.” Manuscript submitted for publication (2013) [8] Lignell et al. “Prenatal exposure to polychlorinated biphenyls and polybriminated diphenyl ethers may influence birth weight among infants in a Swedish cohort with background exposure: a cross-sectional study.” Environmental Health (2013), 12:44 http://www.ehjournal.net/content/12/1/44 [9] Lin, Psaty and Kronmal. “Assessing the Sensitivity of Regression Results to Unmeasured Confounders in Observational Studies”. Biometrics (September 1998) 54, 948-963 [10] Lu, C.Y. “Observational studies: a review of study designs, challenges and strategies to reduce confounding”. The International Journal of Clinical Practice (May 2009) Blackwell Publishing Ltd. 63 5 691-697 [11] McNamee, R. “Regression modeling and other methods to control confounding”. Occupational & Environmental Medicine (2004) 62:500-506 doi:10.1136/oem.2002.001115 [12] Nelson et al. “Daily physical activity predicts degree of insulin resistance: a cross-sectional observational study using the 2003–2004 National Health and Nutrition Examination Survey”. International Journal of Behavioral Nutrition and Physical Activity. (2013) 10:10 http://www.ijbnpa.org/content/10/1/10 [13] Rosenbaum and Rubin. “Assessing effects of confounding variables”. Journal of the Royal Statistical Society, Series B (1983) 11, 212-218. [14] Wagner, Clifford H. “Simpson’s Paradox in Real Life.” The American Statistician (February 1982) 36 (1): 4648. 7