Regression and Random Confounding Brian Knaeble University of Utah August 2013 Abstract This paper shows how the probability, for a random confounding factor to reverse the estimate of ordinary regression, decreases exponentially with the sample size. Keywords: Model uncertainty, Complementary error function, Thales’ theorem, Confounding 1 Introduction Writing in the Journal of the Royal Statistical Society, Chris Chatfield broadly states, “model uncertainty is a fact of life and likely to be more serious than other sources of uncertainty which have received far more attention from statisticians [3].” More pointedly, Hosman et al. in the Annals of Applied Statistics claim that “when regression results are questioned, it is often the nonconfounding assumption that is the focus of doubt [6].” Despite these concerns, and even in the presence of clear potential for omitted-variable bias, results of scientific papers are often presented as if the model were certain. See [4], [7], [11], [10] and [2] for examples. This paper studies the simple case of confounding of a relationship between two variables, through the presence of a third variable. Assume that n, 3-dimensional observations have been made resulting in the vectors of data, x, y and w, which we assume have been centered. The observed correlation, r(x, y), provides a crude estimate for an association between X and Y , but it may be susceptible to confounding by the lurking variable W . Utilizing the information contained within w, multiple regression and the principle of least squares leads to an adjusted estimate, β̂x|w , for the unique effect of X on Y , while controlling for W . Both β̂x = r(x, y)sy /sx and β̂x|w estimate a supposedly true, unique effect, βx . While r(y, x) is scale invariant, β̂x and β̂x|w are not, however, it is possible to concern ourselves simply with the direction of the estimates, specifically sign(β̂x|w ) and sign(βˆx ) = sign(r(y, x)). For r(y, x) 6= 0 we assume without loss of generality that r(y, x) > 0, and then ask for the probability of β̂x|w < 0. In other words, we ask the question: what is the probability that controlling for a 1 random confounding factor leads to an adjusted estimate opposite in direction of the original estimate. Working with centered y √6= 0 and standardized x and w, meaning ȳ = y , xi > 0, and x̄ = w̄ = 0 and |x| = |w| = n − 1, we fix y and x so that h |y| n−2 we consider a random w on the sphere S√n−1 , distributed uniformly. By this we mean that it is distributed according to the unique, rotationally invariant, probability measure on that sphere (see [9], Theorem 7.23). Note also that the dimension, n − 2, for the sphere is appropriate, since centered vectors lie orthogonal to a vector of ones, and thus within an n − 1 dimensional space. The probability, P , of w being positioned so that β̂x|w < 0, given 0 < r(y, x) < 1, is a function of the sample size n. With r = r(x, y) and n ≥ 6 the following statement is possible. n−2 (n−2)/2 1 1−r 1−r 1 < P (n) < √ Theorem 1.1. √ 2 π π 1+r As the sample size increases, the probability that a random vector causes an estimate to reverse its direction, decreases to zero exponentially. To suggest a rough comparison, suppose the model, Y = β0 + βx X + , is known with certainty to be true, with βx > 0. If we assume that the errors are unbiased, uncorrelated, and have the same variance σ 2 , then because we are working with standardized, explanatory vectors of data, the least-squares estimate β̂x has variance σ 2 /(n − 1) (see [12], Section 3.2). Assuming normality as well allows us to conclude that β̂x =d N (βx , σ 2 /(n − 1)) (see [12], Section 3.4). Since for x > 0 the Gaussian, complementary error function satisfies Z ∞ 2 2 x 1 −x2 /2 1 1 1 √ √ e−t /2 dt < √ e−x /2 , e < (1.1) x2 + 1 2π x 2π 2π x (see [5]) we can conclude that P β̂x < 0 | βx > 0 is decreasing exponentially in the sample size as well. The upper bound of (1.1) implies P β̂x < 0 | βx > 0 < (βx √ √ 2 1 1 √ e−(βx n−1/σ) /2 , n − 1/σ) 2π while the lower bound from Theorem 1.1 ensures n−2 1 1 − r(x, y) √ < P β̂x|w < 0 | β̂x > 0 , 2 π where we have temporarily written P β̂x|w < 0 | β̂x > 0 in place of P (n). Thus, asymptotically, P β̂x < 0 | βx > 0 < P β̂x|w < 0 | β̂x > 0 , 2 2 2 as long as e−βx /(2σ ) < (1 − r(x, y))/2. This observation alludes to the dangers of over-fitting: for any r(x, y), situations may exist, where up to sign, upon accounting for a random w, the probability of the adjusted estimate being correct is less than the probability of the original estimate being correct. Conversely, the upper bound for P (n) has interesting implications as well. Suppose for example, as part of a model selection procedure, that investigators notice sign(β̂x|w ) 6= sign(β̂x ), in the presence of a modest |r(x, y)| = 1/3. According to the theorem, the probability of such an occurrence must be less than 1/2(n−2)/2 , where n is the sample size. For large n this probability may be small enough to warrant a rejection of the assumption of a uniformly random w. Thus, investigators are lead to serious consideration of W as a confounding factor. 2 Caps on a Sphere For the purpose of proving Theorem 1.1 we can assume that all three vectors, y, x and w, have each been geometrically standardized, meaning they are centered and position on a unit sphere. Definition 2.1. The (d − 1)-dimensional sphere of radius a is Sad−1 = (u1 , u2 , ..., ud ) ∈ Rd : u21 + u22 + ... + u2d = a2 . Definition 2.2. The d-dimensional ball of radius a to be Bad = (u1 , u2 , ..., ud ) ∈ Rd : u21 + u22 + ... + u2d < a2 . Remark 2.1. We use the same notation for embedded such objects. Remark 2.2. When a = 1 we drop the subscript. Definition 2.3. For nonzero vectors u1 and u2 , define the angle between them, θ, to be hu1 , u2 i . θ(u1 , u2 ) = cos−1 |u1 ||u2 | Definition 2.4. Given a point, p, on the sphere S k−1 , and an angle φ, where 0 < φ < π/2, define the cap, C(p, φ), as C(p, φ) = s ∈ S k−1 : θ(s, p) < φ . Definition 2.5. Given a cap, C(p, φ), of the sphere S k−1 , define its negative, −C(p, φ), as −C(p, φ) = s ∈ S k−1 : θ(−s, p) < φ . Definition 2.6. Given a cap, C(p, φ), of the sphere S k−1 , and also the cap’s negative, −C(p, φ), define their union, K(p, φ), as K(p, φ) = C(p, φ) ∪ −C(p, φ). 3 Proposition 2.1. Let x, y ∈ S k−1 : 0 < r(x, y) < 1. Let w ∈ S k−1 . Then, β̂x|w < 0 ⇐⇒ w ∈ K(x + y, θ(x, y)/2). Proof. The truth of Proposition 2.1 follows from Thales’ theorem in geometry. For the details consult Section 3 of the paper “Reversals of Least-Squares Estimates” [8]. 3 Exponentials In this section we use Proposition 2.1 to prove Theorem 1.1, by setting k = n−1 and estimating the proportion, P (k), of the (k−1)-volume of S k−1 , that is taken up by K(x + y, θ(x, y)/2). This can be done using techniques of modern convex geometry [1]. In what follows “∂” denotes the boundary of a set,“◦” denotes the interior of a set, and µ denotes the measure of a set. Note that ∂Bad = Sad−1 . Lemma 3.1. Let x, y ∈ S k−1 : 0 < r(x, y) < 1. For all k ≥ 2, µ(K(x + y, θ(x, y))) 1 >√ µ(S k−1 ) 2π 1 − r(x, y) 2 k−1 . Proof. Without loss of generality assume that (x + py)/2 = (t, 0, 0, ..., 0), for some t > 0. Elementary trigonometry ensures t = (r(x, y) + 1)/2, and also q q that |(x − y)/2| = 1−r(x,y) . Let s stand for 2 Due to the curvature of S k−1 , 1−r(x,y) . 2 µ(K(x + y, θ(x, y))) > 2µ Bsk−1 . d/2 2π Also, µ(B d ) = dΓ(1+d/2) , µ(∂B d ) = Lecture 1). Therefore, 2π d/2 Γ(1+d/2) , 2µ Bsk−1 µ(K(x + y, θ(x, y))) > = µ(S k−1 ) µ(∂B k ) 2sk−1 2π (k−1)/2 (k−1)Γ(1/2+k/2) 2π 1+k/2 Γ(1+k/2) 2sk−1 Γ(1 + k/2) π(k − 1) Γ(1/2 + k/2) 2 sk−1 3k − 5 √ π (k − 1) 2k − 2 (k−1)/2 1 3k − 5 1 − r(x, y) √ 2 π (k − 1)2 (k−1)/2 (k−1)/2 1 1 − r(x, y) 1 − r(x, y) √ 2 2 π k−1 1 1 − r(x, y) √ . 2 π =√ > = ≥ = d/2 2π and µ(Bad ) = ad dΓ(1+d/2) (see [1], 4 The third inequality is due to the hypothesis k ≥ 2. The middle inequality is due to the following argument. In general, Γx = (x − 1)Γ(x − 1), resulting in Γ(x)−Γ(x−1) = (x−1)Γ(x−1)−Γ(x−1) = Γ(x−1)((x−1)−1) = Γ(x−1)(x−2). By convexity, Γ(x + 1/2) is thus greater than Γ(x) + Γ(x − 1)(x − 2)/2, and with the substitution of Γ(x)/(x − 1) for Γ(x − 1) in the latter expression, the inequality simplifies to Γ(x + 1/2) > Γ(x) + Γ(x)(x − 2)/(2(x − 1)). Dividing by Γ(x) then results in Γ(x+1/2)/Γ(x) > 1+(x−2)/(2(x−1)) = (3x−4)/(2x−2). With x = 1/2 + k/2, the result is Γ(k/2)/Γ((k − 1)/2) > (3k − 5)/(2k − 2). Lemma 3.2. Let x, y ∈ S k−1 . Let 0 < r(x, y) < 1. For all k ≥ 5, (k−1)/2 1 − r(x, y) µ(K(x + y, θ(x, y))) 1 <√ . µ(S k−1 ) π 1 + r(x, y) Proof. Without loss of generality assume that (x + py)/2 = (t, 0, 0, ..., 0), for (r(x, y) + 1)/2, and also some t > 0. Elementary trigonometry ensures t = q q 1−r(x,y) 1−r(x,y) . Let s stand for 1+r(x,y) . that |(x − y)/2| = 2 Radial projection (from the origin) of C(x+y, θ(x, y)) onto {(u1 , u2 , ..., uk ) ∈ Rk : u1 = 1} ∼ = Rk−1 results in the ball Bsk−1 , so that µ(K(x + y, θ(x, y))) < 2µ Bsk−1 . Also, µ(B d ) = Therefore, 2π d/2 dΓ(1+d/2) , µ(∂B d ) = 2π d/2 Γ(1+d/2) , 2µ Bsk−1 µ(K(x + y, θ(x, y))) < = µ(S k−1 ) µ(∂B k ) d/2 2π and µ(Bad ) = ad dΓ(1+d/2) . 2sk−1 2π (k−1)/2 (k−1)Γ(1/2+k/2) 2π k/2 Γ(1+k/2) Γ(1 + k/2) 2sk−1 π(k − 1) Γ(1/2 + k/2) 2 sk−1 k + 3 √ π (k − 1) 4 2 sk−1 k + 3 √ π 4 k−1 2 sk−1 2 √ π 4 1 k−1 s √ . π =√ < = ≤ = The third inequality is due to the hypothesis k ≥ 5. The middle inequality is due to the following argument. In general, Γ(x + 1) = xΓ(x). By convexity, Γ(x + 1/2) < (Γ(x) + Γ(x + 1))/2. Thus, Γ(x + 1/2) < (Γ(x) + xΓ(x))/2. Dividing by Γ(x) results in Γ(x + 1/2)/Γ(x) < (1 + x)/2. With x = 1/2 + k/2, Γ(k/2)/Γ((k − 1)/2) < (k + 3)/4. 5 Since both µ(K(x+y,θ(x,y))) and µ(K(x+y,θ(x,y))) are expressions for the proµ(S k−1 ) µ(S k−1 ) portion, P (k), after rewriting in terms of n = k + 1, the truth of Theorem 1.1 follows directly from the lemmas. References [1] Ball, Keith. An elementary introduction to modern convex geometry. Flavors of geometry, 1–58, Math. Sci. Res. Inst. Publ., 31, Cambridge Univ. Press, Cambridge, 1997. MR1491097 (99f:52002) [2] Cervellati et al. Bone mass density selectively correlates with serum markers of oxidative damage in post-menopausal women. Clinical Chemistry and Laboratory Medicine (2012) Volume 51, Issue 2, Pages 333-338 Hosman, Carrie A.; Hansen, Ben B.; Holland, Paul W. The sensitivity of linear regression coefficients’ confidence limits to the omission of a confounder. Ann. Appl. Stat. 4 (2010), no. 2, 849–870. MR2758424 [3] Chatfield, Chris. Model Uncertainty, Data Mining and Statistical Inference. Journal of the Royal Statistical Society: Series A (1995), A 158, Part 3, pp. 419466 [4] Davis et al. Rice Consumption and Urinary Arsenic Concentrations in U.S. Children. Environmental Health Perspectives (October 2012), Vol.120, Issue 10, p1418–1424 [5] Gordon, Robert D. Values of Mills’ ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Ann. Math. Statistics 12, (1941). 364–366. MR0005558 (3,171e) [6] Hosman, Carrie A.; Hansen, Ben B.; Holland, Paul W. The sensitivity of linear regression coefficients’ confidence limits to the omission of a confounder. Ann. Appl. Stat. 4 (2010), no. 2, 849–870. MR2758424 [7] Jungert et al. Serum 25-hydroxyvitamin D3 and body composition in an elderly cohort from Germany: a crosssectional study. Nutrition & Metabolism. (2012), 9:42 http://www.nutritionandmetabolism.com/content/9/1/42 [8] Knaeble, Brian. Reversals of Least-Squares Estimates. Manuscript submitted for publication (2013) [9] Khoshnevisan, Davar. (2010) Probability American Mathematical Society [10] Lignell et al. Prenatal exposure to polychlorinated biphenyls and polybriminated diphenyl ethers may influence birth weight among infants in a Swedish cohort with background exposure: a cross-sectional study. Environmental Health (2013), 12:44 http://www.ehjournal.net/content/12/1/44 6 [11] Nelson et al. Daily physical activity predicts degree of insulin resistance: a cross-sectional observational study using the 2003–2004 National Health and Nutrition Examination Survey. International Journal of Behavioral Nutrition and Physical Activity. (2013), 10:10 http://www.ijbnpa.org/content/10/1/10 [12] Seber, George A.F. and Alan J. Lee. (2003) Linear Regression Analysis. Hoboken, New Jersey: John Wiley & Sons, Inc. 7