Regression and Random Confounding Brian Knaeble University of Utah August 2013

advertisement
Regression and Random Confounding
Brian Knaeble
University of Utah
August 2013
Abstract
This paper shows how the probability, for a random confounding factor
to reverse the estimate of ordinary regression, decreases exponentially with
the sample size.
Keywords: Model uncertainty, Complementary error function, Thales’
theorem, Confounding
1
Introduction
Writing in the Journal of the Royal Statistical Society, Chris Chatfield broadly
states, “model uncertainty is a fact of life and likely to be more serious than other
sources of uncertainty which have received far more attention from statisticians
[3].” More pointedly, Hosman et al. in the Annals of Applied Statistics claim
that “when regression results are questioned, it is often the nonconfounding
assumption that is the focus of doubt [6].” Despite these concerns, and even
in the presence of clear potential for omitted-variable bias, results of scientific
papers are often presented as if the model were certain. See [4], [7], [11], [10]
and [2] for examples. This paper studies the simple case of confounding of a
relationship between two variables, through the presence of a third variable.
Assume that n, 3-dimensional observations have been made resulting in the
vectors of data, x, y and w, which we assume have been centered. The observed
correlation, r(x, y), provides a crude estimate for an association between X and
Y , but it may be susceptible to confounding by the lurking variable W . Utilizing
the information contained within w, multiple regression and the principle of
least squares leads to an adjusted estimate, β̂x|w , for the unique effect of X
on Y , while controlling for W . Both β̂x = r(x, y)sy /sx and β̂x|w estimate a
supposedly true, unique effect, βx .
While r(y, x) is scale invariant, β̂x and β̂x|w are not, however, it is possible to concern ourselves simply with the direction of the estimates, specifically
sign(β̂x|w ) and sign(βˆx ) = sign(r(y, x)). For r(y, x) 6= 0 we assume without loss
of generality that r(y, x) > 0, and then ask for the probability of β̂x|w < 0. In
other words, we ask the question: what is the probability that controlling for a
1
random confounding factor leads to an adjusted estimate opposite in direction
of the original estimate.
Working with centered y √6= 0 and standardized x and w, meaning ȳ =
y
, xi > 0, and
x̄ = w̄ = 0 and |x| = |w| = n − 1, we fix y and x so that h |y|
n−2
we consider a random w on the sphere S√n−1 , distributed uniformly. By this
we mean that it is distributed according to the unique, rotationally invariant,
probability measure on that sphere (see [9], Theorem 7.23). Note also that
the dimension, n − 2, for the sphere is appropriate, since centered vectors lie
orthogonal to a vector of ones, and thus within an n − 1 dimensional space. The
probability, P , of w being positioned so that β̂x|w < 0, given 0 < r(y, x) < 1,
is a function of the sample size n. With r = r(x, y) and n ≥ 6 the following
statement is possible.
n−2
(n−2)/2
1
1−r
1−r
1
< P (n) < √
Theorem 1.1. √
2
π
π 1+r
As the sample size increases, the probability that a random vector causes
an estimate to reverse its direction, decreases to zero exponentially. To suggest
a rough comparison, suppose the model, Y = β0 + βx X + , is known with
certainty to be true, with βx > 0. If we assume that the errors are unbiased,
uncorrelated, and have the same variance σ 2 , then because we are working with
standardized, explanatory vectors of data, the least-squares estimate β̂x has
variance σ 2 /(n − 1) (see [12], Section 3.2). Assuming normality as well allows
us to conclude that β̂x =d N (βx , σ 2 /(n − 1)) (see [12], Section 3.4). Since for
x > 0 the Gaussian, complementary error function satisfies
Z ∞
2
2
x
1 −x2 /2
1
1 1
√
√ e−t /2 dt < √ e−x /2 ,
e
<
(1.1)
x2 + 1 2π
x
2π
2π
x
(see [5]) we can conclude that P β̂x < 0 | βx > 0 is decreasing exponentially
in the sample size as well.
The upper bound of (1.1) implies
P β̂x < 0 | βx > 0 <
(βx
√
√
2
1
1
√ e−(βx n−1/σ) /2 ,
n − 1/σ) 2π
while the lower bound from Theorem 1.1 ensures
n−2
1
1 − r(x, y)
√
< P β̂x|w < 0 | β̂x > 0 ,
2
π
where we have temporarily written P β̂x|w < 0 | β̂x > 0 in place of P (n).
Thus, asymptotically,
P β̂x < 0 | βx > 0 < P β̂x|w < 0 | β̂x > 0 ,
2
2
2
as long as e−βx /(2σ ) < (1 − r(x, y))/2. This observation alludes to the dangers
of over-fitting: for any r(x, y), situations may exist, where up to sign, upon accounting for a random w, the probability of the adjusted estimate being correct
is less than the probability of the original estimate being correct.
Conversely, the upper bound for P (n) has interesting implications as well.
Suppose for example, as part of a model selection procedure, that investigators notice sign(β̂x|w ) 6= sign(β̂x ), in the presence of a modest |r(x, y)| = 1/3.
According to the theorem, the probability of such an occurrence must be less
than 1/2(n−2)/2 , where n is the sample size. For large n this probability may be
small enough to warrant a rejection of the assumption of a uniformly random
w. Thus, investigators are lead to serious consideration of W as a confounding
factor.
2
Caps on a Sphere
For the purpose of proving Theorem 1.1 we can assume that all three vectors, y,
x and w, have each been geometrically standardized, meaning they are centered
and position on a unit sphere.
Definition 2.1. The (d − 1)-dimensional sphere of radius a is
Sad−1 = (u1 , u2 , ..., ud ) ∈ Rd : u21 + u22 + ... + u2d = a2 .
Definition 2.2. The d-dimensional ball of radius a to be
Bad = (u1 , u2 , ..., ud ) ∈ Rd : u21 + u22 + ... + u2d < a2 .
Remark 2.1. We use the same notation for embedded such objects.
Remark 2.2. When a = 1 we drop the subscript.
Definition 2.3. For nonzero vectors u1 and u2 , define the angle between them,
θ, to be
hu1 , u2 i
.
θ(u1 , u2 ) = cos−1
|u1 ||u2 |
Definition 2.4. Given a point, p, on the sphere S k−1 , and an angle φ, where
0 < φ < π/2, define the cap, C(p, φ), as
C(p, φ) = s ∈ S k−1 : θ(s, p) < φ .
Definition 2.5. Given a cap, C(p, φ), of the sphere S k−1 , define its negative,
−C(p, φ), as
−C(p, φ) = s ∈ S k−1 : θ(−s, p) < φ .
Definition 2.6. Given a cap, C(p, φ), of the sphere S k−1 , and also the cap’s
negative, −C(p, φ), define their union, K(p, φ), as
K(p, φ) = C(p, φ) ∪ −C(p, φ).
3
Proposition 2.1. Let x, y ∈ S k−1 : 0 < r(x, y) < 1. Let w ∈ S k−1 . Then,
β̂x|w < 0 ⇐⇒ w ∈ K(x + y, θ(x, y)/2).
Proof. The truth of Proposition 2.1 follows from Thales’ theorem in geometry. For the details consult Section 3 of the paper “Reversals of Least-Squares
Estimates” [8].
3
Exponentials
In this section we use Proposition 2.1 to prove Theorem 1.1, by setting k = n−1
and estimating the proportion, P (k), of the (k−1)-volume of S k−1 , that is taken
up by K(x + y, θ(x, y)/2). This can be done using techniques of modern convex
geometry [1]. In what follows “∂” denotes the boundary of a set,“◦” denotes the
interior of a set, and µ denotes the measure of a set. Note that ∂Bad = Sad−1 .
Lemma 3.1. Let x, y ∈ S k−1 : 0 < r(x, y) < 1. For all k ≥ 2,
µ(K(x + y, θ(x, y)))
1
>√
µ(S k−1 )
2π
1 − r(x, y)
2
k−1
.
Proof. Without loss of generality assume that (x +
py)/2 = (t, 0, 0, ..., 0), for
some t > 0. Elementary
trigonometry
ensures
t
=
(r(x, y) + 1)/2, and also
q
q
that |(x − y)/2| = 1−r(x,y)
. Let s stand for
2
Due to the curvature of S k−1 ,
1−r(x,y)
.
2
µ(K(x + y, θ(x, y))) > 2µ Bsk−1 .
d/2
2π
Also, µ(B d ) = dΓ(1+d/2)
, µ(∂B d ) =
Lecture 1). Therefore,
2π d/2
Γ(1+d/2) ,
2µ Bsk−1
µ(K(x + y, θ(x, y)))
>
=
µ(S k−1 )
µ(∂B k )
2sk−1 2π (k−1)/2
(k−1)Γ(1/2+k/2)
2π 1+k/2
Γ(1+k/2)
2sk−1
Γ(1 + k/2)
π(k − 1) Γ(1/2 + k/2)
2 sk−1 3k − 5
√
π (k − 1) 2k − 2
(k−1)/2
1 3k − 5
1 − r(x, y)
√
2
π (k − 1)2
(k−1)/2 (k−1)/2
1
1 − r(x, y)
1 − r(x, y)
√
2
2
π
k−1
1
1 − r(x, y)
√
.
2
π
=√
>
=
≥
=
d/2
2π
and µ(Bad ) = ad dΓ(1+d/2)
(see [1],
4
The third inequality is due to the hypothesis k ≥ 2. The middle inequality
is due to the following argument. In general, Γx = (x − 1)Γ(x − 1), resulting in
Γ(x)−Γ(x−1) = (x−1)Γ(x−1)−Γ(x−1) = Γ(x−1)((x−1)−1) = Γ(x−1)(x−2).
By convexity, Γ(x + 1/2) is thus greater than Γ(x) + Γ(x − 1)(x − 2)/2, and
with the substitution of Γ(x)/(x − 1) for Γ(x − 1) in the latter expression, the
inequality simplifies to Γ(x + 1/2) > Γ(x) + Γ(x)(x − 2)/(2(x − 1)). Dividing by
Γ(x) then results in Γ(x+1/2)/Γ(x) > 1+(x−2)/(2(x−1)) = (3x−4)/(2x−2).
With x = 1/2 + k/2, the result is Γ(k/2)/Γ((k − 1)/2) > (3k − 5)/(2k − 2).
Lemma 3.2. Let x, y ∈ S k−1 . Let 0 < r(x, y) < 1. For all k ≥ 5,
(k−1)/2
1 − r(x, y)
µ(K(x + y, θ(x, y)))
1
<√
.
µ(S k−1 )
π 1 + r(x, y)
Proof. Without loss of generality assume that (x +
py)/2 = (t, 0, 0, ..., 0), for
(r(x, y) + 1)/2, and also
some t > 0. Elementary
trigonometry
ensures
t
=
q
q
1−r(x,y)
1−r(x,y)
. Let s stand for 1+r(x,y) .
that |(x − y)/2| =
2
Radial projection (from the origin) of C(x+y, θ(x, y)) onto {(u1 , u2 , ..., uk ) ∈
Rk : u1 = 1} ∼
= Rk−1 results in the ball Bsk−1 , so that
µ(K(x + y, θ(x, y))) < 2µ Bsk−1 .
Also, µ(B d ) =
Therefore,
2π d/2
dΓ(1+d/2) ,
µ(∂B d ) =
2π d/2
Γ(1+d/2) ,
2µ Bsk−1
µ(K(x + y, θ(x, y)))
<
=
µ(S k−1 )
µ(∂B k )
d/2
2π
and µ(Bad ) = ad dΓ(1+d/2)
.
2sk−1 2π (k−1)/2
(k−1)Γ(1/2+k/2)
2π k/2
Γ(1+k/2)
Γ(1 + k/2)
2sk−1
π(k − 1) Γ(1/2 + k/2)
2 sk−1 k + 3
√
π (k − 1) 4
2 sk−1 k + 3
√
π 4 k−1
2 sk−1 2
√
π 4 1
k−1
s
√ .
π
=√
<
=
≤
=
The third inequality is due to the hypothesis k ≥ 5. The middle inequality
is due to the following argument. In general, Γ(x + 1) = xΓ(x). By convexity,
Γ(x + 1/2) < (Γ(x) + Γ(x + 1))/2. Thus, Γ(x + 1/2) < (Γ(x) + xΓ(x))/2.
Dividing by Γ(x) results in Γ(x + 1/2)/Γ(x) < (1 + x)/2. With x = 1/2 + k/2,
Γ(k/2)/Γ((k − 1)/2) < (k + 3)/4.
5
Since both µ(K(x+y,θ(x,y)))
and µ(K(x+y,θ(x,y)))
are expressions for the proµ(S k−1 )
µ(S k−1 )
portion, P (k), after rewriting in terms of n = k + 1, the truth of Theorem 1.1
follows directly from the lemmas.
References
[1] Ball, Keith. An elementary introduction to modern convex geometry. Flavors of geometry, 1–58, Math. Sci. Res. Inst. Publ., 31, Cambridge Univ.
Press, Cambridge, 1997. MR1491097 (99f:52002)
[2] Cervellati et al. Bone mass density selectively correlates with serum markers
of oxidative damage in post-menopausal women. Clinical Chemistry and
Laboratory Medicine (2012) Volume 51, Issue 2, Pages 333-338
Hosman, Carrie A.; Hansen, Ben B.; Holland, Paul W. The sensitivity of
linear regression coefficients’ confidence limits to the omission of a confounder. Ann. Appl. Stat. 4 (2010), no. 2, 849–870. MR2758424
[3] Chatfield, Chris. Model Uncertainty, Data Mining and Statistical Inference.
Journal of the Royal Statistical Society: Series A (1995), A 158, Part 3,
pp. 419466
[4] Davis et al. Rice Consumption and Urinary Arsenic Concentrations in U.S.
Children. Environmental Health Perspectives (October 2012), Vol.120, Issue 10, p1418–1424
[5] Gordon, Robert D. Values of Mills’ ratio of area to bounding ordinate and
of the normal probability integral for large values of the argument. Ann.
Math. Statistics 12, (1941). 364–366. MR0005558 (3,171e)
[6] Hosman, Carrie A.; Hansen, Ben B.; Holland, Paul W. The sensitivity of
linear regression coefficients’ confidence limits to the omission of a confounder. Ann. Appl. Stat. 4 (2010), no. 2, 849–870. MR2758424
[7] Jungert et al. Serum 25-hydroxyvitamin D3 and body composition in an elderly cohort from Germany:
a crosssectional
study.
Nutrition
&
Metabolism.
(2012),
9:42
http://www.nutritionandmetabolism.com/content/9/1/42
[8] Knaeble, Brian. Reversals of Least-Squares Estimates. Manuscript submitted for publication (2013)
[9] Khoshnevisan, Davar. (2010) Probability American Mathematical Society
[10] Lignell et al. Prenatal exposure to polychlorinated biphenyls
and polybriminated diphenyl ethers may influence birth weight
among infants in a Swedish cohort with background exposure: a cross-sectional study. Environmental Health (2013), 12:44
http://www.ehjournal.net/content/12/1/44
6
[11] Nelson et al. Daily physical activity predicts degree of insulin resistance: a cross-sectional observational study using the 2003–2004 National Health and Nutrition Examination Survey. International Journal of Behavioral Nutrition and Physical Activity. (2013), 10:10
http://www.ijbnpa.org/content/10/1/10
[12] Seber, George A.F. and Alan J. Lee. (2003) Linear Regression Analysis.
Hoboken, New Jersey: John Wiley & Sons, Inc.
7
Download