Reversals of Least-Squares Estimates Brian Knaeble University of Utah June 2013

advertisement
Reversals of Least-Squares Estimates
Brian Knaeble
University of Utah
June 2013
Abstract
An adjusted estimate may be opposite the original estimate. This
paper presents necessary and sufficient conditions for such a reversal, in
the context of linear modeling, where adjustment is obtained through
considering additional explanatory data associated with lurking variables.
Keywords: Yule–Simpson effect, Thales’ theorem, Multiple regression,
Confounding
1
Introduction
The nature of an association between real-valued random variables X1 and Y ,
can be crudely estimated through observing vectors of data, x1 and y, and
then computing a correlation coefficient r(x1 , y). However, in general, such an
estimate is susceptible to confounding: additional data, x2 , may lead to an
adjusted estimate that is opposite the original estimate.
In this paper we study such reversals in the context of least-squares fitting of
linear models. Given x1 , y, and the fitted coefficient β̂1 6= 0, we seek necessary
and sufficient conditions on x2 , so that the adjusted estimate, denoted with
2,1 β̂1 , is opposite the original estimate β̂1 , which we henceforth denote with
1 β̂1 . Throughout this paper, left subscripts indicate those explanatory variables
used to fit a model. Note also that we assume always that all vectors of data
are of equal length.
Visual intuition suggests that a reversal occurs if and only if the vector x2
lies “between” the vectors x1 and y. Geometric analysis in higher dimensions
then leads to the conclusion that this region is bounded by a right circular,
conical surface of two nappes (this geometric object will be described in detail
in Section 2). Such an understanding leads quickly to useful results.
The results are stated using “sign”, where sign of a positive real number is
one, sign of a negative real number is negative one, and sign of zero remains
undefined; the truth of sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) entails that both quantities are
defined. Note also that r(y, x1 + y) = r(x1 , x1 + y).
Theorem 1.1. When r(y, x1 ) 6= 0, a reversal, sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ), can
occur only if |r(x2 , y)| and |r(x2 , x1 )| are both greater than |r(x1 , y)|.
1
Theorem 1.2. When r(y, x1 ) 6= 0, a reversal, sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ), occurs
if and only if |r(x2 , x1 + y)| > |r(y, x1 + y)|.
These results can be generalized so as to regard the robustness of an (already)
adjusted estimate, against confounding due to multiple lurking variables. See
Theorems 5.1 and 5.2, in Section 5. In particular, a special case of Theorem 5.1
gives necessary and sufficient conditions for the Yule–Simpson effect. See [14],
[5] and [7] (Section 2)) for an overview of the Yule–Simpson effect.
Mathematics relating to reversals appears in literature from the 1950s regarding smoking and lung cancer (see [2]), and there remains potential for application of such mathematics within epidemiology today: see [10] and [11] for
educational articles, and any of [3], [6], [12], [8] or [1] for scientific papers. For
related mathematical papers see [13], [9] or [4].
2
Geometry
This section presents the geometric foundation for the proofs of Theorems 1–
4. Notation has been chosen so as to best convey the ideas behind the proofs.
Readily-verified lemmas are stated without proof.
2.1
Angles and Objects
Definition 2.1. Let u1 and u2 be two nonzero vectors within an inner product
space. Define the angle between them, θ, to be
hu1 , u2 i
−1
.
θ(u1 , u2 ) = cos
|u1 ||u2 |
Definition 2.2. Let u ∈ Rp be a generic vector with coordinates (u1 , u2 , ..., up ).
For a fixed θ, with 0 < θ < π/2, define the following geometric objects:
q




u21 + u22 + ... + u2p−1
≤ tan(θ/2) ,
Vθ := u ∈ Rp :


|up |
Bθ := u ∈ Rp : up = ± cos(θ/2), u21 + u22 + ... + u2p−1 ≤ sin2 (θ/2) .
The boundary of Vθ is a right circular, conical surface of two nappes, and
Bθ is the intersection of Vθ with the union of two affine planes: with A+ =
{(u1 , u2 , ..., up ) ∈ Rn : uk = cos(θ/2)} ∼
= Rp−1 and A− = {(u1 , u2 , ..., up ) ∈ Rn :
+
p−1
∼
up = − cos(θ/2)} = R
define Bθ = Vθ ∩ A+ and Bθ− = Vθ ∩ A− so that
+
−
Bθ = Bθ ∪ Bθ . Similarly, Vθ+ and Vθ− to represent the upper (xp > 0) and
lower (xp < 0) portions of Vθ .
We use “∂” to signal the boundary of a set and “◦” to signal the interior of a
set. Expressions such as ∂Vθ and Vθ◦ have meanings that are readily understood.
For Bθ+ , however, since it is embedded within A+ , which itself is embedded
within Rp , we should specify that the topological operations are carried out
2
within A+ , and likewise for Bθ− ⊂ A− and Bθ ⊂ A. Thus, ∂Bθ = A ∩ ∂Vθ and
Bθ◦ = A ∩ Vθ◦ , etc.
The boundary ∂Vθ separates Rp into the complement of Vθ , namely Vθc ,
and the interior of Vθ , Vθ◦ . Furthermore, the interior Vθ◦ is separated by the
+
−
origin into what we will call Vθ◦ and Vθ◦ . In summary, the complement of the
+
−
boundary of Vθ consists of three seperate regions: ∂Vθc = Vθc ∪ Vθ◦ ∪ Vθ◦ .
2.2
Projections
Definition 2.3. Given an affine subspace L ⊆ Rd consider u ∈ Rd , u 6∈ L. The
projection of u onto L is
pL (u) = argmin(|u − l|).
l∈L
Lemma 2.1. Given an affine subspace L ⊆ Rd consider u ∈ Rd , u 6∈ L. A
point p ∈ L is pL (u) if and only if for all l ∈ L we have (l − p) ⊥ (u − p).
Definition 2.4. Given a set of vectors, {xk }k∈K , define the vector subspace
KL
= span({xk }k∈K }).
Lemma 2.2. Consider a vector subspace K L ⊆ Rd and a vector y ∈ Rd , y 6∈
d
K L. If q ∈ R and q 6∈ K L then |y − p q,K L (y)| < |y − p K L (y)|.
Proposition 2.1. Set ed = (0, 0, ..., 1) ∈ Rd . Consider y ∈ Rd and x1 ∈ Rd
with hy, ed , i > 0 and hx1 , ed i > 0. If a nonzero x2 ∈ Rd satisfies x2 ⊥ ed then
6 0.
hp 2,1 L (y), ed i =
Proof. Define Ed = {u ∈ Rd : u ⊥ ed } so that y 6∈ Ed , x1 6∈ Ed and x2 ∈ Ed .
Observe that {span({x1 , x2 }) ∩ Ed } is one-dimensional, and thus by Lemma
2.1, hp 2,1 L (y), ed i = 0 =⇒ p 2,1 L (y) = p 2 L (y). This however contradicts the
strict inequality in Lemma 2.2, so we reject hp 2,1 L (y), ed i = 0 and conclude
hp 2,1 L (y), ed i =
6 0.
Theorem 2.1 (Thales’ theorem). Consider the geometric sphere:
Srm = {(u1 , u2 , ..., um ) ∈ Rm : u21 + u22 + u23 = r2 > 0}.
For three points on Srm , namely t, its opposite −t, and another point s, the
following holds:
(s − t) ⊥ (s − (−t)).
Thales’ theorem is typically proved for m = 2 using elementary geometry of
angles and triangles; the higher-dimensional version that we have stated follows
since span({t, −t, v}) ∼
= R2 and (span({t, −t, s}) ∩ Srm ) ∼
= Sr2 . Note that the
theorem also holds for lower dimensional (q < m), embedded spheres (Srq ,→
Rm ), that are not necessarily centered at the origin. For example, the theorem
holds when applied on ∂Bθ .
3
3
Zeros
This section builds on the development of the previous section. Fixing y and
x1 within Rn , we characterize the level set {x2 ∈ Rn : 2,1 β̂1 = 0}, as a step
toward identification of those x2 that are associated with reversals. The analysis
proceeds cleanly if we assume centered (mean zero) vectors of data. Tildes
indicate centered data.
3.1
Space of Centered Vectors
Assume ỹ, x̃1 and x̃2 , each within Rn , with 0 < θ(ỹ, x̃1 ) < π/2. Being centered, each vector is orthogonal to e = (1, 1, ..., 1), and thus span({ỹ, x̃1 , x̃2 }) is
contained within an (n − 1)-dimensional, vector subspace E.
Place orthonormal coordinates on E so that ỹ/|ỹ|+ x̃1 /|x̃1 | can be expressed
as (0, 0, ..., 2 cos(θ(ỹ, x̃1 )/2)). Set p = n − 1 so as to write E ∼
= Rp , suggesting
and allowing application of the remaining terminology from Section 2. Note
that ỹ/|ỹ| and x̃1 /|x̃1 | are opposites on ∂B + , and x̃2 is somewhere in E\0.
˜
Because 0 < θ(ỹ, x̃1 ) < π/2, 1 β̂1 > 0, but depending on the positioning of
˜
x̃2 , 2,1 β̂1 could be positive, negative, or zero.† We specify the “sign” function
to be “1” for positives, “−1” for negatives, and undefined for zero, so that
˜
˜
sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) signals a change from a positive to a negative or vice
˜
˜
˜
˜
versa: i.e. sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ sign( 2,1 β̂1 ) + sign( 1 β̂1 ) = 0.
˜
In case x̃2 /|x̃2 | = ±x̃1 /|x̃1 | think of x̃2 as dominating, and define 2,1 β̂2 to
˜
˜
be 2 β̂2 while 2,1 β̂1 = 0. For the unrealistic case x̃2 = 0, think of arrival along
˜
span({ỹ}), so that by continuity it makes sense to formally define 2,1 β̂1 = 0.
3.2
Geometric Results
Proposition 3.1. Let ỹ and x̃1 be such that 0 < θ(ỹ, x̃1 ) < π/2. For any x̃2 ,
˜
2,1 β̂1
= 0 ⇐⇒ x̃2 ∈ ∂Vθ(ỹ,x̃1 ) .
˜
Proof. First observe that x̃2 ∈ span({x̃1 )}) ∪ span({ỹ)}) =⇒ 2,1 β̂1 = 0. Also,
˜
by Proposition 2.1, 2,1 β̂1 6= 0 for any x̃2 6= 0 such that hx̃2 , (0, 0, ..., 1)i = 0.
˜
Furthermore, for c1 6= 0, the scaling, x̃2 7→ c1 x̃2 leaves 2,1 β̂1 unchanged. Thus,
we assume x̃2 ∈ A+ \{ỹ, x̃1 }.
Additionally, note that for c2 > 0, the scaling, p̃ 7→ p̃∗ = c2 p̃ leaves
˜
˜
˜
sign( 2,1 β̂1 ) unchanged, so we may consider 2,1 β̂1∗ (in place of 2,1 β̂1 ), where
˜∗
∗
2,1 β̂1 is the expression of p̃ in the basis {x̃1 , x̃2 }.
† In
general, K β̂k denotes the kth coefficient, when the projection of y onto the span of
explanatory vectors indexed by K, is expressed in terms of such basis vectors.
4
Also note that p̃∗ , by Lemma 2.1, is the projection of ỹ onto the space
{x̃2 + l(x̃1 − x̃2 )}l∈R . Denote with ˆl the value of the parameter l associated with
˜
p̃∗ , and note, since 0 < θ(ỹ, x̃1 ) < π/2, that sign( 2,1 β̂1∗ ) = sign(ˆl).
+
Finally, thinking of ˆl as a function of x̃2 ∈ A \{ỹ, x̃1 }, we can conclude that
sign(ˆl) = 0 ⇐⇒ x̃2 ∈ ∂B + by appealing to Thales’ theorem.
Proposition 3.2. Let ỹ and x̃1 be such that 0 < θ(ỹ, x̃1 ) < π/2. For any x̃2 ,
˜
2,1 β̂1
◦
< 0 ⇐⇒ x̃2 ∈ Vθ(ỹ,x̃
.
1)
˜
Proof. Given our setup, 2,1 β̂1 is a function, f , of x̃2 , and, by Proposition 3.1,
the zero set of f is ∂Vθ(ỹ,x̃1 ) . Also, as mentioned in Section 2.1, ∂Vθc = Vθc ∪
+
−
Vθ◦ ∪ Vθ◦ . Since, f is continuous we need only sample one representative
+
−
˜
point from each of Vθc , Vθ◦ and Vθ◦ , in order to determine the sign of 2,1 β̂1
throughout each region. The following points are conveniently selected to avoid
˜
projection, and direct computation reveals that x̃2 = ±ỹ + x̃1 =⇒ 2,1 β̂1 < 0
˜
and x̃2 = x̃1 − ỹ =⇒ 2,1 β̂1 > 0.
In case π/2 < θ(ỹ, x̃1 ) < π, replace x̃1 with x̃1 , so that Proposition 3.2 still
applies, effectively extended the range of application, resulting in the following
corollary.
Corollary 3.1. Let ỹ and x̃1 be such that θ(ỹ, x̃1 ) ∈ {(0, π/2) ∪ (π/2, π)}. For
any x̃2 ,
˜
˜
◦
2,1 β̂1 6= 1 β̂1 ⇐⇒ x̃2 ∈ Vθ(ỹ,x̃1 ) .
4
Correlations
In this section we translate the results of Section 3 into the language of correlations, in order to prove Theorems 1.1 and 1.2. We posit the observation of
y, x1 , and x2 , each within Rn . In case r(y, x1 ) = ±1, the truth of the theorems can be verified with the comments written at the end of Section 3.1, and
r(y, x1 ) 6= 0 is assumed in both theorems. Thus, henceforth, we tacitly assume
r(y, x1 ) ∈ {(−1, 0) ∪ (0, 1)}, ensuring applicability of Corollary 3.1.
Lemma 4.1. Let u1 and u2 be two Dvectors inERn , and let their centered versions
ũ1
, ũ2 = r(u1 , u2 ).
be denoted with ũ1 and ũ2 . Then, |ũ
1 | |ũ2 |
Given y and x1 projection determines their centered versions, ỹ and x̃1 .
Define h̃ = ỹ + x̃1 . Since projection and addition commute, we know that h̃ is
the centered version of h = y+xD1 . With Ethis notation,
D
E Corollary 3.1 asserts that
˜
˜
ỹ
x̃2
h̃
h̃
sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ |x̃2 | , |h̃| > |ỹ| , |h̃| , and by Lemma 4.1 this
˜
˜
becomes sign( 2,1 β̂1 ) 6= sign( 1 β̂1 ) ⇐⇒ r(x2 , h) > r(y, h), thus demonstrating
the truth of Theorem 1.2.
5
To prove Theorem 1.1, simply observe that within A the intersection F =
{r̃1 ∈ A : |r̃1 − ỹ| < sin(θ(ỹ, x̃1 )/2)} ∩ {r̃2 ∈ A : |r̃2 − x̃1 | < sin(θ(ỹ, x̃1 )/2)}
strictly contains B. Thus, by Corollary 3.1, the existence of a reversal implies
that there exists a c 6= 0 such that cx̃2 ∈ B ◦ ⊂ F . An interpretation of cx̃2 ∈ F
in terms of correlation coefficients produces the conclusion of Theorem 1.1.
5
Generalizations
If we denote with J R the positive square root of the coefficient of determination,
where J indexes a finite subset of the (perhaps very large) set of all conceivable,
explanatory vectors of data, then the following generalization of Theorem 1.2
can be stated.
Theorem 5.1. When r(y, x1 ) 6= 0, a reversal, sign( J,1 β̂1 ) 6= sign( 1 β̂1 ), occurs
if and only if J R > |r(y, x1 + y)|.
Theorem 5.1 is related to Theorem 1.2 and Proposition 3.2 through the
observation that a reversal occurs, in this more general setting, if and only
◦
. Alternatively, one can think of reversals as occurring
if p J,1 V (ỹ) ∈ Vθ(ỹ,x̃
1)
◦
precisely when span({x̃1 , {x̃j }j∈J }) intersects Vθ(ỹ,x̃
.
1)
We can generalize further. Let I index a finite, orthogonal† set of explanatory
vectors, and let J index a disjoint, finite set of additional explanatory vectors.
Theorem 5.2. For any i ∈ I, when r(y, xi ) 6= 0, a reversal, sign( J,I β̂i ) 6=
sign( I β̂i ), occurs if and only if J R > |r(y, xi + y)|.
References
[1] Cervellati et al. “Bone mass density selectively correlates with serum markers of oxidative damage in post-menopausal women.” Clinical Chemistry
and Laboratory Medicine (2012) Volume 51, Issue 2, Pages 333-338
[2] Cornfield et al. “Smoking and lung cancer: Recent evidence and a discussion of some questions”. Journal of the National Cancer Institute (1959)
22, 173-203
[3] Davis et al. “Rice Consumption and Urinary Arsenic Concentrations in U.S.
Children”. Environmental Health Perspectives. (October 2012), Vol.120, Issue 10, p1418-1424
[4] Hosman, Hansen and Holland. “The sensitivity of linear regression coefficients’ confidence limits to the omission of a confounder”. The Annals of
Applied Statistics (2010) Vol. 4, No. 2, 849-870
† The assumption of orthogonality allows for disregard of the supporting vectors, since they
lie orthogonal to the space of interest. For a specific example that shows the importance of
the assumption of orthogonality, and for a discussion of the practical applications of Theorem
5.2, see [7] (Table 1 and Sections 1 and 3).
6
[5] Julious and Mullee. “Confounding and Simpson’s paradox.” British Medical
Journal (12/03/1994) 309 (6967): 14801481.
[6] Jungert
et
al.
“Serum
25-hydroxyvitamin
D3
and
composition in an elderly cohort from Germany:
a
sectional
study”.
Nutrition
&
Metabolism.
(2012)
http://www.nutritionandmetabolism.com/content/9/1/42
body
cross9:42
[7] Knaeble, Brian. “Simple Conditions to Theoretically Limit the Potential
for Confounding of Least-Squares Estimates.” Manuscript submitted for
publication (2013)
[8] Lignell et al. “Prenatal exposure to polychlorinated biphenyls
and polybriminated diphenyl ethers may influence birth weight
among infants in a Swedish cohort with background exposure:
a cross-sectional study.” Environmental Health (2013),
12:44
http://www.ehjournal.net/content/12/1/44
[9] Lin, Psaty and Kronmal. “Assessing the Sensitivity of Regression Results to
Unmeasured Confounders in Observational Studies”. Biometrics (September 1998) 54, 948-963
[10] Lu, C.Y. “Observational studies: a review of study designs, challenges and
strategies to reduce confounding”. The International Journal of Clinical
Practice (May 2009) Blackwell Publishing Ltd. 63 5 691-697
[11] McNamee, R. “Regression modeling and other methods to control confounding”. Occupational & Environmental Medicine (2004) 62:500-506
doi:10.1136/oem.2002.001115
[12] Nelson et al. “Daily physical activity predicts degree of insulin resistance: a cross-sectional observational study using the 2003–2004
National Health and Nutrition Examination Survey”. International
Journal of Behavioral Nutrition and Physical Activity. (2013) 10:10
http://www.ijbnpa.org/content/10/1/10
[13] Rosenbaum and Rubin. “Assessing effects of confounding variables”. Journal of the Royal Statistical Society, Series B (1983) 11, 212-218.
[14] Wagner, Clifford H. “Simpson’s Paradox in Real Life.” The American
Statistician (February 1982) 36 (1): 4648.
7
Download