8b. Multiple regression – brief intro to linear, logistic, Poisson & Cox

advertisement
Multiple Regression
(sec VIII)
Multiple Regression - Overview
Multiple Regression in statistics is the science and art of
creating an equation that relates an outcome Y, to one
or more predictors, X1, X2, X3, .. Xk.
Linear Regression
Y = a + b1X1 + b2 X2 + b3 X3 + ... + bk Xk + e
=Ŷ+e
where "e" is the residual error between the observed Y and
the prediction (Ŷ).
In linear regression, bi is the average change in Y for a
single unit change in Xi.
“Performance” stats: R2, SDe
Logistic Regression
In multiple logistic regression, Y is 0 or 1 with mean P, the “risk”. The logit
of P, not P, is assumed a linear function of the Xs
Logit(P) = ln(P/(1-P)) = a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk
(“Logit” = log of the odds since P/(1-P) is the odds)
odds=exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk)
risk = P = odds/(odds+1)
“Performance” stats: Sensitivity, Specificity, Accuracy, Concordance (C),
mean deviance
Poisson Regression
When Y is a positive integer (0,1,2,3…), we model the log of Y so Y can
never be negative. This is the multiple Poisson regression model.
ln(mean Y) = a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk
mean Y = exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk)
mean Y cannot be negative.
“Performance” stats: R2, mean deviance
Logit function: P vs ln(P/(1-P))
1.0
P vs logit
0.9
0.8
0.7
P=risk
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-4
-3
-2
-1
0
1
logit =log odds
2
3
4
Logistic regression
Predictors of in hospital infection
Characteristic Odds Ratio (95% CI) p value
Incr APACHE score 1.15 (1.11-1.18) <.001
Transfusion (y/n)
4.15 (2.46-6.99) <.001
Increasing age (yr) 1.03 (1.02-1.05) <.001
Malignancy
2.60 (1.62-4.17) <.001
Max Temperature
0.70 (0.58-0.85) <.001
Adm to treat>7 d
1.66 (1.05-2.61) 0.03
Female (y/n)
1.32 (0.90-1.94) 0.16
*APACHE = Acute Physiology & Chronic Health Evaluation Score
Multiple Proportional Hazards Regr (Cox model)
For time dependent outcomes (ie time to death), we model the hazard rate,
h , the event rate per unit time (for death, it is the mortality rate).
Since h > 0, we model the log of the hazard as a linear function of the Xs so
h can never be zero (similar to Poisson regression) .
ln(h)
so
h
= a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk
= exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk)
>0
If h0=exp(a) is the ‘baseline’ hazard, (that is, a=log(h0)) the hazard ratio is
HR = h/h0 = exp(b1 X1 + b2 X2 + b3 X3 + ... + bk Xk)
no intercept ‘a’.
If S0(t) is the ‘baseline’ survival curve corresponding to the baseline hazard,
then the survival curve for a given combination of X1, X2, … Xk is given by
S(t) = S0(t)HR where HR is computed with the equation above.
exp(bi) is the hazard rate ratio for a one unit change in Xi.
“Performance” stats:
Harrell’s Concordance (C)
(0.5 < C < 1.0)
Cox regr-HR for patient mortality- Busuttil et al 2005
Cox HRs for donor age
(Busuttil 2005)
Donor age
1-18
HR
1.00 (ref)
95% CI
p value
--
--
18-32
1.23
0.88-1.72
0.20
32-48
1.40
1.02-1.92
0.03
48-55
1.51
1.02-2.24
0.04
55-60
2.29
1.48-3.55
< 0.001
60+
1.61
1.10-2.37
0.01
Harrell C= 0.70
Regression coeff interpretation
Outcome (Y)
Regression
interpretation
continuous
Linear
b is the average change in Y per one unit
increase in X, the rate of change
Binary
Logistic
exp(b)=eb=odds ratio (OR) for a one unit
increase in X
Low Positive
integers
(0,1,2,3..)
Poisson
exp(b)= mean ratio (MR) for a one unit
increase in X
Hazard rate
Cox
exp(b)=hazard rate ratio (HR) for a one unit
increase in X
(P=proportion)
(h=events/time)
S(t) = S0(t)HR
Multiple Linear Regression Example
Consider predictors of Y=Bilirubin (mg/dl)
in liver transplant candidates.
Two predictors are
X1=Prothombin time (PT) in seconds
X2=ALT (alanine aminotransferase in U/L).
A multiple regression equation (on the log
scale) is
Ŷ
= (predicted) log Bilirubin
= -3.96 + 3.47 log PT + 0.21 log ALT
Regression output - equation
(Equation) Parameter estimates
term estimate SE t ratio p value
Intercept -3.96
0.257 -15.4 < 0.001
log PT
3.47
0.214 16.2 < 0.001
log ALT
0.21
0.055
3.8 0.0002
equation:
Log Bili=-3.96 + 3.47 log PT + 0.21 log ALT
Regression-analysis of variance table
Source df sum squares mean square F
Model 2
37.76
18.88
147.2
Error 363 46.56
0.1283=SDe2 -Total 365 84.32
F = 18.88/0.1283 ->Screening F test that
none of the predictors are related to Y.
R2 = 37.76/84.32 = 0.448 = 44.8%
R2 = model sum squares/total sum squares
Log B iliru bin R esidual
Residual error plot
Residual Log Bilirubin by Predicted
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.0
-0.5
.0
.5
1.0
1.5
2.0
Log Bilirubin Predicted
When the model is valid, this plot should look like a circular
cloud if the errors have constant variance.
The example above is a “good” result.
Example of a “good” residual error histogram
errors have a Gaussian distribution about zero
-1
-0.5
Quantiles - errors
100.0%
maximum
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.0%
minimum
0
1.711
0.834
0.454
0.177
-0.025
-0.213
-0.355
-0.630
-1.191
.5
1
1.5
Moments - errors
Mean
0.000
Std Dev (SDe)
0.357
Std Err Mean
0.0187
n 366
Normal Quantile Plot
3
.99
2
.95
.90
1
.75
.50
0
.25
-1
.10
.05
-2
.01
-3
-1
-0.5
0
.5
residual error (e)
1
1.5
N orm al Q uan tile P lot
(Should be approximately a straight line if the residual error data are Gaussian)
Interpreting multiple regression coefficients (cont.)
The multiple regression coefficients will not in general be the same as the
individual regression coefficient for each variable one at a time, even
though the same Y is being modeled.
variable
Log PT
Log ALT
Simple regression
(one Y, one X)
3.560
0.310
Multiple (simultaneous)
(b1X1+b2X2)
3.470
0.211
Log Bilirubin = - 3.70 + 3.56 log PT,
R2 = 0.425
Log Bilirubin = -.105 + 0.310 log ALT,
R2 = 0.049
Log Bilirubin = -3.96 + 3.47 log PT + 0.211 log ALT ,
Log PT coefficients don’t match
R2 = 0.448
Orthogonally (vs collinearity)
In general, regression coefficients from
simple and multiple regression are not the
same ↔ controlling for covariates does not
give the same answer as ignoring
covariates.
Only when all the X variables have
correlation zero with each other will the
simple and multiple regression coefficients
be the same  orthogonally
(Collinearity is when Xs are strongly correlated.
It is the “opposite” of orthogonality).
Log PT (X1) vs log ALT (X2)
since correlation is low, unadjusted and adjusted
regression results are similar
1.50
log P rothro m
1.40
1.30
1.20
1.10
1.00
1.0
1.5
2.0
2.5
3.0
log ALT
r12 = 0.111,
R2 = 0.0123
3.5
Interaction Effects & subgroups
The model
Y = 0 + 1X1 + 2X2 + 
implies that change in Y due to X1 (=1) is the same
(constant) for all values of X2. An ADDITIVE model.
In the model
Y = 0 + 1X1 + 2X2 + 3 X1X2 + 
the 3 term is an interaction term.
Change in Y for a unit change in X1 is (1+3X2) and
is therefore not constant.
Positive 3 is often termed a “synergism”
Negative 3 is often termed an “antagonism”
Additive only if β3=0
How to implement? Make new variable W = X1X2.
Interaction example
Response: Y= log HOMA IR (MESA study)
R2=0.280, Root Mean Square Error=SDe=0.623
Mean Response= 0.395, n= 6782
Parameter Estimates
Term
Estimate Std Error t Ratio
Intercept -1.388
0.049
-28.16
Gender
-0.669
0.085
-7.83
BMI
0.061
0.00168
36.45
gender*BMI 0.028 0.00299
9.38
p value
<.0001
<.0001
<.0001
<.0001
Predicted log HOMA IR =
-1.39 – 0.669 gender + 0.061 BMI + 0.028 gender * BMI
(gender is coded 0 for female and 1 for male)
gender x BMI interaction- non additivity
In Females
Log HOMA IR = -1.39 +0.061 BMI
In Males
Log HOMA IR = -2.06 + 0.089 BMI
log e HOMA IR
log HOMA IR
2.0
1.5
1.0
F
0.5
M
0.0
-0.5
20 22
24 26 28 30
32 34 36 38
BMI
40 42 44
Gender x BMI interaction
relation is different in males vs females
HOMA IR
6.0
HOMA IR
5.0
F
4.0
M
3.0
2.0
1.0
0.0
20
22
24
26
28
30
32
BMI
34
36
38
40
42
44
Hierarchically well formulated
(WWF) regression models
HWF Rule – To correctly evaluate the X1*X2
interaction, must also have X1 and X2 in
the model.
In general, one must include the lower order
terms in order to correctly evaluate the
higher order terms.
HWF example- NON HWF
Model:chol = a0 + a1 smoke x age = SMOKEAGE
0, 1 (dummy) coding: smoke=0 or 1,
Variable DF
INTERCEPT 1
SMOKEAGE
1
Estimate
156.863
0.361
smokeage = smoke x age
std error
3.993
0.182
t
39.286
1.988
p value
0.0001
0.0567
----------------------------------------------------------------------------
-1, 1 (effect) coding: smoke=-1 or 1, smokeage =smoke x age
Variable DF
INTERCEPT 1
SMOKEAGE
1
Estimate
162.278
0.055
std error
3.103
0.100
t
52.320
0.548
p value
0.0001
0.5881
p value changes just because coding changes!
HWF example (cont)- HWF
HWF:Model: chol = b0 + b1 smoke + b2 age +
b3 smoke x age
For HWF models, significance is the same regardless of coding
0, 1 (dummy) coding: smoke=0 or 1, smokeage = smoke x age
Variable DF
INTERCEPT 1
SMOKE
1
AGE
1
SMOKEAGE 1
Estimate
100.221
3.812
2.010
-0.009
std error
1.110
1.570
0.036
0.050
t
90.304
2.429
56.297
-0.178
p value
0.0001
0.0224
0.0001
0.8599
-1, 1 (effect) coding:smoke=-1 or 1, smokeage=smoke x age
Variable DF
INTERCEPT
SMOKE
AGE
SMOKEAGE
1
1
1
1
Estimate std error
102.127
0.785
1.906
0.785
2.005
0.025
-0.004
0.025
t
p value
130.138 0.0001
2.429 0.0224
79.437 0.0001
-0.178 0.8599
Regression assumptions
Regression can simultaneously evaluate all
factors and thus reduce confounding. But
must check two critical assumptions.
1. If X is continuous/interval, check if
relation of X & Y is linear on some scale.
(otherwise polychotomize X)
2. Check if effects of X1, X2, X3 … are
additive by adding interaction terms.
(Ex: X4 = X1 x X2). Not additive if
interactions are significant.
3 Also, in linear regression, prefer residual
errors (e) to have a Gaussian distribution
with a constant variance that is
independent of Y. But additivity and
linearity are more important since lack of
additivity and linearity lead to bias and is
more misleading.
Nonlinear Regression
Log(Bilirubin)= -3.96 + 3.47 log(PT) + 0.211 log(ALT)
is a nonlinear model in terms of PT and ALT
but is a linear model in terms of log PT, log ALT and
the regression coefficients b0=-3.96, b1=3.47 and b2=0.211.
Consider model of the form: Ŷ= Drug concentration = b1 10 b2 X
This is nonlinear in b2 but can be made linear with a
transformation: log10(concentration)=log10(b1) + b2 x
What about: Drug concentration = b0+ b1 10 b2 X
This model is nonlinear in b2 and cannot be transformed.
Nonlinear regression software is needed to estimate b0, b1, & b2.
Nonlinear Example
Compartmental drug models
Model of how drug (or any chemical) is metabolized by an organism.
Y1=concentration in serum, Y2=concentration in organ, x=time
Y1
serum
Y2
organ
d (Y1)/dx = -b1 Y1
d (Y2)/dx = b1Y1 - b2Y2
b1 > b2 > 0
solutions:
Y1 = constant e -b1 x
Y2 = (b1/(b1-b2)) [e -b2x – e -b1x]
< - model
Y2 takes on a maximum value when x = ln(b1/b2)/[b1-b2]
Y2 is zero when x=0 or x is very large
The constants b1 and b2 are rates. They are in units of 1/x (i.e 1/time).
Nonlinear equation
Y
Y = (b1/(b1-b2)) [e -b2x – e -b1x]
t
Y= [0.0967/(0.0967-0.0506)]*[exp(-0.0506*t)-exp(-0.0967*t)]
at peak, t = 14, Ŷ= 0.49
Residual diagnostics & “model criticism”
Assumptions of linear regression:
1. Linear relation between Y and each X except
for random “noise” (but can transform X).
2. Effect of each X is additive (but can make
interaction terms)
3. Errors (e) have constant variance and come
from a Gaussian distribution
4. All observations from the same population
5. All observations independent (usually ok)
A plot of Ŷ versus e, called a residual error
(diagnostic) plot, can help verify if these
assumptions are met.
“good” residual plot
Residual plots – diagnostics- “bad” residual plots
Regression diagnostics
Problem-outliers
Solution – Find on residual plot & eliminate
Problem-Curvilinear (non linear)
Solution- try nonlinear trans formation
(x2, 1/x, log(x), ex)
Problem-Errors not Gaussian
Solution-Robust regression (future class)
Problem-Non constant SDe or Var(e)
Solution-Weights (future class)
Adjusted means
4.0
unadjusted difference
body fat chg (%)
2.0
0.0
-2.0
-4.0
adjusted difference
-6.0
mediation
sham
unadjusted means
adjusted means
-8.0
-10.0
-12.0
20
25
30
35
40
45
50
dietary fat (g)
55
60
65
70
Ex: Meditation & change in pct body fat
Overweight persons chose a meditation program or a “sham” (lectures) as
part of a weight loss effort. They were NOT randomized.
Change in percent body fat by treatment group (mediation or sham)
over three months
Unadjusted Means
Level
n
Mean pct body
SEM Mean dietary fat (gm)
fat change
(before study start)
1-meditate 439
-7.51%
0.47%
32.7 g
2-sham
704
1.34%
0.35%
67.1 g
Unadjusted Mean difference (sham - meditation) = 8.85%
SE of the difference = SEd = √0.47222 + 0.35322 = 0.59%
t = mean diff/SEd = 8.85% / 0.59% = 15.1, p < 0.0001
Overall unweighted dietary fat = 49.9 g
Result via “regression”
Y= change in body fat,
X = 1 if sham, 0 if meditation
Y = a + b X + error = a + b sham + error
term estimate
SE
t
p value
a
-7.51
0.46 -16.4 < 0.001
b
8.85
0.59
15.2 < 0.001
Ŷ = -7.51 + 8.85 sham
Regr-control for dietary fat
pct body fat= Y = a + b1 X1 + b2 X2 + error
X1= 1 if sham, 0 if meditation
X2 = dietary fat in g
term estimate SE
t
p value
a
-14.5 0.54 -26.6 < 0.001
b1
1.51 0.56
2.7
0.007
b2
0.21 0.011 18.9 <0.001
body fat chg=-14.5 +1.51 sham +0.21 diet fat
Adjusted means
Plug into equation for X1=1=sham or X1=0.
Hold X2 = diet fat= 49.9 g – overall mean X2
Med: -14.5 + 1.51(0)+0.213(49.9)=-3.84%
Sham:-14.5 + 1.51(1)+0.213(49.9)=-2.33%
Adj mean difference =2.33-(-3.84) =1.51%
(Unadjusted was 8.85%)
General procedure
Adjusted means for X1,controlling for
(confounders) X2, X3, X4 …
1. Estimate model regression coefficients
2. Plug in different values for X1, and values
for all other Xs held constant at their
overall means.
Gives adjusted means and their SEs.
Assumes ????
Download