Multilevel Models

advertisement
Chapter 5-20. Correlated Data: Multilevel Models
In this chapter, we will dicuss multilevel models. These models are also known as mixed effects
models, mixed models, heirarchial models, and random effects models. (Rabe-Hesketh and
Everitt, 2003, p. 155).
Isoproterenol Dataset
We will again use the 11.2.Isoproterenol.dta dataset provided with the Dupont (2002, p.338)
textbook, described as,
“Lang et al. (1995) studied the effect of isoproterenol, a β-adrenergic agonist, on forearm
blood flow in a group of 22 normotensive men. Nine of the study subjects were black and
13 were white. Each subject’s blood flow was measured at baseline and then at
escalating doses of isoproterenol.”
Reading the data in,
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on 11.2.Isoproterenol.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
11.2.Isoproterenol.dta ", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use 11.2.Isoproterenol.dta, clear
_________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Chapter 5-20 (revision 16 May 2010)
p. 1
Listing the data,
list , nolabel
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
+---------------------------------------------------------------------+
| id
race
fbf0
fbf10
fbf20
fbf60
fbf150
fbf300
fbf400 |
|---------------------------------------------------------------------|
| 1
1
1
1.4
6.4
19.1
25
24.6
28 |
| 2
1
2.1
2.8
8.3
15.7
21.9
21.7
30.1 |
| 3
1
1.1
2.2
5.7
8.2
9.3
12.5
21.6 |
| 4
1
2.44
2.9
4.6
13.2
17.3
17.6
19.4 |
| 5
1
2.9
3.5
5.7
11.5
14.9
19.7
19.3 |
|---------------------------------------------------------------------|
| 6
1
4.1
3.7
5.8
19.8
17.7
20.8
30.3 |
| 7
1
1.24
1.2
3.3
5.3
5.4
10.1
10.6 |
| 8
1
3.1
.
.
15.45
.
.
31.3 |
| 9
1
5.8
8.8
13.2
33.3
38.5
39.8
43.3 |
| 10
1
3.9
6.6
9.5
20.2
21.5
30.1
29.6 |
|---------------------------------------------------------------------|
| 11
1
1.91
1.7
6.3
9.9
12.6
12.7
15.4 |
| 12
1
2
2.3
4
8.4
8.3
12.8
16.7 |
| 13
1
3.7
3.9
4.7
10.5
14.6
20
21.7 |
| 14
2
2.46
2.7
2.54
3.95
4.16
5.1
4.16 |
| 15
2
2
1.8
4.22
5.76
7.08
10.92
7.08 |
|---------------------------------------------------------------------|
| 16
2
2.26
3
2.99
4.07
3.74
4.58
3.74 |
| 17
2
1.8
2.9
3.41
4.84
7.05
7.48
7.05 |
| 18
2
3.13
4
5.33
7.31
8.81
11.09
8.81 |
| 19
2
1.36
2.7
3.05
4
4.1
6.95
4.1 |
| 20
2
2.82
2.6
2.63
10.03
9.6
12.65
9.6 |
|---------------------------------------------------------------------|
| 21
2
1.7
1.6
1.73
2.96
4.17
6.04
4.17 |
| 22
2
2.1
1.9
3
4.8
7.4
16.7
21.2 |
+---------------------------------------------------------------------+
We see that the data are in wide format, with variables
id
patient ID (1 to 22)
race race (1=white, 2=black)
fbf0 forearm blood flow (ml/min/dl) at ioproterenol dose 0 mg/min
fbf10 forearm blood flow (ml/min/dl) at ioproterenol dose 10 mg/min
…
fbf400 forearm blood flow (ml/min/dl) at ioproterenol dose 400 mg/min
In this dataset, each of the several occasions represents an increasing dose, so can be thought of
as an effect across dose, rather than as an effect across time.
To convert the race variable into a 0-1 scored variable, we use
capture drop black
capture label drop blacklab
recode race 1=0 2=1, gen(black)
label variable black "Black race"
label define blacklab 0 "White" 1 "Black"
label values black blacklab
tab black race
Chapter 5-20 (revision 16 May 2010)
p. 2
We can get a feel for these data using a parallel coordinate plot (Cox, 2004). First, the parplot
command (ado file) must be added to Stata, if it hasn’t already been.
findit parplot
parplot from http://fmwww.bc.edu/RePEc/bocode/p
'PARPLOT': module for parallel coordinates plots / parplot draws parallel
coordinates plots. Stata 8 is required. d / KW: graphics / KW:
multivariate / KW: parallel coordinates plot / Requires: Stata version
8.0 / Author: Nicholas J. Cox, Durham University / Support: email
If not installed already, click on the blue link to install.
Then, creating the graph for the first two doses,
#delimit ;
parplot fbf0 fbf10
, transform(raw)
xlabel(1 "0" 2 "10")
ylabel(0(1)6, angle(horizontal))
ytitle("forearm blood flow (ml/min/dl) ")
xtitle("isoproterenol dose (mg/min)")
by(black)
;
#delimit cr
White
Black
6
5
4
3
2
1
0
0
10
0
10
isoproterenol dose (mg/min)
Graphs by Black race
Chapter 5-20 (revision 16 May 2010)
p. 3
From these first two dose levels, we see that each subject not only has a unique intercept, but
also has a unique slope. It seems a better fit to the data could be made using a model that permits
a random slope, in addition to a random intercept that was modeled in the previous chapter.
Multilevel models can be fitted with just a random intercept, or with both a random intercept and
a random slope.
To see all the time points, we use
#delimit ;
parplot fbf0-fbf400
, transform(raw)
xlabel(1 "0" 2 "10" 3 "20" 4 "60" 5 "150" 6 "300" 7 "400")
ylabel(0(5)45, angle(horizontal))
ytitle("forearm blood flow (ml/min/dl) ")
xtitle("isoproterenol dose (mg/min)")
by(black)
;
#delimit cr
White
Black
45
40
35
30
25
20
15
10
5
0
0
10
20
60
150
300
400
0
10
20
60
150
300
400
isoproterenol dose (mg/min)
Graphs by Black race
A model that attempts to fit an equation to repeated measures like this is called a growth curve
model. The GEE model fitted in the last chapter was a growth curve model, fitting the average
growth curve through the data points.
A growth curve model can also be fitted with a multilevel model, fitting a growth curve
separately for each subject.
Chapter 5-20 (revision 16 May 2010)
p. 4
Intraclass Correlation Coefficient (ICC)
The multilevel approach models the correlation structure of the data expressed as the intraclass
correlation coefficient (ICC), also called the intracluster correlation coefficient (ICC). The ICC
is also know as the reliability coefficient (r or rho) for measuring interrater agreement of a
continuous measurement between two or more raters, or two or more measurement occasions.
To compute the ICC, we use the formula (Streiner and Norman, 1995, p.106) and (Shrout and
Fleiss, 1979),
reliability =
subject variability
subject variability + measurement error
expressed symbolically as,
 s2
 2
 s   e2
Note, however, that these sigma’s are population parameters. Population parameters are
estimated using the expected value of sample statistics, where the expected value is the long-run
average.
The ICC cannot be computed, then, simply from the MS(between) and MS(within) from an
analysis of variance table. For any ANOVA, depending on whether it is for a fixed effect,
random effect, multiple raters for each subject, separate raters for each subject, etc, the expected
mean squares (EMS) are different equations containing the MS(between) and MS(within). That
is, all versions of the ICC use the same MS(between) and MS(within) from the ANOVA table,
but the EMS(between) and EMS(within) has a slightly different equation for each situation.
To see graphically what the ICC is estimating, first we need to reshape the data to long format,
drop race
reshape long fbf , i(id) j(dose)
list if id<=1 , sepby(id) nolabel
1.
2.
3.
4.
5.
6.
7.
+--------------------------+
| id
dose
fbf
black |
|--------------------------|
| 1
0
1
0 |
| 1
10
1.4
0 |
| 1
20
6.4
0 |
| 1
60
19.1
0 |
| 1
150
25
0 |
| 1
300
24.6
0 |
| 1
400
28
0 |
|--------------------------|
Chapter 5-20 (revision 16 May 2010)
p. 5
Drawing a scatter diagram, showing each subjects data lined up vertically,
20
0
10
fbf
30
40
sort id
twoway (scatter fbf id) , xlabel(1(1)22)
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22
Patient ID
Using the formula for ICC,
reliability (or ICC) =
subject variability
subject variability + measurement error
if the background noise in the data is small, relative to how each subjects’ measurements are
similar, the ICC will be close to 1 (high reliability). If the “tightness” of each subjects’
measurements is not discernable from the background noise in the scatterplot, the ICC will be
close to 0 (low reliability).
In this dataset, each subject has a “cluster” of fbf’s, where fbf was measured at each of dose
levels.
Chapter 5-20 (revision 16 May 2010)
p. 6
In Stata, the ICC can be computed treating subject as a fixed effect, using
loneway fbf id
One-way Analysis of Variance for fbf:
Number of obs =
R-squared =
150
0.3933
Source
SS
df
MS
F
Prob > F
------------------------------------------------------------------------Between id
4668.2739
21
222.29876
3.95
0.0000
Within id
7200.5365
128
56.254191
------------------------------------------------------------------------Total
11868.81
149
79.656445
Intraclass
Asy.
correlation
S.E.
[95% Conf. Interval]
-----------------------------------------------0.30227
0.09435
0.11735
0.48720
Estimated SD of id effect
Estimated SD within id
Est. reliability of a id mean
(evaluated at n=6.81)
4.936652
7.500279
0.74694
However, this is not a correct ICC, because subjects are a random sample from a larger
population of subjects that we wish to make an inference to, and so they represent a random
effect.
Alternatively, the ICC can be calculated using (Rabe-Hesketh and Skrondal, 2005, p.10),
xtreg fbf , i(id) mle
Random-effects ML regression
Group variable (i): id
Number of obs
Number of groups
=
=
150
22
Random effects u_i ~ Gaussian
Obs per group: min =
avg =
max =
3
6.8
7
Log likelihood
= -529.66028
Wald chi2(0)
Prob > chi2
=
=
0.00
.
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons |
9.688395
1.193338
8.12
0.000
7.349496
12.02729
-------------+---------------------------------------------------------------/sigma_u |
4.787002
.9924495
3.188534
7.18681
/sigma_e |
7.501424
.4688391
6.636569
8.478984
rho |
.2893841
.091739
.1398512
.4882802
-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)=
22.02 Prob>=chibar2 = 0.000
From this output, the ICC is reported as “rho”. It can be calculated as,
display "ICC = " 4.787002^2/(4.787002^2 + 7.501424^2)
ICC = .28938412
Chapter 5-20 (revision 16 May 2010)
p. 7
The likelihood-ratio test at the bottom of the output is significant, informing us that the ICC is
not equal to zero. In other words, the observations are not independent, so an ordinary linear
regression should not be used to model these data—a multilevel model (or some alternative such
as GEE) is needed.
In Stata 9, the xtmixed command was added, allowing for both random intercepts and random
coefficients to be modeled.
Calculating this same ICC using xtmixed, we can get the terms for the ICC, but we have to
calculate the ICC ourself if we want to report that statistic.
xtmixed fbf || id: , mle
display "ICC = " 4.787002^2/(4.787002^2 + 7.501424^2)
Mixed-effects ML regression
Group variable: id
Number of obs
Number of groups
=
=
150
22
Obs per group: min =
avg =
max =
3
6.8
7
Wald chi2(0)
=
.
Log likelihood = -529.66028
Prob > chi2
=
.
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons |
9.688395
1.193168
8.12
0.000
7.349827
12.02696
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------id: Identity
|
sd(_cons) |
4.787002
.9924497
3.188534
7.186811
-----------------------------+-----------------------------------------------sd(Residual) |
7.501424
.4688392
6.636569
8.478984
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) =
22.02 Prob >= chibar2 = 0.0000
. display "ICC = " 4.787002^2/(4.787002^2 + 7.501424^2)
ICC = .28938412
The “|| id:” part of the above xtmixed command informs Stata that observations are nested in
patient (each patient’s dose repeated measurements are nested with a specific patient, identified
by the variable id).
Chapter 5-20 (revision 16 May 2010)
p. 8
To fit a growth model, we add dose,
xtreg fbf dose , i(id) re mle
<or>
xtmixed fbf dose || id: , mle
Random-effects ML regression
Group variable (i): id
Number of obs
Number of groups
=
=
150
22
Random effects u_i ~ Gaussian
Obs per group: min =
avg =
max =
3
6.8
7
Log likelihood
= -476.95216
LR chi2(1)
Prob > chi2
=
=
105.42
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0352574
.0027581
12.78
0.000
.0298517
.0406631
_cons |
4.966772
1.246024
3.99
0.000
2.52461
7.408935
-------------+---------------------------------------------------------------/sigma_u |
5.232768
.897282
3.739149
7.323018
/sigma_e |
4.97302
.3108209
4.399658
5.621103
rho |
.5254345
.0924609
.3477587
.6981109
-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)=
64.59 Prob>=chibar2 = 0.000
So far we have fitted a random intercept model, where a separate intercept is estimated for each
subject, but the slope across dose is assumed the same for each subject.
Chapter 5-20 (revision 16 May 2010)
p. 9
Next, we will permit each subject to have a unique growth curve slope. We cannot do this with
the xtreg command, but we can with the xtmixed command.
The xtreg command, which only allows a random intercept, is also more correctly called a
random intercept model. A model that has both random intercepts and random slopes is called a
random coefficients model. This much precision in naming the model is usually not reported by
researchers, however.
When more than one random effects variable is included following the “||” in the xtmixed
command, we must also specify the “unstructured” correlation structure as an option. Otherwise,
Stata will set the covariance (and correlation) to zero by default (Rabe-Hesketh and Skrondal,
2005, p.70).
xtmixed fbf dose || id: dose , mle cov(unstructured)
Mixed-effects ML regression
Group variable: id
Log likelihood = -437.38499
Number of obs
Number of groups
=
=
150
22
Obs per group: min =
avg =
max =
3
6.8
7
Wald chi2(1)
Prob > chi2
=
=
50.82
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0356267
.0049976
7.13
0.000
.0258315
.0454219
_cons |
4.929394
.6680961
7.38
0.000
3.61995
6.238838
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------id: Unstructured
|
sd(dose) |
.0215309
.0038137
.0152156
.0304673
sd(_cons) |
2.527519
.5522903
1.647015
3.878745
corr(dose,_cons) |
.9999998
.0001242
-1
1
-----------------------------+-----------------------------------------------sd(Residual) |
3.560716
.2225561
3.150174
4.024762
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(3) =
143.72
Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference
By specifying “|| id: dose” in the xtmixed command, we are informing Stata to model both a
random intercept and a random slope, so that each patient has their own intercept and their own
slope.
Chapter 5-20 (revision 16 May 2010)
p. 10
Just to verify that an independent covariance structure is fitted by default, we try
xtmixed fbf dose || id: dose , mle
Mixed-effects ML regression
Group variable: id
Number of obs
Number of groups
=
=
150
22
Obs per group: min =
avg =
max =
3
6.8
7
Wald chi2(1)
=
42.31
Log likelihood = -447.54794
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0356907
.005487
6.50
0.000
.0249364
.046445
_cons |
4.89908
.6878314
7.12
0.000
3.550955
6.247205
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------id: Independent
|
sd(dose) |
.0238898
.0042098
.0169128
.0337449
sd(_cons) |
2.594299
.6065499
1.64062
4.102342
-----------------------------+-----------------------------------------------sd(Residual) |
3.669354
.2424493
3.223645
4.176687
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(2) =
123.40
Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference
Note: a correlation coefficient is a standardized covariance (similar to a z-score and a variable in
its original units. Thus, when we specify the covariance structure, we are also specifying the
correlation structure.
The available covariance structures for xtmixed are (Stata 9 Longitudinal/Panal Data reference
manual, p.178):
Covariance( ) option
independent
exchangeable
identity
unstructured
Description
one unique variance parameter per random effect, all
covariances zero; the default unless a factor variable is
specified
equal variances for random effects, and one common
pairwise covariance
equal variances for random effects, all covariances zero
all variances/covariances distinctly estimated
Stata 11 stills offers only these four structures.
Chapter 5-20 (revision 16 May 2010)
p. 11
To verify that the unstructured covariance structure provides the best fit, we can try all the
available structures and see which one gives the smallest model log likelihood.
xtmixed
xtmixed
xtmixed
xtmixed
fbf
fbf
fbf
fbf
dose
dose
dose
dose
||
||
||
||
id:
id:
id:
id:
dose
dose
dose
dose
,
,
,
,
mle
mle
mle
mle
cov(independent)
cov(exchangeable)
cov(identity)
cov(unstructured)
. xtmixed fbf dose || id: dose , mle cov(independent)
Log likelihood = -447.54794
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0356907
.005487
6.50
0.000
.0249364
.046445
_cons |
4.89908
.6878314
7.12
0.000
3.550955
6.247205
-----------------------------------------------------------------------------. xtmixed fbf dose || id: dose , mle cov(exchangeable)
Log likelihood = -454.21473
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0357791
.0067294
5.32
0.000
.0225896
.0489686
_cons |
4.86307
.4529122
10.74
0.000
3.975378
5.750761
-----------------------------------------------------------------------------. xtmixed fbf dose || id: dose , mle cov(identity)
Log likelihood = -454.56164
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0357802
.0067429
5.31
0.000
.0225643
.0489961
_cons |
4.862272
.4542216
10.70
0.000
3.972015
5.75253
-----------------------------------------------------------------------------. xtmixed fbf dose || id: dose , mle cov(unstructured)
Log likelihood = -437.38499
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0356267
.0049976
7.13
0.000
.0258315
.0454219
_cons |
4.929394
.6680961
7.38
0.000
3.61995
6.238838
------------------------------------------------------------------------------
We see the unstuctured covariance structure provided the best model fit, since it has the smallest
log likelihood, as well as the best p value out to more decimals places than are displayed here,
since it has the largest z statistic.
It is always the safest approach to use the “unstructured”, but you should consider the other
structures if significance is lost while model fit is not improved.
Chapter 5-20 (revision 16 May 2010)
p. 12
To determine if a random slope was needed (in other words, the random slope improved the
goodness of fit), we compare the log likelihoods between the models, using
xtmixed fbf dose || id: , mle cov(unstructured)
estimates store modelA // store model estimates
xtmixed fbf dose || id: dose , mle cov(unstructured)
estimates store modelB // store model with added term estimates
lrtest modelB modelA
display "LR Chi-square = " -2*(-476.95216 –(-437.38499))
. xtmixed fbf dose || id: , mle cov(unstructured)
Note: single-variable random-effects specification; covariance structure set to
identity
Log likelihood = -476.95216
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0352574
.0027581
12.78
0.000
.0298517
.0406631
_cons |
4.966772
1.245969
3.99
0.000
2.524718
7.408827
-----------------------------------------------------------------------------. xtmixed fbf dose || id: dose , mle cov(unstructured)
Log likelihood = -437.38499
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------dose |
.0356267
.0049976
7.13
0.000
.0258315
.0454219
_cons |
4.929394
.6680961
7.38
0.000
3.61995
6.238838
-----------------------------------------------------------------------------. lrtest modelB modelA
Likelihood-ratio test
(Assumption: modelA nested in modelB)
LR chi2(2) =
Prob > chi2 =
79.13
0.0000
. display "LR Chi-square = " -2*(-476.95216 -(-437.38499))
LR Chi-square = 79.13434
The LR ratio test statistic = 79.13, p<0.001 informs us that adding the random slopes improved
the fit. The display command, was not needed, but was used just to show that this statistic is -2 ×
the difference of the two model likelihoods.
Chapter 5-20 (revision 16 May 2010)
p. 13
Now is a good time to see what the model equations look like.
Ordinary linear regression fits the equation (also called the “naïve model” since it does not
account for correlation structure),
regress fbf dose
yi  0  1 xi   i
, where i represents the observation, i =1 to n
(n=sample size)
A random-intercept model fits the equation (Rabe-Hesketh and Skrondal, 2005, p.68).
xtreg fbf dose , i(id) re mle
<or>
xtmixed fbf dose || id: , mle
yij  (  0   0 j )  1 xij   ij

the is the Greek letter zeta.
, where i represents the observation, i =1 to n
j represents the cluster (subject in this dataset) , j =1 to k
so the intercept has a common part and a residual for each cluster
A random-coefficients (or multilevel) model fits the equation (Rabe-Hesketh and Skrondal,
2005, p.69),
xtmixed fbf dose || id: dose , mle cov(unstructured)
yij  (  0   0 j )  ( 1   1 j ) xij   ij
, where i represents the observation, i =1 to n
j represents the cluster (subject in this dataset) , j =1 to k
so the intercept has a common part and a residual for each cluster, as does the slope.
Chapter 5-20 (revision 16 May 2010)
p. 14
Adding the black term, the black × dose interaction, and the quadratic term, similar to what we
did with the GEE model in the previous chapter, and then plotting the model fit
use 11.2.Isoproterenol.dta, clear
capture drop black
capture label drop blacklab
recode race 1=0 2=1, gen(black)
label variable black "Black race"
label define blacklab 0 "White" 1 "Black"
label values black blacklab
tab black race
drop race
reshape long fbf , i(id) j(dose)
capture drop dose_sq
gen dose_sq = dose*dose
capture drop blkxdose_sq
gen blkxdose_sq=black*dose_sq
xtmixed fbf black dose dose_sq blkxdose blkxdose_sq || id: dose ///
, mle cov(unstructured)
*
capture drop predfbf
predict predfbf
sort dose
#delimit ;
twoway (scatter fbf dose if black==1
, msymbol(triangle) mfcolor(green) mlcolor(green)
msize(medium))
(scatter fbf dose if black==0
, msymbol(circle) mfcolor(blue) mlcolor(blue) msize(medium))
(scatter predfbf dose if black==1
, msymbol(none) connect(direct) clpattern(solid)
clwidth(thick) clcolor(green))
(scatter predfbf dose if black==0
, msymbol(none) connect(direct) clpattern(solid)
clwidth(thick) clcolor(blue))
, legend(off)
ytitle(forearm blood flow)
xtitle(dose)
;
#delimit cr
Chapter 5-20 (revision 16 May 2010)
p. 15
40
30
0
10
20
forearm blood flow
0
100
200
dose
300
400
30
20
10
0
forearm blood flow
40
Comparing this to the final model graph from the GEE model in the previous chapter,
0
100
200
dose
300
400
We see that the multilevel model has less separation between Whites and Blacks in the predicted
growth curves.
Multilevel models provide what are called shrunken estimates, which are more conservative
(show a less jazzy effect) and considered to be more reliable (more likely to be replicated in
future patient samples).
Chapter 5-20 (revision 16 May 2010)
p. 16
The multilevel model is:
. xtmixed fbf black dose dose_sq blkxdose blkxdose_sq || id: dose ///
>
, mle cov(unstructured)
note: blkxdose_sq dropped due to collinearity
Mixed-effects ML regression
Group variable: id
Number of obs
Number of groups
=
=
150
22
Obs per group: min =
avg =
max =
3
6.8
7
Wald chi2(4)
=
140.81
Log likelihood = -414.57493
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------black |
-2.17419
1.260199
-1.73
0.084
-4.644135
.2957552
dose |
.0841474
.0080375
10.47
0.000
.0683942
.0999005
dose_sq | -.0001075
.0000199
-5.41
0.000
-.0001464
-.0000685
blkxdose_sq | -.0000466
.0000178
-2.62
0.009
-.0000816
-.0000117
_cons |
4.341907
.8111189
5.35
0.000
2.752143
5.931671
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------id: Unstructured
|
sd(dose) |
.0160508
.0037041
.0102108
.0252309
sd(_cons) |
2.209726
.5360612
1.373548
3.554944
corr(dose,_cons) |
.9999996
.000227
-1
1
-----------------------------+-----------------------------------------------sd(Residual) |
3.099771
.2014891
2.728981
3.520942
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(3) =
107.27
Prob > chi2 = 0.0000
The GEE model from the previous chapter was,
GEE population-averaged model
Group variable:
id
Link:
identity
Family:
Gaussian
Correlation:
exchangeable
Number of obs
=
150
Number of groups
=
22
Obs per group: min =
3
avg =
6.8
max =
7
Wald chi2(5)
=
428.47
Scale parameter:
28.63631
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------fbf |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------black | -1.922958
1.952956
-0.98
0.325
-5.750681
1.904766
dose |
.1136194
.0112688
10.08
0.000
.0915329
.1357059
dose_sq | -.0001669
.0000285
-5.85
0.000
-.0002228
-.000111
blackxdose | -.0722987
.0172973
-4.18
0.000
-.1062008
-.0383966
blkxdose_sq |
.0000986
.0000439
2.25
0.025
.0000126
.0001846
_cons |
4.263511
1.253953
3.40
0.001
1.805808
6.721214
------------------------------------------------------------------------------
Just by looking at these models, it’s hard to say which provided the better fit.
Chapter 5-20 (revision 16 May 2010)
p. 17
More Complicated Multilevel Structure
A big advantage of a multilevel model over a GEE model is that it can handle a more
complicated multilevel structure.
The GEE model in Stata can only have one panel ID. So, it can handle the case where there are
repeated measurements for each patient, but you cannot specify that patients are nested within
physicians. (Perhaps other GEE software exists that handles this, but I’m not aware of it.)
We will next model some hypothetical data where patients are nested within physicians, and
patients are more alike within a physician than between physicians. The data represent an initial
visit and one follow-up visit.
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\example.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use example.dta, clear
list , sepby(physician_id) abbrev(15)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
+----------------------------------------------+
| patient_id
physician_id
visit2
y
x |
|----------------------------------------------|
|
1
1
0
12
1 |
|
1
1
1
10
1 |
|
2
1
0
13
2 |
|
2
1
1
11
3 |
|
3
1
0
14
2 |
|
3
1
1
9
5 |
|----------------------------------------------|
|
4
2
0
20
2 |
|
4
2
1
18
3 |
|
5
2
0
22
4 |
|
5
2
1
17
5 |
|----------------------------------------------|
|
6
3
0
25
4 |
|
6
3
1
22
7 |
|
7
3
0
23
7 |
|
7
3
1
21
10 |
|----------------------------------------------|
|
8
4
0
30
8 |
|
8
4
1
27
9 |
|----------------------------------------------|
|
9
5
0
30
1 |
|
9
5
1
27
2 |
|
10
5
0
32
11 |
|
10
5
1
29
15 |
+----------------------------------------------+
Chapter 5-20 (revision 16 May 2010)
p. 18
Graphing these data,
#delimit ;
twoway (scatter y visit2 if physician_id==1,
(scatter y visit2 if physician_id==2,
(scatter y visit2 if physician_id==3,
(scatter y visit2 if physician_id==4,
(scatter y visit2 if physician_id==5,
, legend(off) xlabel(0(1)1)
;
#delimit cr
mlabel(patient_id))
mlabel(patient_id))
mlabel(patient_id))
mlabel(patient_id))
mlabel(patient_id))
30
10
8
9
10
25
8
9
6
20y
7
5
6
7
4
15
4
5
3
2
1
10
2
1
3
0
1
visit2
From this graph, we see that patients within physicians are more alike than patients between
physicians.
Chapter 5-20 (revision 16 May 2010)
p. 19
Fitting a naïve model, that does not account for the correlation structure of the data,
regress y visit2
Source |
SS
df
MS
-------------+-----------------------------Model |
45
1
45
Residual |
977.8
18 54.3222222
-------------+-----------------------------Total |
1022.8
19 53.8315789
Number of obs
F( 1,
18)
Prob > F
R-squared
Adj R-squared
Root MSE
=
20
=
0.83
= 0.3748
= 0.0440
= -0.0091
= 7.3704
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------visit2 |
-3
3.296126
-0.91
0.375
-9.924903
3.924903
_cons |
22.1
2.330713
9.48
0.000
17.20335
26.99665
------------------------------------------------------------------------------
This incorrect model first of all thinks the sample size is N=20, when there are really only N=10
patients. That is, there are really only 10 independent patients contributing to the sample size.
It shows a change from previsit to postvisit of a decrease of 3, which is not significant (p=0.375).
We can use the xtreg command, but it only allows for one level of hierarchical structure.
xtreg y visit2 , i(physician_id)
<- one possibility
xtreg y visit2 , i(patient_id)
<- another possibility
xtreg y visit2 , i(physician_id patient_id) <- can’t do this
Choosing to model a random intercept for patient,
xtreg y visit2 , i(patient_id)
Random-effects GLS regression
Group variable (i): patient_id
Number of obs
Number of groups
=
=
20
10
R-sq:
Obs per group: min =
avg =
max =
2
2.0
2
within = 0.0000
between = 0.0000
overall = 0.0440
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)
Wald chi2(1)
Prob > chi2
=
=
67.50
0.0000
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------visit2 |
-3
.3651484
-8.22
0.000
-3.715678
-2.284322
_cons |
22.1
2.330713
9.48
0.000
17.53189
26.66811
-------------+---------------------------------------------------------------sigma_u | 7.3249953
sigma_e | .81649658
rho | .98772755
(fraction of variance due to u_i)
------------------------------------------------------------------------------
Notice the high ICC = 0.99. Although the effect of -3 is the same for this model and naive
model, accounting for this correlation reduced the standard error dramatically, and we now get a
significant result.
Chapter 5-20 (revision 16 May 2010)
p. 20
Next, we will try the GEE approach.
xtgee y visit2 , t(visit2) i(patient_id) corr(unstr)
GEE population-averaged model
Group and time vars:
patient_id visit2
Link:
identity
Family:
Gaussian
Correlation:
unstructured
Scale parameter:
48.89
Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(1)
Prob > chi2
=
=
=
=
=
=
=
20
10
2
2.0
2
75.00
0.0000
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------visit2 |
-3
.3464102
-8.66
0.000
-3.678951
-2.321049
_cons |
22.1
2.211108
9.99
0.000
17.76631
26.43369
------------------------------------------------------------------------------
The result is very similar.
We know, however, that the correlation structure introduced by sampling clusters of patients for
each physician has not yet been accounted for.
Chapter 5-20 (revision 16 May 2010)
p. 21
We next fit a multilevel model with two levels: visits nested within patient at level, and patients
nested within phyisician. We are only modeling random intercepts at both levels at this point.
xtmixed y visit2
|| physician_id: || patient_id:
Mixed-effects ML regression
, mle cov(unstructured)
Number of obs
=
20
=
=
75.00
0.0000
----------------------------------------------------------|
No. of
Observations per Group
Group Variable |
Groups
Minimum
Average
Maximum
----------------+-----------------------------------------physician_id |
5
2
4.0
6
patient_id |
10
2
2.0
2
----------------------------------------------------------Log likelihood = -39.582651
Wald chi2(1)
Prob > chi2
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------visit2 |
-3
.34641
-8.66
0.000
-3.678951
-2.321049
_cons |
23.78311
2.950812
8.06
0.000
17.99963
29.5666
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------physician_id: Identity
|
sd(_cons) |
6.554588
2.090782
3.507754
12.2479
-----------------------------+-----------------------------------------------patient_id: Identity
|
sd(_cons) |
.670165
.3669838
.2291194
1.960205
-----------------------------+-----------------------------------------------sd(Residual) |
.7745963
.1732049
.4997365
1.200631
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(2) =
55.38
Prob > chi2 = 0.0000
To determine if both levels of hierarchy need to be included in the model, we can use a
likelihood ratio test.
xtmixed y visit2 || patient_id: , mle cov(unstructured)
estimates store modelA // store model estimates
xtmixed y visit2 || physician_id: || patient_id: , mle cov(unstructured)
estimates store modelB // store model with added term estimates
lrtest modelB modelA
Chapter 5-20 (revision 16 May 2010)
p. 22
. xtmixed y visit2 || patient_id: , mle cov(unstructured)
Note: single-variable random-effects specification; covariance structure set to
identity
Log likelihood = -48.707467
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------visit2 |
-3
.3464102
-8.66
0.000
-3.678951
-2.321049
_cons |
22.1
2.211108
9.99
0.000
17.76631
26.43369
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------patient_id: Identity
|
sd(_cons) |
6.949101
1.563549
4.471035
10.80063
-----------------------------+-----------------------------------------------sd(Residual) |
.7745967
.1732051
.4997366
1.200632
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) =
37.13 Prob >= chibar2 = 0.0000
. xtmixed y visit2 || physician_id: || patient_id: , mle cov(unstructured)
Note: single-variable random-effects specification; covariance structure set to
identity
Log likelihood = -39.582651
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------visit2 |
-3
.34641
-8.66
0.000
-3.678951
-2.321049
_cons |
23.78311
2.950812
8.06
0.000
17.99963
29.5666
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------physician_id: Identity
|
sd(_cons) |
6.554588
2.090782
3.507754
12.2479
-----------------------------+-----------------------------------------------patient_id: Identity
|
sd(_cons) |
.670165
.3669838
.2291194
1.960205
-----------------------------+-----------------------------------------------sd(Residual) |
.7745963
.1732049
.4997365
1.200631
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(2) =
55.38
Prob > chi2 = 0.0000
. lrtest modelB modelA
Likelihood-ratio test
(Assumption: modelA nested in modelB)
LR chibar2(01)
Prob > chibar2
=
=
18.25
0.0000
From the likelihood-ratio test (p<.001), we see that using both the physician level and patient
level provided a better fit than the patient level alone.
Chapter 5-20 (revision 16 May 2010)
p. 23
Next, let’s add the covariate x, and specify that we want to model a random slope for it at the
patient level.
xtmixed y visit2 x || physician_id: || patient_id: x , mle cov(unstructure)
Mixed-effects ML regression
Number of obs
=
20
=
=
74.87
0.0000
----------------------------------------------------------|
No. of
Observations per Group
Group Variable |
Groups
Minimum
Average
Maximum
----------------+-----------------------------------------physician_id |
5
2
4.0
6
patient_id |
10
2
2.0
2
----------------------------------------------------------Log likelihood = -38.369514
Wald chi2(2)
Prob > chi2
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------visit2 | -3.214062
.38211
-8.41
0.000
-3.962984
-2.46514
x |
.1184616
.0915592
1.29
0.196
-.060991
.2979143
_cons |
23.20502
2.843082
8.16
0.000
17.63268
28.77736
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------physician_id: Identity
|
sd(_cons) |
6.264573
1.99897
3.351819
11.70853
-----------------------------+-----------------------------------------------patient_id: Unstructured
|
sd(x) |
.0664193
.123227
.0017501
2.520696
sd(_cons) |
.0545017
.6798772
1.31e-12
2.26e+09
corr(x,_cons) |
.9997077
.2113569
-1
1
-----------------------------+-----------------------------------------------sd(Residual) |
.7859827
.1639858
.5221798
1.183058
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(4) =
47.60
Prob > chi2 = 0.0000
This covariate is not significant, and so can be dropped from the final model.
Notice that specifying cov(unstructured) had no effect until we added a random slope. In the
previous model, cov(identity) was maintained by Stata, even though we specificed
cov(unstructured).
Chapter 5-20 (revision 16 May 2010)
p. 24
Example Use of a Multilevel Model
Pronovost et al. (N Engl J Med 2006) report using multilevel Poisson regression models, wth two
and three levels:
“To explore the exposure-outcome relationship, we used a generalized linear latent and
mixed model18,19 with a Poisson distribution for the quarterly number of catheter-related
bloodstream infections. In the model, we used robust variance estimation and included
two-level random effects to account for nested clustering within the data, catheter-related
bloodstream infections within hospital, and hospitals within the geographic regions
included in the study.18,20 The addition of a third level of clustering for a potential ICU
effect (catheter-related bloodstream infections within ICUs, ICUs within hospitals, and
hospitals within the geographic regions) did not change the results.”
Example Use of a Random Intercept (Mixed) Model
Hillmen et al. (N Engl J Med 2006) report using a mixed effect (random intercept) model, with
baseline as a covariate:
“Changes in scores on the FACIT-Fatigue instrument and the EORTC QLQ-C30
instrument from baseline through week 26 were analyzed with the use of a mixed model,
with baseline scores as the covariate, treatment and time as fixed effects, and the patient
identifier as a random effect.”
Are Models That Account for the Correlation Structure (Chapters 5-17 through Chapter 520) Being Used Enough?
Bryant (2006) conducted a systematic review of high-impact orthopaedic journals to determine
how frequently correct statistical methods were being used for repeated measurements within the
same patients. In articles published in 2003, she found that out of 76 studies that used statistical
analyses involving two limbs or multiple joints from single patients, only 16 (21%) used
methods to adjust for within-patient relationships of the repeated measurements.
Chapter 5-20 (revision 16 May 2010)
p. 25
Case Study: Schroerlucke Dataset
Returning to the case study, where ordinary logistic regression could not be fitted,
we can fit the univariable exact logistic model.
After reading in the dataset, schroerlucke.dta, we fit the model without covariates, using
exlogistic failed blount
note: CMLE estimate for blount is +inf; computing MUE
Exact logistic regression
Number of obs =
31
Model score
= 7.536232
Pr >= score
=
0.0096
--------------------------------------------------------------------------failed | Odds Ratio
Suff. 2*Pr(Suff.)
[95% Conf. Interval]
-------------+------------------------------------------------------------blount |
12.48967*
8
0.0111
1.655921
+Inf
--------------------------------------------------------------------------(*) median unbiased estimates (MUE)
Next, adjusting for weight,
exlogistic failed blount weight
note: CMLE estimate for blount is +inf; computing MUE
Exact logistic regression
Number of obs =
31
Model score
=
7.65015
Pr >= score
=
0.0178
--------------------------------------------------------------------------failed | Odds Ratio
Suff. 2*Pr(Suff.)
[95% Conf. Interval]
-------------+------------------------------------------------------------blount |
8.495888*
8
0.0600
.9205344
+Inf
weight |
1.01012
762.3
0.7005
.9613036
1.064392
--------------------------------------------------------------------------(*) median unbiased estimates (MUE)
Sorting the data and listing,
sort id sequence
list , sepby(id) abbrev(15)
Chapter 5-20 (revision 16 May 2010)
p. 26
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
+-------------------------------------------------------------------+
| id
sequence
blount
age
deformdeg
weight
bmi
failed |
|-------------------------------------------------------------------|
| 1
1
1
10
19
90.7
35.4
1 |
|-------------------------------------------------------------------|
| 2
1
1
9
16
92.6
38.1
1 |
|-------------------------------------------------------------------|
| 3
1
1
12
20
77.1
33.2
1 |
| 3
2
1
12
20
77.1
33.2
1 |
|-------------------------------------------------------------------|
| 4
1
1
10
10
89.5
37.3
1 |
|-------------------------------------------------------------------|
| 5
1
1
9
17
108.1
44.4
1 |
|-------------------------------------------------------------------|
| 6
1
1
12
22
86.2
32.4
1 |
| 6
2
1
12
22
86.2
32.4
0 |
|-------------------------------------------------------------------|
| 7
1
1
12
12
141
48.2
1 |
|-------------------------------------------------------------------|
| 8
1
1
11
10
80.3
32.2
0 |
|-------------------------------------------------------------------|
| 9
1
1
12
17
121
45
0 |
|-------------------------------------------------------------------|
| 10
1
1
10
13
87
34.9
0 |
|-------------------------------------------------------------------|
| 11
1
1
13
10
99.8
40.2
0 |
|-------------------------------------------------------------------|
| 12
1
1
7
19
55.4
27.5
0 |
|-------------------------------------------------------------------|
| 13
1
1
13
12
99.4
39.3
0 |
|-------------------------------------------------------------------|
| 14
1
1
11
14
109
39.1
0 |
|-------------------------------------------------------------------|
| 15
1
1
10
11
102.8
38.5
0 |
|-------------------------------------------------------------------|
| 16
1
1
11
19
74.8
33.3
0 |
|-------------------------------------------------------------------|
| 17
1
0
14
8
70
26.3
0 |
| 17
2
0
14
8
70
26.3
0 |
|-------------------------------------------------------------------|
| 18
1
0
12
11
54.1
19.6
0 |
| 18
2
0
12
11
54.1
19.6
0 |
|-------------------------------------------------------------------|
| 19
1
0
14
13
47.3
17
0 |
|-------------------------------------------------------------------|
| 20
1
0
11
20
72
32
0 |
| 20
2
0
11
20
72
32
0 |
|-------------------------------------------------------------------|
| 21
1
0
12
12
32
16.3
0 |
| 21
2
0
12
12
32
16.3
0 |
|-------------------------------------------------------------------|
| 22
1
0
11
20
102.8
38.5
0 |
| 22
2
0
11
20
102.8
38.5
0 |
|-------------------------------------------------------------------|
| 23
1
0
7
20
70.8
54.2
0 |
| 23
2
0
7
18
70.8
54.2
0 |
+-------------------------------------------------------------------+
We see that in 8 patients, both legs were included in the dataset. Our dataset, then, is an example
of a study that Bryant (2006) mentioned, two pages above, as “multiple joints from single
patients”.
Chapter 5-20 (revision 16 May 2010)
p. 27
A multilevel version of exact logistic regression is not available in Stata. We might try treating
implant as a repeated measurement in the same patient, and use the maximum of the failed
variable as a summary measure. The collapse command is an easy way to accomplish this.
Since the covariates are constant for the same patient with two joints included in the dataset, we
can use the max of all variables without loosing any information on the covariates,
collapse (max) blount-failed , by(id)
list , abbrev(15)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
+--------------------------------------------------------+
| id
blount
age
deformdeg
weight
bmi
failed |
|--------------------------------------------------------|
| 1
1
10
19
90.7
35.4
1 |
| 2
1
9
16
92.6
38.1
1 |
| 3
1
12
20
77.1
33.2
1 |
| 4
1
10
10
89.5
37.3
1 |
| 5
1
9
17
108.1
44.4
1 |
|--------------------------------------------------------|
| 6
1
12
22
86.2
32.4
1 |
| 7
1
12
12
141
48.2
1 |
| 8
1
11
10
80.3
32.2
0 |
| 9
1
12
17
121
45
0 |
| 10
1
10
13
87
34.9
0 |
|--------------------------------------------------------|
| 11
1
13
10
99.8
40.2
0 |
| 12
1
7
19
55.4
27.5
0 |
| 13
1
13
12
99.4
39.3
0 |
| 14
1
11
14
109
39.1
0 |
| 15
1
10
11
102.8
38.5
0 |
|--------------------------------------------------------|
| 16
1
11
19
74.8
33.3
0 |
| 17
0
14
8
70
26.3
0 |
| 18
0
12
11
54.1
19.6
0 |
| 19
0
14
13
47.3
17
0 |
| 20
0
11
20
72
32
0 |
|--------------------------------------------------------|
| 21
0
12
12
32
16.3
0 |
| 22
0
11
20
102.8
38.5
0 |
| 23
0
7
20
70.8
54.2
0 |
+--------------------------------------------------------+
Fitting the exact logistic regression again, controlling for weight,
exlogistic failed blount weight
Exact logistic regression
Number of obs =
23
Model score
= 4.464865
Pr >= score
=
0.1013
--------------------------------------------------------------------------failed | Odds Ratio
Suff. 2*Pr(Suff.)
[95% Conf. Interval]
-------------+------------------------------------------------------------blount |
3.483329*
7
0.2963
.3615471
+Inf
weight |
1.014742
685.2
0.5928
.9643557
1.072998
--------------------------------------------------------------------------(*) median unbiased estimates (MUE)
Chapter 5-20 (revision 16 May 2010)
p. 28
We went from OR=12.5 (p=0.011) when weight was not included, to OR=8.5 (p=0.060) when
weight was included, to OR=3.5 (p=0.29) when weight is included and the maximum of the
outcome summary measure was used.
Which model is correct, the OR=8.5 or this last one OR=3.5?
Recall this article was referring to the screws that attach the implanted plate, where a failure
outcome was noted if the screw broke. One could make the argument that the screws were the
“unit of analysis”, rather than patient. Then, the screws really are independent, since each screw
is a different unit. The fact that the screws are in the same patient simply means they had a
similar exposure, but that does not make them a “repeated measurement”. Making this
argument, a multilevel model, or repeated measurements model, is not needed so the OR=8.5
model is correct.
An example of what Bryant was referring to as “two limbs or multiple joints from single
patients” is bone mineral density (BMD), measured around the implant in the implanted limb and
in the same anatomical location of the non-implanted limb of the same patient. These are
patient-level outcomes, where the biology is similar between the two limbs, since both limbs are
in the same patient. In contrast, the patient biology does not affect the physical properties of the
screws discussed in the preceding paragraph.
References
Bryant D, Havey TC, Roberts R, Guyatt G. (2006). How many patients? How many limbs?
Analysis of patients or limbs in the orthopaedic literature: a systematic review. J Bone
Joint Surg 88-A(1):41-45.
Burton P, Gurrin L, Sly P. (1998). Extending the simple linear regression model to account for
correlated responses: an introduction to generalized estimating equations and multi-level
mixed modelling. Statistics in Medicine 17:1261-91.
Diggle PJ, Liang K-Y, Zeger SL. (2000). Analysis of Longitudinal Data. Oxford, Oxford
University Press.
Dupont WD. (2002). Statistical Modeling for Biomedical Researchers: a Simple
Introduction to the Analysis of Complex Data. Cambridge, Cambridge University
Press.
Gregoire AJP, Kumar R, Everitt BS, Henderson AF, Studd JWW. (1996). Transdermal oestrogen
for the treatment of severe post-natal depression. The Lancet 347:930-934.
Hillman P, Young NS, Schubert J, et al. (2006). The complement inhibitor eculizumab in
paroxysmal nocturnal hemoglobinuria. N Engl J Med 355(12):1233-43.
Lang CC, Stein CM, Brown RM, et al. (1995). Attenuation of isoproterenol-mediated
vasodilation in blacks. N Engl J Med 333:155-60.
Chapter 5-20 (revision 16 May 2010)
p. 29
Pronovost P, Needham D, Berenholtz S, et al. (2006). An intervention to decrease catheterrelated bloodstream infections in the ICU. N Engl J Med 355(26):2725-2732.
Rabe-Hesketh S, Everitt B. (2003). A Handbook of Statistical Analyses Using Stata. 3rd
Ed. New York, Chapman & Hall/CRC.
Rabe-Hesketh S, Skrondal A. (2005). Multilevel and Longitudinal Modeling Using Stata,
College Station, Tx, Stata Press.
Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychological
Bulletin 1979;86(2):420-428.
Streiner DL, Norman GR. (1995). Health Measurement Scales: A Practical Guide to
Their Development and Use. New York, Oxford University Press.
Thara R, Henrietta M, Joseph A, Rajkumar S, Eaton W. (1994). Ten year course of
schizophrenia—the Madras Longitudinal study. Acta Psychiatrica Scandinavica 90:329336.
Twisk JWR. (2003). Applied Longitudinal Data Analysis for Epidemiology: A Practical
Guide. Cambridge, Cambridge University Press.
Chapter 5-20 (revision 16 May 2010)
p. 30
Download