Understanding Approaches to Account for Clustering of Observations

advertisement
Understanding Approaches to Account for Clustering of Observations
in Health Services Research
A. Russell Localio
Division of Biostatistics
Center for Clinical Epidemiology and Biostatistics
University of Pennsylvania, School of Medicine
Academy Health
2004 Annual Research Meeting
San Diego, June 6, 2004 (corrected 06/08/2004)
This project was supported in part by an Agency for Healthcare Research and Quality
(AHRQ) Centers for Education and Research on Therapeutics cooperative agreement (grant #
U18 HS10399) and by Agency for Healthcare Research and Quality, Grant No. R03 HS 1148101.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 1 of 80
C
Abstract
This outline provides an overview of many of the problems and some of the alternative solutions to
account for clustering of observations within “centers” in health services research. The “center”
refers to any natural or purposeful grouping of individuals. Studies are categorized across several
dimensions: randomized vs observational, randomization within or across centers, fixed vs random
effects, continuous or binary outcomes. Issues such as profiling of centers, confounding by center,
and volume/outcome studies appear as special examples. The outline reviews and discusses the
analytic alternatives and their strengths and weakness, as well as the software options. A
bibliography offers references for materials covered as well as general guidance for further details.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 2 of 80
C
Outline
§1 Applications and methods
§2 Multicenter Designs – randomization within or among centers
§3 Observational Studies – Patient or center-level factors
§4 Complex Designs – Longitudinal Analyses
§5 Analysis options
§6 Report cards and profiling
§7 Confounding by Cluster
§8 Volume-outcome studies – Correlations of fixed and random effect
§9 Crossed vs nested effects
§10 Model specification and “Interactions”
§11 Comments and Conclusions
References:
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 3 of 80
C
§1
Applications and methods – Studying grouped data
A.
Applications to which these notes are relevant
Multicenter clinical trials of the effect of drugs and therapeutics
Multicenter studies on the use or misuse of drugs or on modification of behavior or
exposure of clinicians and/or patients
Surveys of patients
Profiling of physicians, clinics, hospitals, and health plans
Observational studies of associations of outcomes and exposures at the patient and/or
physician and/or hospital level
B.
Reasons for studying grouped data:
(1) Patients are naturally grouped into clusters
Hospitals, clinics, physicians, neighborhoods, communities
(2) Analysis of groups realizes efficiencies in design
(3)
Simple random sampling designs are too expensive
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 4 of 80
C
C.
Analytic methods
(1) Frequentists methods
Survey methods – design based
Population averaged
Center (cluster) specific
(2) Bayesian methods (Gibbs Sampling for complex analyses)
(3) Permutation-test-based methods (assumption free)
D.
Some notation and taxonomy
Y = outcome
X = covariates including factor of interest
Center = Any natural grouping of individuals (a cluster)
Centers indexed j=1,…,J
Patients within centers indexed i=1,…,nj
Treatment/exposure – Usually the factor (X) of interest
E.
Outcomes (Y)
Focus on continuous and binary outcomes
Issues for binary data usually apply for ordered categorical data and counts
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 5 of 80
C
F.
Paradigm of generalized linear model
µ = α + X β -- linear combination of factors
µ = h( E ( y | X )) -- link between outcome and linear combination
G. Time and Center – key factors in study design and analysis
Single time
Multiple times
Within center
Parallel group RCT
Longitudinal, repeated
measures of patients
Between Center
Cluster randomization design
(1) Repeated cross sectional
design
(2) clustered longitudinal
design
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 6 of 80
C
§2
Multicenter Studies – randomization within or among centers
§2.1 Randomization within centers
-Both treated and nontreated patients within each center
-Advantage of accrual of large samples
-Common for FDA-submissions for drug trials
-Paradigm for
-Studies conducted by single sponsor at one time period (multi-center RCT)
-Multiple studies conducted by multiple investigators over different times (meta
analysis)(Normand 1999)
-Design and analysis issues (Fig 2.1)
-Intercept --Variation in risk across centers among the control (standard care) patients
-Slope -- Variation in the effect size across centers
-Balance -- proportion assigned to treatment (and followed) across centers (can be lost
from incomplete followup or missing data)
-Sample size -- Number of centers
--Number of patients per center
--Variation in the sample sizes across centers
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 7 of 80
C
Fig 2.1 Example: (variation in baseline risk and treatment effect across centers)
p
1
0
0
1
tx
-30 centers
-Patients randomized within center to treatment (tx=1) or to control (tx=0)
-Variation across centers in average risk among the control patients (variation in points at left)
-Variation across centers in degree of improvement among treated compared to controls (slopes)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 8 of 80
C
§2.2 Randomization among centers – Cluster randomization designs
Characteristics:
-Centers are randomized to treatments (or interventions)
-Patients receive treatment by reason of being members of centers
-Numbers of centers usually small
-Numbers of patients within centers usually large
(Green 1995; Murray 1998; Campbell 1998; Cornfield 1978; Donner 2000, 1994, 1980)
Challenges
-Bias --Randomization at center level does not ensure balance of patient characteristics
-Variance – Naïve analytic methods overlook “design effect”. Overstate significance
-Sample size estimation must consider correlation of subjects within centers
-Reporting now subject to standards – CONSORT (Campbell 2004)
(Donner and Klar 2000)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 9 of 80
C
§3.1 Observational Studies – Interest in effect of treatment/exposure
Characteristics
-Responses based on complex sampling designs with stratification, clustering, unequal
probabilities of selection (NHANES, NHIS, NMES and other surveys)
-Effect of treatment or exposure within centers (e.g., patient age)
-Effect of exposure at the center level (e.g., hospital teaching status)
Challenges
-Estimate the effect of exposure
Controlling for confounders and
Adjusting for “design effect” (ratio of variance to that under simple random
sampling)
-Design effects >1.0 common
- clustering of respondents within sampled units
- unequal sampling probabilities and thus need to adjust for “sampling weights”
-Absent randomization, adjustment for confounders is usually essential to control for
-Patient levels factors that are associated with outcome and exposure of interest
-Center level factors
-Context effects – Influence of a factor at the patient level as well as at the group level
Individual income might influence individual’s access to care in the community
Community income might influence access to care of anyone in community
-Bias from confounding by center
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 10 of 80
C
§3.2 Observational studies – Profiling -- Variation of outcomes across centers
Characteristics
-Patients differ across centers (“case mix” problem)
-Centers, or “outlier” centers, are the object of comparison - “profiling”
Challenges
-Multiple comparisons without any prior estimates of which centers are “outliers”
-Added complexity of correct and appropriate analyses
-Intense demand for “some results” to identify “bad apples” and quality providers
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 11 of 80
C
§4 Complex designs -- longitudinal designs
§4.1 Repeated measures within patients over time – patients as the “center”
Characteristics
-Patients are measured repeatedly over time
-Patients then become the “clusters” of correlated observations
Challenges
-Correlation structure over time becomes part of the model
-Must fit and test models with alternative correlation structures
-Not a focus of this talk. New texts on applied statistics are helpful
(Fitzmaurice, Laird, Ware 2004; Diggle 2002, 1994)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 12 of 80
C
§4.2 Repeated cross sectional studies – groups of patients as the centerr
tx==0
tx==1
1
.8
.6
mean proportion
.4
.2
0
0
1
0
1
Pre-Post Intervention
Cluster Randomization -- Example -- 50 persons per time per site
(Fig 4.2)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 13 of 80
C
Characteristics (Fig 4.2) (Feldman 1984)
-Same treatment/exposure assignment to all persons within a center
-Patients clustered within centers, but individuals are not followed over time
-Subsequent time periods involve different sets of patients in the same centers
-2 sets of clusters randomized to treatment (tx=1) and control (tx=0) (simulated data)
-Variation of center-specific rates at baseline (time = 0)
-Variation over time among centers randomized to control but no overall improvement
-Variation over time among centers randomized to treatment and overall improvement
-Increased validity of estimates of treatment effect because of “control” centers
-Decrease in power from clustering of patients in centers offset by increased power of
repeated measures over time
Challenges
-Design effect from cluster randomization
-Model estimates of interest are the time*treatment interaction terms
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 14 of 80
C
§4.3 Clustered cohort designs
Characteristics
-Patients are clustered within centers
-Treatment (exposure) applied to entire center
-Individual patients followed over time and measured repeatedly
-Added power of having each patient serve at his/her own control
-Validity of having simultaneous control group of centers followed over time
Challenges
-Two levels of clustering with different degrees of correlation
Within patient over time
Among patients within centers
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 15 of 80
C
§5
Analysis options – strengths and limitations
§5.0 Preliminary distinctions
§5.0.1 Fixed vs Random effects
Define carefully what is “fixed” and what is “random”
Intercept
Model variation across centers by separate
Fixed effect -Centers in
intercept for each center µij = α1 + ... + α k + β tx
sample assumed
to represent
only themselves
random effect-- Variation across centers assumed to follow a
Centers in
distribution, often normal U j . Constant
sample
treatment effect β
represent a
population of
centers
µij = α + U j + β tx
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 16 of 80
C
Slope
Same treatment effect at each
center
β
Variation across centers and
variation of treatment effect
across centers ∆ j . Assume a
distribution, often normal
µij = α + U j + β tx + ∆ jtx
Longford (1993)
§5.0.2 Population- averaged (PA) vs Center-Specific (CS) models
(Liang 1993; Neuhaus 1991; Graubard 1994; Diggle 2002; Burton 1998; Hu 1998; Carlin
2001)
Population-averaged data model -- center effect enters the residual variances:
µij = α + β PA ∗ xi + εij , where cov(e) = σ2 e I + σ2u I
There are two sources of error variance – random error across patients ( σ2 e ) and then
variation across centers ( σu 2 )
µij = h[ E (Yij | xi )] , i.e., the marginal mean = average response given the covariate(s)
β̂ measures of the effect of the intervention/exposure averaged over centers
Center-specific data model – center effect is explicit in the model
Yij = α + βCS ∗ xi + U j + εij
U j represents variation of outcome across centers not accounted for by covarates
U j : N (0, σu 2 ); εij : N (0, σe 2 );cov(U j , eij ) = 0
µij = h[ E (Yij | xi ,U j )] , i.e., the conditional mean is the response for a covariate pattern and
center
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 17 of 80
C
Interpretation: (Hu 1998)
βˆ PA = change in E(y) for a change in x from baseline (x=0) to the comparison level (x=1)
adjusted for covariates for the population of individuals at x=0 compared to the
population at x=1
βˆ CS = change in E(y) for an individual if he/she were to change from x=0 to x=1 (adjusted
for covariates) The same interpretation applies to center-level factors, such as hospital
size or teaching status, but changes in these center-level factors are more difficult to
explain with the CS model. For example, think of the interpretation of a factor x=0/1,
urban/rural. How can one describe the change of a hospital from urban to rural?
Estimation:
PA – GEE, robust regression: (Diggle 2002, chapters 7,8)
-regression uses robust variance estimates to allow for correlation of observations
within center
-When the interest lies not in changes over time or within center but across centers
CS -- Random or mixed effects models; conditional methods
-When covariates vary within center, CS methods might be more satisfactory
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 18 of 80
C
-Linear models: PA and CS estimates should be similar
-Poisson models: PA and CS intercepts will differ; β̂(ln(RR)) should be similar
-Logistic models: βˆ < βˆ
PA
CS
This result (PA estimates are attenuated relative to CS) in logistic models has two
analogies:
(1) Logistic regression with an omitted covariate produces estimates attenuated
toward null (Gail 1984)
(2) In absence of conventional confounding, unconditional (compared to
conditional) estimates will be attenuated towards null = noncollapsibility of the
odds ratio (Gail 1984)
Model misspecification – when the data and analysis models differ
If CS model is true and use PA methods
Attenuated estimates of β̂
ˆ
But β ˆ should be appropriate
se(β)
As one adds covariates, the degree of attenuation of OR should lessen
If PA model is true and use CS methods (remains for further work)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 19 of 80
C
Choice of PA vs CS – Still controversial with each having its adherents
CS -- Lindsay & Lambert 1998; Goldstein 2002
PA -- Fitzmaurice, Laird, Ware 2004
Carlin (2001) has a good discussion of practical issues
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 20 of 80
C
§5.0.3
Permutation-test based methods (Good 2000; Gail 1996)
What are they?
(1) Arrive at a test statistic based on observed data, e.g., difference in treatment means
(2) Compute the test statistic for the observed data
(3) Permute the observations across the treatments in all possible ways to arrive at a
distribution of the test statistic under the null hypothesis – that there is no treatment
difference
(4) Compare the observed test to the distribution under the null and determine how far
out on the tail of this distribution the observed statistic lies
(5) Estimate a confidence bound for the observed test by adding (and subtracting) very
small and increasing values until the upper (and lower) bounds are just statistically
significant when compared to the null distribution (algorithms can do this computerintensive process efficiently)
(7) Covariates are handled by defining test statistics based on residuals after adjusting for
covariates
(8) In theory these methods should provide statistical tests, estimates, and confidence
intervals
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 21 of 80
C
§5.0.4 Bayesian methods (Gelman 2004)
As contrasted with “classical” (frequentist) approaches
Starting with mixed effects models, Bayesian methods add prior distributions to the
model parameters: For example, for the simple random intercept model:
βˆ : N (0, 1
1
); 2 ~ gamma( a = 0.001, b = 0.001)
0.001 σ
Using intensively iterative procedures (e.g. Gibbs Sampling) characterized by:
- a preliminary “burn in” sequence and
- a subsequent sequence that is followed to some equilibrium values of the
parameter estimates
- prior distributions selected to be “flat” or “noninformative” so that data
overwhelms the prior distribution (of the estimates) to arrive at a posterior
- Requires use of diagnostics to determine convergence of sequences
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 22 of 80
C
§5.0.5
Bootstrap methods (Good 2001; Carpenter 2000; Diaconis 1983; Efron 1991))
Resampling the sample repeatedly (with replacement) to ascertain robust estimates that
ask: What would be the range of estimates if we had many samples from the same
underlying population?
These can be:
Nonparametric -- Resampling residuals adjusted to have a “correct” covariance
structure consistent with the study design
Parametric – Resampling is based on a set of random effects U j ' s and errors
eij ' s sampled from their respective distributions (assumed to be normal)
But these methods, as applied to multi-center data, need more exposition
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 23 of 80
C
§5.0.6
Randomized vs observational studies – Treatment/Exposure
Exposure in an observational study can be handled as “treatment” in a randomized
design, with due attention to issues:
(1) Causation is more difficult to demonstrate
(2) Confounding by observed and unobserved factors
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 24 of 80
C
§5.1 Analysis options: Stratified 2 by 2 tables in presence of clustering
-Goal is stratified analyses (outcome*exposure by age category)
-Observations are not independent owing to clustering
-Solution:
Reduce the effective sample by the “design effect”
Reduce the Mantel-Haenszel χ 2 statistic by the design effect
Apply standard statistical tables to adjusted statistic
(Rao 1992)
-Simple solution using standard software and adjusted chisq statistic
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 25 of 80
C
Exposure category
1
2
3
4
5
Center
1
ab
2
cd
ef
gh
K
In the absence of clustering, the observations a,b,c,d, …,h should be independent. But in the
presence of clustering, a and b are more closely related to each other than are a and c through h.
The absence of independence means that standard chisq tests are not appropriate.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 26 of 80
C
§5.2 Multicenters studies with intervention/exposure varying within centers
While the following comments focus on designed interventions, they also apply to observational
studies in which the exposure of interest is contrasted within center
§5.2.1
Pooled vs stratified methods -- with emphasis on binary outcomes
Underlying issues are:
(a) Does risk in baseline/reference (unexposed) persons vary across centers?
(b) Does the effect of intervention vary across centers? = interaction
(c ) Are the centers “fixed” or “random”?
Common practice: treat all centers a single center if intervention or exposure is “balanced”
within center. “Naïve pooling”
But what if:
-Followup is incomplete across centers?
-Inferences of interest are within subgroups (interaction of intervention and gender)?
-Populations differ across centers so that baseline risks differ?
-Centers differ in size?
-Treatment effects seem to differ across centers?
(Agresti 2000; Senn 1998)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 27 of 80
C
§5.2.2.1 Methods in which the center effect is estimated
Fixed effects models – Indicator variable for each center
µ = α1 ,..., α J (assuming no treatment by center interactions)
-Can estimate of baseline risk/outcome in unexposed or control group in each center
-Too few patients per center results in badly biased estimates
-Low power because of large number of terms for estimation
-Requires very large sample of patients per center
-Adding interaction terms (centers*treatment) to allow for variation in effect across
centers – adds J-1 more degrees of freedom to model and requires even larger
samples
-Regular regression software applicable
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 28 of 80
C
Random intercept models
-Assume that each center’s baseline value (risk) varies about average following a
distribution ( µ = α + X Β + U j ;U : N (0, τ) )
-This assumption:
Reduces degrees of freedom (fewer parameters to be estimated)
Allows for greater than sampling variation of baseline risk across centers
-Many software options: SAS (Mixed); Stata (xtreg, xtlogit …); MLWin; HLM
-Issues of concern in implementation
Will frequentist methods estimate the components of variance?
Are Bayesian methods necessary to allow for uncertainty in estimates?
Software options are BUGS 0.6, WinBUGS 1.4, or MLWin
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 29 of 80
C
Random slope models:
When the treatment/exposure effects vary across centers (as contrasted with varying
baseline risks)
Fewer degrees of freedom than comparable fixed effects model
Commonly seen in meta analyses (but issues apply to all studies)
Studies from different protocols and across widely different population can expect
different effects of treatment or exposure
Sometimes seen in multi-center studies and observational studies
Reason for variation in effect across centers needs explanation
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 30 of 80
C
Random slope models(cont))
Methods:
(a) DerSimonian & Laird - simple, readily available (Stata “metan”; Rev Man)
Used primarily in meta analyses, but it can apply to multicenter RCTs and to
observational studies
Can produce biased estimates
(b) Mixed effects models for continuous outcomes
Generalized linear mixed effects models for binary, Poisson, ordered categorical,
outcomes
Treatment is a “fixed effect”
Center is a random effect (random intercept representing baseline variation)
Treatment at center level has random slope
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 31 of 80
C
µij = α + β1 ∗ txij + U j + ∆ j tx , where j indexes the center and I indexes the patient
within center,
α represents a “fixed intercept” the average baseline across center, and
β1 represents the “fixed slope”, the average treatment effect across centers
U j : N (0, τ1 ) is a random intercept representing variation of the baseline effect
across centers. Assumption saves degrees of freedom
∆ j : N (0, τ 2 ) is a random “slope” representing variation of the treatment effect
across center. Assumption saves degrees of freedom
Variances of confidence intervals of β1 are generally wider than for same estimate
from a fixed effects analysis
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 32 of 80
C
Software options
-Continuous outcomes
– SAS Mixed (popular and well documented
- MLWin; HLM
- Splus and R (lme) (Venables 2002; Everitt 2001)
-Binary outcomes
-Quadrature is recommended
SAS NLMIXED
Splus 6.2 (correlated data library)
-Approximations (recommended only for many large centers)
SAS glimmix
R (glmmPQL)
MLWin (PQL)
HLM (Works well with many centers; awaiting simulations for smaller
numbers of smaller centers)
Additional concerns:
Attention to reporting and explaining differences -- just as with any instance of
“effect modification” across strata of a covariate
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 33 of 80
C
§5.2.2.2 Marginal methods in which the baseline center effect is not estimated:
Here the analysis model is: µij = α + XB + εij
Variances are adjusted to account for the inflated error from having multiple centers
Assumptions:
Effect of intervention does not vary across centers
Random intercept (U j ) is a nuisance parameter not in need of estimation
Must adjust confidence intervals to allow for excess variance
Common methods:
-generalized estimating equations (GEE)
SAS Genmod
Stata xtgee
Splus 6.2 (correlated data library – download)
-survey methods (the equivalent of GEE with independence corr)
SUDAAN
Stata xtgee
-resampling methods – if done at the center level
Stata “bs” “bootstrap” functions
Splus 6.2 –resample library
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 34 of 80
C
§5.2.2.3 Conditional methods in which the baseline center effect is not estimated:
Conditional regression -- For estimating within-center effects
Variance – typically less of a problem for RCTs but all estimates assume
“fixed effects”, i.e, that these centers are fixed and not a sample from a
larger population of centers
Bias – the principal concern in analysis
Binary outcomes
-Central issue involves noncollapsibility of the odds ratio
-Must stratify an analysis even in the absence of imbalance treatment across centers
-The pooled OR will differ from the stratified OR and be attenuated towards 1.0
-Mantel-Haenszel for binary outcomes stratified by center
Estimates for odds ratios (OR), relative risk (RR) and risk difference (RD)
-Conditional logistic regression to control for patient-level covariates
-Easy to compute using many standard software packages (SAS, Stata, StatXact)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 35 of 80
C
Ordered or continuous outcomes
-Stratified linear rank tests – Permutation tests and their special cases
Wilcoxon Rank Sum; Normal Scores (Van de Waerden) (StatXact; SAS)
-Permutation tests for any outcome
Stata “permute” function with stratification
-Fixed effects regression – Within-center effects of treatment (Stata “xtreg, fe”)
Survival data
-Stratified logrank test
(StatXact, Stata)
-Stratified Cox regression (Stata, Splus, R)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 36 of 80
C
§5.3 Cluster randomized designs (Atienza 2002; Donner 2000; Murray 1998, 2001)
§5.3.1 Single-time study (Donner; 2000; Green 1995; 1997)
Simple methods: For both continuous and binary outcomes
-t-tests (2-sample) – use mean of each center as an observation
The method is simple but can be conservative (p-values too large)
Can adjust for covariates (see permutation-test-based methods)
-Adjusted chisq test (Donald 1987; Donner & Klar 1994)
For the association of Y and X across centers
Generalized linear models to adjust for covariates at the patient level
-Generalized estimation equations (GEE) (Bellamy 2000)
These will work well when the number of centers is large, but are anticonservative
(p-values too small) with few centers
Population averaged models seem more appropriate because contrast is between
rather than within centers
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 37 of 80
C
-Center-specific methods
– penalized quasi likelihood (PQL – SAS Glimmix; MLWin)
Works better than GEE with few centers but still problematic with few centers
because of bias in estimating intraclass correlation coefficient (ICC) (Bellamy
2000)
-Quadrature methods – (SAS NLMIXED, Stata gllamm; Splus 6.2 (correlated data
library) – Performance remains for further analysis
-Other approximations (HLM Laplace) – Performance remains for further analysis
-Bayesians methods (BUGS 0.6, WinBUGS etc) – Performance remains for further
analysis
-Permutation tests
-These can be configured to allow for covariates by permuting the residuals
(Braun 2001; Gail 1996)
-Confidence intervals – obtained by “inverting the test” (find the bounds of
the value of the effect size [e.g. risk difference or difference in means] such
that the results of the permutation test are exactly p=0.05) (Good 2000; §3.2)
-Strong theoretical justification -- strength
-Absence of standard software -- weakness
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 38 of 80
C
§5.3.2 Repeated cross sectional designs – Analysis options (Bellamy 2000)
-Patients are not followed as individuals, but centers are followed over time
-Single level of clustering
-Interest lies in the time*treatment interaction (does treatment group of centers improve
more than the control group)?
-Assume
(1) a random intercept (variation across centers)
(2) a random slope (variation in the treatment effect over time)
-Continuous outcomes –
REML -- Widely available in SAS (mixed), Splus (lme), R (lme)
RIGLS – Used in MLWin
HLM (Widely used but requires multistep data setup)
Quadrature – Stata (gllamm); SAS (nlmixed)
These are far slower than REML algorithms and do not offer improved
performance in results
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 39 of 80
C
-Binary outcomes –
-PQL(Penalized quasi likelihood) – MLWin, SAS (glimmix), R (glmmPQL),
Splus(correlatedData)
PQL has been criticized for most (McCulloch 2002) or all (Neuhaus 2001)
applications, but performs adequately with large centers and modest random
effects. Performance becomes unsatisfactory (coverage <0.9 for 95%
confidence intervals) with 10 or fewer centers and even modest std deviation
of random effects
-Laplace approximation (HLM) – Performance needs further evaluation
-Adaptive quadrature – SAS NLMIXED; Stata gllamm; Splus 6.2 (correlatedData)
Offer improved performance at the expense of much slower execution times
Also result in undercoverage (not as severe) with smaller numbers of centers
and increased variability of baseline risks (random intercept) and treatment
effects (random slope)
In some applications, results can be sensitive to number of quadrature points
Do not rely on default quadrature points. Try 8, 12, 16.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 40 of 80
C
-Bayesian methods
BUGS 0.6
WinBUGS 1.4 – Very poor data entry
R(bugs.R) (Gelman 2004) -- A front end for WinBUGS with vastly improved input
and output
MLWin 2.0
These offer best performance with acceptable execution times (based on work in
progress)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 41 of 80
C
-Permutation-based methods (Rosenbaum 2002)
-These methods make no assumptions about parametric distributions; do not have to
ˆ
assume that β µ : N (0,1) .
se(βˆ )
-Covariates? – Fit a linear model of the outcome (Y) as a function of (X),
except for treatment, and apply a permutation-test based method on the
residuals. Covariates might be (a) patient level factors that differ across
center, (b) baseline risk at each center
-Confidence intervals can be obtained by “inverting the test”, i.e., searching
for values (θL , θU ) of the test statistic that exactly coincide with the rejection
region of the null distribution. This search can be very inefficient (requiring
1000s of permutations if not done efficiently). One option is reported by
Garthwaite (1996) – “Robbins-Monro search”
-For centers of unequal size, weighting methods are possible (Braun 1999)
-Limitations – Unbalanced data (unequal numbers of centers in intervention/control
groups
-Software – No easy-to-use software. Stata (permute) might offer a solution
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 42 of 80
C
Overall comments on repeated cross-sectional designs
-Many analyses are tractable using standard software
-Major challenges – Too few centers for analysis
-Much additional work needed on performance of alternative methods
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 43 of 80
C
§5.3.3 Clustered cohort design – Analysis options
Model the multiple layers of clustering – time within patient, patient within center
Nested modeling cannot ignore the multiple layers (Ten Have 1999)
What is the more appropriate perspective: population averaged or center-specific?
-The questions is: What is the improvement to an individual within a center over time
(subject to treatment), and how does this improvement compare to what the same
individual would experience in another center (not subject to treatment)?
-CS models are likely more appropriate in this context
Analysis paradigm: Multilevel modeling – nested random effects (Sullivan 1999)
-Continuous outcomes REML (MLWin, SAS proc mixed, Splus (lme) R (lme)
-Fast and efficient algorithms
-Syntax for SAS is especially flexible and documentation is ample
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 44 of 80
C
-Binary outcomes
-PQL (Penalized quasi-likelihood)– (SAS glimmix, MLWin, R – glmmPQL
These methods fail when (as is usual) the number of repeated measures
within patients over time is small. Not recommended
-Laplace approximation (HLM) – Performance uncertain with this application
-Adaptive quadrature – Stata (gllamm); Splus 6.2 (Correlated Data Library)
(SAS nlmixed not an option – handles only single level of random effects)
Far better than PQL in terms of bias and coverage
-Non-linear mixed effects models (Splus, R “nlme”) (Venables 2002)
(Performance in these applications needs assessment)
-Alternating logistic regression (Carey 1993) SAS GENMOD
(Performance in these applications needs assessment)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 45 of 80
C
-Any outcome
-Permutation-based methods (Gail 1996)
Works well with additive models – For differences (between treatment group)
of differences (within patient over time). Covariates not a problem
Performance for logit models with few repeated measures within patient
uncertain. Donner & Klar (2000) dismiss this option.
-Bayesian methods:
BUGS (Spiegelhalter 1997 ); WinBUGS; Bugs.R; MLWin;
(S-plus 6.2 (Bayes) – problematic as of this writing)
Perhaps the easiest and most promising methods (Work in progress)
-Bootstrap resampling
Remains for further investigation as to implementation and performance
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 46 of 80
C
§5.4 Surveys – complex survey – stratified and clustered
Characteristics of survey data:
-Stratification – strata of different sizes with variable numbers of clusters
-Clustering --“primary sampling units” within strata (of different sizes)
-Unequal sampling fractions of PSUs and of individuals within PSUsà weighted data
- National surveys come with survey designs and weights in the datasets
Large stratum
Multiple PSUs per stratum
Small stratum
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 47 of 80
C
Software options for analysis of survey data:
-Sudaan: means, ratios, totals, regression, logistic regression, log linear models,
survival
-Stata (svy): means, ratios, totals, regression, logistic regression, multinomial logit,
ordered logit, Poisson, negative binomial,
-SAS (some limited options)
Interpretation is population averaged
(Hansen 1953; Kish 1965; Korn & Graubard 1999)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 48 of 80
C
§5.4.1
Survey methods for non-survey applications (LaVange 2001)
Application of survey methods to
-Multicenter clinical trials
-Repeated measures within patients over time
-Multiple outcomes within patients
-Nested clusters (repeated measures within patients, who are clustered within centers)
Potential for arriving at “population averaged” estimate accounting for multilevel data
Remains for extensive study via simulations of performance with varying:
-Number and size of strata
-Number and size of centers
-Number of subjects—overall and per center
-Balance across treatments and exposures
-Prevalences of factors and covariates
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 49 of 80
C
§5.5 Observational “hierarchical” models with center- and patient- level factors of interest
“Hierarchical models” – term refers to range of analyses including mixed effects models. This
section focuses on observational studies.
A large literature on “hierarchical models” from social and health sciences, mostly on
continuous outcomes (Goldstein 1995; Leyland 2001; Raudenbush 2002)
Analysis options are similar to those previously outlined with attention to whether factors are
within or across centers
-Population averaged methods (GEE) – Especially where interest lies with center level
factors
-Center-specific methods – Especially where interest lies in patient-level factor
(conditional on being in a particular center)
Hierarchies can be more than two levels:
-Patients clustered within physicians, and physicians within health plans
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 50 of 80
C
Population averaged methods might apply
-GEE
-Survey software also a potential analysis method (SUDAAN, Stata)
Mixed effects methods available in software packages
-SAS MIXED (ideal for continuous outcomes)
-SAS glimmix (PQL) – if centers are large
-SAS nlmixed (only for 2 levels of data or single level of clustering)
-Stata gllamm (adaptive and non adaptive quadrature)
-MLWin (has Bayesian analysis options)
-HLM (v5) (Uses Laplace approximation)
- R (lme and nlme)
-S-plus 6.2 (Correlated Data library)
Performance depends, again, on:
-Outcome – continuous, binary, ordered, counts
-Number of centers
-Number of observations within center
-Dispersion of outcomes across centers
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 51 of 80
C
Bayesian hierarchical models
-BUGS v 0.6 – Data input is simple, examples are abound (Spiegelhalter 1997)
-WinBUGS v1.4 – Data input is difficult, unless dataset is small (Spiegelhalter 2003).
Many examples (Congdon 2001; 2003)
-bugs.R (Gelman 2004 – Appendix C) – Resolves the data input problems in WinBUGS.
Requires writing program in BUGS. Good documentation
-Splus 6.2 – S+Bayes module provide hierarchical mixed effects models for
Few examples, crashes unexpectedly
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 52 of 80
C
§6
Report cards, league tables, and profiling (Marshall 1998)
-Misguided methods used by Pennsylvania Healthcare Cost Containment Council (and others)
(a) Estimate expected risk (mortality) by a logistic regression for each patient
(b) Sum expected risks across hospitals (surgeons)
(c) For each center j estimate (O j − E j ) / var
-Does not control at all for Type I error (multiple comparisons)
-High risk of finding outliers when cause is random variation
-More soundly based method – Mixed effects model
µij = α + X ij Β + U j , where U j : N (0, τ) represent the j centers
and X Β represents a matrix of fixed covariates (patient + center level)
Then the goal is to look for “outlier” centers: Γ j =
Uj
se(U j )
> 1.96
These Γ j represent how far the jth center departs from the overall average.
These estimators are denominated “best linear unbiased predictors” (BLUPs) or
sometimes called “empirical Bayes” or “skrunken” estimates
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 53 of 80
C
Report card methods for estimating U j ' s and Γ j ’s.
(a) Continuous measures – REML-based methods likely will perform adequately
provided large number of large centers
-SAS mixed; S-plus – lme; R-lme
(b) Binary outcomes – Perhaps more common (surgical deaths)
-PQL-based methods suffer from bias in estimating σu 2 (Evans 2001)
SAS glimmix; R – glmmPQL; MLWin
-Quadrature-based methods likely perform better
SAS nlmixed; Stata – gllamm
Splus 6.2 (correlated data library)
(c) Bayesian models for ordering centers based on outcomes
-Mixed-model-based estimators have standard errors that can be too small because
variance components from mixed model are estimated rather than known. But
extent remains to be determined in different settings.
-Bayesian methods attempt to account for additional uncertainty (Goldstein
1996)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 54 of 80
C
(d)
Use of more complex designs (repeated cross sectional models)
-The same mixed effects models could easily be extended to estimate “outliers” and
yet avoid Type I and Type II error.
-Follow centers over time and model
-This method remains for further development and testing.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 55 of 80
C
§7
Confounding by Cluster –
Issues of variance estimation for clustered data are well-known.
Problems of bias from confounding are less appreciated (Localio 2002; Berlin 1999; Neuhaus
1998; Ten Have 1996)
Ingredients for confounding by center:
(1) Focus on factors at patient level (race, gender, age) rather than center level (hospital size)
(2) Association between:
-Center and outcome – e.g., outcome varies across centers
(3)
Variation in prevalence of patient-level factors across centers
-Therefore, if patient-level factor is balanced (identically distributed) within center
confounding will not occur (except if special case using OR as the outcome)
(4)
If the odds ratio (OR) is the outcome:
Variation in the odds of outcome in the reference patient group across centers
(noncollapsibility)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 56 of 80
C
Example: Reperfusion therapy among Medicare Beneficiaries (Canto NEJM 2000)
Question: Do minority patients have lower access to reperfusion therapy for acute myocardial
infarction
Key Facts Known:
N= 26,575 patients
Outcome =57% patients received reperfusion therapy
Prevalence of risk factor (race) = 6%
Key factors Not Known:
Number of hospitals – perhaps 2000 or more
Variation in rate of reperfusion across hospitals
Variation of risk factor across hospitals
Findings:
Black women (RR=0.9) and black men (RR=0.85) significantly less
likely to receive perfusion therapy than white men, adjusting for patient
and hospitals (cath lab, urban, 3 sizes, 4 regions)
Proposed explanation:
Physicians’ clinical ambiguity, lack of training, insufficient knowledge
Alternative explanation:
Caucasian Medicare patients receive care at hospitals in which all
patients are more likely to receive reperfusion therapy.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 57 of 80
C
Table 7.1. Interquartile Ranges of Center-Specific Baseline Outcome (overall risk =0.2)
Std dev
Random
Effect
25th% ile
75th% ile
Range
0.0
0.5
1.0
0.15
0.24
0.09
0.11
0.31
0.20
0.10
0.34
0.24
Some variation across centers is consistent with random variation (sd=0.0) assuming that
observations from each center come from the same true underlying risk and only sampling
variation applies
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 58 of 80
C
Table 7.2. Bias under typical methods for analysis of clustered data. Mean odds ratios from
500 simulations.
True RR→
1.0
1.5
2.0
True OR→
1.0
1.71
2.67
Cluster
1.09
1.84
2.84
specific (CS)
Population
1.07
1.59
2.21*
averaged
(GEE)
Survey
1.57
2.32
3.24
(sandwich)
Dispersion of exposure and outcome: 2.0;Correlation of dispersion = 0.4
Baseline risk=0.2; Exposure prevalence=0.2
(*Note, in this example the PA (GEE) estimate is lower than 2.67 because there are two effects at
work. This estimate uses an exchangeable correlation structure and there is ample attenuation of the
results towards the null. This attenuation offsets the bias of confounding by center. See Localio
2002 for details).
Standard methods confound two different attributes of the exposure
(1) The among center effect – differences across hospitals according to race of patients treated
(2) The strictly within-center (hospital) effect – differences in treatment according to race of
patients given that patients of multiple races are treated at the same hospital
Both PA (GEE) and CS (MLWin, NLMIXED, gllamm) methods will result in bias
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 59 of 80
C
Solutions?
(1) Condition on center to find the within-center effect
-eliminates the random center effect by conditioning
-random effects become nuisance parameters not estimated
-eliminates all center-level factors
-uses only centers with discordant outcomes (must have events and nonevents within
each center
-fixed effects regression (e.g., Stata “xtreg, fe”)
-conditional regression (conditional logistic or Poisson)
(2) Decompose the within- and among-center effects
Add mean rate of exposure for each center as a regression covariate.
Alternative parameterization, mean exposure is subtracted from the individual level
(binary exposure) to yield the form:
logit{E[Yij | Γi , X ij ]} = α + U j + β A X j + βW ( X ij − X j )
Subscripts A and W represent among and within components of exposure
U ~ N (0, τ2 ) represents a random intercept for the J centers
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 60 of 80
C
Random intercept is not explicitly estimated in population-averaged models (GEE or
survey methods).
The within component – Does medical care differ within an institution because of the
patient’s characteristic?
The among component – Does medical care differ across institutions based on the
institutional average ( X j ) of the patient’s characteristic?
Each component measures a different effect
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 61 of 80
C
Table 7.3. Conditional analyses. Mean odds ratios from 500 simulations.
True RR→ 1.0
1.5
2.0
True
1.0
1.71
2.67
OR→
# Centers
Size
30
(20-50)
1.02
1.72
2.66
60
(20-50)
1.01
1.71
2.69
10
(80-150)
1.00
1.71
2.66
Dispersion of exposure and outcome: 2.0; Correlation of dispersion = 0.4
Baseline risk=0.2; Exposure prevalence=0.2
The conditional analysis (in this case conditional logistic regression) gives unbiased estimates
regardless of institutional (center) size and the number of centers
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 62 of 80
C
Table 7.4. Decomposition of Cluster Specific and Population Averaged Models. Mean odds
ratios
True RR→
1.0
1.5
2.0
True OR→
1.0
1.71
2.67
Method of
Analysis
Center Specific
1.02
1.72
2.67
Population
1.01
1.53
2.13
Averaged (GEE,
exch)
Population
1.01
1.53
2.15
averaged (Survey,
sandwich)
Dispersion of exposure and outcome: 2.0;Correlation of dispersion = 0.4
Baseline risk=0.2; Exposure prevalence=0.2
Decomposition of the center specific (CS) estimate gives an unbiased estimate of true within-center
effect
Decomposition of GEE and survey estimates (population averaged (PA) estimates) are attenuated
towards the null, as expected.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 63 of 80
C
Table 7.5. Fixed Effects Analyses – Relation to Number of Centers
# Centers
30
60
10
RR→
OR→
Size
(20-50)
(20-50)
(80-150)
1.0
1.0
1.5
1.71
2.0
2.67
1.02
1.01
1.00
1.75
1.74
1.71
2.74
2.77
2.69
Dispersion of exposure and outcome: 2.0; Correlation of dispersion = 0.4
Baseline risk=0.2; Exposure prevalence=0.2
A fixed effects analysis (using an indicator variable for each center) will lead to some bias if the
number of centers is large.
The fixed effects method shows least bias with few centers relative to the number of patients.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 64 of 80
C
Example: Reperfusion and Race – Confounding by hospital? (Canto’s example)
Assumptions of a simulation:
-2000 hospitals with 6 to 24 patients per hospital (n=30,000)
-Dispersion random effects outcome and exposure = 0.5
-Interquartile range hospital-level prevalence race = 0% - 12%
-Interquartile range of rate of reperfusion
= 34% to 65%
-Correlation of random effects
= 0.5
-Spearman correlation reperfusion rate and race
= 0.077
-True association race and reperfusion: RR=OR
=1.0
Results:
Population averaged (sandwich) method: OR
= 0.92
(Canto found RR=0.90 and 0.85)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 65 of 80
C
§8
Volume-outcome studies – Correlations of fixed and random effect
A common application of clustered data – do outcomes improve with larger centers?
A simple model:
 pˆ 
ln  ij  = α + X Β + δ ∗ vol j + U j , where X Β represents a linear combination of patient 1 − pˆ ij 


level factors and their coefficients, vol j represents a fixed effect for the volume in center j, and
U j represents a random effect for the center, i.e., the variation of each center about the mean p̂ .
Issue #1: Should the analysis use a PA or CS model? (Panageas 2003)
PA model makes more sense because
Volume does not change within center (Graubard 1994)
There is (usually) no effort to demonstrate effect over time.
CS and PA methods can give qualitatively different estimates (Panageas 2003)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 66 of 80
C
Issue #2:Whether random effects, Γ j , are independent of fixed effects vol j
For example, will high volume centers, after controlling for patient characteristics and for
volume, tend to have smaller positive departures from average risk than small volume
centers?
-If yes, then simulations show that the negative association of volume and outcome
(larger centers have lower risks) will be biased toward the null, with both GEE and
center-specific models.
-A positive association will bias the estimate of association, both PA and CS, away from
the null
-Both methods assume independence of random and fixed effects
-There is no good way to address the problem, expect perhaps by sensitivity analyses
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 67 of 80
C
§9
Crossed (non-nested) vs nested effects
Examples:
Patients are seen by more than one physician
Physicians practice in more than one hospital
“Volume” of surgeon who practices at more than one hospital and each hospital has its
own “volume”
Solution 1:
Treat multiple occurrences of repeated measures as coming from an independent
observation. If Dr. Smith works at two hospitals, treat Dr. Smith as being two different
physicians – her practice might differ across hospitals. (Clayton 1999; Rasbash 2001;
Goldstein 2002)
Solution 2:
Assign weights to data so that a physician’s time is allocated to Hospital A random effect
and Hospital B random effect (Rasbash 2001)
Software options
MLWin
Splus
R
Bayesian methods
(Note: Stata’s “gllamm” might not be an option for non-nested models)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 68 of 80
C
§10 Model specification and “interaction”
Desire to express effect of intervention in terms of:
“risk differences” (for a main effect), or
“differences of risk differences” (for an interaction of time and treatment/exposure)
“Interaction” is scale dependent
Example: t=time, tx=treatment. Effect of t and tx on outcome (risk)
No interaction for a multiplicative effect (risk doubles regardless of tx)
Interaction on additive scale: 5% points (tx=0) vs 10 % points (tx=1); a difference
of risk differences of 5 % points.
t=1 t=0
tx=1 0.20 0.10
tx=0 0.10 0.05
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 69 of 80
C
Should the investigator use a linear hierarchical model (as contrasted with a logistic
hierarchical model) simply to achieve a risk differences (or difference in risk differences?
“The linear approach was adopted because there was more interest in public health
effects (that is, percent increase in the outcome per unit change in a predictor) than in
epidemiologic association” (Unnamed author of a manuscript reviewed by ARL)
Solution = Fit a statistical model appropriate to the data and then express the results in a
manner appropriate to the audience. (Lindsay & Jones 1998)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 70 of 80
C
§11 Comments and Conclusions
-Clustering within centers is common statistical issue in health services research. It results in
challenges to adjust for bias and to estimate variance.
-Choice of software for analysis should be governed by the scientific question, rather than by
availability
-Many statistical problems remain unsolved (volume-outcome analysis, for example)
-Solutions for some statistical problems remain controversial (Bayesian methods or
permutation-test-based methods)
-Specialized software presents challenges (some does not work) and expense (in some
instances) and false assurances (when not used properly)
-Performance of statistical software might be good in some applications and poor in others
-Statistical expertise is essential for complex analyses (but statisticians are too often absent in
health services research studies based on multi-center data)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 71 of 80
C
References:
Agresti A, Hartzel J. Strategies for comparing treatments on a binary response with multi-center
data. Statist Med. 2000;19:1115-39.
Albert PS. Longitudinal data analysis (repeated measures) in clinical trials. Statist. Med.
1999;18:1707-32.
Andersen PK, Klein JP, Zhang MJ. Testing for centre effects in multi-centre survival studies: a
Monte Carlo comparison of fixed and random effects tests. Statist Med. 1999;18:1489-1500.
Ashby M, Neuhaus JM, Hauck WW, Bacchetti P, Heilbron DC, Jewell NP, et al. An annotated
bibliography of methods for analyzing correlated categorical data. Statist Med. 1992;11:67-99.
Atienza AA, King AC. Community-based health intervention trials: An overview of
methodological issues. Epidemiologic Reviews. 2002;24:72-79.
Bellamy SL, Gibberd R, Hancock L, Howley P, Kennedy B, Klar N, Lipsitz S, Ryan L.
Analysis of dichotomous outcome data for community intervention studies. Statistical Method in
Medical Research. 2000; 9:135-159.
Berlin JA, Kimmel SE, Ten Have TR, Sammel MD. An empirical comparison of several
clustered data approaches under confounding due to cluster effects in the analysis of complications
of coronary angiography. Biometrics. 1999;55:470-6.
Braun TM, Feng Z. Optimal permutation tests for the analysis of group randomized trials. J Am
Statist Assn. 2001;96:1424-32.
Burton P, Gurrin L, Sly P. Extending the simple linear regression model to account for correlated
responses: an introduction to generalized estimating equations and multi-level mixed modeling.
Statist Med. 1998; 17:1261-91
Campbell MK, Elbourne DR, Altman DG. CONSORT statement: extension to cluster
randomized trials. BMJ. 2004;702-8.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 72 of 80
C
Campbell MK, Grimshaw JM. Cluster randomization trials: time for improvement. BMJ.
1998;317:171-2.
Canto, J.G., Allison, J.J., Kiefe, C.I., Fincher, C., Farmer, R., Sekar, P., Sharina, P., and
Weissman, N.W. Relation of race and sex to the use of reperfusion therapy in Medicare
beneficiaries with acute myocardial infarction. N Engl J Med. 2000; 342: 1094-1100.
Carlin JB, Wolfe R, Brown CH, Gelman A. A case study on the choice, interpretation and
checking of multilevel models for longitudinal binary outcomes. Biostatistics. 2001. 2:397-416.
Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for
medical statisticians. Statist Med. 2000;19:1141-64
Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. Second Edition.
Boca Raton: Chapman & Hall/CRC; 2000:35.
Carey V, Zeger SL. Modelling multivariate binary data with alternating logistic regression.
Biometrika. 1993; 80:517-26.
Clayton D, Rasbash J. Estimation in large crossed random effects models by data augmentation. J
R Statist Soc A. 1999;162:425-36.
Cnaan A, Laird. NM, Slasor P. Using the general linear mixed model to analyze unbalanced
repeated measures and longitudinal data. Statist Med. 1997;16:2349-80.
Congdon P. Bayesian Statistical Modeling. New York: John Wiley & Sons; 2001.
Congdon P. Applied Bayesian Modeling. New York: John Wiley & Sons; 2003.
Cornfield J. Randomization by group: a formal analysis. Am J Epidemiol. 1978;108:100-2.
Diaconis P, Efron B. Computer intensive methods in statistics. Scientific American. 1983
May;248(5):116-130.
Diez-Roux AV. Multi-level analysis in public health research. Annu Rev Pub Health. 2000;21:17192.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 73 of 80
C
Diggle P, Liang KY, Zeger SL. The Analysis of Longitudinal Data. New York: Oxford University
Press; 1994.
Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. Second Edition.
New York: Oxford; 2002.
Donald A, Donner A. Adjustments to the Mantel-Haenszel chi-square statistic and odds ratio
variance estimator when the data are clustered. Statist Med. 1987;6:491-9.
Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research.
London: Arnold;2000.
Donner A, Klar N. Methods for comparing event rates in intervention studies when the unit of
allocation is the cluster. Am J Epidemiol. 1994; 140:279-89.
Donner A, Brown KS, Brasher P. A methodological review of non-therapeutic intervention trials
employing cluster randomization. 1979-1989. International Journal of Epidemiology. 1990;19:795800.
Efron B, Tibshirani R. Statistical data analysis in the computer age. Science. 1991;253:390-5.
Evans BA, Feng Z, Peterson AV. A comparison of generalized linear mixed model procedures
with estimating equations for variance and covariance parameter estimation in longitudinal studies
and group randomized trials. Statist Med. 2001; 20:3353-73.
Everitt B, Rabe-Hesketh S. Analyzing Medical Data Using S-plus. New York: Springer; 2001.
Feldman HA, McKinlay SM. Cohort versus cross-sectional design in large field trials: precision,
sample size, and a unifying model. Statist. Med. 1994;13:61-78.
Feng Z, Diehr P, Peterson A, McLerran D. Selected statistical issues in group randomization
trials. Annu Rev Public Health. 2001;22:167-87.
Fitzmaurice G, Laird N, Ware J. Applied Longitudinal Analysis. New York: Wiley; 2004
(June)
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 74 of 80
C
Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized
experiments with nonlinear regressions and omitted covariates. Biometrika. 1984;71:3:431-44.
Gail MH, Mark SD, Carroll RJ, Green SB. On design considerations and randomization-based
inference for community intervention trials. Statist Med. 1996;15:1069-92.
Garthwaite PH. Confidence intervals from randomization tests. Biometrics. 1996; 52:1387-93.
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Second Edition. Boca
Raton, FL: Chapman & Hall/CRC:2004.
Goldstein H. Multilevel Statistical Models. London: Edward Arnold; 1995:1-13.
Goldstein H, Spiegelhalter DJ. League tables and their limitations: statistical issues in
comparisons of institutional performance. JR Statist Soc. 1996;159:385-409
Goldstein H, Browne W, Tasbash J. Tutorial in biostatistics. Multilevel modeling of medical
data. Statist. Med. 2002;21:3291-3315
Good P. Permutation tests. A Practical Guide to Resampling Methods for Testing Hypotheses. 2nd
Ed. New York: Springer-Verlag;2000;
Good P. Resampling methods: A Practical Guide to Data Analysis. 2nd Edition. Boston:
Birkhäuser; 2001.
Gould AL. Multicenter trial analysis revisited. Statist Med. 1998;17:1779-97.
Graubard BI, Korn EL. Regression analysis with clustered data. Statist Med. 1994;13:509-22.
Green SB. The advantages of community-randomized trials for evaluating lifestyle modification.
Controlled Clinical Trials. 1997;18:506-13.
Green SB, Corle DK, Gail MH, Mark SD, Pee D, Freedman LS, Graubard BI, Lynn WR.
Interplay between design and analysis
for behavioral intervention trials with community as the unit of randomization. Am J Epidemiol.
1995;142:587-93.
Guo G, Zhao H. Multilevel modeling for binary data. Annu Rev Sociol. 2000; 26:441-62
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 75 of 80
C
Hansen MH, Hurwitz WN, Madow WG. Sample Survey Methods and Theory, Vol 1: Methods
and Applications. New York: John Wiley & Sons; 1953.
Hedeker D, Siddiqui O, Hu FB. Random-effects regression analysis of correlated grouped-time
survival data. Statistical Methods in Medical Research. 2000;9:161-79.
Horton NJ, Lipsitz SR. Review of software to fit generalized estimating equation regression
models. American Statistician. 1999; 53:160-9.
International Conference on Harmonization of Technical Requirements for Registration of
Pharmaceuticals for Human Use. ICH Harmonized Tripartite Guideline. Statistical Principles for
Clinical Trials. Statist. Med. 1999;18:1905-42. http://www.fda.gov/cder/guidance/index.html.
Kerry SM, Bland JM. The intracluster correlation coefficient in cluster randomization. BMJ.
1998;316:1455-60.
Kerry SM, Bland JM. Statistical notes: sample size in cluster randomization. BMJ.
1998;316:549.
Kish L. Survey Sampling. New York: John Wiley & Sons;1965:88.
Koepsell TD, Martin DC, Diehr PH, Psaty BM, Wagner EF, Perrin EB, Cheadle A. Data
analysis and sample size issues in evaluations of community-based health promotion and disease
prevention programs: a mixed model analysis of variance approach. J Clin Epidemiol.
1991;44:701-13.
Korn EL, Graubard BI. Analysis of Health Surveys. New York: John Wiley & Sons; 1999.
LaVange LM, Koch GG, Schwartz TA. Applying sample survey methods to clinical trials data.
Statist Med. 2001;20:2609-23.
Leyland AH, Goldstein H. Eds. Multilevel Modeling of Health Statistics. Chichester: John
Wiley & Sons; 2001.
Liang K-Y, and Zeger S L. Regression analysis for correlated data. Annu Rev Pub Health.
1993;14:43-68.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 76 of 80
C
Lindsay JK, Jones B. Choosing among generalized linear models applied to medical data. Statist
Med. 1998;17:59-68.
Lindsay JK, Lambert P. On the appropriateness of marginal models for repeated measurements in
clinical trials. Statist Med. 1998;17:447-69.
Localio AR, Berlin JA, Ten Have TR. Confounding due to cluster in multicenter studies – causes
and cures. Health Services & Outcomes Research Methodology. 2002;3:1-16.
Longford NT. Random coefficient models. Oxford: Clarendon Press; 1993.
Marshall EC, Spiegelhalter DJ. Reliability of league tables of in vitro fertilization clinics:
retrospective analysis of live birth rates. BMJ. 1998;316:1701-5.
McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: John Wiley &
Sons; 2001:232-4.
Murray DM. Design and Analysis of Group Randomization Trials. New York: Oxford University
Press; 1998.
Murray DM. Statistical models appropriate for designs often used in group-randomization trials.
Statist. Med. 2001;20:1373-85.
Murray DM, Hannan PJ, Wolfinger RD, Baker WL, Dwyer JH. Analysis of data from grouprandomized trials with repeat observations on the same groups. Statist Med. 1998;17:1581-1600.
Hannan PJ, Murray DM. Gauss or Bernouilli? A Monte Carlo comparison of the performance of
the linear mixed-model analysis of simulated community trials with a dichotomous outcome variable
at the individual level. Evaluation Review. 1996;20:338-52.
Neuhaus JM. Assessing change with longitudinal and clustered binary data. Annu Rev Public
Health. 2001;22:115-28.
Neuhaus JM. Statistical methods for longitudinal and clustered designs with binary responses.
Statistical Methods in Medical Research. 1992;1:249-73.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 77 of 80
C
Neuhaus JM, Segal MR. Design effects for binary regression models fitted to dependent data.
Statist Med. 1993; 12:1259-68.
Neuhaus JM, Kalbfleisch JD, Hauck WW. A comparison of cluster-specific and populationaveraged approaches for analyzing correlated binary data. International Statistical Review.
1991;59:25-35.
Neuhaus, J. and Kalbfleisch JD. Between- and within-cluster covariate effects in the analysis of
clustered data. Biometrics. 1998; 54:638-64.
Nixon RM, Thompson SG. Baseline adjustments for binary data in repeated cross-sectional cluster
randomized trials. Statist. Med 2003; 22:2673-92.
Normand ST. Tutorial in biostatistics Meta-analysis: formulating, evaluating, combining, and
reporting. Statist Med. 1999;18:321-359.
Okuomunne OC. Thompson SG. Analysis of cluster randomization trials with repeated crosssectional binary measurements. Statist Med. 2001; 20:417-33.
Okoumunne OC, Guilliford MC, Chinn S, Sterne JAC, Burney PGJ, Donner A. Evaluation of
health interventions at area and organization level. BMJ. 1999;319:376-9.
Omar RZ, Thompson SG. Analysis of a cluster randomized trial with binary outcome data using a
multi-level model. Statist Med. 2000;19:2675-88.
Panageas KS, Schrag D, Riedel E, Bach PB, Begg CB. The Effect of Clustering of Outcomes on
the Association of Procedure Volume and Surgical Outcomes. Ann Intern Med. 2003;139: 658 - 665.
Rabe-Hesketh S, Pickles A, Skrondal A. Reliable estimation of generalized linear mixed models
using adaptive quadrature. The Stata Journal. 2002;2:1-21.
Rabe-Hesketh S, Pickley A, Skrondal A. GLLAMM Manual. London: Kings College; 2001.
http://www.gllamm.org/
Rao JNK, Scott AJ. A simple method for the analysis of clustered binary data. Biometrics.
1992;48:577-85.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 78 of 80
C
Rao JNK, Scott AJ. A simple method for analyzing overdisperion in clustered poisson data.
Statist. Med. 1999;18:1373-85
Rasbash J, Browne W. Modelling non-hierarchical structures. In Layland AH, Goldstein H, eds.
Multilevel Modelling of Health Statistics. New York: John Wiley & Sons; 2001: 93-105
Raudenbush SW, Bryk AS. Hierarchical Linear Models: Applications and Data Analysis
Methods. Newbury Park, CA: Sage;2002.
Raudenbush SW, Yang ML, Matheos Y. Maximum likelihood for generalized linear models with
nested random effects via high-order, multivariate Laplace approximation. J Computational and
Graphical Statistics. 2000;9:141-57
Rosenbaum PR. Covariance adjustment in randomized experiments and observational studies.
Statistical Science. 2002;17:286-327.
Senn S. Some controversies in planning and analyzing multi-centre trials. Statist. Med.
1998;17:1753-65.
Simpson JM, Klar N, Donner A. Accounting for cluster randomization: a review of primary
prevention trials. 1990-1993. Am J Pub Hlth. 1995;85:1378-83.
Spiegelhalter D, Thomas A, Best N, Gilks W. BUGS 0.6. Bayesian Inference Using Gibbs
Sampling. Cambridge: MRC Biostatistics Unit, Institute of Public Health; 1997.
Spiegelhalter D, Thomas A, Best N, Lunn D. WinBUGS User Manual. Version 1.4. Cambridge:
MRC Biostatistics Unit, Institute of Public Health; 2003.
Sullivan LM, Dukes KA, Losina E. Tutorial in biostatistics. An introduction to hierarchical linear
modeling. Statist Med. 1999;18:855-88.
Ten Have TR, Kunselman AR, Tran L. A comparison of mixed effects logistic regression models
for binary response data with two nested levels of clustering. Statist. Med. 1999;18:947-60.
Ten Have TR, Landis JR, Weaver S. Association models for periodontal disease progression: a
comparison of methods for clustered binary data. (Letter). Statist Med. 1996;15:1227-9.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 79 of 80
C
Ten Have TR, Landis JR, Weaver SL. Association models for periodontal disease progression: a
comparison of methods for clustered binary data. Statist Med. 1995;14:413-29.
Ten Have TR, Kunselman A, Zharichenko E. Accommodating negative intracluster correlation
with a mixed effects logistic
model for bivariate binary data. J Biopharmaceutical Statistics. 1998;8:131-49.
Thompson SG, Warn DE, Turner RM. Bayesian methods for analysis of binary outcome data in
cluster randomized trials on the absolute risk scale. Statist Med. 2004; 23:389-410.
Venables WN, Ripley BD. Modern Applied Statistics with S. Fourth Edition. New York:
Springer; 2002:297-8.
Wei LJ, Glidden DV. An overview of statistical methods for multiple failure time data in clinical
trials. Statist Med. 1997;16:833-39.
Williams RL. A note on robust variance estimation for cluster-correlated data. Biometrics.
2000;56:645-6.
Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation
approach. Biometrics. 1988:44:1049-60.
This project was supported in part by an Agency for Healthcare Research and Quality
(AHRQ) Centers for Education and Research on Therapeutics cooperative agreement (grant #
U18 HS10399) and by Agency for Healthcare Research and Quality, Grant No. R03 HS 1148101.
R. Localio, Clustered Observations in HSR, 06/05/2004: Page 80 of 80
C
Download