Appendix A: Formulae for spatial models included in systematic

advertisement
Appendix A: Formulae for spatial models included in systematic review
3.1.1 Multilevel logistic modelling (48):
A generalised linear mixed model (GLMM) was used to model the outcome of diagnosis with
diabetes, π·π‘–π‘˜ , for each individual 𝑖 nested within their area of residence π‘˜, as a Bernoulli distribution
with probability parameter π‘π‘–π‘˜ . The logit(π‘π‘–π‘˜ ) was modelled as a linear function of individual and
neighbourhood sociodemographic explanatory variables, as follows:
π·π‘–π‘˜ ~π΅π‘’π‘Ÿπ‘›(π‘π‘–π‘˜ )
𝑇
π‘™π‘œπ‘”π‘–π‘‘(π‘π‘–π‘˜ ) = π‘‹π‘–π‘˜
𝛽 + π‘ˆπ‘˜
1
π‘ˆπ‘˜ ~𝑁(πœ‡, )
𝜏
where π‘‹π‘–π‘˜ is a vector of individual and area-level risk factors for individual 𝑖 within area π‘˜, 𝛽 is a
vector of regression parameters, and π‘ˆπ‘˜ is the uncorrelated random effect for each neighbourhood
1
π‘˜, with mean πœ‡ and variance 𝜏 (an inverse of the precision 𝜏).
Priors distributions were specified for the precision and regression coefficients as follows: πœ‡ = 0,
𝜏~πΊπ‘Ž(0.5,0.0005), 𝑁(0,10000) for each coefficient within vector 𝛽 with the exception of the
intercept and coefficient for age, for which flat prior distributions were specified. A flat prior is a
uniform prior distribution where the probability of any value within the distribution is a constant.
The mean of the uncorrelated random effect was assumed known and set to zero.
3.1.2 Sparse Poisson Convolution conditional autoregression (49):
For a lattice of neighbouring regions, with 𝑖~𝑗 denoting that regions 𝑖 and 𝑗 are neighbours, the
standard Poisson model using conditional autoregressive (CAR) priors for correlated random effect
described by Besag et al. (1991) assuming a Poisson distribution for the observed number of cases
π‘Œπ‘– in each area 𝑖, takes the following form:
π‘Œπ‘– ~π‘ƒπ‘œ(𝐸𝑖 𝑝𝑖 )
with offset 𝐸𝑖 denoting the expected number of cases in area 𝑖, and 𝑝𝑖 representing the relative risk
of diabetes in area 𝑖. Log(𝑝𝑖 ) is modelled as a linear equation as follows:
log(𝑝𝑖 ) = 𝑋𝑖𝑇 𝛽 + π‘ˆπ‘– + 𝑉𝑖
where for area 𝑖, 𝑋𝑖 is a vector of area-level risk factors, 𝛽 is a vector of regression parameters, π‘ˆπ‘–
represents an uncorrelated random effect with no spatial structure, and 𝑉𝑖 represents a correlated
random effect with CAR spatial structure.
The CAR prior assumes that the correlated random effect in area 𝑖, given the correlated random
effect in area 𝑗, is given by:
πœŽπ‘‰2
𝑉𝑖 |𝑉𝑗 = 𝑣𝑗 , 𝑗 ≠ 𝑖 ~𝑁 (πœ‡(𝑣𝑗 ), )
π‘šπ‘–
where πœ‡(𝑣𝑗 ) is average correlated random effect for the neighbours of area 𝑖, π‘šπ‘– is the number of
such neighbours, and πœŽπ‘‰2 is the conditional variance of 𝑉. An advantage of using the CAR prior is that
the conditional independencies can be modelled in MCMC estimation approaches, and allows spatial
smoothing.
Sparse Poisson Convolution model:
The sparse Poisson convolution (SPC) models used in this study to model DM I and DM II prevalence
are described by the authors as follows:
π‘Œπ‘– ~π‘ƒπ‘œ(𝑦𝑖 µπ‘– )
log(µπ‘– ) = log(𝐸𝑖 ) + α(j) + 𝑒𝑖 + 𝑣𝑖 , 𝑗 = 1,2
where for census tract 𝑖, 𝑦𝑖 is the observed count, 𝐸𝑖 is the expected number of cases, 𝑗 is a binary
factor indicating zero and non-zero observed counts (ie. 𝑗 = 1 if 𝑦𝑖 = 0 and 𝑗 = 2 if 𝑦𝑖 > 0, α(j) is a
factored intercept for modelling zero and non-zero counts, 𝑒𝑖 is the uncorrelated random effect in
and 𝑣𝑖 is the correlated random effect. The priors described for this model include a CAR prior for 𝑣𝑖
controlled by adjacent areas with common boundaries (first-order neighbours), intercept
α~𝑁(0,1000), standard deviation of uncorrelated random error σ𝑒 ~π‘ˆπ‘›π‘–π‘“(0,10) and standard
deviation of correlated random error σ𝑣 ~π‘ˆπ‘›π‘–π‘“(0,10).
Sparse Poisson MCAR model:
In the model adapted by Liese et al., DM I and DM II were considered components of a vector of
outcomes and a multivariate model applied as follows:
π‘Œπ‘– ~π‘ƒπ‘œ(π‘Œπ‘– 𝛍𝑖 )
log(𝛍𝑖 ) = log(𝑬𝑖 ) + 𝛂(j) + 𝑼𝑖 + 𝑽𝑖 , 𝑗 = 1,2
where for census tract 𝑖, π‘Œπ‘– is a vector of multivariate observed counts for DM I and DM II following a
Poisson distribution with mean 𝛍𝑖 , 𝛍𝑖 is a vector of the means of the Poisson distribution of the
multivariate health outcomes, 𝑬𝑖 is a vector of the expected number of cases for multivariate
outcomes, 𝛂(j) is a vector of factored intercepts for the multivariate outcomes where 𝑗 denotes the
class of the observed count (zero or positive), π‘ˆπ‘– is a vector of uncorrelated random effects and 𝑉𝑖 is
a vector of correlated random effects.
Joint spatial correlation between DM I and DM II was examined by calculating an empirical
correlation between the RR estimates obtained for the sparse Poisson convolution models using the
Pearson correlation coefficient. The priors described for this model include a CAR model for 𝑉𝑖
controlled by adjacent areas with common boundaries, intercept α~𝑁(0,1000), standard deviation
of uncorrelated random error σπ‘ˆ ~π‘ˆπ‘›π‘–π‘“(0,10) and standard deviation of correlated random error
σ𝑉 ~π‘ˆπ‘›π‘–π‘“(0,10).
3.1.3 Stratified generalised linear modelling (52):
The authors considered three models, briefly described here.
In the first model, the stratified observed counts, π‘Œπ‘–π‘—π‘˜ , of diagnosed diabetes cases, stratified by
gender 𝑖, eighteen 5-year age bands 𝑗, and seven ethnic groups π‘˜, were modelled assuming a
Poisson distribution, as follows:
π‘Œπ‘–π‘—π‘˜ ~π‘ƒπ‘œ(πΈπ‘–π‘—π‘˜ µπ‘–π‘—π‘˜ )
where µπ‘–π‘—π‘˜ models the impact of gender, age and ethinicity, and πΈπ‘–π‘—π‘˜ is the expected count in
stratum (π‘–π‘—π‘˜). In this model, πΈπ‘–π‘—π‘˜ = 2.3 ∗ π‘π‘–π‘—π‘˜ , where 2.3 is the external standardisation rate of
diabetic prevalence and π‘π‘–π‘—π‘˜ is the population number in stratum π‘–π‘—π‘˜.
π‘™π‘œπ‘”(µπ‘–π‘—π‘˜ )=( X π‘–π‘—π‘˜ )𝑇 𝛽
where (X π‘–π‘—π‘˜ ) is a vector of area-level risk factors, and 𝛽 is a vector of regression parameters.
Two GLMs were modelled, one with and one without age-ethnic group interactions. The coefficient
for age, 𝛽𝑗 , was modelled using a random walk prior that assumes diabetes rates for successive age
groups will tend to be similar, as follows:
1
𝛽𝑗 ~𝑁 (𝛽𝑗−1 , Ζ¬ ) for 𝑗 = 2, … ,18
𝑗
𝛽𝑗=1 ~𝑁(0,1000) and precision Ƭ𝑗 ~πΊπ‘Ž(1,1)
The coefficients for gender, 𝛽𝑖 , and ethnicity, π›½π‘˜ , were assigned fixed effect priors with corner
constraints, 𝛽𝑖=1 = π›½π‘˜=1 = 0,
𝛽𝑖=2 ~𝑁(0,1000), π›½π‘˜ ~𝑁(0,1000) for π‘˜ = 2, … ,7
Additional priors used for the GLM with an age-ethnic group interaction term include:
1
π›½π‘—π‘˜ ~𝑁(0, Ζ¬ ), where Ζ¬π‘—π‘˜ ~πΊπ‘Ž(1,1). A sensitivity analysis was performed on the prior distribution for
π‘—π‘˜
each precision, Ƭ𝑗 and Ζ¬π‘—π‘˜ , with πΊπ‘Ž(1,0.1) and πΊπ‘Ž(1,0.001) priors also trialled.
In the second model, the prevalence gradient of diabetes (DM I and DM II combined) over
neighbourhood deprivation quintiles m was modelled using logistic regression. For each gender
separately, the impact of age (𝑗 = 1, … ,18), ethnicity in four categories (π‘˜ = 1, … ,4) and
neighbourhood deprivation quintile (π‘š = 1, … ,5) were assessed using a Bernoulli trial model for the
presence of diabetes 𝐷𝑖 in each individual 𝑖, with probability parameter 𝑝𝑖 , as follows:
𝐷𝑖 ~π΅π‘’π‘Ÿπ‘›(𝑝𝑖 )
π‘™π‘œπ‘”π‘–π‘‘(𝑝𝑖 ) = 𝑋𝑖𝑇 𝛽
where 𝑋𝑖 is a vector comprising ethnic group, age group and deprivation quintile for each individual
𝑖, and 𝛽 is a vector of regression parameters, including coefficients for 𝑗, π‘˜, π‘š. Priors used were
similar to Model 1, including a random walk prior on age categories 𝑖:
1
Ƭ𝑗
𝛽𝑗 ~𝑁 (𝛽, ) for 𝑗 = 2, … ,18
𝛽𝑗=1 ~𝑁(0,1000) and precision Ƭ𝑗 ~πΊπ‘Ž(1,1)
The deprivation effects were assumed to follow truncated normal distributions, constraining
sampling to produce a monotonic gradient as follows:
π›½π‘š ~𝑁(0,1000)𝐼(π›½π‘š−1 , π›½π‘š+1 ), π‘š = 2,3,4
π›½π‘š=1 ~𝑁(0,1000)𝐼(−∞, π›½π‘š=2 )
π›½π‘š=5 ~𝑁(0,1000)𝐼(π›½π‘š=4 , ∞)
where 𝐼(π‘Ž, 𝑏) is the interval from π‘Ž to 𝑏.
For ethnic group π‘˜, a normal prior with corner constraints was used:
π›½π‘˜=1 = 0,
π›½π‘˜ ~𝑁(0,1000) for π‘˜ = 2,3,4
In the third model, diabetes mortality was modelled separately for each gender using Poisson
regression. For males, let π‘Œπ‘–1 = observed deaths in area 𝑖, and 𝐸𝑖1 =expected deaths in area 𝑖.
For females, let π‘Œπ‘–2 = observed deaths in area 𝑖, and 𝐸𝑖2 =expected deaths in area 𝑖.
π‘Œπ‘–1 ~π‘ƒπ‘œ(𝐸𝑖1 π‘šπ‘–1 );
log(π‘šπ‘–1 ) = 𝐡(𝑄𝑀𝑖 )
π‘Œπ‘–2 ~π‘ƒπ‘œ(𝐸𝑖2 π‘šπ‘–2 );
log(π‘šπ‘–2 ) = 𝐡(𝑄𝐹𝑖 )
where 𝑄𝑀𝑖 = male-prevalence quintile to which area 𝑖 belongs to and 𝑄𝐹𝑖 = female-prevalence
quintile to which area 𝑖 belongs to. A Poisson regression was assessed to be satisfactory for this
model due to the absence of overdispersion.
3.2 Classic generalised linear models and generalised linear mixed models:
3.2.2 Generalised linear mixed modelling (GLMM) (54):
The first regression model assessed HbA1c level as a linear mixed model with random intercept and
slope based on individual sociodemographic and lab characteristics, with practice characteristics or
their primary care physician and clinic specialty as fixed effects and neighbourhood SES quintile as a
random effect:
𝐻𝑖 = 𝑋𝑖𝑇 𝛽 + 𝑍𝑖 π‘Šπ‘– + π‘ˆπ‘–
where 𝐻𝑖 is the haemoglobin level for individual 𝑖, 𝑋𝑖 is a vector of risk factors with fixed effects for
individual 𝑖, 𝛽 is a vector of fixed regression parameters, 𝑍𝑖 is a vector of risk factors with random
effects for individual 𝑖, 𝑍𝑖 is a vector of random regression parameters for individual 𝑖, and π‘ˆπ‘–
represents an uncorrelated random effect with no spatial structure.
The second model dichotomised LDL cholesterol using a cutpoint of 100mg/dL, using mixed logistic
regression. The same fixed and random effects were included as in the HbA1c model, with the
addition of statin prescription:
𝐿𝑖 ~π΅π‘’π‘Ÿπ‘›(𝑝𝑖 )
π‘™π‘œπ‘”π‘–π‘‘(𝑝𝑖 ) = 𝑋𝑖𝑇 𝛽 + 𝑍𝑖 π‘Šπ‘– + π‘ˆπ‘–
where for individual 𝑖, 𝐿𝑖 represents the outcome LDL>100, 𝑝𝑖 is the probability of having LDL>100,
𝑋𝑖 is a vector of risk factors with fixed effects, 𝛽 is a vector of fixed regression parameters, 𝑍𝑖 is a
vector of risk factors with random effects, π‘Šπ‘– is a vector of random regression parameters, and π‘ˆπ‘–
represents an uncorrelated random effect with no spatial structure.
3.2.4 Linear regression including temporal component (58):
For each region, the time trends for prevalence of DM I and DM II were separately interpolated by
mixed linear regression:
π‘Œπ‘– = π‘Žπ‘– + 𝛽𝑖 π‘₯𝑖 + π‘ˆπ‘–
where for area 𝑖, π‘Œπ‘– is the estimated prevalence, π‘₯𝑖 is the year, π‘Žπ‘– is the intercept, 𝛽𝑖 is the coefficient
for year, and π‘ˆπ‘– is an uncorrelated random effect with no spatial structure.
Download