health applications of bayes techniques

Population Health Perspectives 1    Peter Congdon Research Professor of Quantitative Geography & Health Statistics, QMUL e-mail: p.congdon@qmul.ac.uk http://www.geog.qmul.ac.uk/staff/congdonp.html http://webspace.qmul.ac.uk/pcongdon/ 2 Major tasks in definition and analysis 3 4  Upstream & downstream determinants of health (Kaplan, 2004) 5 6 To describe/analyze health variation over  areas or area categories (poverty status, area socioeconomic classifications, “deprivation quintiles”)  by area SES scales (deprivation gradients), or other area characteristics (social “fragmentation”, social capital)  according to area environmental exposures (e.g. pollution levels or categories)  7 8 9 To describe health variation over  demographic categories (age, race, gender, family type)  individual socioeconomic variables (income, education)  health behaviours (smoker or not, obese or not)   Assess how individual and contextual factors (aka upstream & downstream factors) interact in their impacts on health 10  Assess which potential sources of variation in health are significant (or not)  Summarise health variations parametrically  Provide stable estimates 11 12 Assess how health need (need for healthcare) is distributed over areas or social groups  to guide distribution of scarce health resources and effective targeting of healthcare interventions  May involve “health need indices” based on characteristics of areas or area populations  13 14    A major focus of my talk will be on models for spatial variations in health, and predictors of those variations (“ecological studies”) These models typically use area counts (deaths, incidence or prevalence totals) from official registration systems Statistical models often seek to assess the implications of area health variation (e.g. locating areas with excess risk, ranking areas according to health risk, measuring inequality, smoothing ragged observed area rates). 15    Relevance of “ecological studies” (despite “ecological fallacy”) to broader upstream/downstream debate: what are contextual effects, what causes them, how should they be modelled, etc Crude rates for rare events unreliable  stabilized/robust/smoothed area health outcome rates essential to accurate description of population health. Smoothing may draw on spatial structure of known or unknown risk factors 16 17 18 19     I will also consider multilevel perspectives Assess how individual and contextual factors interact in their impacts on health, e.g. area variables may act as effect modifiers for individual risk factors. Health surveys (e.g. Health Survey for England, Behavioral Risk Factor Surveillance System) are the most suitable for analysing the effects on health of age, ethnicity, and individual SES, and their possible modification by geographic variables. But administrative or census data also can be analyzed profitably by ML methods 20         RELEVANT METHODS Some distinctive aspects of methodologies for modelling population health 1 2 3 4 5 6 7 APPLICATION THEMES Assessing varying health risks in areas Spatially varying predictor effects Age and area: life table methods Spatial aspects of health care use Multilevel modelling Prevalence Modelling Common Spatial Factor Models 21 22    General linear models (e.g. with count or binary response) more frequently used than linear models For health survey data often need binary regression, with logit or log link, and maybe accounting for differential survey weights For area counts of health events (e.g. deaths, prevalence) typically need Poisson or binomial or over-dispersed versions of these densities 23    Often use random effects to pool strength (or borrow strength) over areas or other relevant dimensions (e.g. age) Essentially refers to adding stability/precision to estimates by referring to overall population density Maybe pool strength over variables too  multivariate random effect and common factor models 24     In area health applications, both outcomes (e.g. mortality or prevalence) and ecological risk factors (e.g. area deprivation, area smoking rates) are typically spatially structured. Also applies to “unknown” risk factors So in statistical models, spatially correlated random effects often involved (in Bayesian terminology “spatial priors”) Modelling aims to account for spatial structure Inter alia, a good model will ensure regression residuals are free of spatial correlation 25    In multilevel modelling effects of individual risk factors may vary according to area contexts For example, ethnic relativities in diabetes prevalence may not be constant over areas So use random effects (maybe spatially structured) to model spatial variation in impacts of individual level risk factors 26    Bayesian approach using MCMC sampling assists in monitoring “derived parameters” or outputs, providing full densities, and in testing hypotheses about derived parameters. Classical estimation typically provides confidence intervals under assumed asymptotic normality for model parameters only, with delta method for derived parameters Bayesian approach arguably more flexible for models with multiple or nested random effects, or where there is partially missing data 27  An example of a derived model output that is not the model response. Model response (Poisson) are deaths by area & age. Derived model output is life expectancy EASTERN REGION OF ENGLAND, MALE LIFE EXPECTANCY. Ref: Congdon , 2009, International Statistical Review 28    Many relevant risk or outcome variables for analysing population health can be regarded as latent constructs, not directly observed but proxied by several observed variables Examples in area studies: area unemployment or rates of social housing are proxies for area construct “deprivation” Examples in survey studies: battery of survey items on neighbourhood perceptions and trust are proxies for individual level construct “social capital” 29  ASSESSING VARYING HEALTH RISKS IN AREAS 30     Observed data (e.g. death totals by area i) are y[i], and E[i] are expected event totals. Limitations of conventional (fixed effects) maximum likelihood estimates of relative risks (or “standard mortality ratios”) y[i]/E[i] as description of spatial contrasts. OR data might be y[i] and populations P[i], MLE (e.g. crude death rate) is y[i]/P[i] (or such rates feed into “age standardised” rate) OR data: y[i] (infant deaths) and births B[i]. MLE is y[i]/B[i] 31 Maximum likelihood approach (underlies conventional demographic techniques) treats each area (or risk category) as a separate isolated entity, taking no account of:  overall average for the event,  the location of the area, or risk category in relation to other areas (or risk categories)  By neglecting the broader context, MLE estimates also potentially unstable  32    Under Bayesian random effects, information on the pattern of disease risk across the collectivity of areas (or risk categories) is used to provide an estimate of the underlying relative risk for each area (or risk category) Treat each area’s outcome with reference to the ensemble of areas The “prior” specifies the chosen overarching density of relative risk (e.g. normal or gamma) and whether or not the density specifies local or global pooling of strength. 33      However, may be unwise to uncritically assume complete spatial dependence - or homogenous spatial correlation. So allow for some unstructured variation or for spatial outliers Spatial outliers: areas unlike their neighbours, e.g. socially dissimilar (example, suburban “social housing” estates surrounded by owner occupied housing areas) Allow extent of spatial dependence to vary across the map Congdon, 2008, Statistical Methodology 34      Use of spatial risk modelling for policy inferences One may assess for example, the posterior probability that a particular area has an elevated relative risk (compared to the average) Assume RR=1 on average. Then simply count the proportion of MCMC iterations where condition RR[i]>1 holds More complicated to do this under frequentist approaches e.g. Congdon, Health and Place, 1997, article on area contrasts in suicide and attempted suicide in NE London 35 36  EXTENDING THE SMOOTHING PRINCIPLE: Spatial Heterogeneity In Regression Effects 37     Spatial pooling of strength may be applied not only to disease risks but to effects of area risk factors. Example: how are lung cancer incidence relativities i affected by area smoking rates xi Conventionally assume constant slope  on xi over all areas However, risk relationship may vary (smoothly) over space  varying slopes i e.g. Congdon, Health and Place, 1997, article on area contrasts in suicide and attempted suicide in part of NE London (x=deprivation) 38 39 40  EXTENDING THE SMOOTHING PRINCIPLE: Smoothing over areas and ages to derive small area life tables 41     Modelling mortality data yix (and maybe illness data hix too) by both area i and age group x As before, neighbouring areas have similar rates under prior incorporating spatial dependency But also assume neighbouring ages have similar rates under pooling (random effects) prior Technically, often use “state space” or “random walk” priors for age effects 42 43      Congdon 2006, Demographic Research, A model for geographical variation in health and total life expectancy Spatial Framework, 33 London Boroughs, ca 230k population on average Use illness data (long term ill status from 2001 UK Census) as well as deaths data (bivariate outcome) With mortality and illness data can model both total life expectancy and healthy life expectancy difference between expectancies is expected years lived in disability (“disease burden”) Correlation between disease burden & area deprivation 44 45     Calculate life expectancies Eix for areas i and ages x using usual life table calculations and “smoothed” age and area specific mortality rates Mix Life expectancy at birth Ei0. Monitor “derived outcomes” Ei0 in MCMC whereas likelihood for deaths uses Mix (“actual model parameters”) Problems with conventional calculations for life expectancies when populations small, rates Mix unstable  apply Bayesian random effects smoothing 46   Congdon (2007) A model for spatial variations in life expectancy; mortality in Chinese regions. Int J Health Geographics Negative binomial model because of large death counts/overdispersion but allowing for correlated area and age effects 47  FEMALE LIFE EXPECTANCY CHINA 2000 48    Similar ideas apply if the second dimension is time rather than age Correlation between adjacent times is expected and should be included in the model For example, could have “random walk” in time parameters 49    Congdon, 2004, J Appl Stat “Modelling Trends and Inequality in Small Area Mortality” has three dimensions: area, age, time (years) in an analysis of area mortality through time “Derived outputs” monitored by MCMC are Theil and Gini indices of inequality in life expectancies Eit between areas i =(1,..,n) at year t. If Rit=Eit/Et where Et is average, then Theil entropy index in year t is Ht=i [Ritlog(Rit)]/n 50 51    Mortality, prevalence and incidence variations between areas reflect primarily population health need (e.g. age/ethnicity/SES composition), maybe together with area contextual effects But health care use in different areas (e.g. hospitalisation rates) affected not only by population need but by health supply factors & efficacy of different health sectors Same applies to flows f[i,j] from areas i to care providers j, e.g. of acute (hospital) care 52 For example, emergency hospital use in different areas i affected (inter alia) by  area deprivation, age structure, etc  efficacy of primary care in handling chronic disease (and preventing “ambulatory sensitive” emergency hospital admissions)  access to primary care (e.g. primary care physicians per head, adequate “out of hours” cover)  referral thresholds  hospital capacity  proximity of area i to hospitals j allowing for competing/intervening populations in other areas  53 Gravity models for flows f[i,j] from area i to providers j take account of:  Population sizes in areas i (maybe weighted for need) and capacity of hospital provider j (e.g. bed mass, staff)  Efficacy of, and access to, primary care  Distance decay (home to hospital distances)  Relative accessibility R[i,j]: capacity B[j] of providers j adjusted for distances d[i,j] relative to other providers  Can model situation of new sites, or hospital closures (via Bayesian “predictions”) using implied changes in R[i,j]  54 Congdon 2001 Health Care Management Sciences 55 Multilevel models 56 Usual paradigm: want to assess effects on health of  Compositional variables (individual level risk factors)  Contextual influences (area variations)  Interplay between composition and context  Example is additive effect: poor people more likely to smoke, but poor people living in deprived areas more likely to smoke than poor living in less deprived areas.  57       For example suppose yij is binary health status (e.g. whether long term ill) for individuals i in areas j, and xij is individual level measure of socioeconomic status (e.g. years of education). Then ij=Probability(yij=1) Logit regression to predict ij, ij=1/(1+exp(-ij) Intercept variation only ij=j+xij Intercept and slope variation ij=j+jxij 58      Are there “place effects”: does inter-area health variation (intercept variation) remain when individual risk variables are added into the model Conversely are effects of individual variables diminished when area effects added Explaining “contextual variation” or “place effects” in substantive terms. Which aspects of areas cause contextual effects (e.g. “healthy food access”, deprivation amplification, residential segregation) How should (known and unknown sources of) contextual variation be modelled (e.g. spatial prior or not) Do impacts of individual risk factors vary by area (interaction between levels). This includes slope variation, as well as more complex forms for categorical predictors (e.g. multivariate conditional autoregressive prior) 59    Neighbourhood access to healthy environments, positive health choices Access to healthy food outlets (e.g. work by Neil Wrigley on “Food Deserts” in British cities). Many GIS studies on access to healthy food Access to physical activity facilities also an influence on child & adolescent obesity 60    Usual paradigm for multilevel model is individual level observation nested according to higher level index (e.g. area, school); scheme A However, to make analysis feasible when there are many individuals, might turn all individual risk factors into category form and group observations according to risk category; scheme B. Also, sometimes lower level units may be small areas (“neighbourhoods”), and higher level units might be larger policy areas; scheme C 61    Binary birth events y at level 1 (e.g. stillbirth) as well as maternal characteristics x (mothers age, whether lone mother, etc). Individual events nested within J=25 districts, in turn nested within K=7 health authorities (HA’s) Form risk groups by cross-classifying (a) maternal age (<20, 20-34,> 35), (b) parity (null,1-2, 3+), (c) previous still-birth (y/n) (d) lone mother (y/n). 62    Maximum combinations based on these factors is 900=(3 x 3 x 2 x 2 x 25), of which 549 are non-empty. Can look at contextual effects for both districts and HA’s in n=549 “collapsed” data points. In particular (policy implications): do HA rankings (monitored by MCMC) change before and after controlling for risk factors at both individual and district level 63 64     Reduction of intra-district health gradients (over small areas) often forms focus of public health targets. But raises methodological issues… Study of long term illness (LTI) rates in i=1,..,1332 wards (small areas) within j=1,.,53 districts (London & Eastern England). Slope variations: within district slopes j relating LTI to small area deprivation (binomial data) Negative intercept-slope covariation, cov(j, j): stronger deprivation effects occur in districts with lower LTI rates. Slopes in low LTI districts enhanced by very low illness rates in some wards. 65   Relates to broader themes of potential relative deprivation impacts on health: districts with lower average illness may be more internally heterogeneous in terms of small area SES Not only absolute deprivation or income that matters for health, but income or deprivation relative to the average of reference group. 66  GEOGRAPHIC PREVALENCE ESTIMATION (based usually on multilevel modelling) 67 Rising US diabetes levels (Mokdad et al, 2001)  Information regarding small area prevalence important for effective targeting of diabetes prevention and resources  Prevalence estimates should incorporate  spatial clustering  ethnic relativities  interactions across levels, e.g. between demography & area  68 Spatial variation & Clustering Crude diabetes prevalence rate among adults (source; Behavioural Risk Factor Surveillance System) 69    In 2007 age standardised rate of diagnosed diabetes was highest among Native Americans and Alaska Natives (16.5%), followed by blacks (11.8%) and Hispanics (10.4%), with whites at 6.6 % (CDC, 2008). Age gradient for diabetes prevalence varies by race Barnett et al (2001) & Casper et al (2000) also report that ethnic disparities vary by area of residence (their work is on CHD mortality) 70   Survey regression model based on BRFSS survey data. Binary multilevel model for doctor diagnosed diabetes. State of residence for survey subjects is at level 2 Seek diabetes prevalence estimates for 30,000 Zip Code Tabulation Areas in US (http://www.census.gov/geo/ZCTA/zcta.html)  ZCTA is not provided as a survey variable. Instead use survey regression parameters in conjunction with ZCTA census data (and the ZCTA’s geographic location) to derive ZCTA prevalence estimates 71     BRFSS survey model includes age, race, sex, and education effects together with contextual modifiers Inclusion of individual risk factors (e.g. age, race) in survey regression model presumes that such factors also available in census tabulations for ZCTA populations. Any interaction between risk factors in regression model (e.g. age gradients differing by race) similarly requires matching census cross-tabulation However, survey model can include geographic context variables (e.g. state or county level random and fixed effects). These are applied according to ZCTA geographic location 72      Differential risks for race (whites, black, hispanic, other races) and for education (four categories) (fixed effects) Fixed regression effects of state level predictors, (poverty rate & percent rural) Random differential risks specific to age-race group (multivariate CAR, dimension 4 by 12) Race specific spatially correlated effects for continental states (use multivariate CAR), {sjr, j=1,..,49,r=1,..,4} Race specific spatially unstructured effects for all states, (multivariate normal) {ujr, j=1,..,53,r=1,..,4} (includes Hawaii, PR, Alaska, VI) 73      Census provides education and poverty breakdown for ZCTAs However, poverty status not in BRFSS So education gradient in prevalence from BRFSS survey model used to adjust ZCTA prevalence estimates for small area education mix (“compositional” adjustment) BRFSS model does shows clear education gradient in prevalence (steeper for females) Adjustment ensures ZCTAs with more college graduates have lower prevalence 74 75    Estimates for a particular ZCTA include relevant random state effects, s[j,r] and u[j,r] (applied in prevalence estimates according to state j the particular ZCTA is located in). ZCTA estimates also adjust for relevant state poverty/urbanity effect Can apply same principle to counties (i.e. 3141 US counties at level 2) but then can’t easily model random area by race interactions. Also survey data sparse for many counties. 76    Use European standard population weights or US 2000 Census weights to combine over age/race categories Provides age standardised prevalence estimate for all persons Apply same survey model to different years 77 78 79 80     Congdon (Int J Env Res Pub Hea 2010) uses random county effect in prevalence model for joint conditions Joint response over six diabetes- weight categories: diabetes (y/n) & obesity/overweight/normal weight Using BRFSS again In fact county random effect is form of common spatial factor over different categories of the joint response 81 Application Area 7 82     Most commonly applied when there are many indicators (e.g. different types of cancer incidence) and just one common factor But will give an example of multiple underlying dimensions In spatial health applications, the common factor is usually spatially structured (value for particular area depends on those of its neighbours) Can have “multiple causes” of spatial common factors as well as “multiple indicators” 83    Pool over diverse outcomes (these are indicators of the underlying common factor) as a way of pooling strength to assess underlying morbidity levels Some indicators may be infrequent, some may be subject to partially missing values (e.g. lung cancer county incidence data not comprehensive across all US states) So the common factor pools information of health relativities over different observed indicators (and imputes/predicts values where they are missing) 84   Can use common factor models to develop univariate indices of “health need” (need for healthcare based on demographic/social composition of population & also maybe taking account of relativities in healthcare usage) e.g. Congdon, J Geo Syst, 2008 Conventional methods for needs indices (e.g. Mental Illness Needs Index) are aspatial and usually based on regressing health activity on bundle of socioeconomic indices. Need index then derived from significant coefficients. 85 86    Develop multivariate factor models of different aspects of area social structure with relevance to health outcomes Example: area deprivation, area social fragmentation, and rurality all relevant to suicide contrasts Explanatory ecological model for US suicide contrasts (male & female suicide 2002-2006) 87 88 89 90 91 92    Education application. Multiple tests are indicators of underlying ability But investigator might also want to know what causes variations in ability. Potential causes might include gender, parent SES, type of school, etc 93   Variations in avoidable (“ambulatory sensitive”) hospital admissions. Usually unplanned emergencies. Such admissions typically for chronic conditions, and in many cases could be avoided with suitable primary/community care 94 Possible “multiple indicators” of underlying “quality of care” factor for GP practices:  Emergency admission rate by practice  Ambulatory sensitive admission rate  Attendance rate (unplanned attendances at “emergency room” or “Accident and Emergency Unit” that usually don’t result in hospital admission)  Possible “multiple causes”  GP practice deprivation  Average proximity of patients (affiliated to each GP Practice) to hospital  95     Similarly Congdon 2010 (J Stat Comp Sim) uses mortality and hospitalisation data (as “indicators”) for CHD need index Spatially structured need index for 625 London wards (small areas) But ward income and unemployment act as “causes” Adjust spatial prior (Besag et al CAR or Leroux et al 1999 model) to allow for predictors 96 97         US prevalence model application based on BRFSS with bivariate outcome “Multiple indicators” are six categories defined by diabetic status and weight band: diabetic and obese (y=1) diabetic and overweight (y=2) diabetic and normal weight (y=3) non-diabetic and obese (y=4) non-diabetic and overweight (y=5) non-diabetic and normal weight (y=6). 98 99 10 0

health applications of bayes techniques

Related documents

Products

Support

health applications of bayes techniques

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib