Strengthening Causal Inference in HIV Studies: + Introduction and Practical Examples CAPS Methods Core Presentation, April 18, 2012 Starley Shade, Sheri Lippman, Mi-Suk Kang Dufour & Carol Camlin + Outline Answering causal questions: common roadblocks in HIV research Causal Inference Framework and Overview of methods Concrete example: Using treatment and censoring weighting in Prevention with Positives Concrete example: G-comp for population level attributable risk in the SHAZ study Q &A + Roadblocks in HIV research: selection bias / who gets exposed Population surveillance and surveys in probability-based samples study participants (in testing, in survey research, etc.) almost always systematically differ from nonparticipants Observational studies using ‘comparison’ clinics, communities: Systematic differences in study arms exist and/or may accrue over time + Common roadblocks in HIV research: Loss To Follow-up Cohort studies of HIV+ individuals: highly susceptible to loss to follow-up >20% after 2 years, in resource-poor settings: medical records don’t capture patient mobility Death registries rarely available & those who die mistakenly assumed to be lost to follow-up Those who drop out are systematically different from those who stay engaged in care + Roadblocks in HIV research: time dependent confounding C (&U)0 Expos0 C (&U)1 C (&U)2 C = group of confounders C (&U)3 U = unmeasured confounders STI0 Expos1 Expos2 Expos3 STI1 Time dependent confounding – if C is related to prior exposure & affects subsequent exposure STI2 STI3 + Common roadblocks in HIV research: Complex, multicomponent intervention studies Increasing calls for comprehensive HIV prevention interventions addressing multiple levels and domains of influence on individual behavior Evaluation Diverse of such studies hampered by: levels of exposure to individual intervention components Difficult to distinguish relative contributions of individual intervention components to observed outcomes + Mending our comparison – the causal /counter factual framework “We may define a cause to be an object followed by another… where, if the first object had not been, the second never had existed” (Hume 1748) An association can be considered causal when, if the exposure had been altered, the outcome would have been different Key part is the counterfactual element – reference to what would have happened if, contrary to fact, the exposure had been something other than what it actually was + 8 Counterfactual framework “Ideal a hypothetical study which, if we could actually conduct it, would allow us to infer causality Ideal experiment” illustrates the framework experiment: Person or population experiences one exposure and observed for outcome over a given time period Roll back the clock Change the exposure but leave everything else the same, observe for outcome over the same time period + 9 Counterfactual framework OBSERVED: AIDS Time Counterfactual question: how long would Person A have survived had if he/she had not received treatment? + 10 Counterfactual framework OBSERVED: AIDS Time UNOBSERVED: AIDS + 11 Counterfactuals – specifying what we really want to know Thinking about the counterfactual outcome(s) as something we are missing and something we are trying to estimate when we analyze HIV studies or any epidemiologic data is instructive Akin to a missing data problem When we compare groups of people observed as exposed or unexposed we want to compare groups that best estimate the counterfactual outcomes that are unobserved or missing + Notation for presentation A A = treatment Y = outcome W = confounders (point treatment) L = confounders (longitudinal) The Likelihood of Data simplifies to: W, L L(O) = P(Y|A,W,L)P(A|W,L) Y + Rationale for causal inference approach Basic regression models produce stratum specific, or conditional, estimates (i.e., “while holding constant a set of covariates”) E[Y | A, L ) b0 b1 A b3 L ( j )... Where Y is outcome, A is observed exposure and L is matrix of time-dependent covariates Therefore, our conditional estimates of effect are also E[Y | A 1, L ) E (Y | A 0, L ) b1 + Rationale for causal inference approach Causal inference approaches help us model our way back to the ideal (counter factual) experiment E[Y (a 1) Y (a 0)] Where Y is outcome and a is counterfactual where all individuals are exposed (a=1) or unexposed (a=0) + Inverse Probability Weighting + Inverse Probability of Treatment Weighting (IPTW) Re-create the counter factual data set by weighting IPTW assigns a weight for each subject equivalent to the inverse probability of being in their exposure group at each interval. wt 1 / P[ A( j ) 1 | A ( j 1), L ( j )] The treatment model is based on values of past and current covariates (L(j)) and past exposures (A(j-1)). E[ A( j ) | A ( j 1), L ( j )] a0 a2 ( L ( j ) a3 A ( j 1) a4 L ( j 1)... + Inverse Probability of Treatment Weighting (IPTW) The treatment weights are applied to the observed population (e.g. weighted logistic regression) wt [ E (Y | A)] b0 b1 A Creates a new pseudo-population in which the distribution of confounders is balanced between the two exposure groups, essentially mimicking a randomized trial. E[Y (a 1) Y (a 0)] b1 + Inverse Probability of Censoring Weighting (IPCW) Like IPTW, IPCW assigns a weight equivalent to the inverse probability of remaining in the study at each interval, based on values of observed covariates and past outcomes and exposures. wc 1 / P[C 1 | A ( j ), L ( j )] The censoring weights are applied to the observed population, creating a new pseudo-population in which censored subjects are “replaced” by upweighting uncensored subjects with the same values of past exposures and covariates. + Example: Prevention with Positives Demonstration Projects Fifteen HRSA-funded demonstration projects implemented prevention with positives in clinical settings Each site decided whether to randomize patients to: Provider-delivered intervention vs. Assessment Specialist-delivered intervention vs. Assessment Mixed intervention vs. Provider intervention How do we assess the effectiveness of each intervention type? + Example: Prevention with Positives Patient characteristics Male White Heterosexual Age 40 or more Education (Less than HS) Employed CD4 < 200 VL < 75 Standard of care Provider Specialist Mixed p< 781 (74) 410 (39) 453 (43) 720 (68) 540 (51) 490 (64) 282 (37) 371 (48) 423 (55) 377 (49) 705 (72) 332 (25) 478 (49) 704 (72) 524 (54) 530 (72) 298 (22) 297 (39) 431 (57) 371 (49) .001 .001 .001 .001 ns 411 (39) 152 (14) 381 (36) 355 (46) 109 (14) 216 (28) 324 (33) 154 (16) 418 (43) 279 (37) 120 (16) 219 (29) .001 ns .001 + Example: Prevention with Positives Retention At the 12-month follow-up assessment, 58% of patients were retained in the standard of care group, 76% of patients were retained in the provider intervention sites; 62% were retained in the specialist sites; and 44% in the mixed intervention sites. There were differences in retention by patient characteristics. Older, white, gay males with more than a high school education but who did not use cocaine or injection drugs were more likely to be retained in the study at 12-months . + Example: Prevention with Positives Risk Behavior 30% 25% 20% Provider-led Specialist-led Mixed Assessment 15% 10% 5% 0% Baseline 6 months 12 months + Example: Prevention with Positives Analysis Inverse probability of treatment weights E[ A | L ] a0 a1 (male) a2 ( white) a3 ( gay )... wt 1 / P( A | L ) + Example: Prevention with Positives Analysis Inverse probability of censoring weights E[C ( j ) 1 | A , L ] c c( provider ) c( specialist )... c(male) c( white) c( gay )... wc 1 / P[C ( j ) | A , L ] *1 / P[C ( j 1) | A , L ]... Weighted logistic regression wt * wc * log it [ E (Y | A)] b0 b1 ( provider ) b2 ( specialist b3 (mixed ) + Example: Prevention with Positives Results Intervention type 6 months OR (95% CI) 12 months OR (95% CI) Provider-delivered 0.93 (0.60, 1.20) 0.55 (0.32, 0.94) Specialist-delivered 0.58 (0.35, 0.96) 0.67 (0.39, 1.14) Mixed 0.89 (0.53, 1.51;) 0.89 (0.53, 1.51) Reference Reference Assessment only + G-computation and Population intervention Models G-computation Sometimes called substitution estimation approach G-computation approach is to model the exposure and outcome relationship and then “control” exposure in the population by substituting counterfactual exposures in your model Population intervention models use this approach to answer practical questions 27 + Population Intervention Models Standard regression models give conditional estimate: E (Y | A 1,W w) E (Y | A 0,W w) Marginal structural models allow total effect estimate: Ew (Y1 ) Ew (Y0 ) For interventions what we care about is the population difference when intervention is present or absent: Ew (Ya ) Ew (Y ) + Analogous to Attributable Risk Traditional population Attributable Risk or Attributable Fraction: The proportion of the disease risk in the total population associated with the exposure Incidenceexp osed Incidenceun exp osed Incidenceexp osed proportion exp osed *100 This assumes the exposure causes the outcome and that there are no other causes i.e. in absence of that exposure there would be no outcome + Why PIMS? Rarely looking at outcomes with only one important predictor/confounder PIMS allow assessment of effect averaged across covariates Rarely able to completely eliminate a risk factor from population PIMS allow estimation for realistic interventions + Population Intervention Models: estimation 1) Estimate outcome model 2) Create new dataset setting covariate(s) of interest to intervention levels 3) Predict outcome of interest using model estimated in step 1 4) Calculate the difference between predicted mean outcome and observed mean outcome + Example: SHAZ! study SHAZ! (Shaping the Health of Adolescents in Zimbabwe) Enrolled adolescent orphan girls ages 16 to 19 Overall project was designed as an HIV prevention intervention based on provision of reproductive health services, economic livelihoods training and life-skills education + Example: SHAZ! study Using baseline data to look at a secondary outcome Interested in the potential of interventions to improve mental health for adolescent orphan girls Several structural factors considered as potentially modifiable with intervention Orphaning Age at orphaning Socioeconomic status Food security Ability to pay for medication Ever homeless Changes in household Completed education Social environment Female caregiver relationship Social support Exposure to violence Feeling safe at home Caring for ill person Psychological distress (Unmeasured) Poor physical health General health status Viral infection Baseline Self efficacy Baseline Mental Health status SSQ + PIMS Question: What is the potential impact of intervening on these factors on this population’s mental health status? + Domain/variable Social environment Physical violence Sexual violence Prevalence in Population N % Hypothesized intervention level 18 29 4.7% 7.6% no experience of physical violence no experience of sexual violence forced sex Unsafe home environment 28 241 7.3% 62.9% no experience of forced sex home environment considered very safe Household expereince of violence 34 8.9% noone in the house experiencing violence Caring for ill Low social support 115 231 30.0% 60.3% not caring for someone ill in the household "enough" people you can count on Absence of supportive female caregiver 116 30.3% presence of a female caregiver who is "often" or "always" supportive 132 34.5% Unable to buy medicine 235 61.4% never going to bed hungry or not eating because there is no food able to buy needed medicine within 2 days Changes in household location 197 51.4% Ever homeless Less than form 4 education Low baseline self efficacy 86 99 335 22.5% 25.8% 87.5% no changes in household location within the past 5 years never homeless at least form 4 (secondary) education Average response of "agree/strongly agree" with positive statements, "disagree/strongly disagree" with negative statements Poor physical health Less than excellent health 278 72.6% excellent self reported health Viral infection HIV/HSV-2 42 11.0% no viral infection with HIV or HSV-2 Socioeconomic status Food security + Traditional regression results Conditional Effects parameter (standard regression) Dichotomized Social environment OR Physical violence 3.67 Sexual violence 0.61 forced sex 2.99 Unsafe home environment 1.50 Household expereince of violence 1.85 Caring for ill 5.19 Low social support 1.64 Absence of supportive female caregiver 2.57 Socioeconomic status Food security 0.88 Unable to buy medicine 1.30 Changes in household location 1.11 Ever homeless 2.40 Less than form 4 education 1.38 Low baseline self efficacy 4.84 Poor physical health Less than excellent health 2.67 Viral infection HIV/HSV-2 2.57 Potential Impact of Interventions Domain/variable Prevalence in Population Population Intervention parameter N % Physical violence 18 4.7% -1.1% Sexual violence 29 7.6% 0.0% forced sex 28 7.3% -0.7% Unsafe home environment 241 62.9% -3.5% Household experience of violence 34 8.9% -1.1% Caring for ill 115 30.0% -5.8% Low social support 231 60.3% -4.4% Absence of supportive female caregiver 116 30.3% -3.9% 132 235 197 86 99 335 34.5% 61.4% 51.4% 22.5% 25.8% 87.5% 0.4% -2.7% -0.9% -2.8% -0.5% -9.2% Less than excellent health 278 72.6% -7.4% Viral infection HIV/HSV-2 42 11.0% -1.3% Social environment Socioeconomic status Food security Unable to buy medicine Changes in household location Ever homeless Less than form 4 education Low baseline self efficacy Poor physical health + Extension of this approach to longitudinal context: 6 month covariates Baseline covariates Intervention Participation: Life-skills Red Cross Baseline Mental Health 12 month covariates Intervention Participation: Start vocational training Mental Health at 6 months Mental Health at 12 months Time 18 month covariates Intervention Participation: finish vocational training Mental Health at 18 months Intervention Participation: Receive grant Mental Health at 24 months + Question: Does poor mental health status affect participation in the intervention over time? + Analytic approach Interested in effect of exposure (A) on outcome (Y) given covariates and past exposure and outcome EW[E0(Y|A=1,W)‐E0(Y|A=0,W)] Where W includes past exposure and outcome and other covariates + Analytic approach cont. Fit a series of point treatment models for outcomes at timepoints following exposure(s) of interest + Example 1: 6 month covariates Baseline covariates (W) Intervention Participation: Life-skills (Y) Red Cross (Y) Baseline Mental Health (A) Intervention Participation: Start vocational training Mental Health at 6 months + Example 2: 6 month covariates (W) Baseline covariates (W) Intervention Participation: Life-skills Red Cross (W) Baseline Mental Health (W) Intervention Participation: Start vocational training (Y) Mental Health at 6 months(A) Odds of Completion of Intervention Components by Symptomatic Status for Mental Health Distress at Baseline, Conditional on Completing Previous Intervention Components: Estimates from Logistic Regression Lifeskills Sample OR Size (95% CI) 300 1.1 (0.35, 3.42) Red Cross Sample OR Size (95% CI) 282 0.57 (0.30, 1.11) Start vocational training Sample OR Size (95% CI) 114 1.30 (0.14, 12.14) Completed vocational training Sample OR Size (95% CI) 114 0.63 (0.26, 1.54) Received Grant Sample OR Size (95% CI) 78 0.54 (0.05, 6.37) Difference in Intervention Component Completion by Mental Health Distress Symptoms, Conditional on Completing Previous Intervention Components: Average Treatment Effects (ATE) using tmle(D/S/A) estimation Lifeskills Sample Size 300 Symptomatic at baseline Red Cross Start vocational training Completed vocational training ATE (95% CI) Sample Size ATE (95% CI) Sample Size ATE (95% CI) Sample Size ATE (95% CI) 0.03 (-0.02,0.08) 282 -0.23 (-0.41,-0.05) 119 -0.01 (-0.16, 0.14) 114 -0.18 (-0.43, 0.07) 118 0.05 (0.02,0.10) 113 0.04 (-0.19,0.26) 110 -0.01 (-0.28, 0.26) Symptomatic at 6 months Symptomatic at 12 months Symptomatic at 18 months bold numbers indicate parameters statistically significant at p<0.05 + Assumptions and Limitations + Assumptions No Unmeasured Confounding There is no way to empirically test for no unmeasured confounding; collection of data on a complete set of covariates should be incorporated in the design phase Time-ordering (temporality) Need to be certain the covariates measured were prior to treatment if used in Tx weights/ treatment is prior to outcome. Experimental Treatment Assignment (ETA) or positivity Groups defined by all possible combinations of covariates must have the potential to be in any (either) treatment groups. If there are covariate groups that will only be observed in one treatment state, then we cannot estimate the effect of the exposure within that group + Acknowledgements Thanks to: Alan Hubbard, UCB Mark van der Laan , UCB Jennifer Ahern, UCB