Correctly modeling CD4 cell count in Cox regression analysis of HIV-positive patients Allison Dunning, M.S. Research Biostatistician Weill Cornell Medical College Outline • • • • • • Background Motivation Methods Data Management Results Conclusion Background • Results from the primary open-label clinical trial have previously been published in the New England Journal of Medicine. Background • Results of the clinical trial have shown that starting antiretroviral therapy earlier (‘Early’) rather than waiting for onset of symptoms (‘Standard’) in HIV patients significantly decreases mortality. • Between 2005 and 2008 a total of 816 participants – 408 per group – were enrolled and followed. • After stopping the clinical trail all participants were immediately put on antiretroviral therapy. • Researchers have continued to follow and collect data on the 816 participants. Motivation • As a follow-up, researchers are interested in determining if ‘Early’ therapy significantly decreases time to first Tuberculosis (TFTB) diagnosis. • CD4 cell count has long been considered a measure of overall health in HIV patients. • Therefore investigators felt it was important to adjust for CD4 cell count in the analysis of TFTB diagnosis. Motivation • The problem arose of how best to adjust for CD4 cell count. • Typically CD4 recorded at the beginning of the study is used for analysis; known as baseline CD4 cell count. • Per protocol CD4 cell counts were collected every 6 months for all participants. • Investigators felt it was important to account for changing CD4 cell counts, especially after therapy initiation, in the analysis. Motivation • Our analysis was not interested in predicting survival just whether or not drug start time was a predictor of TB diagnosis. • In order to allow survival analysis to account for changing CD4 cell counts we decided to conduct a Cox Proportional Hazards Regression analysis using a mixture of fixed and time-dependent covariates. What is a Time Dependent Covariate • Time-dependent covariates are those that may change in value over the study period • Most variables in survival analysis are collected at one time point, typically at the start of the study, these include demographic and risk factor variables • Sometimes we may collect a lab variable or risk factor that can vary over the study period. Example of Time Dependent Variables • Lab Values: – Blood Pressure • Most studies will only use blood pressure collected at start of study, sometimes called baseline blood pressure. • However, in theory, blood pressure could be collected at multiple time during the study period. • Risk Factors: – Smoking Status • Again this can be collected only at start of study, or baseline or could be tracked over time • Some patients may quit smoking, start smoking, or quit and relapse smoking during the study period. Fixed Covariates • Fixed Covariates is a term used to represent variables that stay constant, or do not change, during the study period. • These are typically things like patient gender, race/ethnicity, risk factors such as diabetes or hypertension, etc. • We as researchers must develop a method to analyze time to event data while including both these fixed covariate and time-dependent covariates Methods • STATA 12.0 was used to perform two Cox regression models to analyze the effect of ART start time on TFTB. • The first model included baseline CD4 cell count only as a predictor • While the second model treated CD4 cell count as a time-varying predictor. • Both models were adjusted for history of TB diagnosis prior to clinical trial and baseline BMI Methods • Regular Cox Proportional Hazards Model: – Log[hi(t)] = α(t) + β1xi1 + … + βkxik – Where α(t) = log [λ0(t)] • Proportional Hazards Model with time-varying covariate: – Log[hi(t)] = α(t) + β1xi1 + β2xi2(t) – Where α(t) = log [λ0(t)] Data Management • Problems we encountered: • Missing CD4 cell count – Some patients missed a scheduled lab visit during the study, therefore CD4 cell count was missing for one of the six month intervals. • Multiple CD4 cell counts within a six month interval – For various reasons, several patients visited the lab multiple times within a six month interval, therefore multiple CD4 cell counts were collected in the six month time frame. Data Management • What we did – Missing Data: – If only one interval was missing, the previous CD4 cell count was used in a carry the last forward approach – If at least two consecutive intervals were missing, the patient was excluded from the study; 13 patients in total were excluded for this reason. • What we did – Multiple Observations: – The minimum CD4 cell count collected in the six month interval was the value used in analysis for that time frame. . use "C:\Documents and Settings\ald2018\Desktop\STATA Conference 2013\JSM Abstra . quietly tabulate rxcode, generate(grp) . rename grp1 Early . stset weeks_to_tb, failure(incident_tb==1) failure event: obs. time interval: exit on or before: 760 0 760 94 150407.6 incident_tb == 1 (0, weeks_to_tb] failure total obs. exclusions obs. remaining, representing failures in single record/single failure data total analysis time at risk, at risk from t = earliest observed entry t = last observed exit t = . stcox Early baseline_bmi history_tb baseline_cd4 failure _d: analysis time _t: incident_tb == 1 weeks_to_tb 0 0 330.5714 Results – Regular Cox Regression . stcox Early baseline_bmi history_tb baseline_cd4 failure _d: analysis time _t: incident_tb == 1 weeks_to_tb Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood Refining estimates: Iteration 0: log likelihood = = = = -619.08645 -601.71285 -600.93058 -600.92881 = -600.92881 Cox regression -- Breslow method for ties No. of subjects = No. of failures = Time at risk = Log likelihood = 773 96 153457.4286 -600.92881 _t Haz. Ratio Early baseline_bmi history_tb baseline_cd4 .4388986 .9067024 2.163769 .9972554 Std. Err. .096901 .0322029 .5184032 .0025424 z -3.73 -2.76 3.22 -1.08 Number of obs = 773 LR chi2(4) Prob > chi2 = = 36.32 0.0000 P>|z| [95% Conf. Interval] 0.000 0.006 0.001 0.281 .2847305 .8457326 1.352935 .9922849 .6765415 .9720676 3.460546 1.002251 Results • Regular cox regression analysis showed that ‘Early’ therapy results in a significant decrease in TFTB, after adjustment for previous TB diagnosis, baseline BMI, and baseline CD4 cell count. Data Management • Data was collected with one row per participant: Data Management • In STATA, using reshape command, we reformatted dataset for analysis: . quietly tabulate rxcode, generate(grp) . rename grp1 Early . stset t2, id(patid) failure(status==1) time0(t1) id: failure event: obs. time interval: exit on or before: 5659 212 36 5411 760 92 148364 patid status == 1 (t1, t2] failure total obs. entry on or after exit (t1>t2) overlapping records (t2[_n-1]>t1) obs. remaining, representing subjects failures in single failure-per-subject data total analysis time at risk, at risk from t = earliest observed entry t = last observed exit t = . stcox Early history_tb baseline_bmi cd4_count failure _d: analysis time _t: status == 1 t2 PROBABLE ERROR PROBABLE ERROR 0 0 329 Results – Cox Regression with timedependent covariates . stcox Early history_tb baseline_bmi cd4_count failure _d: analysis time _t: id: status == 1 t2 patid Iteration 0: log likelihood Iteration 1: log likelihood Iteration 2: log likelihood Iteration 3: log likelihood Iteration 4: log likelihood Refining estimates: Iteration 0: log likelihood = = = = = -593.47212 -541.68367 -535.94519 -535.89227 -535.89226 = -535.89226 Cox regression -- Breslow method for ties No. of subjects = No. of failures = Time at risk = Log likelihood = 760 92 148364 -535.89226 _t Haz. Ratio Early history_tb baseline_bmi cd4_count .8041764 1.991573 .9376221 .9911868 Std. Err. .1881104 .4914993 .0327094 .0010495 z -0.93 2.79 -1.85 -8.36 Number of obs = 5411 LR chi2(4) Prob > chi2 = = 115.16 0.0000 P>|z| [95% Conf. Interval] 0.351 0.005 0.065 0.000 .5084414 1.227803 .8756554 .9891321 1.271926 3.230456 1.003974 .9932458 Results • When treating CD4 cell count as time-varying predictor in Cox regression, we find that ART start time is not a significant predictor of TFTB. Conclusion • Failing to adjust for the change in CD4 cell counts over time led to reporting that ‘Early’ therapy significantly reduces risk of TB diagnosis. Modeled correctly, the effect becomes non-significant. This result has substantial consequence on treatment decision making. Conclusion • Our results help us to consider that TFTB diagnosis in HIV positive patients is not associated with start time of ART when overall patient health is considered. • Further analysis is needed before we are comfortable making this conclusion. Looking Forward • We are currently in the process of further examining the relationship between CD4 cell count and ART start. • Currently collecting data to examine time from ART start to first TB diagnosis. For the Early group this data does not change, however, for the Standard group this may have a significant effect on the analysis. Acknowledgements • Daniel W. Fitzgerald, M.D • Sean Collins, M.D • Sandra H. Rua, Ph.D Thank You ald2018@med.cornell.edu