Group4 (Robert Szulkin, Alex Granqvist) Baby immune response and HIV infection Maternal transmission of HIV is well known (Nduati, et al JAMA 2000), where nearly 40% of breastfed children turned HIV positive within 2 years after birth. Considering breastfeeding as exposure, the question we now address is whether the child’s immune response (as measured the peptide response) predict their time to infection. ‘exposed children’ are those under breastfeeding. their infection status is ascertained at monthly visits during followup peptide response is also measured at those visits. The peptides tested belong to 5 major groups; first answer the question for all combined, and then check if there is any individual group effect. The data is in 3 STATA files where infant is identified by IDNUM in each: BF_lastBF: whether child was breastfed, and if so, the age in months at last breastfeeding. Example of reading the STATA data into R: >setwd('z:/classes-new/likelihood/MEB/HIV') >library(foreign) >BF = read.dta('BF_lastBF.dta') >names(BF) [1] "idnum" "bfeeding" "lastbf" >table(BF$bfeeding) no yes stopped 102 349 0 (NOTE: Analyse only the 349 breastfed children.) LAST_NEG_1ST_POS_DATES: dates of last negative and first positive HIV test (variable PROBLEM indicates where infection status was uncertain) also delivery date of child, allowing us to compute age. Use library(date) to deal with date variables. E.g. > library(date) > a=as.date(c("1jan1960", "2jan1960", "31mar1960", "30jul1960")) > as.numeric(a) [1] 0 1 90 211 > as.date(c('2001-12-31','2009-5-30'), order='ymd') (i.e. dates are represented as the number of days since 1Jan1960. This allows you to compute survival times from calendar dates.) ELISPOT_TESTS: the Elispot test result (in ‘spot forming units’) for each peptide at each visit. The file includes the following variables: o SPECDAY, SPECMON, SPECYR: day month and year of the specimen o VISIT: code for the age of infant at visit, DEL=delivery, M1=month 1, M2 = month 2 etc. o PEPTIDE: the exact sequence of the peptide tested o PEPPROT (peptide group, 5 levels),1=env(gp120), 2=nef, 3=gag (p24/p17), 4=pol, 5=rev o OBG: overall background o OSFU: overall spot forming units o OHIVSFU: overall HIV-specific SFU (‘Overall’ is a measurement done by eye and by machine, where we take the eye value when available, and machine otherwise.) o CTL50, CTL100, CTL500: indicators for whether peptide gives positive response for 3 different cut-off criteria Questions The main predictor of interest is OHIVSFU or the ratio (OHIVSFU/OBG), which is best viewed on log scale. Check with the normal quantile plot to get the reasonable scale. Fit a simple Cox model, using only the first peptide test result as predictor. For the time of infection, use the mid-point between the last-negative and first-positive as the time of HIV infection. (You can use coxph function in the survival package in R.) Your main problem now is to use all the repeated peptide values during the followup. The idea is to use some sort of GEE-type extension for the Cox model, where all the measurements from one child are paired with the survival information, and they form repeated survival data. E.g. if a child has 3 visits, then he will generate 3 correlated subjects (x1,y1,delta), (x2,y2,delta) and (x3,y3,delta), where yi is the survival information computed from the visits (consisting of start and end of followup dates). Delta is the final infection status; it will have the same value for these 3 pseudo-subjects. (See the function Surv(.) in R. You specify the start and end date in this function; see below.) For independent subjects, starting with the Cox likelihood, derive the score equation and the information matrix for the Cox regression model. Identify the score equation as an estimating equation, so explain how it can be extended to GEE assuming ‘independent working variance’. Sketch how the sandwich formula for the robust variance looks like for the GEE in this case. Run the coxph model in R Coxph(Surv(start,end,hiv) ~ peptide + cluster(idnum)) The ‘cluster’ term indicates the repeated measures. This will produce estimates and the robust standard errors from the GEE model. Compare the GEE results with the simple run using only the first peptide measurement.