Can statisticians and medical doctors talk together, work together, do research together? Saskia le Cessie Dept of Clinical Epidemiology/ dept of Medical Statistics and Bioinformatics, Leiden University Medical Center Outline 1. Can statisticians and medical doctors talk together? Working in a academic medical centre, what to expect? 2. Can statisticians and medical doctors work together? Practical problems and challenges. 3. Can statisticians and medical doctors do research together? Examples of statistical research projects, resulting from collaborations with medical researchers 2 UCL, april 30,2010 Can statisticians and medical doctors talk together? End-stage renal Failure and Creactive protein Medical doctor ??????? βˆ = (X' X) −1 X' Y, ( y i − ŷ i ) 2 σˆ = n − p −1 Statistician 3 UCL, april 30,2010 What does a medical statistician (like me) do? • Medical researchers within our institute can contact us for 1. ad-hoc questions 2. help with the design and analysis of their study 3. collaboration in project teams of large studies • Teaching medical students and researchers (statistics/ conducting research) • Own methodological research • In our institute: no fees for internal consultation 4 UCL, april 30,2010 Medical research Research questions, aims of study Study Design Data collection Statistical analysis Interpretation, conclusions 5 UCL, april 30,2010 Tasks of statistician Medical research Research questions, aims of study Find out what research questions are Study Design Data collection Statistical analysis Interpretation, conclusions 6 UCL, april 30,2010 Tasks of statistician Medical research Research questions, aims of study Ideally: statistician is consulted before study is carried out Study Design In practice: often data is already collected. Find out how. Data collection Statistical analysis Interpretation, conclusions 7 UCL, april 30,2010 Tasks of statistician Medical research Research questions, aims of study Study Design To advice which statistical methods are needed to answer the research questions, Data collection or carry out the analyses yourself Statistical analysis Interpretation, conclusions 8 UCL, april 30,2010 Tasks of statistician Medical research Research questions, aims of study • Explain the statistical methods Data collection • Interpret the results and implications in such a way that the researcher understands it Statistical analysis • Important, otherwise methods will not be used Study Design Interpretation, conclusions 9 UCL, april 30,2010 Requirements for statistical consultants • Interest in collaboration with non statisticians • Interest in medical research • Be an all-round, up-to-date statistician • Good communication skills • Able to explain statistics to non statisticians • Pragmatic • Work under time pressure Joiner gives list of 22 points (in Encyclopedia of Statistical Sciences, see Kenett and Thyregod, Stat Neerl 2006) 10 UCL, april 30,2010 Can statisticians and medical doctors work together? Yes we can ! Statistician Medical doctor 11 UCL, april 30,2010 Medical doctors: • Usually they think differently (less mathematical, more intuitive) • Some of them are smart, devoted to their research and patients, and really want to understand things • They usually value statisticians highly 12 UCL, april 30,2010 Problems and challenges of consulting 1. Determine the amount of time spend on a project • Depends on quality of researcher, research project, and interest from a statistical point of view • Often a simple (reasonable correct) approach satisfies • Not so good researchers with not so well designed studies tend to ask more frequently for help • Researcher does the simple analyses him/herself • Statistician advises and performs the complex analyses. 13 UCL, april 30,2010 Problems and challenges (2) Asking advice after (part of) research is done. Ideally: statistician is involved in whole project. But often advice is asked • after data is collected. Design errors cannot be repaired. • after statistical analyses are done. Difficult to handle if analyses are not correct • after paper is written. Some medical journals ask for written approval of a statistician (We decided not to sign such forms) • after paper is rejected. 14 UCL, april 30,2010 Problems and challenges (3) 3. Conducting own statistical research • Topics naturally arise from practical problems • Advantage: work on relevant problems, with data available • Difficult to find time to do research (other things are usually more urgent) • Difficult to focus on one topic 15 UCL, april 30,2010 Can statisticians and medical doctors do research together? Collaboration in Medical projects Statistician Medical doctor 16 UCL, april 30,2010 Can statisticians and medical doctors do research together? (2) Two examples of statistical research resulting from collaboration with medical researchers 1. The problem of two control groups 2. The problem of modeling repeated kidney function measurements 17 UCL, april 30,2010 Example 1: The problem of two control groups • MEGA study: Large case-control study to examine risk factors for thrombosis • Cases: patients with first thrombosis (n= 3986) • Two control groups: • Control group 1: partners of cases (n= 2286) • Control group 2: population based controls (randomly selected) (n= 2612) 18 UCL, april 30,2010 Analysis of data: two case-control studies • Compare cases to matched controls (partners) • Matched analysis • Conditional logistic regression • Compare cases to population based controls • Unconditional logistic regression • Yields two sets of odds ratios, both using only part of the data 19 UCL, april 30,2010 Example: effect of smoking (ever/never) on thrombosis log-OR β (se) OR (95%CI) Matched analysis 2286 pairs 0.194 (0.069) 1.21 (1.06, 1.39) Unmatched 3986 cases 0.320 (0.052) 1.38 (1.24, 1.52) 2612 controls • Can we combine the results? 20 UCL, april 30,2010 Estimate overall effect, using all data • Pool separate estimates (perform a kind of meta-analysis) βˆ pooled = wβˆ 1 + (1 − w )βˆ 2 • Choice weights w such that se(βpooled) is minimal w = (se22 − ρse1se2 ) /(se12 + se22 − 2ρse1se2 ) • Can be extended to situation where several estimates are pooled simultaneously IkT C11C12−1 Ik−1 IkT C11C12−1 β$1 β$pooled= I C C I I C C $ k 21 22 k k 21 22 β2 21 UCL, april 30,2010 Problem • Same cases are used twice. • Therefore estimates β̂1 and β̂2 are correlated • We need to know correlation • to calculate optimal weights • to obtain correct standard errors and confidence intervals • Correlation can be estimated in two ways • Sandwich estimator • Bootstrap 22 UCL, april 30,2010 Sandwich estimator • Robust estimator, often used in longitudinal data analysis (GEE) • Can be extended to estimate covariance: −1 ˆ T −1 ˆ ˆ ˆ ˆ ˆ cov(β1,β2 ) = I1 (β1 ) ∑ U1i (β1 )U2i (β2 ) I2 (β2 ). i∈M • Inner part is summed over M: all matched cases • U1i and U2i are the components of the score functions of the two likelihood functions, I1 and I2, the Hessians 23 UCL, april 30,2010 Bootstrapped covariance estimator • Bootstrap estimator • Repeatedly sampling with replacement from the original dataset • Each time estimate β1 and β2. • Use correlation between estimates of bootstrap resamples. • Note: resampling should be done using sampling scheme of original dataset • Sample from patients and obtain partner control if available • Sample from random control group 24 UCL, april 30,2010 Simulation • 100 matched pairs, 100 unmatched controls, 1000 simulation runs • One covariate, true β = 1 • Observed correlation between two estimates was 0.501 Sandwich Bootstrap estimate median se observed se estimated β 0.977 0.281 ρ 0.502 0.08 β 0.976 0.290 ρ 0.493 0.07 0.276 0.276 25 UCL, april 30,2010 Simulation • 100 matched pairs, 100 unmatched controls, 1000 simulation runs • One covariate, true β = 1 • Observed correlation between two estimates was 0.501 Sandwich Bootstrap estimate median se observed se estimated β 0.977 0.281 ρ 0.502 0.08 β 0.976 0.290 ρ 0.493 0.07 26 0.276 0.276 UCL, april 30,2010 Simulation • 100 matched pairs, 100 unmatched controls, 1000 simulation runs • One covariate, true β = 1 • Observed correlation between two estimates was 0.501 Sandwich Bootstrap estimate median se observed se estimated β 0.977 0.281 ρ 0.502 0.08 β 0.976 0.290 ρ 0.493 0.07 0.276 0.276 • Both methods perform equally well 27 UCL, april 30,2010 Back to the MEGA study, effect of smoking on thrombosis n OR (95%CI) correlation Matched analysis 4572 1.21 (1.06, 1.39) Unmatched 6598 1.38 (1.24, 1.52) Pooled, using sandwich 8889 1.33 (1.21, 1.45) 0.31 Pooled using bootstrap 8889 1.32 (1.21, 1.45) 0.28 • Test for equivalence of two odds ratios: p= 0.08 28 UCL, april 30,2010 Effect of smoking adjusted for confounders (sex, age, BMI, pregnancy) n OR (95%CI) correlation Matched analysis 4572 1.27(1.10, 1.46) Unmatched 6598 1.35(1.22, 1.50) Pooled, using sandwich 8889 1.33(1.21, 1.46) 0.30 Pooled using bootstrap 8889 1.33(1.21, 1.46) 0.28 • Test for equivalence of two odds ratios: p= 0.39 29 UCL, april 30,2010 Result of collaboration • Developed an easy method to combine estimates from several case-control analyses • Le Cessie et al. Combining Matched and Unmatched Control Groups in Case-Control Studies. Am J Epidemiol 2008;168:1204–1210. • Pomp et al. Experience with multiple control groups in a large population-based case-control study . Submitted. • Thanks to Nico Nagelkerke, Frits R. Rosendaal, Karlijn J. van Stralen, Elisabeth R. Pomp, Hans C. van Houwelingen 30 UCL, april 30,2010 Example 2: Analyzing data of a Dutch follow-up study of renal patients • 1526 end-stage renal failure patients who start dialysis • Two different forms of dialysis : hemodialysis (HD) and peritoneal dialysis(PD) • Outcome: renal function (Glomerular filtration rate, GFR) • Measurements at start dialysis, 3 months, 6 months and thereafter every 6 months (Here follow-up of 3 years) • Goal: model the pattern of GFR over time 31 UCL, april 30,2010 Some problems • The kidneys can stop working completely. Then GFR is per definition equal to 0. This is called anuria. • Sometimes there are incidental GFR=0 measurements • Patients with PD have on average a larger GFR value at start of dialysis. 32 UCL, april 30,2010 GFR patterns for 4 different patients 33 UCL, april 30,2010 The distribution of the gfr measurements at different time points 34 UCL, april 30,2010 Observed means over time 35 UCL, april 30,2010 Question: how to model this type of data? • The standard approach is to use a linear mixed model for repeated measurements • Remove all measurements where patients are anuric (two subsequent GFR=0 measurements) • Underlying idea: model the GFR trajectory before anuria 36 UCL, april 30,2010 What are you doing here? • You do not model the mean GFR over time. This model implicitly imputes values for those patients with GFR =0 • Observations are left out if GFR=0. Not missing not at random 37 UCL, april 30,2010 A different approach: Two-part mixed models (Tooze 2002) • A joint model for the probability that GFR >0, and for GFR|GFR>0 • Likelihood is rather complex • Estimation takes long • It does not use the fact that a patient is anuric after two GFR=0 measurements 38 UCL, april 30,2010 Problems • Correlation between repeated measures is rather simple (random intercept model) • If random effects are correlated: estimation takes very long However, ignoring the correlation could yield biased estimates( Su, Tom, Farewell (2009)) • It does not use the fact that a patient is anuric after two GFR=0 measurements 39 UCL, april 30,2010 Approach 3. Transition models (Markov approach) • Use transition models (Diggle, Liang, Zeger (1994)) • Rewrite multivariate density f(yi1, yi2,...yiJ ) = f(yi1) f (yi2|yi1)...f(yiJ| yiJ-1,…yi1) (yij is the response of subject i at time j) . Markov assumption: • f(yij| yij-1,…yi1) = f(yij| yij-1) • Likelihood ∏ f ( yi1) ∏ f ( yij | yij−1) i i, j>1 • Maximize both parts separately 40 UCL, april 30,2010 Model for f(yij|yij-1) yj-1>0 yj>0 41 UCL, april 30,2010 Model for f(yij|yij-1) Pr(yj=0|yj-1>0) yj-1>0 42 yj>0 UCL, april 30,2010 Model for f(yij|yij-1) yj-1>0 yj>0 43 UCL, april 30,2010 Model for f(yij|yij-1) yj-1>0 yj>0 44 UCL, april 30,2010 Model for f(yij|yij-1) yj-1>0 yj>0` 45 UCL, april 30,2010 Applying this model • Restructure data • Logistic model for probability that yij>0|yij-1 • logit(Pr(yij>0|yij-1))=θ1 Xij+ θ2 yij-1 • Linear regression model for g(yij-1|yij-1 ) (or for log(yij-1)) • g(yij-1|yij-1 )= N(β1 Xij+ β2 yij-1 , σe2) 46 UCL, april 30,2010 Application to our dataset • We distinguished first zero and second (subsequent) zero. • Two logistic models: • Probability on first zero (yij=0 , yij-1>0) • Probability on second zero (yij=0, yij-1=0) • After second zero, a patient is aneuric (all subsequent measures are 0) • Two intensities: • Yij given yij>0 and Rij-1>0 • Yij given yij>0 and yij-1>0 47 UCL, april 30,2010 Results, occurrence of first zero • logit(Pr(yij>0)) = 0.12 (se 0.16) + 0.30 (se 0.10)* therapy + 1.33 * log(GFRij-1) • PD patients have a exp(0.30) = 1.35 higher odds of GFR>0 given the previous GFR value No significant effects of: GFRij-1*therapy, GFRij-2, , visit number 48 UCL, april 30,2010 Occurrence of anuria (second zero) • No significant difference between two treatment groups 49 UCL, april 30,2010 Intensity when GFRij >0 and GFRij-1>0 • Markov assumption did not hold; Current GFR depends on previous two measurements • Significant interaction between visit and effect of previous GFR • Effects of GFRj-1 on visit 1 was clearly different • Visit 1 : log(GFRij) = 0.21 +0.05 (se 0.04) *therapy + 0.61 *log(GFRij-1 ) • Visit 2 to 7 :log(GFRij) = -0.19 + 0.04 (se 0.02) * therapy + 0.73 * log(GFRij-1 ) + 0.18 log(GFRij-2 ). • PD patients have on average exp(0.04) = 1.04 times higher GFR, given previous GFR responses (p=0.04) 50 UCL, april 30,2010 Conclusion from transition model • PD patients have a higher occurrence probability and a have slightly higher GFR, after correction for previous GFR measurements • PD patients have a higher mean intensity, given the previous GFR measurement 51 UCL, april 30,2010 Marginal means • E[Yj] can be estimated • Analytically, using recursively that E[Yj] =E [ E[Yj|Yj-1] ] • By simulation from joint distribution f(yi1) f (yi2|yi1)...f(yiJ| yiJ-1,…yi1) 52 UCL, april 30,2010 Marginal means 53 UCL, april 30,2010 Results of collaboration • Transition models using a two part mixture distribution can be used to model data with zero’s • Advantages of this approach: • Can be performed with standard software • Yields interpretable parameters • Corrects for baseline differences • No paper written yet • Many thanks to Friedo Dekker, Diana Grootendorst 54 UCL, april 30,2010 Conclusions • Statisticians and medical doctors can talk together, work together, do research together !! • Congratulations and many good wishes for the SMCS!! 55 UCL, april 30,2010