Can statisticians and medical doctors talk together, work together

advertisement
Can statisticians and medical doctors talk
together, work together, do research together?
Saskia le Cessie
Dept of Clinical Epidemiology/ dept of Medical Statistics and Bioinformatics,
Leiden University Medical Center
Outline
1. Can statisticians and medical doctors talk together?
Working in a academic medical centre, what to expect?
2. Can statisticians and medical doctors work together?
Practical problems and challenges.
3. Can statisticians and medical doctors do research together?
Examples of statistical research projects, resulting from
collaborations with medical researchers
2
UCL, april 30,2010
Can statisticians and medical doctors talk together?
End-stage renal
Failure and Creactive
protein
Medical doctor
???????
βˆ = (X' X) −1 X' Y,
( y i − ŷ i ) 2
σˆ =
n − p −1
Statistician
3
UCL, april 30,2010
What does a medical statistician (like me) do?
•
Medical researchers within our institute can contact us for
1. ad-hoc questions
2. help with the design and analysis of their study
3. collaboration in project teams of large studies
•
Teaching medical students and researchers (statistics/
conducting research)
•
Own methodological research
•
In our institute: no fees for internal consultation
4
UCL, april 30,2010
Medical research
Research questions,
aims of study
Study Design
Data collection
Statistical analysis
Interpretation,
conclusions
5
UCL, april 30,2010
Tasks of
statistician
Medical research
Research questions,
aims of study
Find out what research
questions are
Study Design
Data collection
Statistical analysis
Interpretation,
conclusions
6
UCL, april 30,2010
Tasks of
statistician
Medical research
Research questions,
aims of study
Ideally: statistician is
consulted before study is
carried out
Study Design
In practice: often data is
already collected. Find out
how.
Data collection
Statistical analysis
Interpretation,
conclusions
7
UCL, april 30,2010
Tasks of
statistician
Medical research
Research questions,
aims of study
Study Design
To advice which statistical
methods are needed to
answer the research
questions,
Data collection
or carry out the analyses
yourself
Statistical analysis
Interpretation,
conclusions
8
UCL, april 30,2010
Tasks of
statistician
Medical research
Research questions,
aims of study
• Explain the statistical
methods
Data collection
• Interpret the results and
implications in such a way
that the researcher
understands it
Statistical analysis
• Important, otherwise
methods will not be used
Study Design
Interpretation,
conclusions
9
UCL, april 30,2010
Requirements for statistical consultants
• Interest in collaboration with non statisticians
• Interest in medical research
• Be an all-round, up-to-date statistician
• Good communication skills
• Able to explain statistics to non statisticians
• Pragmatic
• Work under time pressure
Joiner gives list of 22 points (in Encyclopedia of Statistical
Sciences, see Kenett and Thyregod, Stat Neerl 2006)
10
UCL, april 30,2010
Can statisticians and medical doctors work together?
Yes we can !
Statistician
Medical doctor
11
UCL, april 30,2010
Medical doctors:
• Usually they think differently (less mathematical, more
intuitive)
• Some of them are smart, devoted to their research and
patients, and really want to understand things
• They usually value statisticians highly
12
UCL, april 30,2010
Problems and challenges of consulting
1. Determine the amount of time spend on a project
•
Depends on quality of researcher, research project, and
interest from a statistical point of view
•
Often a simple (reasonable correct) approach satisfies
•
Not so good researchers with not so well designed studies tend
to ask more frequently for help
•
Researcher does the simple analyses him/herself
•
Statistician advises and performs the complex analyses.
13
UCL, april 30,2010
Problems and challenges (2)
Asking advice after (part of) research is done.
Ideally: statistician is involved in whole project.
But often advice is asked
•
after data is collected. Design errors cannot be repaired.
•
after statistical analyses are done. Difficult to handle if
analyses are not correct
•
after paper is written. Some medical journals ask for written
approval of a statistician (We decided not to sign such forms)
•
after paper is rejected.
14
UCL, april 30,2010
Problems and challenges (3)
3. Conducting own statistical research
•
Topics naturally arise from practical problems
•
Advantage: work on relevant problems, with data available
•
Difficult to find time to do research (other things are usually
more urgent)
•
Difficult to focus on one topic
15
UCL, april 30,2010
Can statisticians and medical doctors do research together?
Collaboration in
Medical projects
Statistician
Medical doctor
16
UCL, april 30,2010
Can statisticians and medical doctors do research
together? (2)
Two examples of statistical research resulting from
collaboration with medical researchers
1. The problem of two control groups
2. The problem of modeling repeated kidney function
measurements
17
UCL, april 30,2010
Example 1: The problem of two control groups
• MEGA study: Large case-control study to examine risk factors
for thrombosis
• Cases: patients with first thrombosis (n= 3986)
• Two control groups:
• Control group 1: partners of cases (n= 2286)
• Control group 2: population based controls (randomly selected)
(n= 2612)
18
UCL, april 30,2010
Analysis of data: two case-control studies
• Compare cases to matched controls (partners)
• Matched analysis
• Conditional logistic regression
• Compare cases to population based controls
• Unconditional logistic regression
• Yields two sets of odds ratios, both using only part of the data
19
UCL, april 30,2010
Example: effect of smoking (ever/never) on thrombosis
log-OR β (se)
OR (95%CI)
Matched
analysis
2286 pairs
0.194 (0.069)
1.21 (1.06, 1.39)
Unmatched
3986 cases
0.320 (0.052)
1.38 (1.24, 1.52)
2612 controls
• Can we combine the results?
20
UCL, april 30,2010
Estimate overall effect, using all data
• Pool separate estimates (perform a kind of meta-analysis)
βˆ pooled = wβˆ 1 + (1 − w )βˆ 2
• Choice weights w such that se(βpooled) is minimal
w = (se22 − ρse1se2 ) /(se12 + se22 − 2ρse1se2 )
• Can be extended to situation where several estimates are
pooled simultaneously
  IkT  C11C12−1  Ik−1  IkT  C11C12−1  β$1
β$pooled=   I   C C   I   I   C C   $ 
  k  21 22  k  k  21 22  β2
21
UCL, april 30,2010
Problem
• Same cases are used twice.
• Therefore estimates β̂1 and β̂2 are correlated
• We need to know correlation
• to calculate optimal weights
• to obtain correct standard errors and confidence intervals
• Correlation can be estimated in two ways
• Sandwich estimator
• Bootstrap
22
UCL, april 30,2010
Sandwich estimator
• Robust estimator, often used in longitudinal data analysis
(GEE)
• Can be extended to estimate covariance:
−1 ˆ 
T  −1 ˆ
ˆ
ˆ
ˆ
ˆ
cov(β1,β2 ) = I1 (β1 )  ∑ U1i (β1 )U2i (β2 )  I2 (β2 ).
i∈M

• Inner part is summed over M: all matched cases
• U1i and U2i are the components of the score functions of the
two likelihood functions, I1 and I2, the Hessians
23
UCL, april 30,2010
Bootstrapped covariance estimator
• Bootstrap estimator
• Repeatedly sampling with replacement from the original dataset
• Each time estimate β1 and β2.
• Use correlation between estimates of bootstrap resamples.
• Note: resampling should be done using sampling scheme of
original dataset
• Sample from patients and obtain partner control if available
• Sample from random control group
24
UCL, april 30,2010
Simulation
• 100 matched pairs, 100 unmatched controls, 1000 simulation runs
• One covariate, true β = 1
• Observed correlation between two estimates was 0.501
Sandwich
Bootstrap
estimate
median
se observed se estimated
β
0.977
0.281
ρ
0.502
0.08
β
0.976
0.290
ρ
0.493
0.07
0.276
0.276
25
UCL, april 30,2010
Simulation
• 100 matched pairs, 100 unmatched controls, 1000 simulation runs
• One covariate, true β = 1
• Observed correlation between two estimates was 0.501
Sandwich
Bootstrap
estimate
median
se observed se estimated
β
0.977
0.281
ρ
0.502
0.08
β
0.976
0.290
ρ
0.493
0.07
26
0.276
0.276
UCL, april 30,2010
Simulation
• 100 matched pairs, 100 unmatched controls, 1000 simulation runs
• One covariate, true β = 1
• Observed correlation between two estimates was 0.501
Sandwich
Bootstrap
estimate
median
se observed se estimated
β
0.977
0.281
ρ
0.502
0.08
β
0.976
0.290
ρ
0.493
0.07
0.276
0.276
• Both methods perform equally well
27
UCL, april 30,2010
Back to the MEGA study, effect of smoking on thrombosis
n
OR (95%CI)
correlation
Matched analysis
4572
1.21 (1.06, 1.39)
Unmatched
6598
1.38 (1.24, 1.52)
Pooled, using
sandwich
8889
1.33 (1.21, 1.45)
0.31
Pooled using
bootstrap
8889
1.32 (1.21, 1.45)
0.28
• Test for equivalence of two odds ratios: p= 0.08
28
UCL, april 30,2010
Effect of smoking adjusted for confounders (sex, age, BMI,
pregnancy)
n
OR (95%CI)
correlation
Matched analysis
4572
1.27(1.10, 1.46)
Unmatched
6598
1.35(1.22, 1.50)
Pooled, using
sandwich
8889
1.33(1.21, 1.46)
0.30
Pooled using
bootstrap
8889
1.33(1.21, 1.46)
0.28
• Test for equivalence of two odds ratios: p= 0.39
29
UCL, april 30,2010
Result of collaboration
• Developed an easy method to combine estimates from several
case-control analyses
• Le Cessie et al. Combining Matched and Unmatched Control
Groups in Case-Control Studies. Am J Epidemiol
2008;168:1204–1210.
• Pomp et al. Experience with multiple control groups in a large
population-based case-control study . Submitted.
• Thanks to Nico Nagelkerke, Frits R. Rosendaal, Karlijn J. van
Stralen, Elisabeth R. Pomp, Hans C. van Houwelingen
30
UCL, april 30,2010
Example 2: Analyzing data of a Dutch follow-up study of renal
patients
• 1526 end-stage renal failure patients who start dialysis
• Two different forms of dialysis : hemodialysis (HD) and
peritoneal dialysis(PD)
• Outcome: renal function (Glomerular filtration rate, GFR)
• Measurements at start dialysis, 3 months, 6 months and
thereafter every 6 months (Here follow-up of 3 years)
• Goal: model the pattern of GFR over time
31
UCL, april 30,2010
Some problems
• The kidneys can stop working completely. Then GFR is per
definition equal to 0. This is called anuria.
• Sometimes there are incidental GFR=0 measurements
• Patients with PD have on average a larger GFR value at start
of dialysis.
32
UCL, april 30,2010
GFR patterns for 4 different patients
33
UCL, april 30,2010
The distribution of the gfr measurements at different time points
34
UCL, april 30,2010
Observed means over time
35
UCL, april 30,2010
Question: how to model this type of data?
• The standard approach is to use a linear mixed model for
repeated measurements
• Remove all measurements where patients are anuric (two
subsequent GFR=0 measurements)
• Underlying idea: model the GFR trajectory before anuria
36
UCL, april 30,2010
What are you doing here?
• You do not model the mean GFR over time. This model implicitly
imputes values for those patients with GFR =0
• Observations are left out if GFR=0. Not missing not at random
37
UCL, april 30,2010
A different approach: Two-part mixed models (Tooze 2002)
• A joint model for the probability that GFR >0, and for
GFR|GFR>0
• Likelihood is rather complex
• Estimation takes long
• It does not use the fact that a patient is anuric after two
GFR=0 measurements
38
UCL, april 30,2010
Problems
• Correlation between repeated measures is rather simple
(random intercept model)
• If random effects are correlated: estimation takes very long
However, ignoring the correlation could yield biased
estimates( Su, Tom, Farewell (2009))
• It does not use the fact that a patient is anuric after two
GFR=0 measurements
39
UCL, april 30,2010
Approach 3. Transition models (Markov approach)
• Use transition models (Diggle, Liang, Zeger (1994))
• Rewrite multivariate density
f(yi1, yi2,...yiJ ) = f(yi1) f (yi2|yi1)...f(yiJ| yiJ-1,…yi1)
(yij is the response of subject i at time j) .
Markov assumption:
• f(yij| yij-1,…yi1) = f(yij| yij-1)
• Likelihood
∏ f ( yi1) ∏ f ( yij | yij−1)
i
i, j>1
• Maximize both parts separately
40
UCL, april 30,2010
Model for f(yij|yij-1)
yj-1>0
yj>0
41
UCL, april 30,2010
Model for f(yij|yij-1)
Pr(yj=0|yj-1>0)
yj-1>0
42
yj>0
UCL, april 30,2010
Model for f(yij|yij-1)
yj-1>0
yj>0
43
UCL, april 30,2010
Model for f(yij|yij-1)
yj-1>0
yj>0
44
UCL, april 30,2010
Model for f(yij|yij-1)
yj-1>0
yj>0`
45
UCL, april 30,2010
Applying this model
• Restructure data
• Logistic model for probability that yij>0|yij-1
• logit(Pr(yij>0|yij-1))=θ1 Xij+ θ2 yij-1
• Linear regression model for g(yij-1|yij-1 ) (or for log(yij-1))
• g(yij-1|yij-1 )= N(β1 Xij+ β2 yij-1 , σe2)
46
UCL, april 30,2010
Application to our dataset
• We distinguished first zero and second (subsequent) zero.
• Two logistic models:
• Probability on first zero (yij=0 , yij-1>0)
• Probability on second zero (yij=0, yij-1=0)
• After second zero, a patient is aneuric (all subsequent
measures are 0)
• Two intensities:
• Yij given yij>0 and Rij-1>0
• Yij given yij>0 and yij-1>0
47
UCL, april 30,2010
Results, occurrence of first zero
• logit(Pr(yij>0)) = 0.12 (se 0.16) + 0.30 (se 0.10)* therapy +
1.33 * log(GFRij-1)
• PD patients have a exp(0.30) = 1.35 higher odds of GFR>0
given the previous GFR value
No significant effects of: GFRij-1*therapy, GFRij-2, , visit number
48
UCL, april 30,2010
Occurrence of anuria (second zero)
• No significant difference between two treatment groups
49
UCL, april 30,2010
Intensity when GFRij >0 and GFRij-1>0
• Markov assumption did not hold; Current GFR depends on
previous two measurements
• Significant interaction between visit and effect of previous GFR
• Effects of GFRj-1 on visit 1 was clearly different
• Visit 1 : log(GFRij) = 0.21 +0.05 (se 0.04) *therapy + 0.61 *log(GFRij-1 )
• Visit 2 to 7 :log(GFRij) = -0.19 + 0.04 (se 0.02) * therapy +
0.73 * log(GFRij-1 ) + 0.18 log(GFRij-2 ).
• PD patients have on average exp(0.04) = 1.04 times higher
GFR, given previous GFR responses (p=0.04)
50
UCL, april 30,2010
Conclusion from transition model
• PD patients have a higher occurrence probability and a have
slightly higher GFR, after correction for previous GFR
measurements
• PD patients have a higher mean intensity, given the previous
GFR measurement
51
UCL, april 30,2010
Marginal means
• E[Yj] can be estimated
• Analytically, using recursively that E[Yj] =E [ E[Yj|Yj-1] ]
• By simulation from joint distribution
f(yi1) f (yi2|yi1)...f(yiJ| yiJ-1,…yi1)
52
UCL, april 30,2010
Marginal means
53
UCL, april 30,2010
Results of collaboration
• Transition models using a two part mixture distribution can be
used to model data with zero’s
• Advantages of this approach:
• Can be performed with standard software
• Yields interpretable parameters
• Corrects for baseline differences
• No paper written yet
• Many thanks to Friedo Dekker, Diana Grootendorst
54
UCL, april 30,2010
Conclusions
• Statisticians and medical doctors can talk together, work
together, do research together !!
• Congratulations and many good wishes for the SMCS!!
55
UCL, april 30,2010
Download