UK11_wallace

advertisement
Sample size and power estimation when
covariates are measured with error
Michael Wallace
London School of Hygiene and Tropical Medicine
Outline
1. Measurement error – what is it and what
problems can it cause?
2. What can we do about it?
3. The problem of power – introducing
autopower
Measurement error – a crash course

Often impossible to measure covariates accurately:
e.g. Dietary intake, blood pressure, weight



Instead, we have error-prone observations
How these relate to the underlying true values is our
'measurement error model'
Common model: ”classical” error:


Observed = True + Measurement Error
...but other models are available.
Why does it matter?

Simple linear regression:

Classical measurement error:
Why does it matter?

Simple linear regression:

Classical measurement error:

Regress Y on W to obtain an
estimate of
where
Why does it matter?

Simple linear regression:

Classical measurement error:

Regress Y on W to obtain an
estimate of
where
What can we do about it?




Need additional data to tell us about the
measurement error

Validation (accurate measurements on some)

Replication (multiple measurements)
Validation 'best', but replication more practical
Huge variety of 'correction methods' available to
try and remove bias induced by measurement
error.
Two that are already available in Stata:

Regression calibration (Stata command: rcal)
Conditional Score




If there is measurement error, then solving estimating
equations as normal will give inconsistent effect
estimates.
Conditional score solves modified estimating
equations to avoid this.
Unlike regression calibration and simulation
extrapolation, it produces consistent effect estimates
for a range of models, including logistic regression.
We have produced cscore for Stata to implement this
method in the case of logistic regression.
The problem of power


Measurement error hits us with a 'double whammy':

Bias

Wider confidence intervals
Bias will often remain a problem even if a correction
method is used.

Sample size calculations generally impossible.

Simulation studies only recourse.

autopower aims to remove the leg work.
autopower in brief

autopower simulates datasets that suffer from
measurement error.

Then sees how methods perform on these datasets.

Variety of methods available:


'naïve', rcal, simex, cscore
Assumes:

Univariate logistic regression

Subjects are measured either once or twice
Example: specific design

“How well should regression calibration perform
on this dataset?”
Example: estimating sample size

“What sample size do I need to achieve 80% power?”
Example: cost minimization




“Obtaining second observations is expensive, can I save
money by considering a design where not everyone is
measured twice?”
User specifies how much more it costs to measure a
subject twice rather than once.
autopower then searches the 'r1-r2' space:

r1 = subjects measured once

r2 = subjects measured twice
Various tricks for practical speed.
References




General overview: Carroll, R. J., D. Ruppert, L. K. Stefanski, and C.
M. Crainiceanu. 2006. Measurement Error in Nonlinear Models: A
Modern Perspective, Second Edition. Chapman & Hall/CRC
Regression calibration: Stefanski, L. A., and R. J. Carroll. 1987.
Conditional scores and optimal scores in generalized linear
measurement error models. Biometrika 74: 703–716.
Simulation extrapolation: Cook J R and Stefanski L A. Simulationextrapolation estimation in parametric measurement error models.
Journal of the American Statistical Association, 89:1314–1328, 1994.
Conditional score: Carroll, R. J., and L. A. Stefanski. 1990.
Approximate quasilikelihood estimation in models with surrogate
predictors. Journal of the American Statistical Assocation 85: 652–63.
Download