Variances are Not Always Nuisance Parameters Raymond J. Carroll Department of Statistics Texas A&M University http://stat.tamu.edu/~carroll 1 Dedicated to the Memory of Shanti S. Gupta • • Head of the Purdue Statistics Department for 20 years I was student #11 (1974) 2 Palo Duro Canyon, the Grand Canyon of Texas West Texas East Texas Wichita Falls, my hometown Guadalupe Mountains National Park College Station, home of Texas A&M University I-45 Big Bend National Park I-35 3 Overview Main point: there are problems/methods where variance structure essentially determines the answer Assay Validation Measurement error Other Examples mentioned briefly Logistic mixed models Quality technology DNA Microarrays for gene expression (Fisher!) 4 Variance Structure My Definition: Encompasses Systematic dependence of variability on known factors Random effects: their inclusion, exclusion or dependence on covariates My point: Variance structure can be important in itself Variance structure can have a major impact on downstream analyses 5 Collaborators on This Talk Statistics: David Ruppert David Ruppert also works with me outside the office Assays: Marie Davidian, Devan Devanarayan, Wendell Smith Measurement error: Larry Freedman, Victor Kipnis, Len Stefanski 6 Acknowledgments Matt Wand Naisyin Wang Peter Hall Mitchell Gail Alan Welsh Xihong Lin (who nominated me!) 7 Assay Validation • • • • Immunoassays: used to estimate concentrations in plasma samples from outcomes • Intensities • Counts Calibration problem: predict X from Y My Goal: to show you that cavalier handling of variances leads to wrong answers in real life David Finney: anticipates just this point 8 Assay Validation • • “Here the weighted analysis has also disclosed evidence of invalidity” “This needs to be known and ought not to be concealed by imperfect analysis” David Finney is the author of a classic text 9 Assay Validation • • Assay validation is an important facet of the drug development process One goal: find a working range of concentrations for which the assay has • • Wendell Smith motivated this work small bias (< 30% say) small coefficient of variation (< 20% say) 10 Assay Validation The Data These data are from a paper by M. O'Connell, B. Belanger and P. Haaland Journal of Chemometrics and Intelligent Laboratory Systems (1993) 11 Assay Validation • Main trends: Unweighted and Weighted Fits any method will do • Typical to fit a 4 parameter logistic model E(Y|X)=f(x,β) =β2 + (β1 -β2 ) 1+ X/β3 β4 12 Assay Validation: Unweighted Prediction Intervals 13 Assay Validation • • The data exhibit heteroscedasticity Typical to model variance as a power of the mean var(Y|X) E(Y|X) • David Rodbard (L) and Peter Munson (R) in 1978 proposed the 4-parameter logistic for assays Most often: 1 2 14 Assay Validation: Weighted Prediction Intervals Marie Davidian and David Giltinan have written extensively on this topic 15 Assay Validation: Working Range • • • • • Goal: predict X from observed Y Working Range (WR): the range where the cv < 20% Validation experiments (accuracy and precision): done on working range If WR is shifted away from small concentrations: never validate assay for those small concentrations No success, even if you try (see %-recovery plots) 16 Assay Validation: Variances Matter No weighting: LQL=1,057: UQL=9,505 0.1 0.1 C0.V2 C0.V2 0.3 0.3 0.4 0.4 Weighting, LQL=84, UQL=3,866 UQL LQL UQL 0. 0. LQL 50 100 500 1000 5000 10000 50000 1000 5000 10000 Conc entrati500 on Conc entra 17 O D 0.2 0.4 0.6 0.8 1.0 Working Ranges for Different Variance Functions Unweighted Weighted LQL = 10 84 UQL = 3,866 1000 100 1000 Conc ent r a Assay Validation: % Recovery • Goal: predict X from observed Y • Measure: X̂ / X = % recovered • Want confidence interval to be within 30% of actual concentration Devan Devanarayan, my statistical grandson, organized this example 19 Assay Validation: % Recovery • Note Acceptable ranges (IL-10 Validation Experiment) depend on accounting for variability Unweighted 140 130 120 110 100 90 80 70 60 % Recovery with 90% C.I. % Recovery % Recovery % Recovery with 90% C.I. Weighted 10 100 True Concentration 1000 140 130 120 110 100 90 80 70 60 10 100 1000 True Concentration 20 Assay Validation: Summary • Accounting for changing variability is pointless if the interest is merely in fitting the curve • In other contexts, standard errors actually matter • • • (power is important after all!) The gains in precision from a weighted analysis can change conclusions about statistical significance Accounting for changing variability is crucial if you want to solve the problem Concentrations for which the assay can be used depend strongly on a model for variability 21 The Structure of Measurement Error Measurement error has an enormous literature See Wayne Fuller’s 1987 text Hundreds of papers on the structure for covariates W=X+e Here X = “truth”, W = “observed” X is a latent variable 22 The Structure of Measurement Error For most regressions, if X is the only predictor W=X+e then biased parameter estimates when error is ignored power is lost (my focus today) 23 The Structure of Measurement Error My point: the simple measurement error model is too simple W=X+e A different variance structure suggests different conclusions 24 The Structure of Measurement Error Nutritional epidemiology: dietary intake measured via food frequency questionnaires (FFQ) Ross Prentice has written extensively on this topic Prospective studies: none have found a statistically significant fat intake effect on breast cancer Controversy in post-hoc power calculations: what is the power to detect such an effect? 25 Dietary Intake Data The essential quantity controlling power is the attenuation Let Q = FFQ, , X = “long-term dietary intake” Attenuation l= % of variation that is due to true intake 100% is good 0% is bad slope of regression of X on Q Sample size needed for fixed power can be thought of as proportional to l-2 26 Post hoc Power Calculation FFQ: known to be biased F: “reference instrument” thought to be unbiased (but much more expensive than Q) Larry Freedman has done fundamental work on dietary instrument validation F=X+e F = 24-hour recall or some type of diary Then l = slope of regression of F on Q 27 Post hoc Power Calculation If “reference instrument” is unbiased then Can estimate attenuation Can estimate mean of X Can estimate variance of X Walt Willett: a leader in nutritional epidemiology Can estimate power in the study at hand Many, many papers assume that the reference instrument is unbiased in this way Plenty of power 28 Dietary Intake Data The attenuation l ~= 0.30 for absolute amounts, ~= 0.50 for food composition Remember, attenuation is the % of variability that is not noise All based on the validity of the reference instrument F=X+e Pearson and Cochran now weigh in 29 The Structure of Measurement Error 1902: “On the mathematical theory of errors of judgment” Karl Pearson Interested in nature of errors of measurement when the quantity is fixed and definite, while the measuring instrument is a human being Individuals bisected lines of unequal length freehand, errors recorded 30 The Structure of Measurement Error • FFQ’s are also selfreport Karl Pearson • Findings have relevance today • Individuals were biased • Biases varied from individual to individual 31 Measurement Error Structure • Classic 1968 Technometrics paper William G. Cochran • Used Pearson’s paper • Suggested an error model that had systematic and random biases • This structure seems to fit dietary self-report instruments 32 Measurement Error Structure: Cochran Fij = aF+ bF Xij +rFi+ eFij rFi = Normal(0,sFr2) • We call rFi the “person-specific bias” • We call bF the “group-level bias” • Similarly, for FFQ, Qij = aQ+ bQ Xij +rQi+ eQij rQi = Normal(0,sQr2) 33 Measurement Error Structure The horror: the model is unidentified Sensitivity analyses suggest potential that measurement error causes much greater loss of power than previously suggested Needed: Unbiased measures of intake Biomarkers Protein via urinary nitrogen Calories via doubly-labeled water 34 Biomarker Data Protein: Calories and Protein: Available from a number of European studies Victor Kipnis was the driving force behind OPEN Available from NCI’s OPEN study Results are stunning 35 Biomarker Data: Attenuations Protein (and Calories and Protein Density for OPEN) 0.6 0.5 0.4 0.3 0.2 Biomarker Standard 0.1 0 OPEN-%P OPEN-C OPEN-P UK: Diary UK: WFR EPIC#1 EPIC#2 EPIC#3 EPIC#4 EPIC#5 36 Biomarker Data: Sample Size Inflation Protein (and Calories and Protein Density for OPEN) 12 10 8 6 4 Sample Size 2 0 OPEN-%P OPEN-C OPEN-P UK: Diary UK: WFR EPIC#1 EPIC#2 EPIC#3 EPIC#4 EPIC#5 37 Measurement Error Structure The variance structure of the FFQ and other self-report instruments appears to have individual-level biases Pearson and Cochran model Ignoring this: Overestimation: of power Underestimation: of sample size It may not be possible to understand the effect of total intakes Food composition more hopeful 38 Other Examples of Variance Structure Nonlinear and generalized linear mixed models (NLMIX and GLIMMIX) Quality Technology: Robust parameter design Microarrays 39 Nonlinear Mixed Models Mixed models have random effects Typical to assume normality Robustness to normality has been a major concern Many now conclude that this is not that major an issue There are exceptions!! 40 Logistic Mixed Models Heagerty & Kurland (2001) Patrick Heagerty “Estimated regression coefficients for cluster-level covariates Can be highly sensitive to assumptions about whether the variance of a random intercept depends on a cluster-level covariate”, i.e., heteroscedastic random effects or variance structure 41 Logistic Mixed Models Heagerty (Biometrics, 1999, Statistical Science 2000, Biometrika 2001) See also Zeger, Liang & Albert (1988), Neuhaus & Kalbfleisch (1991) and Breslow & Clayton (1993) Gender is a cluster-level variable Allowing cluster-level variability to depend on gender results in a large change in the estimated gender regression coefficient and p-value. Marginal contrasts can be derived and are less sensitive In the presence of variance structure, regression coefficients alone cannot be interpreted marginally 42 Robust Parameter Design “The Taguchi Method” From Wu and Hamada: “aims to reduce the variation of a system by choosing the setting of control factors to make it less sensitive to noise variation” Jeff Wu and Mike Hamada’s text is an excellent introduction Set target, optimize variance 43 Robust Parameter Design Modeling variability is an intrinsic part of the method Maximizing the signal to noise ratio (Taguchi) Modeling location and dispersion separately Modeling location and then minimizing the transmitted variance Ideas are used in optimizing assays, among many other problems 44 Robust Parameter Design: Microarrays for Gene Expression cDNA and oligomicroarrays have attracted immense interest R. A. Fisher Multiple steps (sample preparation, imaging, etc.) affect the quality of the results Processes could clearly benefit from robust parameter design (Kerr & Churchill) 45 Robust Parameter Design: Microarrays Experiment (oligo-arrays): 28 rats given different diets (corn oil, fish oil and olive oil enhanced) 15 rats have duplicated arrays How much of the variability in gene expression is due to the array? We have consistently found that 2/3 of the variability is noise within animal rather than between animal 46 Intraclass Correlations r in the Nutrition Data Set Simulated ICC for 8,000 independent genes with common r = 0.35 Estimated ICC for 8,000 genes from mixed models Clearly, more control of noise via robust parameter design has the potential to impact power for analyses 47 Conclusion My Definition: Variance Structure encompasses Systematic dependence of variability on known factors Random effects: their inclusion or exclusion My point: Variance structure can be important in itself Variance structure can have a major impact on downstream analyses 48 And Finally At the Falls on the Wichita River, West Texas I’m really happy to be on the faculty at A&M (and to be the Fisher Lecturer!) 49