Bayesian Methods for Density and Regression Deconvolution Raymond J. Carroll Texas A&M University and University of Technology Sydney http://stat.tamu.edu/~carroll Co-Authors Bani Mallick Abhra Sarkar John Staudenmayer Debdeep Pati Longtime Collaborators in Deconvolution Peter Hall Aurore Delaigle Len Stefanski Overview • My main application interest is in nutrition • Nutritional intake is necessarily multivariate • Smart nutritionists have recognized that in cancers, it is the patterns of nutrition that matter, not single causes such as saturated fat • To affect public health practice, nutritionists have developed scores that characterize how well one eats • Healthy Eating Index, Dash score, Mediterranean score, etc. Overview • One day of French fries/Chips will not kill you • It is your long-term average pattern that is important • In population public health science, long term averages cannot be measured • The best you can get is some version of selfreport, e.g., multiple 24 hour recalls • This fact has been the driver behind much of measurement error modeling, especially including density deconvolution Overview • Analysis is complicated by the fact that on a given day, people will not consume certain foods, e.g., whole grains, legumes, etc. • My long term goal has been to develop methods that take into account measurement error, the multivariate nature of nutrition, and excess zeros. Why it Matters • What % of kids U.S. have alarmingly bad diets? • Ignore measurement error, 28% • Account for it, 8% • What are the relative rates of colon cancer for those with a HEI score of 70 versus those with 40? • Ignore measurement error, decrease 10% • Account for it, decrease 35% Overview • We have perfectly serviceable and practical methods that involve transformations, random effects, latent variables and measurement errors • The methods are widely and internationally used in nutritional surveillance and nutritional epidemiology • For the multivariate case, computation is “Bayesian” • Eventually though, anything random is assumed to be Gaussian • Can we not do better? Background • In the classical measurement error – deconvolution problem, there is a variable, X, that is not observable • Instead, a proxy for it, W, is observed • In the density problem, the goal is to estimate the density of X using only observations on W • Also, in population science contexts, the distribution of X given covariates Z is also important (very small literature on this) Background • In the regression problem, there is a response Y • One goal is to estimate E(Y | X) • Another goal is to estimate the distribution of Y given X, because variances are not always nuisance parameters Background • In the classic problem, W = X + U, with U independent on X. • Deconvoluting kernel methods that result in consistent estimation of the density of X were discovered in 1988 (Stefanski, Hall, Fan and ) • They are kernel density estimates with kernel function K decon (x) Background • In the classic problem, W = X + U, with U independent of X. • The deconvoluting kernel is a corrected score for a ordinary kernel density function, with the property that for a bandwidth h, E K decon (W-x 0 )/h|X =K (X-x 0 )/h • Lots of results on rates of convergence, etc. Background • There is an R package called decon • However, a paper to appear by A. Delaigle discusses problems with the package’s bandwidth selectors • Her web site has Matlab code for cases that the measurement error is independent of X, including bandwidth selection Problem Considered Here • Here is a general class of models. Here are W and X Wij =X i +U ij (X i ) E U ij (X i ) | X i 0 var U ij (X i ) | X i u2 (X i ) • The W’s are independent given X Background • There is a substantial econometric literature on technical conditions for identification in many different contexts (S. Schennach, X. Chen, Y. Hu) • The problem I have stated is known to be nonparametrically identified if there are 3 replicates (and certain technical completeness assumptions hold) Problem Considered Here • Here is a general class of models, First, Y Yi =g(X i )+ε i (X i ) E ε i (X i ) | X i 0 var ε i (X i ) | X i (X i ) 2 ε • The classical heteroscedastic model where the • variance is important Identified if there are 2 replicate W’s Background • The econometric literature invariably uses sieves with orthogonal basis functions • The theory follows X. Shen’s 1997 paper Background • In practice, as with non-penalized splines, 5-7 basis functions are used to represent all densities and functions • Constraints (such as being positive and integrating to 1 for densities) are often ignored • In the problem I eventually want to solve, the dimension of the two densities = 19 (latent stuff all around • Maybe use multivariate Hermite series? Problem Considered Here • There is no deconvoluting kernel method that does density or regression deconvolution in the context that the distribution of the measurement error depends on X Problem Considered Here • It seems to me that there are two ways to handle this problem in general • Sieves be an econometrician • Bayesian with flexible models • Our methodology is explicitly Bayesian, but borrows basis function ideas from the sieve approach Model Formulation • We borrow from Hu and Schennach’s example and also Staudenmayer, Ruppert and Buonaccorsi Wij = X i 1/2 u + s (X i )U ij 1/2 ε Yi = g(X i )+ s (X i )ε i • Here, U is assumed independent of X • Also, e is independent of X Model Formulation • Our model is Wij = X i + s1/2 u (X i )U ij Yi = g(X i )+ s1/2 ε (X i )ε i • Like previous authors, we model s ε (X i ) and s u (X i ) as B-splines with positive coefficients • We model g(Xi ) as B-spline • As frequentists, we could model the densities of X, U, and e by sieves, and appeal to Hu and Schennach for theory • We have not investigated this Model Formulation • Our model is Wij = X i + s1/2 u (X i )U ij Yi = g(X i )+ s1/2 ε (X i )ε i • As Bayesians, we have modeled the densities of X, U, and e by DPMM • We have found that mixtures of normals, with an unknown number of components, is much faster, just as effective, and very stable numerically Model Formulation • We found that by fixing the number of components to a largish number works best • The method concentrates on a lower number of components (Rousseau and Mengersen found this in a non-measurement error context) • There are lots of issues involved: (a) starting values; (b) hyper-parameters; (c) MH candidates; (d) constraints (e.g., zero means), (e) data standardization, etc. Model Formulation • Here is a simulation example of density deconvolution and homoscedasticity with a mixture of normals for X and a Laplace for U • The settings come from a paper not by us • There are 3 replicates, so the density of U is also estimated by our method (we let DKDE know the truth) • I ran our R code as is, with no fine tuning Model Formulation Model Formulation • Here is another example • Y = sodium intake as measured by a food frequency questionnaire (known to be biased) • W = same thing, but measured by a 24 hour recall (known to be almost unbiased) • We have R code for this Model Formulation The dashed line is the Y=X line, indicating the bias of the FFQ Multivariate Deconvolution • There are also multivariate problems of density deconvolution • We have found 4 papers about this • 3 deconvoluting kernel papers, all assume the density of the measurement errors is known • 1 of those papers has a bandwidth selector • Bovy et al (2011, AoAS) model X as a mixture of normals, and assume U is independent of X and Gaussian with known covariance matrix. They use an EM algorithm. Multivariate Deconvolution • We have generalized our 1-dimension deconvolution approach as Wijk = X ij + s1/2 uj (X ij )U ijk • Again, X is a mixture of multivariate normals, as is U • However, standard multivariate inverse Wishart computations fail miserably Multivariate Deconvolution • We have generalized our 1-dimension deconvolution approach as Wijk = X ij + s1/2 uj (X ij )U ijk • We use a factor analytic representation of the component specific covariance matrices with sparsity inducing shrinkage priors on the factor loading matrices (A. Bhattacharya and D. Dunson) • This is crucial in flexibly lowering the dimension of the covariance matrices Multivariate Deconvolution Multivariate inverse Wisharts on top, Latent factor model on bottom Blue = MIW, green = MLFA. Variables are (a) carbs; (b) fiber; (c) protein and (d) potassium Conclusion • I still want to get to my problem of multiple nutrients/foods, excess zeros and measurement error • Dimension reduction and flexible models seem a practical way to go • Final point: for health risk estimation and nutritional surveillance, only a 1-dimensional summary is needed, hence better rates of convergence