pdf slides

Bayesian Methods for Density and Regression Deconvolution Raymond J. Carroll Texas A&M University and University of Technology Sydney http://stat.tamu.edu/~carroll Co-Authors  Bani Mallick Abhra Sarkar   John Staudenmayer Debdeep Pati  Longtime Collaborators in Deconvolution Peter Hall Aurore Delaigle Len Stefanski Overview • My main application interest is in nutrition • Nutritional intake is necessarily multivariate • Smart nutritionists have recognized that in cancers, it is the patterns of nutrition that matter, not single causes such as saturated fat • To affect public health practice, nutritionists have developed scores that characterize how well one eats • Healthy Eating Index, Dash score, Mediterranean score, etc. Overview • One day of French fries/Chips will not kill you • It is your long-term average pattern that is important • In population public health science, long term averages cannot be measured • The best you can get is some version of selfreport, e.g., multiple 24 hour recalls • This fact has been the driver behind much of measurement error modeling, especially including density deconvolution Overview • Analysis is complicated by the fact that on a given day, people will not consume certain foods, e.g., whole grains, legumes, etc. • My long term goal has been to develop methods that take into account measurement error, the multivariate nature of nutrition, and excess zeros. Why it Matters • What % of kids U.S. have alarmingly bad diets? • Ignore measurement error, 28% • Account for it, 8% • What are the relative rates of colon cancer for those with a HEI score of 70 versus those with 40? • Ignore measurement error, decrease 10% • Account for it, decrease 35% Overview • We have perfectly serviceable and practical methods that involve transformations, random effects, latent variables and measurement errors • The methods are widely and internationally used in nutritional surveillance and nutritional epidemiology • For the multivariate case, computation is “Bayesian” • Eventually though, anything random is assumed to be Gaussian • Can we not do better? Background • In the classical measurement error – deconvolution problem, there is a variable, X, that is not observable • Instead, a proxy for it, W, is observed • In the density problem, the goal is to estimate the density of X using only observations on W • Also, in population science contexts, the distribution of X given covariates Z is also important (very small literature on this) Background • In the regression problem, there is a response Y • One goal is to estimate E(Y | X) • Another goal is to estimate the distribution of Y given X, because variances are not always nuisance parameters Background • In the classic problem, W = X + U, with U independent on X. • Deconvoluting kernel methods that result in consistent estimation of the density of X were discovered in 1988 (Stefanski, Hall, Fan and  ) • They are kernel density estimates with kernel function K decon (x) Background • In the classic problem, W = X + U, with U independent of X. • The deconvoluting kernel is a corrected score for a ordinary kernel density function, with the property that for a bandwidth h, E  K decon (W-x 0 )/h|X  =K (X-x 0 )/h • Lots of results on rates of convergence, etc. Background • There is an R package called decon • However, a paper to appear by A. Delaigle discusses problems with the package’s bandwidth selectors • Her web site has Matlab code for cases that the measurement error is independent of X, including bandwidth selection Problem Considered Here • Here is a general class of models. Here are W and X Wij =X i +U ij (X i ) E U ij (X i ) | X i   0 var U ij (X i ) | X i    u2 (X i ) • The W’s are independent given X Background • There is a substantial econometric literature on technical conditions for identification in many different contexts (S. Schennach, X. Chen, Y. Hu) • The problem I have stated is known to be nonparametrically identified if there are 3 replicates (and certain technical completeness assumptions hold) Problem Considered Here • Here is a general class of models, First, Y Yi =g(X i )+ε i (X i ) E ε i (X i ) | X i   0 var ε i (X i ) | X i    (X i ) 2 ε • The classical heteroscedastic model where the • variance is important Identified if there are 2 replicate W’s Background • The econometric literature invariably uses sieves with orthogonal basis functions • The theory follows X. Shen’s 1997 paper Background • In practice, as with non-penalized splines, 5-7 basis functions are used to represent all densities and functions • Constraints (such as being positive and integrating to 1 for densities) are often ignored • In the problem I eventually want to solve, the dimension of the two densities = 19 (latent stuff all around • Maybe use multivariate Hermite series? Problem Considered Here • There is no deconvoluting kernel method that does density or regression deconvolution in the context that the distribution of the measurement error depends on X Problem Considered Here • It seems to me that there are two ways to handle this problem in general • Sieves  be an econometrician • Bayesian with flexible models • Our methodology is explicitly Bayesian, but borrows basis function ideas from the sieve approach Model Formulation • We borrow from Hu and Schennach’s example and also Staudenmayer, Ruppert and Buonaccorsi Wij = X i 1/2 u + s (X i )U ij 1/2 ε Yi = g(X i )+ s (X i )ε i • Here, U is assumed independent of X • Also, e is independent of X Model Formulation • Our model is Wij = X i + s1/2 u (X i )U ij Yi = g(X i )+ s1/2 ε (X i )ε i • Like previous authors, we model s ε (X i ) and s u (X i ) as B-splines with positive coefficients • We model g(Xi ) as B-spline • As frequentists, we could model the densities of X, U, and e by sieves, and appeal to Hu and Schennach for theory • We have not investigated this Model Formulation • Our model is Wij = X i + s1/2 u (X i )U ij Yi = g(X i )+ s1/2 ε (X i )ε i • As Bayesians, we have modeled the densities of X, U, and e by DPMM • We have found that mixtures of normals, with an unknown number of components, is much faster, just as effective, and very stable numerically Model Formulation • We found that by fixing the number of components to a largish number works best • The method concentrates on a lower number of components (Rousseau and Mengersen found this in a non-measurement error context) • There are lots of issues involved: (a) starting values; (b) hyper-parameters; (c) MH candidates; (d) constraints (e.g., zero means), (e) data standardization, etc. Model Formulation • Here is a simulation example of density deconvolution and homoscedasticity with a mixture of normals for X and a Laplace for U • The settings come from a paper not by us • There are 3 replicates, so the density of U is also estimated by our method (we let DKDE know the truth) • I ran our R code as is, with no fine tuning Model Formulation Model Formulation • Here is another example • Y = sodium intake as measured by a food frequency questionnaire (known to be biased) • W = same thing, but measured by a 24 hour recall (known to be almost unbiased) • We have R code for this Model Formulation The dashed line is the Y=X line, indicating the bias of the FFQ Multivariate Deconvolution • There are also multivariate problems of density deconvolution • We have found 4 papers about this • 3 deconvoluting kernel papers, all assume the density of the measurement errors is known • 1 of those papers has a bandwidth selector • Bovy et al (2011, AoAS) model X as a mixture of normals, and assume U is independent of X and Gaussian with known covariance matrix. They use an EM algorithm. Multivariate Deconvolution • We have generalized our 1-dimension deconvolution approach as Wijk = X ij + s1/2 uj (X ij )U ijk • Again, X is a mixture of multivariate normals, as is U • However, standard multivariate inverse Wishart computations fail miserably Multivariate Deconvolution • We have generalized our 1-dimension deconvolution approach as Wijk = X ij + s1/2 uj (X ij )U ijk • We use a factor analytic representation of the component specific covariance matrices with sparsity inducing shrinkage priors on the factor loading matrices (A. Bhattacharya and D. Dunson) • This is crucial in flexibly lowering the dimension of the covariance matrices Multivariate Deconvolution Multivariate inverse Wisharts on top, Latent factor model on bottom Blue = MIW, green = MLFA. Variables are (a) carbs; (b) fiber; (c) protein and (d) potassium Conclusion • I still want to get to my problem of multiple nutrients/foods, excess zeros and measurement error • Dimension reduction and flexible models seem a practical way to go • Final point: for health risk estimation and nutritional surveillance, only a 1-dimensional summary is needed, hence better rates of convergence

pdf slides

Related documents

Products

Support

pdf slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib