Imaging Genetics Multivariate Approaches: Joint Modelling Giovanni Montana Statistics Section Department of Mathematics Imperial College London, UK Email: gmontana@imperial.ac.uk 6 June, 2010 Imaging genetics Basic data structure: p genotypes and q phenotypes observed on n samples Subject i i = 1, . . . , n (xi1 , xi2 , . . . , xip ) (yi1 , yi2 , . . . , yiq ) A classification of selected imaging genetics studies I Small q, small p I I Small q, large p I I Potkin et al. (2009): q = 1 mean BOLD signal; p = 317, 503 SNPs Large q, small p I I Joyner et al. (2009): q = 4 brain size measures, p = 11 SNPs Filippini et al. (2009): q = 29, 812 voxels; p = 1 SNP Large q, large p I Stein et al. (2010): q = 31, 622 voxels; p = 448, 293 SNPs Mass Univariate Linear Models (MULM) I A commonly used approach is to model one genotype and one phenotype at a time: 1. Fit all univariate linear regression models yj = βjk xk + j = 1, . . . , q, k = 1, . . . , p 2. Search for a subset of p 0 significant genotypes with indices {k1 , k2 , . . . , kp0 } ⊂ {1, 2, . . . , p} with p 0 p by testing all (p × q) null hypotheses of no association H0 : βjk 6= 0 3. Correct for multiple testing – control experiment-wise FWER or FDR I Possible dependence patterns among genotypes and phenotypes are ignored at the modelling stage Multivariate predictive modelling I Why modelling multiple genotypes and phenotypes? I I I A weak effect may be more apparent when other causal effects are already accounted for A false signal may be weakened by inclusion in the model of a stronger signal from a true causal association A weak effect may be more apparent if multiple phenotypes are affected I The basic strategy is to build a linear regression model that includes all genotypes (predictors) and all phenotypes (responses) and then perform variable selection I The models covered here are: I I I Penalised multiple linear regression (any p and q = 1) Penalised sparse canonical correlation analysis (any p and q) Penalised reduced-rank regression (any p and q) Multiple genotypes and one phenotype The multiple linear regression model with univariate response I Fit the multiple linear regression model y= p X βk xk + k=1 by solving β̂ ols ( n ) p X X = argmin (yi − xik βk ) β I i=1 k=1 Or more compactly, minimise the error function RSS(β) = (y − Xβ)T (y − Xβ) I When n > p, the OLS solution is given by −1 β̂ ols = XT X XT y I Which genotypes best predict y ? Penalised multivariate regression for genotype selection I One step-approach: fit the multiple linear regression model while finding a subset of p 0 important predictors with indices {j1 , j2 , . . . , jp0 } ⊂ {1, 2, . . . , p} with p 0 p all having non-zero regression coefficients I This can be achieved by fitting a penalised regression model ) ( n p X X β̂ pen = argmin (yi − xik βk ) such that g (β) < t β i=1 k=1 I The function g (β) imposes a constraint on the size of β I The complexity parameter t controls the trade-off between the OLS (unpenalised) solution and the penalised solution Penalised multivariate regression for genotype selection Illustration Genotypes x1 x2 x3 β1 Phenotype β3 βp xp y Ridge regression Pp 2 k=1 βk <t I Ridge regression finds β̂ subject to I The problem can be rewritten as ( n ) p p X X X ridge 2 β̂ = argmin (yi − xik βk ) + λ βk β i=1 k=1 k=1 or more compactly RSS(β, λ) = (y − Xβ)T (y − Xβ) + λβ T β I I λ controls the amount of shrinkage Some properties are: I I I I I −1 Closed-form solution β̂ ridge = XT X + λI XT y Useful when the data matrix X is singular and XT X non invertible Bias-variance trade-off – better predictions Grouping effect – correlated variables get similar coefficients No variable selection Lasso regression I P Lasso regression finds β̂ ols subject to pk=1 |βk | < t or ( n ) p p X X X lasso β̂ = argmin (yi − xik βk ) + λ |βk | β I I I i=1 k=1 k=1 Performs both continuous shrinkage and variable selection λ controls the amount of sparsity For instance, with p = 2: Lasso Ridge Example: Lasso regression, regularisation path p = 100 genotypes and q = 1 phenotype. Only two strong predictors, 98 noise variables 0.4 0.2 0.0 −0.2 Coefficients b 0.6 0.8 1 Phenotype 1 5 10 15 20 25 30 35 40 45 50 55 60 65 Number of selected Xs 70 75 80 85 90 95 100 Elastic net regression Convex combination of L1 and L2 penalties I The elastic net regression solves β̂ elnet ( n ) p p p X X X X = argmin (yi − xik βk ) + λ1 |βk | + λ2 βk2 β i=1 k=1 k=1 I It retains the benefits of both individual penalties I Setting α= λ2 λ1 + λ2 the penalty simplifies to (1 − α)kβk1 + αkβk2 k=1 Solving the penalised regression problem Uncorrelated predictors I For many penalty functions, the solution can be found efficiently by using component-wide soft-thresholding updates of the OLS estimate I For a given λ: 1. Find the OLS estimates β̂ ols = β1ols , β2ols , . . . , βpols 2. Cycle over the single coefficients and apply the thresholding update I I Component-wise Lasso update: λ lasso ols ols β̂k = sign(β̂k ) |β̂k | − 2 + (a)+ = a if a > 0 0 otherwise Component-wise Elastic net update: β̂krid+las 1 λ1 ols ols = sign(β̂k ) |β̂k | − 1 + λ2 2 + Solving the penalised regression problem Correlated predictors – coordinate descent I I Update each component in turn while holding all the others fixed Given λ, cycle over j = 1, 2, . . . , p, 1, 2, . . . until convergence: 1. Compute the partial residual rik = yi − X xij βj j6=k 2. Compute OLS coefficient of these residuals on jth predictor β̂kols = n 1X xik rik n i=1 3. Apply soft-thresholding–depending on the penalty, e.g. λ lasso ols ols = sign(β̂k ) |β̂k | − β̂k 2 + Variable selection in practice Selection of the sparsity parameter by cross-validation I How do we chose the optimal sparsity parameter λ? I A common procedure is leave-one-out cross validation (LOOCV) I For each value of λ: 1. 2. 3. 4. Leave one sample out for testing Use the remaining n − 1 samples for training– fit the model Compute the prediction error using the test sample Repeat for all n samples and take the average prediction error I The optimal λ minimises the cross-validated prediction error I Various search strategies can be used to explore the space Λ I In practice, it does not always work well in detecting the true sparse solution Variable selection in practice The stability selection approach (Meinshausen and Buhlmann, 2009) I Stability selection is an alternative approach which avoids searching for an optimal regularisation parameter I The procedure works as follow 1. 2. 3. 4. Extract B subsamples (e.g. of size n/2) from the training data set For each subsample, fit the sparse regression model Estimate the probability of each predictor being selected Select all those predictors whose selection probability is above a pre-determined threshold I Under some assumptions, this procedure controls the expected number of false positives I Unlike LOOCV, it does not heavily depend upon the regularisation parameter λ Example: Lasso regression, stability path p = 100 genotypes and q = 1 phenotype. The true model is y = β1 x1 + β2 x2 + 0.6 0.4 0.2 0.0 Selection Probabilities 0.8 1.0 1 Phenotype 1 2 3 Number of selected Xs 4 5 Latent variable models for one phenotype Simultaneous dimensionality reduction and variable selection Genotypes x1 vrq uj1 uj2 Phenotype tj x3 y j = 1, . . . , p ujp xp Each tr is a latent variable with some optimal properties, e.g. maximal variance Modelling multiple genotypes and phenotypes Multivariate multiple linear regression: Y = XC + E I If n were greater than p, C could be estimated by least squares as Ĉ(R) = (X0 X)−1 X0 Y I and Ĉ(R) would be full rank, R = min (p, q) No real gain – same solutions as with q regression models Reduced rank regression (RRR) I An alternative approach is to impose a rank condition on the regression coefficient matrix so that rank(C) ≤ min(p, q) I If C has rank r , it can be written as a product of a (p × r ) matrix B and (r × q) matrix A, both of full rank I The RRR model is written Y = X BA + E, I For a fixed rank r , the matrices A and B are obtained by minimising the weighted least squares criterion M = Tr (Y − XBA) Γ (Y − XBA)0 for a given (q × q) positive definite matrix Γ Reduced rank regression (RRR) Illustration Reduced rank regression Solutions I The optimal  and B̂ are obtained as 1  = H0 Γ− 2 1 B̂ = (X0 X)−1 X0 YΓ 2 H I H is the (q × r ) matrix whose columns are the first r normalized eigenvectors associated with the r largest eigenvalues of the (q × q) matrix 1 1 R = Γ 2 Y0 X(X0 X)−1 X0 YΓ 2 Sparse RRR Vounou et al. (2010) Latent variable models for multiple phenotype Find latent variable pairs (tr , sr ) satisfying some optimal properties Genotypes x1 Phenotypes y1 uj1 vj1 x2 uj2 tj sj y2 vj2 vjq ujp xp yq j = 1, . . . , min(p, q) Canonical Correlation Analysis (CCA) I I Extract canonical variates (tj , sj ), j = 1, . . . , r with r ≤ min(p, q) The first pair of vectors u1 and v1 maximizes ρ1 = cor(Xu1 , Yv1 ) = cor(t1 , s1 ) and is found by solving (u1 , v1 ) = argmax p kuk=1,kvk=1 I where Sxx and Sxy are sample covariance matrices By construction: I I I I I u0 Sxy v u0 Sxx uv0 Syy v canonical correlations are ordered, with ρ1 ≥ ρ2 . . . ≥ ρr cov(ti , tj ) = 0 and cov(si , sj ) = 0 for all i 6= j cov(ti , sj ) = 0 for all i 6= j When n < min(p, q) some regularisation is needed CCA is a special case of RRR Example: sparse RRR (1/3) p = 100, q = 100, (y1 , y2 ) depend on (x1 , x2 ) (red) and (y3 , y4 ) depend on (x3 , x4 ) (green) Genotypes selected from the 1st rank: 0.6 0.4 0.2 0.0 Selection Probabilities 0.8 1.0 100 Phenotypes, 5 selected 0 1 2 3 Number of selected Xs 4 5 Example: sparse RRR (2/3) Genotypes selected from the 2nd rank: 0.6 0.4 0.2 0.0 Selection Probabilities 0.8 1.0 100 Phenotypes, 5 selected 0 1 2 3 Number of selected Xs 4 5 Example: sparse RRR (3/3) Genotypes selected from the 3rd rank: 0.6 0.4 0.2 0.0 Selection Probabilities 0.8 1.0 100 Phenotypes, 5 selected 0 1 2 3 Number of selected Xs 4 5 Association studies using penalised regression I Hoggart et al. (2008) I I I I Wu et al. (2009) I I I I Propose a penalised likelihood approach – equivalent to Lasso Use a stochastic search maximisation algorithm – not as efficient as coordinate descent Propose an approximation for type-I error Propose a sparse logistic regression approach for case-control studies Use coordinate descent to compute the sparse solution Include two and higher-order interactions after marginal effects have been detected Vounou et al. (2010) I I I Use sparse regression with multiple phenotypes (sparse RRR) Tailored for imaging genetics studies Perform simulation studies to assess statistical power Statistical power comparison A Monte Carlo simulation framework (Vounou et al., 2010) I Generate an entire population P of 10k individuals I I I I Use a forwards-in-time simulation approach (FREGENE) Reproduce features observed in real human populations Genotypes coded as minor allele SNP-dosage Generate B Monte Carlo data sets of sample size n each: 1. Randomly sample n genotypes x from the population P 2. Simulate the n phenotypes y from a multivariate normal distribution calibrated on real data (ADNI data base) 3. Induce an association according to an additive genetic model I I I p between 1000 to 40, 000 10 predictive SNPs with small marginal effects q = 111 with 6 true responses Genotype simulation Linkage disequilibrium patterns SNPLDCoef f i c i ent s Phenotype simulation q = 111 ROIs obtained from the GSK CIC Brain Atlas using ADNI images ROICor r el at i onCoef f i c i ent s SNP sensitivity with n = 500 SNP sensitivity with n = 1000 Large p Ratio of SNP sensitivities (sRRR/MULM) as a function of the total number of SNPs References I Filippini, N., Rao, A., Wetten, S., et al. (2009). Anatomically-distinct genetic associations of APOE epsilon4 allele load with regional cortical atrophy in Alzheimer’s disease. NeuroImage, 44(3):724–8. Hoggart, C., Whittaker, J., De Iorio, M., and Balding, D. (2008). Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies. PLoS Genet, 4(7). Joyner, A. H., Roddey, J. C., Bloss, C. S., et al. (2009). A common MECP2 haplotype associates with reduced cortical surface area in humans in two independent populations. PNAS, 106(36):15475–15480. Meinshausen, N. and Buhlmann, P. (2009). Stability selection. Annnals of Statistics. Potkin, S. G., Turner, J. a., Guffanti, G., et al. (2009). A genome-wide association study of schizophrenia using brain activation as a quantitative phenotype. Schizophrenia bulletin, 35(1):96–108. Shen, L., Kim, S., Risacher, S. L., Nho, K., et al. (2010). Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort. NeuroImage, pages 1–13. References II Stein, J. L., Hua, X., Lee, S., et al. (2010). Voxelwise genome-wide association study (vGWAS). NeuroImage. Vounou, M., Nichols, T., and Montana, G. (2010). Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach. NeuroImage (under revision). Witten, D., Tibshirani, R., and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515. Wu, T., Chen, Y., Hastie, T., Sobel, E., and Lange, K. (2009). Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6):714.