International Biometric Society Selection of variables for gene-set analysis using Kernel methods Vicente Gallego1, M. Luz Calle2, Ramon Oller3 1 Department of Systems Biology, University of Vic(Catalonia) 2 Department of Systems Biology, University of Vic(Catalonia). 3 Departament d'Economia i Empresa, University of Vic (Catalonia). The identification of genetic variants that are associated with disease risk is an important goal of genetic association studies. Standard approaches perform univariate analysis where each genetic variant, usually Single Nucleotide Polimorphisms (SNPs), is tested for association with disease status. However, the marginal approach suffers from many limitations, the most important is its reduced statistical power due to small marginal and/or nonlinear effects and the multiple testing corrections that are needed. An alternative is gene-set analysis (GSA), where instead of testing the marginal effect of each variant, we test the joint effect of a set of genetic variants within a gene or a genetic pathway. In this work we consider the Kernel logistic model for gene-set analysis which provides a flexible framework for exploring nonlinear effects. The genetic contribution is included in the model through the kernel matrix that measures the similarities between the individuals on the basis of their SNP genotypes. The test of the joint effect of the SNP set is based on the connection between the kernelmachine framework and generalized linear mixed models. This provides the basis for statistical inference, through an statistic Q that follows a scaled chi-square distribution (Wu et al. 2010). In practice, the power of this approach may be limited if only a small fraction of the considered genetic variants are relevant. In such case, the selection of disease informative SNPs plays a crucial role for improving the power of the set-based association test. With this motivation, we propose a SNP selection process within the Kernel framework. In this context, we can measure the separation of cases and controls provided by a set of genetic variants by considering the concentration of the embedded data to their respective centres of mass. For each genetic variant we measure the decrease in separation when the variable is eliminated and use this measure as the selection criteria. Once the SNP selection is completed, we repeat the set-based analysis using the Q statistic and obtain a set-based p-value. This two-stage approach requires a post-selection adjustment for controlling the type I error based on an adjustment of the p-values distribution under the null hypothesis of no association. Wu, M. C., Kraft P., Epstein M.P., Taylor D.M., Chanock S.J., Hunter D.J., Xiong, L. (2010) Powerful SNP-Set Analysis for Case-Control Genome_wide Association Studies. The American Journal of Human Genetics 86, 929-942. Acknowledgments: This research was partially supported by grant MTM2012-38067-C02-02 from the Ministerio de Economía e Innovación (Spain) and grant 2009SGR-581 from Generalitat de Catalunya (Spain) International Biometric Conference, Florence, ITALY, 6 – 11 July 2014