[CLICK HERE AND TYPE TITLE]

advertisement
International Biometric Society
Selection of variables for gene-set analysis using Kernel methods
Vicente Gallego1, M. Luz Calle2, Ramon Oller3
1
Department of Systems Biology, University of Vic(Catalonia)
2
Department of Systems Biology, University of Vic(Catalonia).
3
Departament d'Economia i Empresa, University of Vic (Catalonia).
The identification of genetic variants that are associated with disease risk is an important goal
of genetic association studies. Standard approaches perform univariate analysis where each
genetic variant, usually Single Nucleotide Polimorphisms (SNPs), is tested for association
with disease status. However, the marginal approach suffers from many limitations, the most
important is its reduced statistical power due to small marginal and/or nonlinear effects and
the multiple testing corrections that are needed. An alternative is gene-set analysis (GSA),
where instead of testing the marginal effect of each variant, we test the joint effect of a set of
genetic variants within a gene or a genetic pathway. In this work we consider the Kernel
logistic model for gene-set analysis which provides a flexible framework for exploring
nonlinear effects. The genetic contribution is included in the model through the kernel matrix
that measures the similarities between the individuals on the basis of their SNP genotypes.
The test of the joint effect of the SNP set is based on the connection between the kernelmachine framework and generalized linear mixed models. This provides the basis for
statistical inference, through an statistic Q that follows a scaled chi-square distribution (Wu et
al. 2010).
In practice, the power of this approach may be limited if only a small fraction of the
considered genetic variants are relevant. In such case, the selection of disease informative
SNPs plays a crucial role for improving the power of the set-based association test. With this
motivation, we propose a SNP selection process within the Kernel framework. In this context,
we can measure the separation of cases and controls provided by a set of genetic variants by
considering the concentration of the embedded data to their respective centres of mass. For
each genetic variant we measure the decrease in separation when the variable is eliminated
and use this measure as the selection criteria.
Once the SNP selection is completed, we repeat the set-based analysis using the Q statistic
and obtain a set-based p-value. This two-stage approach requires a post-selection adjustment
for controlling the type I error based on an adjustment of the p-values distribution under the
null hypothesis of no association.
Wu, M. C., Kraft P., Epstein M.P., Taylor D.M., Chanock S.J., Hunter D.J., Xiong, L. (2010)
Powerful SNP-Set Analysis for Case-Control Genome_wide Association Studies. The
American Journal of Human Genetics 86, 929-942.
Acknowledgments:
This research was partially supported by grant MTM2012-38067-C02-02
from the Ministerio de Economía e Innovación (Spain) and grant 2009SGR-581 from
Generalitat de Catalunya (Spain)
International Biometric Conference, Florence, ITALY, 6 – 11 July 2014
Download