International Biometric Society PENALIZED REGRESSION MODELS FOR MULTIPLEX SNPS PYROSEQUENCING DATA ANALYSIS Ambroise Jérôme1, Butoescu Valentina2, Tombal Bertrand2, Robert Annie3, Gala Jean-Luc1 1 Centre for Applied Molecular Technologies (CTMA), Institut de Recherche Experimentale et Clinique (IREC), Université catholique de Louvain, Brussels, Belgium. 2 Service d’Urologie, Institut de Recherche Expérimentale et Clinique (IREC), Cliniques universitaires SaintLuc,Université catholique de Louvain, Brussels , Belgium. 3Epidemiology and Biostatistics Department (EPID), Institut de Recherche Expérimentale et Clinique (IREC), Université catholique de Louvain, Clos Chapelle-aux-Champs 30, 1200 Bruxelles, Belgium Pyrosequencing is a cost-effective DNA sequencing technology that has many applications including rapid Single Nucleotide Polymorphisms (SNPs) genotyping for bacterial or human applications [1]. The chemi-luminescent signal produced during the reaction is detected in the pyrosequencer and displayed in pyrosequencing signal (also known as pyrogram TM) which is then translated into the corresponding nucleotide sequence. An increasing number of clinical applications rely on the computation of a multilocus genetic score and require therefore to genotype multiple DNA stretches. In such applications several pyrosequencing primers can be used simultaneously in a multiplex experiment, with overlapping primerspecific pyrosequencing signals as main issue. In this study, novelty consists in selecting the nucleotide dispensation order according to the multiplex pyrosequencing application while carrying out signal analysis with a new signal processing method based on a sparse representation of the pyrosequencing signal [2]. This is performed by constructing an over-complete dictionary of standardized simplex pyrosequencing signals. Then, a penalized linear regression model is built with the y testing multiplex pyrosequencing signal as response and all signals from the dictionary as predictor variables. As a proof of concept, this new signal processing method was applied to a series (n=8) of human DNA samples to genotype nine well identified prostate risk-associated SNPs in two pyrosequencing experiments (successive quintuplex and quadruplex experiments). The rationale for this application is the recent demonstration that genotyping this set of 9 SNPs can improve our patient selection for prostate biopsy when combined with a prostate cancer risk calculator [3]. High quality results were obtained with both multiplex pyrosequencing experiments and a perfect concordance was observed between multiplex and simplex (gold-standard) results. To the best of our knowledge, it is the first time that quadruplex and quintuplex pyrosequencing signals are generated from single wells with each SNP being correctly identified and assigned. Multiplex pyrosequencing enables therefore to lower the global turnaround time of SNPs genotyping and to decrease substantially analytical reagent costs and technician work load while providing reliable results. References 1. 2. 3. Ronaghi, M., Pyrosequencing sheds light on DNA sequencing. Genome Res, 2001. 11(1): p. 3-11. Ambroise, J., et al., AdvISER-PYRO: Amplicon Identification using SparsE Representation of PYROsequencing signal. Bioinformatics, 2013. 29(16): p. 1963-1969. Butoescu, V., et al., Does genotyping of risk-associated single nucleotide polymorphisms improve patient selection for prostate biopsy when combined with a prostate cancer risk calculator? Prostate, 2013. International Biometric Conference, Florence, ITALY, 6 – 11 July 2014