Rare variant analysis in large-scale association and sequencing studies Eleftheria Zeggini eleftheria@sanger.ac.uk Missing heritability in complex traits Interactions Structural variation Epigenetics and environment Thousands of very small effects Large phenotype-genotype heterogeneity Locus heterogeneity and rare variants Low frequency and rare variants Low frequency (0.01<MAF<0.05) and rare variation (MAF<0.01) can contribute to complex common phenotypes Rare variants can have higher penetrance, contribute to more extreme phenotypes and may be more useful as predictive markers Accessing low frequency and rare variants through: – GWAS – imputation – re-sequencing Rare variant analysis Single-point analysis of rare variants is under-powered Approximate sample sizes (cases+controls, equally sized) required to attain 80% power to detect an allelic OR=2.0 at α=5×10−8 dramatically increases as MAF decreases: MAF Sample size 0.05 2,500 0.01 12,000 0.001 117,000 An alternative is to use multivariate methods to combine information across multiple variant sites Several locus-specific approaches have been proposed – collapsing methods – allele-matching methods Rare variant analysis methods: challenges Imputation Genotype-associated probabilities Resequencing Genotype call uncertainty False positive rate Probability that a variant be functional Family-based designs Extreme distribution ends designs Incorporating multiple covariates Correlation structure Direction of effect Meta-analysis Collapsing methods pi 0.2 0.1 0.0 0.2 ri yi βxi i mi ARIEL: Accumulation of Rare variants Integrated and Extended Locus-specific test Allele-matching methods cases 2 4 4 4 4 2 2 4 0 4 4 4 controls Compare similarity scores between cases and controls at each SNP, then sum over SNPs: KBAT Mukhopadhyay et al, Gen Epi 2009 Extended to account for uncertainty: AMELIA (Allele-Matching Empirical Locus Integrated Association test) Power comparison 1000 replications, d=0.02, Q=0.05, non-consensus SNP quality scores, 1000 cases/1000 controls, causal variants are of high quality (phred score 10; probability of correct base-call 0.90) • in the presence of different directions of effect allele-matching methods are much more powerful than collapsing methods • accounting for uncertainty increases power Power comparisons using 500 cases/500 controls and 1000 cases/1000 controls, when causal variants are of high quality (phred score 10; probability of correct base-call 0.90) • the power of the allele-matching methods further increase over the collapsing methods with increasing sample size • accounting for uncertainty increases power Population isolates • The study of rare variants can be empowered by focusing on isolated populations, in which rare variants may have increased in frequency and linkage disequilibrium tends to be extended • Need deeply-phenotyped isolated population samples • Whole-genome sequencing in a subset of samples and imputation out into the full set of GWASed samples • Association with traits of interest Analysis of rare variants in 1000 genomes-imputed data Osteoarthritis • Osteoarthritis (OA) is characterised by cartilage degeneration in synovial joints leading to pain and loss of function particularly in the hip and the knee • OA is a common complex disease with environmental and genetic components affecting 40% of people over the age of 70 years • Current treatments: analgesics, total joint replacement (TJR) • To date only two loci have been robustly associated with OA • Common variants (>0.20 MAF) small effect sizes (OR~1.15) 3,177 cases 4,854 controls Directly typed SNPs (Illumina 610k) Imputed SNPs: 1000 genomes Imputed SNPs: HapMap Directly-typed Directly-typed HapMap-based Imputation Directly-typed HapMap-based Imputation 1KGP-based Imputation Study Number Cases 3177 Number Controls 4894 Effect Allele A 0.0718 arcOGEN replication set 1 GOAL 5165 6155 A 0.0694 1686 743 A 0.0720 arcOGEN replication set 2 deCODE 2409 2319 A 0.0636 1552 3071 A 0.0917 EGCUT 2617 2619 A 0.0769 RSI 1950 3243 G 0.0608 RSII 485 1460 A 0.0715 19041 24504 A arcOGEN GWAS Meta-analysis MAF OR (95% CIa) 1.32 (1.16-1.50) 1.17 (1.06-1.30) 1.23 (0.99-1.56) 1.16 (0.98-1.37) 1.03 (0.88-1.20) 1.16 (1.01-1.34) 1.01 (0.86-1.20) 1.46 (1.07-2.00) 1.17 (1.11-1.23) P value 1.67x10-5 2.60x10-3 7.20x10-2 7.86x10-2 7.31x10-1 4.01x10-2 8.61x10-1 1.68x10-2 2.07x10-8 Intron 4 of the guanine nucleotide exchange factorencoding gene MCF2L Mcf2l studies in rat models of OA have shown expression in articular chondrocytes In human cells MCF2L regulates neurotrophin-3 induced cell migration in Schwann cells. Neurotrophin-3 is a member of the nerve growth factor (NGF) family, and inhibition of NGF has an effect on the pain experienced by OA patients Analysis of rare variants in sequence data Long-range PCR PE library preparation PE sequencing Pulldown Targeted resequencing Whole-genome and whole-exome resequencing Data processing and statistical analysis 500 Exomes Project – Collaborative exome resequencing experiment between the Sanger Institute, GSK and Lausanne University – Study design: – – 500 individuals from the CoLaus cohort with BMI>25 250 with type 2 diabetes and 250 normoglycaemic matched controls – Affymetrix 500k GWAS data – Exome sequencing – Mean depth ~65x 500 Exomes Project –preliminary data Number of cases Number of controls Number of transcripts analyzed Single-point 195 166 14,924 ARIEL AMELIA UK10K project Rare genetic variants in health and disease 4,000 whole genomes: population-based cohorts with rich phenotype data 6,000 whole exomes: obesity, neurodevelopmental disorders and further rare diseases Aims •Elucidate singleton variants by maximising variation detected •Directly associate genetic variations to phenotypic traits •Uncover rare variants contributing to disease •Assign uncovered variations into genotyped cohort and case/control collections •Provide a sequence variation resource for future studies www.uk10k.org Acknowledgements Jenn Asimit Andrew Morris Reedik Magi Acknowledgements A.G. Day-Williams, L. Southam, K. Panoutsopoulou, N.W. Rayner, T. Esko, K. Estrada, H.T. Helgadottir, A. Hofman, T. Ingvarsson, H. Jonsson, A. Keis, H.J.M. Kerkhof, G. Thorleifsson, N.K. Arden, A. Carr, K. Chapman, P. Deloukas, J. Loughlin, A. McCaskie, W.E.R. Ollier, S.H. Ralston, T.D. Spector, G.A. Wallis, J.M. Wilkinson, N. Aslam, F. Birell, I. Carluke, J. Joseph, A. Rai, M. Reed, K. Walker, S.A. Doherty, I. Jonsdottir, R.A. Maciewicz, K.R. Muir, A. Metspalu, F. Rivadeneira, K. Stefansson, U. Styrkarsodottir , A.G. Uitterlinden, J.B.J. van Meurs, W. Zhang, A.M. Valdes, M. Doherty, arcOGEN Consortium 500 Exomes Project A partnership between the Wellcome Trust Sanger Institute, the CoLaus principal investigators and the Quantitative Sciences dept. of GlaxoSmithKline GSK: Lausanne: Wellcome Trust Sanger Institute: Vincent Mooser John Whittaker Linda McCarthy Matt Nelson Claudio Verzilli Judong Shen Stephanie Chissoe Charles Cox Meg Ehm Keith Nangle Dana Fraser Kijoung Song Peter Woollard Dawn Waterworth Peter Vollenweider Gerard Waeber Jacques Beckmann Sven Bergmann Pedro Marques Vidal Murielle Bochud Zoltan Kutalik Jennifer Asimit Ines Barroso Caren Brockington Yuan Chen Aaron Day-Williams Richard Durbin Martin Hunt Sarah Hunt Matt Hurles Jimmy Liu Margarida Lopes Daniel MacArthur Aarno Palotie Theo Papamarkou Fliss Payne Manj Sandhu Carol Scott Lorraine Southam Ioanna Tachmazidou Chris Tyler-Smith Ellie Wheeler Bendik Winsvold Yali Xue Eleftheria Zeggini Principal Applicants Leena Peltonen, Wellcome Trust Sanger Institute Richard Durbin, Wellcome Trust Sanger Institute Co-applicants Jeffrey Barrett, Wellcome Trust Sanger Institute Ines Barroso, Wellcome Trust Sanger Institute George Davey-Smith, University of Bristol Ismaa Sadaf Farooqi, University of Cambridge Matthew Hurles, Wellcome Trust Sanger Institute Stephen O'Rahilly, University of Cambridge Aarno Palotie, Wellcome Trust Sanger Institute Nicole Soranzo, Wellcome Trust Sanger Institute Tim Spector, King's College London Eleftheria Zeggini, Wellcome Trust Sanger Institute Named collaborators Phil Beales, University College London Jamie Bentham, University of Oxford Shoumo Bhattacharya, University of Oxford Patrick Bolton, King's College London Gerome Breen, King's College London Krishnan Chatterjee, University of Cambridge Laura K Curran, King's College London Anne Farmer, King's College London David Fitzpatrick, Edinburgh University Daniel Geschwind, UCLA, USA Steve Humphries, University College London Jouko Lonnqvist, National Public Health Institute, Finland Peter McGuffin, King's College London Lucy Raymond, University of Cambridge David Savage, University of Cambridge Peter Scambler, University College London Robert Semple, University of Cambridge David St Clair, University of Aberdeen Lennart von Wendt, University of Helsinki, Finland Supported by the Wellcome Trust, Arthritis Research UK, Pfizer