EleftheriaZeggini_CSI2011

advertisement
Rare variant analysis in large-scale
association and sequencing studies
Eleftheria Zeggini
eleftheria@sanger.ac.uk
Missing heritability in complex traits
Interactions
Structural variation
Epigenetics and environment
Thousands of very small effects
Large phenotype-genotype heterogeneity
Locus heterogeneity and rare variants
Low frequency and rare variants
Low frequency (0.01<MAF<0.05) and rare variation
(MAF<0.01) can contribute to complex common
phenotypes
Rare variants can have higher penetrance, contribute to
more extreme phenotypes and may be more useful
as predictive markers
Accessing low frequency and rare variants through:
– GWAS
– imputation
– re-sequencing
Rare variant analysis
Single-point analysis of rare variants is under-powered
Approximate sample sizes (cases+controls, equally sized)
required to attain 80% power to detect an allelic OR=2.0
at α=5×10−8 dramatically increases as MAF decreases:
MAF
Sample size
0.05
2,500
0.01
12,000
0.001
117,000
An alternative is to use multivariate methods to combine
information across multiple variant sites
Several locus-specific approaches have been proposed
– collapsing methods
– allele-matching methods
Rare variant analysis methods: challenges
Imputation
Genotype-associated probabilities
Resequencing
Genotype call uncertainty
False positive rate
Probability that a variant be functional
Family-based designs
Extreme distribution ends designs
Incorporating multiple covariates
Correlation structure
Direction of effect
Meta-analysis
Collapsing methods
pi
0.2
0.1
0.0
0.2
ri
yi    
 βxi   i
mi
ARIEL: Accumulation of Rare variants Integrated and
Extended Locus-specific test
Allele-matching methods
cases
2
4
4 4
4
2
2
4
0 4
4
4
controls
Compare similarity scores between cases and controls at each SNP, then sum over SNPs: KBAT
Mukhopadhyay et al, Gen Epi 2009
Extended to account for uncertainty: AMELIA
(Allele-Matching Empirical Locus Integrated Association test)
Power comparison
1000 replications,
d=0.02, Q=0.05,
non-consensus SNP quality scores,
1000 cases/1000 controls,
causal variants are of high quality
(phred score 10; probability of
correct base-call 0.90)
• in the presence of different directions of effect allele-matching
methods are much more powerful than collapsing methods
• accounting for uncertainty increases power
Power comparisons using 500 cases/500 controls and 1000 cases/1000
controls, when causal variants are of high quality (phred score 10; probability
of correct base-call 0.90)
• the power of the allele-matching methods further increase over the
collapsing methods with increasing sample size
• accounting for uncertainty increases power
Population isolates
• The study of rare variants can be empowered by
focusing on isolated populations, in which rare variants
may have increased in frequency and linkage
disequilibrium tends to be extended
• Need deeply-phenotyped isolated population samples
• Whole-genome sequencing in a subset of samples and
imputation out into the full set of GWASed samples
• Association with traits of interest
Analysis of rare variants in
1000 genomes-imputed data
Osteoarthritis
• Osteoarthritis (OA) is characterised by cartilage degeneration
in synovial joints leading to pain and loss of function
particularly in the hip and the knee
• OA is a common complex disease with environmental and
genetic components affecting 40% of people over the age of
70 years
• Current treatments: analgesics, total joint replacement (TJR)
• To date only two loci have been robustly associated with OA
• Common variants (>0.20 MAF) small effect sizes
(OR~1.15)
3,177 cases
4,854 controls
Directly typed SNPs (Illumina 610k)
Imputed SNPs: 1000 genomes
Imputed SNPs: HapMap
Directly-typed
Directly-typed
HapMap-based
Imputation
Directly-typed
HapMap-based
Imputation
1KGP-based
Imputation
Study
Number
Cases
3177
Number
Controls
4894
Effect
Allele
A
0.0718
arcOGEN
replication set 1
GOAL
5165
6155
A
0.0694
1686
743
A
0.0720
arcOGEN
replication set 2
deCODE
2409
2319
A
0.0636
1552
3071
A
0.0917
EGCUT
2617
2619
A
0.0769
RSI
1950
3243
G
0.0608
RSII
485
1460
A
0.0715
19041
24504
A
arcOGEN GWAS
Meta-analysis
MAF
OR
(95% CIa)
1.32
(1.16-1.50)
1.17
(1.06-1.30)
1.23
(0.99-1.56)
1.16
(0.98-1.37)
1.03
(0.88-1.20)
1.16
(1.01-1.34)
1.01
(0.86-1.20)
1.46
(1.07-2.00)
1.17
(1.11-1.23)
P value
1.67x10-5
2.60x10-3
7.20x10-2
7.86x10-2
7.31x10-1
4.01x10-2
8.61x10-1
1.68x10-2
2.07x10-8
Intron 4 of the guanine nucleotide exchange factorencoding gene MCF2L
Mcf2l studies in rat models of OA have shown
expression in articular chondrocytes
In human cells MCF2L regulates neurotrophin-3
induced cell migration in Schwann cells. Neurotrophin-3
is a member of the nerve growth factor (NGF) family,
and inhibition of NGF has an effect on the pain
experienced by OA patients
Analysis of rare variants
in sequence data
Long-range PCR
PE library preparation
PE sequencing
Pulldown
Targeted resequencing
Whole-genome and whole-exome resequencing
Data processing and statistical
analysis
500 Exomes Project
– Collaborative exome resequencing experiment
between the Sanger Institute, GSK and Lausanne
University
– Study design:
–
–
500 individuals from the CoLaus cohort with BMI>25
250 with type 2 diabetes and 250 normoglycaemic matched controls
– Affymetrix 500k GWAS data
– Exome sequencing
– Mean depth ~65x
500 Exomes Project –preliminary data
Number of cases
Number of controls
Number of transcripts analyzed
Single-point
195
166
14,924
ARIEL
AMELIA
UK10K project
Rare genetic variants in health and disease
4,000 whole genomes: population-based cohorts with rich phenotype data
6,000 whole exomes: obesity, neurodevelopmental disorders and further rare diseases
Aims
•Elucidate singleton variants by maximising variation detected
•Directly associate genetic variations to phenotypic traits
•Uncover rare variants contributing to disease
•Assign uncovered variations into genotyped cohort and case/control collections
•Provide a sequence variation resource for future studies
www.uk10k.org
Acknowledgements
Jenn Asimit
Andrew Morris
Reedik Magi
Acknowledgements
A.G. Day-Williams, L. Southam, K. Panoutsopoulou, N.W. Rayner, T.
Esko, K. Estrada, H.T. Helgadottir, A. Hofman, T. Ingvarsson, H.
Jonsson, A. Keis, H.J.M. Kerkhof, G. Thorleifsson, N.K. Arden, A. Carr,
K. Chapman, P. Deloukas, J. Loughlin, A. McCaskie, W.E.R. Ollier, S.H.
Ralston, T.D. Spector, G.A. Wallis, J.M. Wilkinson, N. Aslam, F. Birell, I.
Carluke, J. Joseph, A. Rai, M. Reed, K. Walker, S.A. Doherty, I.
Jonsdottir, R.A. Maciewicz, K.R. Muir, A. Metspalu, F. Rivadeneira, K.
Stefansson, U. Styrkarsodottir , A.G. Uitterlinden, J.B.J. van Meurs, W.
Zhang, A.M. Valdes, M. Doherty, arcOGEN Consortium
500 Exomes Project
A partnership between the Wellcome Trust Sanger Institute, the CoLaus principal investigators
and the Quantitative Sciences dept. of GlaxoSmithKline
GSK:
Lausanne:
Wellcome Trust Sanger Institute:
Vincent Mooser
John Whittaker
Linda McCarthy
Matt Nelson
Claudio Verzilli
Judong Shen
Stephanie Chissoe
Charles Cox
Meg Ehm
Keith Nangle
Dana Fraser
Kijoung Song
Peter Woollard
Dawn Waterworth
Peter Vollenweider
Gerard Waeber
Jacques Beckmann
Sven Bergmann
Pedro Marques Vidal
Murielle Bochud
Zoltan Kutalik
Jennifer Asimit
Ines Barroso
Caren Brockington
Yuan Chen
Aaron Day-Williams
Richard Durbin
Martin Hunt
Sarah Hunt
Matt Hurles
Jimmy Liu
Margarida Lopes
Daniel MacArthur
Aarno Palotie
Theo Papamarkou
Fliss Payne
Manj Sandhu
Carol Scott
Lorraine Southam
Ioanna Tachmazidou
Chris Tyler-Smith
Ellie Wheeler
Bendik Winsvold
Yali Xue
Eleftheria Zeggini
Principal Applicants
Leena Peltonen, Wellcome Trust Sanger Institute
Richard Durbin, Wellcome Trust Sanger Institute
Co-applicants
Jeffrey Barrett, Wellcome Trust Sanger Institute
Ines Barroso, Wellcome Trust Sanger Institute
George Davey-Smith, University of Bristol
Ismaa Sadaf Farooqi, University of Cambridge
Matthew Hurles, Wellcome Trust Sanger Institute
Stephen O'Rahilly, University of Cambridge
Aarno Palotie, Wellcome Trust Sanger Institute
Nicole Soranzo, Wellcome Trust Sanger Institute
Tim Spector, King's College London
Eleftheria Zeggini, Wellcome Trust Sanger Institute
Named collaborators
Phil Beales, University College London
Jamie Bentham, University of Oxford
Shoumo Bhattacharya, University of Oxford
Patrick Bolton, King's College London
Gerome Breen, King's College London
Krishnan Chatterjee, University of Cambridge
Laura K Curran, King's College London
Anne Farmer, King's College London
David Fitzpatrick, Edinburgh University
Daniel Geschwind, UCLA, USA
Steve Humphries, University College London
Jouko Lonnqvist, National Public Health
Institute, Finland
Peter McGuffin, King's College London
Lucy Raymond, University of Cambridge
David Savage, University of Cambridge
Peter Scambler, University College London
Robert Semple, University of Cambridge
David St Clair, University of Aberdeen
Lennart von Wendt, University of Helsinki,
Finland
Supported by the Wellcome Trust, Arthritis Research UK, Pfizer
Download