IBD sharing - Columbia University

advertisement
Sequencing 128 Ashkenazi Genomes:
Implications for Medical Genetics and History
Shai Carmi
Department of Computer Science
Columbia University
Itsik Pe’er’s lab
UCLA
October 2014
Outline
• Ashkenazi Jewish Genetics: Background
• The Ashkenazi Genome Sequencing Project
• Segment Sharing and Population History
• Opportunities and Future Directions
Outline
• Ashkenazi Jewish Genetics: Background
• The Ashkenazi Genome Sequencing Project
• Segment Sharing and Population History
• Opportunities and Future Directions
Ashkenazi Jewish (AJ) Genetics: Significance
Medical genetics
• Large founder population
• Mendelian disorders
• Complex diseases
o
Breast cancer, Parkinson’s, Crohn’s
Population genetics
• Debated origins
• Genetics of a founder event
mtDNA: Behar et al., 2004; Behar et al., 2006
Y chr: Behar et al., 2003; Behar et al., 2004
Disease genes: Risch et al., 2003; Slatkin, 2004
SNP arrays: Gusev et al., 2012; Palamara et al., 2012
Review: Ostrer and Skorecki, 2013
Founder Populations: Opportunities
Time
Non-founder population
Founder population
Recent successes
• Greece
o
Tachmazidou et al., 2013; HDL
• Finland
o
Kurki et al. 2014; aneurysm
• Iceland
Bottleneck
o
Many papers; most recently
Steinthorsdottir et al., 2014; T2D
• Ashkenazi Jews
o
Hui et al., in preparation; Crohn’s
See also:
• Hatzikotoulas et al., 2014
• Zuk et al., 2014
Present
Population size
Disease alleles
Problem: Common genotyping platforms do not include
alleles rare outside the founder population
Opportunities: Reduced Haplotype Diversity
Chromosomes
in the sample
Observed data
Inferred sequence
Imputation
Full sequence
Partial sequence (SNP array, low-coverage sequence)
Nearly-complete inferred sequence
Problem: The Ashkenazi population is missing a
reference panel of complete sequences
Opportunities: Personal Genomics in AJ
Personal clinical genomics is here
But genomes are hard to interpret
Problem: The Ashkenazi population is missing a
reference panel of complete sequences
The Documented Ashkenazi History
• Origin?
• Founder event?
• Ca. 1000:
Small communities in
• European gene flow:
Northern France, Rhineland
o Where?
• Migration
east
o When?
• Expansion
o How much?
• Migration to US and Israel
• Relation to other Jews?
Whole-genomes?
Outline
• Ashkenazi Jewish Genetics: Background
• The Ashkenazi Genome Sequencing Project
• Segment Sharing and Population History
• Opportunities and Future Directions
The Ashkenazi Genome Consortium
NY area labs interested in specific diseases
Impute
Large cohorts
of AJ cases
Phase I: 128 whole genomes (Completed*)
Phase II: ≈500 whole genomes (NYGC; under way)
* Carmi et al., Nat Commun, 2014
Quantify utility in
medical genetics
Learn about
population history
Technical Details
• Ashkenazi ancestry verified
• Some phenotypes exist
• Sequencing by Complete
Genomics in three batches
Uniform QC measures
o
Property
Genome (exome)
Coverage
≈56x
Fraction called
96.7±0.3% (98.1%)
Concordance with arrays
99.67±0.25%
Ti/Tv ratio
2.14±0.004 (3.05)
hets
roh
• Error rate estimates
o
o
o
Using runs-of-homozygosity and a duplicate
SNVs: ≈10-40k errors per genome (FDR: 0.3-1.3%)
Indels: ≈10-30k errors per genome (FDR: 2-6%)
• QC: Remove indels, poly-allelic variants, Hardy-Weinberg violations, low call rate
• Errors after QC: ≈5k per genome
Comparison to Europeans
Comparison panels:
• 26 Flemish from Belgium (platform-matched)
• 87 North-West Europeans [CEU (1000 Genomes)]
Fraction novel (%)
Population-specific variants
(dbSNP135)
(25x25 genomes)
AJ Clinical Genomics
An Ashkenazi reference panel filters more benign variants
than a European panel.
AJ Medical Genetics: Imputation
An Ashkenazi reference panel improves imputation accuracy of AJ
SNP arrays compared to the standard European panel.
Correlation
between
imputed and
real data
Using Impute2
Rare variants (≤1%)
accuracy:
87% vs 65%
AJ Medical Genetics: Applications
• Our consortium:
o
o
o
o
An expanded carrier screening panel
Pharmacogenetically-important alleles
Low-frequency deletions in tumors
Association studies: schizophrenia, Parkinson’s, Crohn’s,
longevity, cancer
• Others:
o
o
Frequency lookups (clinical/pedigrees)
Association studies: Epilepsy, Autism, …
Principal Component Analysis (PCA)
Middle-East
Ashkenazi Jews
Europe
Druze
French
Tuscans
Palestinians
Flemish
Italians
Bedouins
Sephardi Jews
(Italy, Turkey)
Sardinians
Basque
Price et al., 2008; Olshen et al., 2008; Need et al., 2009; Kopelman et al., 2009; Atzmon et al., 2010;
Behar et al., 2010; Bray et al., 2010; Guha et al., 2012; Behar et al., 2014
The Documented Ashkenazi History
• Origin?
• Founder event?
• European gene flow:
o Where?
o When?
o How much?
• Relation to other Jews?
Variant Discovery Rate
Heterozygosity paradox?
Number of variants
Predicted number of new variants
A Model for Ancient History
Out-of-Africa
MiddleEast
European gene flow
into AJ
25x25 genomes
The Documented Ashkenazi History
• Origin?
• Founder event?
• European gene flow:
o Where?
o When?
o How much?
• Relation to other Jews?
Outline
• Ashkenazi Jewish Genetics: Background
• The Ashkenazi Genome Sequencing Project
• Segment Sharing and Population History
• Opportunities and Future Directions
Identical-by-Descent (IBD) Shared Segment
Formal definition: A contiguous segment inherited from a
single, recent common ancestor.
g
What’s “recent”?
IBD segment
After Browning & Browning, 2012
Identical-by-Descent (IBD) Shared Segment
Formal definition: A contiguous segment inherited from a
single, recent common ancestor.
Practical definition: A contiguous
segment nearly identical over a
sequence length longer than a cutoff.
g
• Requires strong genetic drift
• Segments are rare but long
o
o
Probability of a site to be shared ~2−2𝑔
Segment length ~𝑔−1
• Current methods can detect
segments ≳1cM
IBD segment
Applications
• A segment indicates recent co-ancestry:
o Disease mapping
o Pedigree reconstruction
o Detecting natural selection
o Demographic (historical) inference
o Estimating mutation rates
• Identical sequence across individuals:
o Resolving haplotypes (phasing)
o Imputation
o Estimating heritability
o Estimating genotyping error rate
Eskin’s lab
g
IBD segment
IBD Sharing Theory
• Model:
o
o
o
A population with a constant effective size N
Two chromosomes of length L (Morgans)
A minimal segment length m (Morgans)
• The number of shared segments nm?
• The fraction of the chromosome in shared segments fm?
ℓ1
ℓ2
ℓ3
L
𝑛𝑚 = 3; 𝑓𝑚 = (ℓ1 + ℓ2 + ℓ3 ) 𝐿
m
Results overview
• Under the Sequentially Markov Coalescent (SMC):
• The number of shared segments:
𝒏𝒎 =
𝟐𝑵𝑳
𝟏+𝟐𝒎𝑵 𝟐
; Var[𝒏𝒎 ] ≈ 𝟐𝒎𝑳𝟐𝑵
• The fraction of the chromosome in shared segments:
𝒇𝒎 =
𝟏+𝟒𝒎𝑵
𝟏+𝟐𝒎𝑵 𝟐
𝑳 𝒎]
; Var[𝒇𝒎 ] ≈ 𝐥𝐨𝐠[𝑵𝑳
• Results for a more realistic coalescent model (SMC’)
• Implicit expressions for the distributions
• All results generalizable to variable population size
Palamara et al., 2012; Carmi et al., Genetics, 2013; Carmi et al., Theor Popul Biol, 2014
Demographic Inference: Maximum Likelihood
Use the distribution of the number of shared segments
Carmi et al., Theor Popul Biol, 2014
Demographic Inference: A Practical Approach
• Historical size N(t)=N0 ν(t).
• Mean fraction of the genome in segments of length ℓ1<ℓ<ℓ2:
− 𝑡 𝑑𝑡′
(1) 𝑃 ℓ1 , ℓ2 =
∞ 𝑒 0 𝜈 𝑡′ −2ℓ 𝑁 𝑡
1 0 1+2ℓ1 𝑁0 𝑡 −𝑒 −2ℓ2 𝑁0 𝑡 1+2ℓ2 𝑁0 𝑡 𝑑𝑡
𝑒
𝜈(𝑡)
0
Hypothetical example
Method:
• Record IBD segments in each
length bin
• Using Eq. (1), find the history N(t)
that fits best
Palamara et al., 2012
IBD Sharing in Ashkenazi Jews
Atzmon et al., 2010
Gusev et al., 2012
AJ
EU
Bray et al., 2010
A pair of AJ individuals shares ≈50cM in ≈15 long segments (>3cM)
Inferring the Bottleneck Size and Time
Carmi et al., Nat. Commun., 2014
Palamara et al., 2012
Inferring the Bottleneck Size and Time
Carmi et al., Nat. Commun., 2014
Palamara et al., 2012
Inferring the Bottleneck Size and Time
Time (years)
Carmi et al., Nat. Commun., 2014
Palamara et al., 2012
Caveats
• Phasing and sequencing errors; IBD detection errors
• Reasonable power only for 10-50 generations ago
• Model specification (e.g. prolonged bottleneck, admixture)
Parameter
95% confidence interval
Ancestral size
3654-5856
Bottleneck size
249-419
Growth rate (per generation) 16-53%
Bottleneck time (years)
625-800
• A bottleneck 700ya confirmed by an independent method:
lengths of haplotypes around rare variants
o
Mathieson and McVean, 2014
The Documented Ashkenazi History
• Origin?
• Founder event?
• European gene flow:
o Where?
o When?
o How much?
• Relation to other Jews?
Outline
• Ashkenazi Jewish Genetics: Background
• The Ashkenazi Genome Sequencing Project
• Segment Sharing and Population History
• Opportunities and Future Directions
Coverage by Shared Segments
A sequenced reference panel
What fraction of the genome
can we cover with shared
segments?
Partly sequenced genome
Impute
Full sequence
Partial sequence
Nearly-complete inferred sequence
The Era of Near-Complete Coverage
Mine public data?
Other studies?
Now
Phase II
Opportunities:
• Interpret personal genomes
o
Time-stamp rare mutations
• Cost-effective large-scale association studies
o
o
o
Resolve haplotypes
Impute SNP arrays or low-coverage sequences
Mapping rare variants/haplotypes
See Carmi et al., Genetics, 2013
for a theoretical analysis
The Era of Near-Complete Coverage
Time-stamp rare mutations
g
New algorithms
needed!
IBD segment
Mine public data?
Other studies?
Phase II
Now
Ashkenazi History
• Origin?
• Founder event?
• European gene flow:
o Where?
o When?
o How much?
• Relation to other Jews?
The Place of European Gene Flow
“Most of these theories … are
myths or speculation … based on
some vague or misunderstood
references. … It will probably be
impossible to say definitely where
the hundreds or thousands of Jews
in Poland in the 13th to 14th
centuries came from.”
B. Weinryb, The Jews of Poland, 1972
Approach
ME
EU
An Ashkenazi genome
x
xxxxx
EU
PC2
xxxx
xxxx
x
oooooo
xxxxxx
ME
xxxxxx
AJ
PC1
ME
EU
PC2
x
xxxxx
xxxxxx
ooo
xxxoxo
o
xxxxxx
xxxx
x
PC1
Johnson et al., 2011; Moreno-Estrada et al., 2013
x
xxxxx
PC2
oo
xxxxxx
oo
oo
xxxx
xxxxxx
xxxx
x
PC1
Preliminary Results
• Origin in the Levant
• Gene flow mostly from
West-Europe,
about 30 generations ago
• Sex-imbalanced history?
Summary
• It is important to study Ashkenazi genetics
• We sequenced 128 whole-genomes
• Useful for personal clinical genomics and imputation
• Segment sharing reveals a founder event and
suggests opportunities
My research statement
Acknowledgements
Itsik Pe’er’s lab:
James Xue, Ethan Kochav,
Shuo Yang, Pier Palamara,
Vladimir Vacic
Harvard University:
Peter Wilton, John Wakeley
Sheba Medical Center:
Eitan Friedman
TAGC consortium members:
Todd Lencz, Semanti Mukherjee (LIJMC)
Lorraine Clark, Xinmin Liu (CUMC)
Gil Atzmon, Harry Ostrer,
Danny Ben-Avraham (AECOM)
Inga Peter, Judy Cho (ISMMS)
Ariel Darvasi (HUJI)
Joseph Vijai (MSKCC)
Ken Hui (Yale)
VIB Ghent, Belgium
Funding:
Human Frontier Science program
Thank you for your attention!
Download