0 - dimacs

advertisement
Recombination based
population genomics
Jaume Bertranpetit
Marta Melé
Francesc Calafell
Asif Javed
Laxmi Parida
Recall: IRiS
Identification of Recombinations in Sequences
IRiS
 is a computational method developed with
biological insight
 detects evidence of historical recombinations
 minimizes number of recombinations in
Ancestral Recombinational Graph (ARG)
Recotypes
Two chromosomes share a recombination if
the junction is co-inherited.
mutation edge
recombination
edge
extant
sequence
Recotypes
Two chromosomes share a recombination if
the junction is co-inherited.
r1
a
b
Recotypes
Two chromosomes share a recombination if
the junction is co-inherited.
r1
r2
c
a
b
Recotypes
Two chromosomes share a recombination if
the junction is co-inherited.
r1 r2 …
a
1
0
b
1
0
c
0
1
…
r1
r2
c
a
b
Validity of inferred recombinations

Comparison with sperm typing

Computer simulated recombinations
in vitro
Chr 1 near MS32 minisatellite
Jeffreys et al. 2005
80 UK semen donor of North
European origin
- Sperm typing
- LDhat and Phase (200 SNPs)
HapMap 2
CEU population
similar SNP density
sperm
typing
LDhat
Phase
IRiS
in silico
HapMap 3 X chromosome data
•Select 2 chromosomes at random.
•Pick a random breakpoint.
•Create a new chromosome.
•Check if it is unique, add to the dataset.
•Run IRiS on the dataset to see if the breakpoint is
detected.
Chromosomes
in silico
HapMap 3 X chromosome data
•Select 2 chromosomes at random.
•Pick a random breakpoint.
•Create a new chromosome.
•Check if it is unique, add to the dataset.
•Run IRiS on the dataset to see if the breakpoint is
detected.
Chromosomes
in silico
HapMap 3 X chromosome data
•Select 2 chromosomes at random.
•Pick a random breakpoint.
•Create a new chromosome.
•Check if it is unique, add to the dataset.
•Run IRiS on the dataset to see if the breakpoint is
detected.
Chromosomes
in silico
HapMap 3 X chromosome data
•Select 2 chromosomes at random.
•Pick a random breakpoint.
•Create a new chromosome.
•Check if it is unique, add to the dataset.
•Run IRiS on the dataset to see if the breakpoint is
detected.
Chromosomes
in silico
HapMap 3 X chromosome data
•Select 2 chromosomes at random.
•Pick a random breakpoint.
•Create a new chromosome.
•Check if it is unique, add to the dataset.
•Run IRiS on the dataset to see if the breakpoint is
detected.
Chromosomes
in silico
Chromosomes
HapMap 3 X chromosome data
•Select 2 chromosomes at random.
•Pick a random breakpoint.
•Create a new chromosome.
IRiS
•Check if it is unique, add to the dataset.
•Run IRiS on the dataset to see if the breakpoint is
detected.
recombination
detected?
in silico
Chromosomes
HapMap 3 X chromosome data
•Select 2 chromosomes at random.
•Pick a random breakpoint.
•Create a new chromosome.
IRiS
•Check if it is unique, add to the dataset.
•Run IRiS on the dataset to see if the breakpoint is
detected.
69% recombinations detected
All detected recombinations detect the correct sequence
No false positives
recombination
detected?
Recombinomics

Strong population structure

Agreement with traditional methods


FST vs. recombinational distance
More informative than SNPs

STRUCTURE

PCA
Regions
18 regions selected from HapMap 3




X-chromosome in males
(to avoid phasing errors)
50 KB away from known CNV and SD
(to avoid genotyping errors)
50 KB away from genes
(to avoid selection)
at least 80 SNPs
Chromosomes:
LWK(43), MKK (88), YRI (88), ASW (42), GIH (42), CHB (40),
CHD (21), JPT(25), MEX(21), CEU (74), TSI (40)
Analysis
For each region IRiS inferred recotypes for
each chromosome


5166 recombinations were inferred
3459 co-occurred in at least two chromosomes
Recombination
Chromosome
…
r1
r2
r3
r4
r5
r6
r3459
LK1
0
1
1
0
0
0
0
LK2
1
0
1
1
0
0
0
LK43
1
0
1
0
0
0
MK1
0
1
0
0
1
1
1
0
0
0
0
0
1
0
:
:
TI40
Analysis
For each region IRiS inferred recotypes for
each chromosome


5166 recombinations were inferred
3459 co-occurred in at least two chromosomes
Recombination
Chromosome
…
r1
r2
r3
r4
r5
r6
r3459
LK1
0
1
1
0
0
0
0
LK2
1
0
1
1
0
0
0
LK43
1
0
1
0
0
0
MK1
0
1
0
0
1
1
1
0
0
0
0
0
1
0
:
:
TI40
Recotype
Agreement with LDhat
recombination rate inferred by LDhat
Each point represents a short haplotype segment
in HapMap CEU population
Spearman
correlation
= 0.711
pvalue <10-30
number of recombinations inferred by IRiS
Agreement with LDhat
recombination rate inferred by LDhat
Each point represents a short haplotype segment
in HapMap CEU population
Spearman
correlation
= 0.711
pvalue <10-30
Correlation in
hotspots
c2 = 38.39
pvalue<6x10-10
number of recombinations inferred by IRiS
Recombinational distance between
populations
Two populations genetically closer will share a
higher number of recombinations
Recombinational distance
DAB = 1 -
RAB
RA + RB -RAB
Correlation between
FST distance and
recombinational distance
for the 18 region
[0.35 – 0.75 ]
with pvalues < 0.025
MDS All regions combined stress=6.1%
PCA of population data
Recall recotypes
…
r1
r2
r3
r4
r5
r6
r3459
LK1
0
1
1
0
0
0
0
LK2
1
0
1
1
0
0
0
LK43
1
0
1
0
0
0
MK1
0
1
0
0
1
1
1
0
0
0
0
0
1
0
:
:
TI40
PCA of population data
Recall recotypes
…
r1
r2
r3
r4
r5
r6
r3459
LK1
0
1
1
0
0
0
0
LK2
1
0
1
1
0
0
0
LK43
1
0
1
0
0
0
MK1
0
1
0
0
1
1
1
0
0
0
0
0
1
0
r1
r2
r3
r4
r5
r6
LK
14
7
4
9
0
1
0
MK
1
4
7
0
5
7
24
0
1
7
1
0
0
1
:
:
TI40
…
r3459
:
TI
PCA of population data
The first two PCs
capture 66.4% of the
variance
…
r1
r2
r3
r4
r5
r6
r3459
LK
14
7
4
9
0
1
0
MK
1
4
7
0
5
7
24
0
1
7
1
0
0
1
:
TI
PCA of recotypes
 more on this later
Recotypes vs. SNPs
Due to ascertainment bias gene diversity does
not reflect population structure
results similar to Conrad 07
Percentage of variance
SNPs
Recotypes
Across groups
9%
6%
Within groups
4%
1%
Within
populations
87%
93%
in agreement with Lewontin 72
Normalized comparison linearly scaled
to [0,1] using 21 samples per population
K=2
from SNPs to haplotypes to recotypes
(a STRUCTURE comparison)
SNPs
haplotypes
recotypes
K=3
from SNPs to haplotypes to recotypes
(a STRUCTURE comparison)
SNPs
haplotypes
recotypes
K=4
from SNPs to haplotypes to recotypes
(a STRUCTURE comparison)
SNPs
haplotypes
recotypes
K=5
from SNPs to haplotypes to recotypes
(a STRUCTURE comparison)
SNPs
haplotypes
recotypes
Africa within global genetic variation
Structure k=4
minority African
specific component
Avg. Number of recombinations
in 21 random chromsomes
Out of Africa hypothesis
Founder’s effect
Genetic variation within Africa
Structure k=5
Maasai specific
minor component
 Subsaharan Maasai are distinct among Africans.
 African-American exhibit stronger recombinational
affinity with African populations than European
populations. (Parra 98)
Genetic variation outside Africa
Structure k=5
Avg. Number of recombinations
in 21 random chromsomes
 Outside Africa, Gujarati
and Japanese exhibit the
highest and lowest number of
recombinations respectively.
 Gujarati Indians show
intermediate position between
Europeans and East Asians.
Venturing outside the X-chromosome

Benefits



The bigger picture
More regions and hence more information
Challenges


Higher number of recombinations makes the
picture murkier
Phasing errors
Regions
81 regions selected from HapMap 3




50 KB away from known
CNV and SD
(to avoid genotyping errors)
50 KB away from genes
(to avoid selection)
at least 200 SNPs
25 samples per population
(each sample has two
chromosomes)
Analysis

For each region IRiS inferred recotypes for each
chromosome


34140 recombinations were inferred
For each sample the two recotypes were merged.
SNPs
recotypes
PCA plots
Quantifying population structure

PCA and by k nearest neighbors is used to
predict population of every sample
Perfectly
classified
classified
with errors
Africans
(0,7)
(4,3)
ASW YRI
MKK
LKK
Non- Africans
GIH E. Asian MEX
European
(3,13)
CHB+CHD
(8,13)
JPT
CEU
Misclassification by
(recotypes, SNPs)
TSI
East Asian population
Recotypes are more informative of underlying
population structure.
SNPs
recotypes
PCA plots
in conclusion …
Recotypes
 show strong agreement with in silico and
in vetro recombination rates estimates
 are highly informative of the underlying
population structure
 provide a novel approach to study the
recombinational dynamics
Download