Recombination_what_we_have_learnt

advertisement
What has variation data taught
us about the biology of
recombination?
Simon Myers
Rory Bowden, Afidalina Tumian, Ronald
Bontrop, Colin Freeman, Tammie MacFie, Gil
McVean, Peter Donnelly
Recap: Composite likelihood
results
Individuals
Loci
1
0
1
1
1
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1
1
1
1
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
1
1
0
0
0
1
0
1
0
0
0
1
0
1
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
1
0
1
0
0
0
1
0
1
0
0
0
1
1
1
1
0
0
0
0
0
1
1
1
0
0
1
1
0
0
0
1
0
0
0
1
0
1
0
0
0
1
1
1
0
0
1
1
0
1
0
0
0
1
0
1
0
0
0
1
0
1
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
1
0
1
0
0
0
0
1
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
1
0
0
1
0
0
1
1
0
0
1
0
0
1
1
1
1
0
1
0
0
0
1
0
1
1
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
0
0
1
0
0
1
0
0
0
1
0
0
1
1
1
1
0
1
0
0
0
1
0
0
1
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
0
0
1
0
0
1
1
0
0
1
0
0
1
1
1
1
0
1
0
0
0
1
0
0
1
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
0
0
1
0
0
1
1
0
0
1
0
0
1
1
1
1
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0
1
0
1
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
0
0
0
1
0
1
0
1
1
0
0
1
0
0
1
1
0
0
1
0
1
1
1
1
1
1
1
1
0
0
1
0
0
1
0
0
0
1
0
1
1
0
0
0
0
0
1
0
0
0
1
0
1
0
1
1
0
0
1
0
0
1
1
0
0
1
0
1
1
1
1
1
1
1
1
0
0
1
0
0
1
0
0
0
1
0
0
0
?
?
0
1
0
0
?
?
0
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1
1
1
0
0
0
0
0
1
1
0
1
1
0
0
0
1
0
0
0
0
1
0
1
1
0
1
1
0
0
0
0
0
1
1
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
1
1
1
0
0
1
0
1
1
0
0
0
0
0
1
1
0
1
?
?
?
?
?
?
?
?
0
0
0
0
1
1
1
1
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
0
0
0
0
1
0
0
1
1
1
1
0
0
0
0
?
0
0
1
0
0
1
1
0
0
1
0
1
0
0
0
0
1
1
0
0
0
0
0
1
0
0
0
1
0
1
1
1
0
0
1
0
1
1
0
0
0
0
0
1
1
0
1
?
?
?
?
?
?
?
?
0
0
0
0
?
1
1
1
0
0
1
0
0
0
0
1
0
0
0
1
0
1
1
1
1
0
1
0
0
1
0
0
0
0
0
1
1
0
1
?
?
?
?
?
?
?
?
0
0
0
0
1
1
0
1
0
0
1
1
0
1
1
0
0
0
1
0
1
0
0
0
0
1
0
0
0
0
1
1
1
0
1
0
0
1
0
0
1
1
0
0
1
0
1
1
1
1
1
0
0
1
0
1
1
0
1
0
1
1
0
0
0
1
0
1
0
0
0
0
1
0
0
0
0
1
1
1
0
1
0
0
1
0
0
1
1
0
0
1
0
1
1
1
1
1
0
0
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
1
0
1
0
0
1
0
1
1
0
0
1
0
0
0
1
1
1
1
0
0
1
1
1
0
0
1
1
0
0
0
1
1
0
1
0
0
1
0
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
• Statistical algorithms to
estimate historical rates,
and identify hotspots
– Applied genome-wide
– Kilobase scale resolution
0
0
0
0
0
0
1
1
1
0
0
1
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
1
0
0
1
1
1
0
0
0
0
0
0
0
1
1
0
1
0
0
1
0
0
0
0
1
0
1
0
0
0
1
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
1
1
0
1
0
0
0
0
0
1
0
0
0
1
1
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
0
1
1
1
1
0
1
0
0
0
0
0
1
0
0
0
1
1
0
1
0
0
1
1
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
1
0
0
0
1
1
1
1
1
1
1
0
0
0
0
1
1
1
0
0
1
1
0
1
1
0
1
1
0
0
0
0
0
0
0
0
0
1
0
1
1
0
0
0
0
1
1
0
1
1
1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
1
1
0
0
1
0
0
1
0
0
1
1
1
0
1
0
1
1
1
0
1
0
0
1
0
1
1
0
(Myers et al. 2005)
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
0
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
1
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
0
0
1
0
0
0
0
1
0
0
0
1
0
1
1
1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
1
1
0
0
1
0
0
1
0
0
1
1
1
0
1
0
0
1
1
0
1
0
0
1
0
1
1
0
1
0
1
1
1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
1
1
0
0
1
0
0
1
0
0
0
1
1
0
1
0
0
1
1
0
1
0
0
1
0
1
1
0
1
0
1
1
1
0
0
0
0
0
0
0
0
0
1
1
0
1
0
1
1
0
0
1
0
0
1
0
0
1
1
1
0
1
0
0
1
?
0
1
0
0
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
?
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
1
1
1
0
0
0
0
0
1
1
0
1
1
0
0
0
1
0
1
1
0
0
1
0
0
1
0
1
0
0
1
0
1
?
0
0
1
0
0
1
0
1
0
1
1
0
0
1
1
1
0
0
0
0
0
1
1
0
1
1
0
0
0
1
0
1
1
0
0
1
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
1
0
1
1
0
0
0
0
1
1
0
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
1
1
1
1
0
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
0
1
1
0
0
0
0
0
0
?
0
0
0
1
0
0
0
1
0
1
1
0
0
0
0
0
1
1
0
0
0
0
0
1
0
1
0
1
0
1
0
1
1
0
0
1
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
0
1
1
0
0
0
1
0
1
0
1
1
1
0
0
0
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
1
0
1
0
1
0
1
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
1
0
1
0
0
0
1
0
0
0
1
1
0
1
0
1
0
1
1
0
1
1
1
0
0
1
1
0
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
?
0
0
?
?
?
?
1
0
1
0
0
0
1
0
0
1
0
0
0
0
0
1
0
1
0
0
0
0
1
?
?
0
0
1
0
1
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
1
1
0
1
1
1
0
0
0
1
1
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
1
0
1
1
0
0
0
0
0
1
1
0
1
1
1
0
0
0
1
1
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
1
1
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
1
1
1
0
0
1
0
0
1
0
0
1
1
0
0
1
1
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
1
0
0
1
1
0
0
0
0
0
0
0
• Model-based inference
from linkage
disequilibrium data (LD)
– coalescent model
1
1
0
1
0
0
0
0
0
1
0
0
0
0
0
0
1
0
1
1
1
0
?
?
?
?
?
?
0
0
0
1
0
1
1
1
0
0
1
1
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
1
1
0
0
0
?
0
0
1
1
1
0
1
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
1
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
1
1
0
1
1
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
0
0
0
1
1
0
1
1
0
1
1
1
1
0
1
0
0
1
0
0
0
0
0
1
1
0
1
0
0
1
?
1
0
1
1
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Recombination questions
• Human recombination is poorly understood despite intense
work
• Recombination clusters into 1-2kb “hotspots” – why?
• Why are hotspots where they are in the genome?
– Primary DNA sequence?
– Epigenetics?
• What biological machinery produces hotspots?
• How are hotspots evolving?
32,996 “HapMap” hotspots
• These hotspots account for 50-70% of all human recombination
• Why are they where they are?
• We can look at the fraction of a genome that is “G”
or “C” in a region
Also see weak correlation with e.g. positions of genes
Are there any stronger predictive features?
Broad scale sequence features and
recombination
THE1B
(LTR of retrotransposon)
Use >20,000 hotspots localized to within 5kb
For each, create a matched “coldspot”
Compare sequence features
THE1B: Found in 1196 hotspots versus 606 coldspots (p<<10-20)
AluY: Found in 3635 hotspots versus 3262 coldspots (p=7x10-5)
A motif for human hotspots
•
•
•
Compared primary DNA sequence at 30,000 human hotspots and matched coldspots
THE1 repeats in hotspots
THE1 repeats in coldspots
...CTTCCGCTATGATTGTGAGGCCTCCCTAGCCATGTGGAACTGTGAGCCCATT...
...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATT...
...CTTCCGCCATGATTGTGAGGCCTCCCTAGCCATGTGGAACTGTGAGTCCATT...
...CTTCCGTTATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAATCCATT...
...CTTCCGCCATGATTGTGAGGCCTCCCTAGCCACGTGGAAC-GTGAGTCCATT...
...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGGACTGTGAGTCCATT...
...CATCCGCCATGATTGTGAGGCCTCCCTAGCCACGTGGAACTGAGAGTCCATT...
...CTTCCGCC-TGATTCTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATT...
...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGGGGAACTGTGAGTCCATT...
...CTTTCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGC-TGTCCATT...
Looked at all “words” of length 5-9 (e.g. 131072 possible 9-mers), refined results
Identified a 13-bp motif, CCNCCNTNNCCNC (Myers et al. 2008)
Average rates around the motif
Penetrance >60%
3-5% of hotspots
•
•
•
Penetrance 7.5%
5% of hotspots
Confirmed via sperm studies, revealing disruption of first 7-bp part of
motif disrupts hotspot activity (Neumann and Jeffreys 2002)
Active on multiple backgrounds (e.g. THE1, L2, Alu repeats and unique DNA…)
Plays a role at c. 43% of hotspots identified through LD, or directly through sperm typing
The motif is actually longer
•
•
•
Based on examining only non-repeat DNA in hotspots
Independent of results on previous slide
Region that matters >30bp
Recombination and human disease:
X-linked ichthyosis
(Myers et al. 2008)
Deletion breakpoint hotspot (Van Esch et al. 2005)
The breakpoint hotspot contains the greatest concentration of the 13-bp motif,
within a segmental duplication, anywhere in the entire genome
The motif is associated with NAHR syndromes
Multiple genomic disorders are caused by the same phenomenon: “nonallelic homologous recombination” (NAHR)
Rearrangement endpoints are consistently clustered into narrow hotspots:
• X-linked ichthyosis
• Charcot-Marie-Tooth disease (CMT1A)
• NF1
• Sotos syndrome
• Smith-Magenis syndrome
• Williams-Beuren syndrome
The motif is present, close to breakpoint hotspots, in each case
(p=0.00055)
A ‘common deletion’ in
mitochondria occurs at the motif
Myers et al (2008)
What binds the motif?
•
•
•
•
3-bp periodicity suggests by a “zinc finger” (ZF) protein with at least 12 zinc fingers
(Myers et al. 2008)
For genes coding for ZF proteins, we can predict their binding target bioinformatically
(Persikov et al. 2009)
Searched systematically
– Zinc finger protein database of 691 C2H2 ZF proteins
– Perform in silico binding predictions
Look for matches to 13-bp motif, degeneracy (Myers et al. 2009)
PRDM9 is unique candidate for the motif
binding protein
PRDM9 binding of the motif
Motif identified by hotspotcoldspot comparison
Bioinformatic prediction of
PRDM9 binding “target”
ZF part of PRDM9. 13 zinc
fingers, one separated
(showing four codons in each
zinc finger that determine
binding target)
(Myers et al. 2009)
Details of PRDM9
•
Independent work by two additional groups confirms that PRDM9 is a gene that directly
determines hotspot locations in both humans and mice
–
–
–
•
PRDM9 puts an epigenetic mark on the histone DNA packaging
–
–
–
•
Mapped a gene in mice, meaning different inbred strains possess different hotspot positions, to PRDM9
Baudat et al. (2009), Parvanov et al. (2009), Myers et al. (2009)
Gel shift assays demonstrate PRDM9 really does bind the predicted motif: Baudat et al. (2009)
H3K4 trimethylation
The identical mark is used by yeast to mark hotspots (Borde et al. 2009)
Conservation over >1 billion years of evolution
In mice
–
–
–
Different PRDM9 types mean different hotspot positions (Buard et al. 2009; Baudat et al. 2009)
Prdm9 expressed only in meiotic prophase (Hayashi et al. 2005)
Prdm9 -/- mutants infertile,fail to repair DSBs (Hayashi et al. 2005)
Baudat et al. (2009)
Percent usage of LD hotspots
•
Considerable variation in PRDM9 in
humans, which influences the
usage of hotspots as defined from
LD data
•
Different humans have different
hotspot
How are hotspots evolving?
Hotspots are radically different between humans and
chimps
LDhot hotspots
Human
Chimp
LDhat rate estimates
Winckler et al. (2005)
PRDM9 is radically different in chimpanzees
• Sharing between
human and chimps:
1 of 13 zinc fingers
• Least shared of all
544 orthologous ZF
protein pairs with at
least two distinct zinc
fingers in each species
• Patterns in multiple
species indicate
positive selection
(Oliver et al. 2009)
• One of the fastest
evolving genes in the
human genome
Crossover activity at motif is human-specific
Human motif sites
Chimp motif sites
THE1 repeats
L2 repeats
Position relative to motif
Position relative to motif
694 SNPs, 36 western chimpanzees
16 THE1 regions, 6 L2 regions
HapMap data, 210 humans
p=0.0007
Conclusions and current directions
• Why are hotspots where they are in the genome?
–
–
–
–
–
PRDM9 has sequence specific binding
Specifies narrow hotspot sites
Targets primary DNA sequence but makes an epigenetic “mark”
Only 40% of hotspots??
Looking at PRDM9 binding in vivo using Chip-seq
• PRDM9 is evolving like crazy!
–
–
–
–
Between species
Within humans
Within mice, chimps,….
Resequencing data for 10 chimpanzees to define their hotspots
• PRDM9 is the only mapped speciation gene in any mammal
– Hybrid sterility in mouse (Mihola et al. 2009)
– What is the link between recombination and speciation?
– Does PRDM9 evolution, in general, lead to breeding barriers between species?
• Recombination and the motif implicated in multiple diseases
– PRDM9 variation suggests different people susceptible to different genomic disorders
Download