Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida Recall: IRiS Identification of Recombinations in Sequences IRiS is a computational method developed with biological insight detects evidence of historical recombinations minimizes number of recombinations in Ancestral Recombinational Graph (ARG) Recotypes Two chromosomes share a recombination if the junction is co-inherited. mutation edge recombination edge extant sequence Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 a b Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 r2 c a b Recotypes Two chromosomes share a recombination if the junction is co-inherited. r1 r2 … a 1 0 b 1 0 c 0 1 … r1 r2 c a b Validity of inferred recombinations Comparison with sperm typing Computer simulated recombinations in vitro Chr 1 near MS32 minisatellite Jeffreys et al. 2005 80 UK semen donor of North European origin - Sperm typing - LDhat and Phase (200 SNPs) HapMap 2 CEU population similar SNP density sperm typing LDhat Phase IRiS in silico HapMap 3 X chromosome data •Select 2 chromosomes at random. •Pick a random breakpoint. •Create a new chromosome. •Check if it is unique, add to the dataset. •Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes in silico HapMap 3 X chromosome data •Select 2 chromosomes at random. •Pick a random breakpoint. •Create a new chromosome. •Check if it is unique, add to the dataset. •Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes in silico HapMap 3 X chromosome data •Select 2 chromosomes at random. •Pick a random breakpoint. •Create a new chromosome. •Check if it is unique, add to the dataset. •Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes in silico HapMap 3 X chromosome data •Select 2 chromosomes at random. •Pick a random breakpoint. •Create a new chromosome. •Check if it is unique, add to the dataset. •Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes in silico HapMap 3 X chromosome data •Select 2 chromosomes at random. •Pick a random breakpoint. •Create a new chromosome. •Check if it is unique, add to the dataset. •Run IRiS on the dataset to see if the breakpoint is detected. Chromosomes in silico Chromosomes HapMap 3 X chromosome data •Select 2 chromosomes at random. •Pick a random breakpoint. •Create a new chromosome. IRiS •Check if it is unique, add to the dataset. •Run IRiS on the dataset to see if the breakpoint is detected. recombination detected? in silico Chromosomes HapMap 3 X chromosome data •Select 2 chromosomes at random. •Pick a random breakpoint. •Create a new chromosome. IRiS •Check if it is unique, add to the dataset. •Run IRiS on the dataset to see if the breakpoint is detected. 69% recombinations detected All detected recombinations detect the correct sequence No false positives recombination detected? Recombinomics Strong population structure Agreement with traditional methods FST vs. recombinational distance More informative than SNPs STRUCTURE PCA Regions 18 regions selected from HapMap 3 X-chromosome in males (to avoid phasing errors) 50 KB away from known CNV and SD (to avoid genotyping errors) 50 KB away from genes (to avoid selection) at least 80 SNPs Chromosomes: LWK(43), MKK (88), YRI (88), ASW (42), GIH (42), CHB (40), CHD (21), JPT(25), MEX(21), CEU (74), TSI (40) Analysis For each region IRiS inferred recotypes for each chromosome 5166 recombinations were inferred 3459 co-occurred in at least two chromosomes Recombination Chromosome … r1 r2 r3 r4 r5 r6 r3459 LK1 0 1 1 0 0 0 0 LK2 1 0 1 1 0 0 0 LK43 1 0 1 0 0 0 MK1 0 1 0 0 1 1 1 0 0 0 0 0 1 0 : : TI40 Analysis For each region IRiS inferred recotypes for each chromosome 5166 recombinations were inferred 3459 co-occurred in at least two chromosomes Recombination Chromosome … r1 r2 r3 r4 r5 r6 r3459 LK1 0 1 1 0 0 0 0 LK2 1 0 1 1 0 0 0 LK43 1 0 1 0 0 0 MK1 0 1 0 0 1 1 1 0 0 0 0 0 1 0 : : TI40 Recotype Agreement with LDhat recombination rate inferred by LDhat Each point represents a short haplotype segment in HapMap CEU population Spearman correlation = 0.711 pvalue <10-30 number of recombinations inferred by IRiS Agreement with LDhat recombination rate inferred by LDhat Each point represents a short haplotype segment in HapMap CEU population Spearman correlation = 0.711 pvalue <10-30 Correlation in hotspots c2 = 38.39 pvalue<6x10-10 number of recombinations inferred by IRiS Recombinational distance between populations Two populations genetically closer will share a higher number of recombinations Recombinational distance DAB = 1 - RAB RA + RB -RAB Correlation between FST distance and recombinational distance for the 18 region [0.35 – 0.75 ] with pvalues < 0.025 MDS All regions combined stress=6.1% PCA of population data Recall recotypes … r1 r2 r3 r4 r5 r6 r3459 LK1 0 1 1 0 0 0 0 LK2 1 0 1 1 0 0 0 LK43 1 0 1 0 0 0 MK1 0 1 0 0 1 1 1 0 0 0 0 0 1 0 : : TI40 PCA of population data Recall recotypes … r1 r2 r3 r4 r5 r6 r3459 LK1 0 1 1 0 0 0 0 LK2 1 0 1 1 0 0 0 LK43 1 0 1 0 0 0 MK1 0 1 0 0 1 1 1 0 0 0 0 0 1 0 r1 r2 r3 r4 r5 r6 LK 14 7 4 9 0 1 0 MK 1 4 7 0 5 7 24 0 1 7 1 0 0 1 : : TI40 … r3459 : TI PCA of population data The first two PCs capture 66.4% of the variance … r1 r2 r3 r4 r5 r6 r3459 LK 14 7 4 9 0 1 0 MK 1 4 7 0 5 7 24 0 1 7 1 0 0 1 : TI PCA of recotypes more on this later Recotypes vs. SNPs Due to ascertainment bias gene diversity does not reflect population structure results similar to Conrad 07 Percentage of variance SNPs Recotypes Across groups 9% 6% Within groups 4% 1% Within populations 87% 93% in agreement with Lewontin 72 Normalized comparison linearly scaled to [0,1] using 21 samples per population K=2 from SNPs to haplotypes to recotypes (a STRUCTURE comparison) SNPs haplotypes recotypes K=3 from SNPs to haplotypes to recotypes (a STRUCTURE comparison) SNPs haplotypes recotypes K=4 from SNPs to haplotypes to recotypes (a STRUCTURE comparison) SNPs haplotypes recotypes K=5 from SNPs to haplotypes to recotypes (a STRUCTURE comparison) SNPs haplotypes recotypes Africa within global genetic variation Structure k=4 minority African specific component Avg. Number of recombinations in 21 random chromsomes Out of Africa hypothesis Founder’s effect Genetic variation within Africa Structure k=5 Maasai specific minor component Subsaharan Maasai are distinct among Africans. African-American exhibit stronger recombinational affinity with African populations than European populations. (Parra 98) Genetic variation outside Africa Structure k=5 Avg. Number of recombinations in 21 random chromsomes Outside Africa, Gujarati and Japanese exhibit the highest and lowest number of recombinations respectively. Gujarati Indians show intermediate position between Europeans and East Asians. Venturing outside the X-chromosome Benefits The bigger picture More regions and hence more information Challenges Higher number of recombinations makes the picture murkier Phasing errors Regions 81 regions selected from HapMap 3 50 KB away from known CNV and SD (to avoid genotyping errors) 50 KB away from genes (to avoid selection) at least 200 SNPs 25 samples per population (each sample has two chromosomes) Analysis For each region IRiS inferred recotypes for each chromosome 34140 recombinations were inferred For each sample the two recotypes were merged. SNPs recotypes PCA plots Quantifying population structure PCA and by k nearest neighbors is used to predict population of every sample Perfectly classified classified with errors Africans (0,7) (4,3) ASW YRI MKK LKK Non- Africans GIH E. Asian MEX European (3,13) CHB+CHD (8,13) JPT CEU Misclassification by (recotypes, SNPs) TSI East Asian population Recotypes are more informative of underlying population structure. SNPs recotypes PCA plots in conclusion … Recotypes show strong agreement with in silico and in vetro recombination rates estimates are highly informative of the underlying population structure provide a novel approach to study the recombinational dynamics