experience5 - Broad Institute

advertisement
Advanced Population and Medical Genetics
EPI511, Spring 1, 2016
Experience 5
Please submit Python code (iPython notebook format preferred) or PERL code, and its output
for each of (1)-(4), on the course www site by 8:00am on Tue Mar 1.
Please indicate in your submission the number of hours you spent working on this experience.
This information will not affect your grade—only the average value across all students will be
shared with the instructor—but will help inform the design of future experiences.
Policy on group work: OK to discuss experiences with your colleagues, but each piece of code
that you write should be your own.
(1) Conduct a genome-wide scan for selection based on unusual population differentiation for
(a) CEU vs. TSI (assume genome-wide FST = 0.004), (b) CHB vs. JPT (assume FST = 0.007),
and (c) CEU vs. CHB (assume FST = 0.11). In each case, print output only for suggestive SNPs
attaining a χ2(1 dof) statistic > 20, with at most one most significant SNP per chromosome. Print
the allele frequencies in each population as well as the χ2(1 dof) statistic, and indicate which
signals are genome-wide significant (P-value < 5 x 10-8). Discuss results of (a) vs. (b) vs. (c).
(2) (a) For each SNP printed as output of (1) (a), repeat the computation assuming that the
same CEU and TSI allele frequencies were observed in large sample size (N>>1/FST). Discuss.
(b) For each SNP printed as output of (1) (a), repeat the computation using CEU vs. YRI
(assume FST = 0.16). Discuss. (c) Repeat (b) assuming that the same CEU and YRI allele
frequencies were observed in large sample size (N>>1/FST). Discuss.
(3) How far does LD (r2>0.5) with LCT SNP rs13404551 on chr 2 span in the CEU population
(in either chromosomal direction)? Repeat the computation for the populations TSI, CHB, YRI.
Why does LD vary across these populations?
(4) (a) Negative selection due to cystic fibrosis: what will the avg local ancestry (all 3 ancestries)
of Puerto Ricans be at the CFTR locus after many generations of admixture, based on Puerto
Rican continental ancestry proportions from Week 2 slides and CFTR allele frequency of 2% in
European populations? (b) Negative selection due to sickle-cell anemia: what will the average
local ancestry (all 3 ancestries) of Mexican Americans be at the HBB locus after many
generations of admixture, based on Mexican American continental ancestry proportions from
Week 2 slides and HBB allele frequency of 5% in African populations?
Possible topics for short Research Paper (an aggregate list of suggested topics will be provided
on Feb 23. At that time, each student should choose one topic from the aggregate list.):
• Using CEU and TSI HapMap3 genotypes, simulate a phenotype in which the effect size is
systematically correlated to the allele frequency difference between CEU and TSI, as would be
expected under a scenario of selection for different phenotypes in different environments (see
Turchin et al. 2012 Nat Genet). Use theory and simulations to evaluate the power to detect
such an effect by analyzing association results (after correction for population stratification) at
top associated SNPs (as in Turchin et al.), at a range of parameter settings. Then, extend the
method to use association results at a larger set of SNPs (possibly even all SNPs) instead of
just the top associated SNPs, and evaluate how much this improves power to detect selection.
Note: it is ok to optimistically assume in this problem that correction for population stratification
(e.g. using explicit CEU and TSI ancestry labels) is fully effective in removing spurious signals.
• Suppose that you are analyzing data from 2 populations that admixed g generations ago.
Consider a SNP that had allele frequency p1 in POP1 and p2 in POP2 at the time of admixture.
Suppose that the reference allele is selectively advantageous in the admixed population. Define
selection coefficient s as the relative fitness of the reference vs. variant allele per generation in
the admixed population. Let N be the sample size analyzed from the admixed population, and
let θ and 1−θ denote the ancestry proportions from POP1 and POP2 in the admixed population.
Use theory and simulations to investigate the power of an approach for detecting the action of
natural selection via searching for unusual deviations in local ancestry in the set of N samples.
Provide quantifications of (1) power to detect selection against the sickle-cell allele in African
Americans with European admixture 6 generations ago, and (2) power to detect selection on a
beneficial pigmentation allele in Europeans with Neanderthal admixture 1,500 generations ago.
• Or, feel free to design your own research topic.
Download