Practical_recombination

DTC module in Bioinformatics Practical on recombination I Gil McVean and Simon Myers, Tuesday June 14th 2005 In this practical you are going to learn about the structure of gene histories that include recombination and devise a simple, non-parametric method for estimating how much historical recombination has influenced genetic variation in a real data set. Simulating recombination histories The Hudson animator at http://www.coalescent.dk allows you to visualise coalescent simulations with recombination and an arbitrary number of chromosomes. We will use this to get an intuitive idea of how much recombination influences gene histories. Find the web site and follow the links to the Hudson Animator with recombination. You can set both the number of sequences in the sample, n, and the population-scaled recombination rate,  (rho) = 4Ner, where Ne is the effective population size and r is the probability of a recombination event occurring across the whole region in one meiosis. When you press recalc you get an animation that shows you the ancestral recombination graph (ARG) with samples (red), coalescent events (green) and recombination events (blue). If you hover the pointer over the coalescent and recombination events, the little panel at the bottom right shows you details of the ancestral material involved in each lineage and the time at which the event occurred (scaled in units of 2Ne generations). Simulate a few histories with n=rho=2 so you get familiar with the setup. You can also click the Trees tab, which gives you a different view of the ARG. The pointer will start in the middle of the ‘left-most’ tree, with blue triangles indicating the location of each recombination event along the sequence. If you click anywhere along the segment, you will be shown the tree at each point along the sequence. Simulate a few histories with n=rho=4 to familiarise yourself with this alternative view of the ARG. Now use the simulator to get approximations to the following questions. See if you can get exact answers using coalescent theory (not all questions have simple analytical expressions for their solution, but you can probably come up with a good approximation!). a) How many coalescent events are there in an ARG if there are no recombination events? b) How many coalescent events are there in an ARG if there are r recombination events? c) For a recombination rate parameter (rho) of 1 and two sequences (n=2), what is the probability that there are no recombination events? d) For n=4 and rho=1, what is the probability of there being a single recombination event in the ARG? If you look at just those simulations where you get a single recombination event, what proportion of the ARGs are such that you could place mutations (as many as you want) on the graph to make the recombination event detectable by the four-gamete test (see below if you can’t remember what this is)? e) Fix rho at 1 and estimate the average number of recombination events in ARGs for n=2, n=4, n=10 and n=20. How do you think the expected number of recombination events scales with the number of sequences? Detecting recombination events in empirical data If the mutation rate is very low, such that repeat or back mutation is very unlikely, every time all four possible combinations of two alleles at two loci are observed (the four-gamete test) we can be sure that a recombination event must have occurred between them. In the example below, the data on the left (A) show a detectable recombination and the data on the right (B) do not. A) B) You are going to develop (and implement) an algorithm for calculating a lower bound for the minimum number of recombination events that must have occurred in the history of our sample of sequences we will call this Rmin (Hudson and Kaplan 1985). In the above examples, it is obvious that (A) requires at least one recombination and (B) zero. In the examples below, how many do you need? A) B) C) For larger data sets we need to develop a systematic way of calculating Rmin. A good starting point is to identify all pairs of SNPs for which a recombination event is detectable between them using the four-gamete test. For example, below, the lines indicate the pairs of ‘incompatible sites’ (those which have a detectable recombination event). To find Rmin, we need to put the fewest possible recombination events down such that all pairwise constraints are satisfied (a linear programming problem). What is the answer for the above example? Can you see how to generalise the algorithm to an arbitrary number of SNPs? If not, consider the following two examples. A) B) Note that (B) is identical to (A), except that it has an extra SNP at the right. In (A), we only need a single recombination event, but when we add the extra SNP to make (B), we generate an incompatibility between the 2nd and the 4th SNPs, which could not be the result of the same recombination event that generated the incompatibility between the 1 st and 2nd SNPs (note that the incompatibility between the 1st and the 4th SNPs could already be explained). More generally, by thinking about adding in one SNP at a time, and looking to see if it generates incompatibilities that are not currently explained, we can work progressively along the sequence. Explain why the formula R min( i )  max j i R min( j )  I ij  provides a solution to Rmin; (i and j are indices for the SNPs and Iij is an indicator function that takes the value 0 if there is no detectable recombination between SNPs i and j and 1 if there is). Implement an algorithm for calculating Rmin from empirical data (alleles coded as 0s and 1s) and apply it to the human data set ceph7q31.txt you can find on the website (from the HapMap project). www.stats.ox.ac.uk/~mcvean/DTC/BIOINF/Practicals/ceph7q31.txt The first line tells you how many sequences and SNPs there are, the second line gives the positions of the SNPs and the subsequent lines are the data for each chromosome. Do recombination events occur uniformly along the sequence or do they cluster into particular regions?

Practical_recombination

Related documents

Products

Support

Practical_recombination

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib