advertisement

Algorithms in Computational Biology 236522, Winter 2004-5 Home Assignment No. 4 Publication date: Due date: 20.12.04 4.1.05, noon 1. Recall question 1 from assignment 3. A new species of bacteria was found. The new bacteria can appear in three different colors – red, green and blue. Researchers found that there are 3 strains in the species, but were surprised to find that the color is not decided by the strain. In strain A, 80% of the bacteria are red, 15% are green and 5% are blue. In strains B and C the distribution is (0.1, 0.7, 0.2) and (0.1, 0.25, 0.65) respectively (preserving the order red, green, blue). Moreover, it was found that the bacteria can “switch” between strains, and that the transition probabilities from one strain to another depend on environmental conditions. Given a certain set of environmental conditions, there is a θAA chance that a cell from strain A will remain in that strain after 24 hours of incubation in the dark, and θAB and θAC chance that it will change to strain B and C respectively. In general, θXY is the probability that a cell will change from strain X to Y under the given conditions. A green cell was taken from a colony which is known to consist of 30% bacteria from strain A, 60% bacteria from strain B and 10% Bacteria of strain C. The cell was incubated in constant conditions for 7 days, and its color was checked every 24 hours. The results were red, green, blue, green, red, red, blue. What are the most likely transition probabilities between strains? Use the EM algorithm to estimate the transition probabilities, starting with θXY = 1/3 for every X,Y{A,B,C}, and perform 3 iterations. 2. D(p||q) is the relative entropy between p(x) and q(x) given in lecture. a. Claim: D is symmetric - for every p(x) and q(x), D(p||q) = D(q||p). Prove the claim or provide counter example. b. Given two probability functions q1(x) and q2(x), for every probability function p(x) we define the average relative entropy between p(x) and q1(x), q2(x) by: 1 D(p || q ) D(p || q ) 1 2 2 Find a function q(x) (not necessarily a probability function) such that for every p(x), q(x) is the “relative entropy average” of q1 and q2, defined by the next equation: D(p || q) 1 D(p || q1 ) D(p || q 2 ) 2 c. For each of the next claims prove it or provide counter example: The “relative entropy average” q(x) is: Claim 1: always a probability function. Claim 2: never a probability function. d. The definition of average relative entropy can be extended to n probability functions q1(x),…,qn(x) by: 1 n D(p || q ) n i 1 i Find a function q(x) which is the “relative entropy average” of q1,…, qn. Algorithms in Computational Biology 236522, Winter 2004-5 3. A certain plant species has purple flowers. During experiments done on these plants, after a few generations plants with 3 other flower colors appeared – red, blue and white. After an intensive research, it was shown that there are 2 genes R/r and B/b that determine the color of all flowers of each individual plant. Each one of these genes has a dominant allele (indicated by capital letter), and a recessive allele. Plants in which both genes have at least one dominant allele (R and B) have purple flowers. Plants in which gene R has a dominant allele, and gene B has two recessive alleles (bb) have red flowers. Plants in which gene B has a dominant allele, and gene R has two recessive alleles (rr) have Blue flowers. Plants in which both genes have no dominant allele (genotype rrbb) have white flowers. a. For each color write all possible genotypes. b. Given a population of plants with known flower color, describe a gene counting algorithm to find MLE estimation of the four allele frequencies (similar to the example in tutorial). c. Run the algorithm for a population of plants with the given flower colors: 300 purple, 60 red, 30 blue and 10 white. Find the allele frequencies with precision of two digits after the decimal point. Advise: You can perform the calculations in an excel worksheet, and submit the calculation steps in a well explained table. Good luck!