Algorithms in Computational Biology 236522, Winter 2004-5
Home Assignment No. 4
Publication date:
Due date:
4.1.05, noon
1. Recall question 1 from assignment 3.
A new species of bacteria was found. The new bacteria can appear in three different colors
– red, green and blue. Researchers found that there are 3 strains in the species, but were
surprised to find that the color is not decided by the strain. In strain A, 80% of the bacteria
are red, 15% are green and 5% are blue. In strains B and C the distribution is (0.1, 0.7, 0.2)
and (0.1, 0.25, 0.65) respectively (preserving the order red, green, blue).
Moreover, it was found that the bacteria can “switch” between strains, and that the
transition probabilities from one strain to another depend on environmental
conditions. Given a certain set of environmental conditions, there is a θAA chance that a
cell from strain A will remain in that strain after 24 hours of incubation in the dark, and θAB
and θAC chance that it will change to strain B and C respectively. In general, θXY is the
probability that a cell will change from strain X to Y under the given conditions.
A green cell was taken from a colony which is known to consist of 30% bacteria from
strain A, 60% bacteria from strain B and 10% Bacteria of strain C. The cell was incubated
in constant conditions for 7 days, and its color was checked every 24 hours. The results
were red, green, blue, green, red, red, blue. What are the most likely transition probabilities
between strains?
Use the EM algorithm to estimate the transition probabilities, starting with θXY = 1/3 for
every X,Y{A,B,C}, and perform 3 iterations.
2. D(p||q) is the relative entropy between p(x) and q(x) given in lecture.
a. Claim: D is symmetric - for every p(x) and q(x), D(p||q) = D(q||p).
Prove the claim or provide counter example.
b. Given two probability functions q1(x) and q2(x), for every probability function p(x) we
define the average relative entropy between p(x) and q1(x), q2(x) by:
1 D(p || q )  D(p || q ) 
Find a function q(x) (not necessarily a probability function) such that for every p(x),
q(x) is the “relative entropy average” of q1 and q2, defined by the next equation:
D(p || q)  1 D(p || q1 )  D(p || q 2 ) 
c. For each of the next claims prove it or provide counter example: The “relative entropy
average” q(x) is:
Claim 1: always a probability function.
Claim 2: never a probability function.
d. The definition of average relative entropy can be extended to n probability functions
q1(x),…,qn(x) by:
D(p || q )
i 1
Find a function q(x) which is the “relative entropy average” of q1,…, qn.
Algorithms in Computational Biology 236522, Winter 2004-5
3. A certain plant species has purple flowers. During experiments done on these plants, after a
few generations plants with 3 other flower colors appeared – red, blue and white. After an
intensive research, it was shown that there are 2 genes R/r and B/b that determine the color
of all flowers of each individual plant. Each one of these genes has a dominant allele
(indicated by capital letter), and a recessive allele. Plants in which both genes have at least
one dominant allele (R and B) have purple flowers. Plants in which gene R has a dominant
allele, and gene B has two recessive alleles (bb) have red flowers. Plants in which gene B
has a dominant allele, and gene R has two recessive alleles (rr) have Blue flowers. Plants in
which both genes have no dominant allele (genotype rrbb) have white flowers.
a. For each color write all possible genotypes.
b. Given a population of plants with known flower color, describe a gene counting
algorithm to find MLE estimation of the four allele frequencies (similar to the example
in tutorial).
c. Run the algorithm for a population of plants with the given flower colors: 300 purple,
60 red, 30 blue and 10 white. Find the allele frequencies with precision of two digits
after the decimal point.
Advise: You can perform the calculations in an excel worksheet, and submit the
calculation steps in a well explained table.
Good luck!