Bio/statistics Handout 6: Kernel and image in biology Imagine a complicated cellular process that involves some n genes that are presumably ‘turned on’ by some N other genes that act at some earlier time. We want to test whether the effect of the early genes on the late genes involves a complicated synergy, or whether their affects simply add. To do this, one would try to derive the consequences of one or the other of these possibilities and then devise an experiment to see if the predicted consequences arise. One way to test this is to vary the expression level of the early genes from their normal levels (either + or -) and see how the variation in the level of expression of the late genes from their normal levels changes accordingly. In this regard, when a gene is expressed, its genetic code (a stretch of the DNA molecule) is used to code for a molecule much like DNA called mRNA. Here, the ‘m’ stands for ‘messenger’ and the RNA part is a sequence of small molecules strung end to end. Any of these can be one of four and the resulting sequence along the RNA string is determined by the original sequence on the coding stretch of DNA. This messenger RNA subsequently attaches to the protein making part of a cell (a ‘ribosome’) where its sequence is used to construct a particular protein molecule. In any event, the level of any given mRNA can be measured at any given time, and this level serves as a proxy for the level of expression of the gene that coded it in the first place. One more thing to note: The level of expression of a gene can often be varied with some accuracy in an experiment by inserting into the cell nucleus tailored molecules to either promote or repress the gene expression. Such is the magic of modern biotechnology. To make a prediction that is testable by measuring early and late gene expression, let us suppose that the affects of the early genes are simply additive and see where this assumption leads. For this purpose, label these early genes by integers from 1 to N, and let uk to denote the deviation, either positive or negative, of the level of expression of the k’th early gene from its normal level. For example, we can take uk to denote the deviation from normal of the concentration of the mRNA that comes from the k’th early gene. Meanwhile, label the late genes by integers from 1 to n, and use pj to denote the deviation of the latter’s mRNA from its normal level. If the affects of the early genes on the late genes are simply additive, we might expect that any given pj has the form pj = Aj1 u1 + Aj2 u2 + · · · + AjN uN , (6.1) where each Ajk is a constant. This is to say that the level pj is a sum of factors, the first proportional to the amount of the first early gene, the second proportional to the amount of the second early gene, and so on. Note that when Ajk is positive, then the k’th early gene tends to promote the expression of the j’th late gene. Conversely, when Ajk is negative, the k’th early gene acts to repress the expression of the j’th late gene. If we use u to denote the N-component column vector whose k’th entry is uk, and if we use p to denote the n-component column vector whose j’th component is pj, then the equation in (6.1) is the matrix equation p = A u . Thus, we see a linear transformation from an N-dimensional space to an n-dimensional one. By the way, note that the relation predicted by (6.1) can, in principle, be tested by experiments that vary the levels of the early genes and see if the levels of the late genes change in a manner that is consistent with (6.1). Such experiments will also determine the values for the matrix entries {Ajk}. For example, to find A11, vary u1 while keeping all k > 1 versions of uk equal to zero. Measure p1 as these variations are made and see if the ratio p1/u1 is constant as u1 changes with each k > 1 version of uk equal zero. If so, the constant is the value to take for A11. If this ratio is not constant as these variations are made, then the linear model is wrong. One can do similar things with the other pj and uk to determine all Ajk. One can then see about changing more than one uk from zero and see if the result conforms to (6.1). The question now arises as to the meaning of the kernel and the image of the linear transformation A from RN to Rn. To make things explicit here, suppose that n and N are both equal to 3 and that A is the matrix 1 1 A = 2 1 1 0 2 1 1 (6.2) As you can check, this matrix has kernel equal to the scalar multiples of the vector 1 3 1 (6.3) Meanwhile, its image is the span of the vectors 0 1 and 1 1 1 . 0 (6.4) Thus, it consists of all vectors in R that can be written as a constant times the first vector in (6.4) plus another constant times the second. 3 Here is the meaning of the kernel: Vary the early genes 1, 2 and 3 from their normal levels in the ratio u1/u3 = 1 and u2/u3 = -3 and there is no affect on the late genes. This is to say that if the expression levels of early genes 1 and 3 are increased by any given amount r while that of early gene 2 is decreased by 3r, then there is no change to the levels of expression of the three late genes. In a sense, the decrease in the level by a factor of 3 of the second early gene exactly offsets the affect of increasing the equal increases in the levels of the first and third early genes. As to the meaning of the image, what we find is that only certain deviations of the levels of expression of the three late genes from their background values can be obtained by modifying the expression levels of the three early genes. For example, both vectors in (6.4) are orthogonal to the vector 1 1 1 (6.5) Thus, values of p1, p2 and p3 with the property that p1+ p3 ≠ p2 can not be obtained by any variation in the expression levels of the three early genes. Indeed, the dot product of the vector p with the vector in (6.5) is p1 – p2 + p3 and this must be zero in the case that p is a linear combination of the vectors in (6.4). Granted that the matrix A is that in (6.2), then the preceding observation has the following consequence for the biologist: If values of p1, p2 and p3 are observed in a cell with p1 + p3 ≠ p2, then the three early genes can not be the sole causative agent for the expression levels of the three late genes.