Handout6B

advertisement
Bio/statistics Handout 6: Kernel and image in biology
Imagine a complicated cellular process that involves some n genes that are
presumably ‘turned on’ by some N other genes that act at some earlier time. We want to
test whether the effect of the early genes on the late genes involves a complicated
synergy, or whether their affects simply add. To do this, one would try to derive the
consequences of one or the other of these possibilities and then devise an experiment to
see if the predicted consequences arise. One way to test this is to vary the expression
level of the early genes from their normal levels (either + or -) and see how the variation
in the level of expression of the late genes from their normal levels changes accordingly.
In this regard, when a gene is expressed, its genetic code (a stretch of the DNA molecule)
is used to code for a molecule much like DNA called mRNA. Here, the ‘m’ stands for
‘messenger’ and the RNA part is a sequence of small molecules strung end to end. Any
of these can be one of four and the resulting sequence along the RNA string is determined
by the original sequence on the coding stretch of DNA. This messenger RNA
subsequently attaches to the protein making part of a cell (a ‘ribosome’) where its
sequence is used to construct a particular protein molecule. In any event, the level of any
given mRNA can be measured at any given time, and this level serves as a proxy for the
level of expression of the gene that coded it in the first place.
One more thing to note: The level of expression of a gene can often be varied
with some accuracy in an experiment by inserting into the cell nucleus tailored molecules
to either promote or repress the gene expression. Such is the magic of modern
biotechnology.
To make a prediction that is testable by measuring early and late gene expression,
let us suppose that the affects of the early genes are simply additive and see where this
assumption leads. For this purpose, label these early genes by integers from 1 to N, and
let uk to denote the deviation, either positive or negative, of the level of expression of the
k’th early gene from its normal level. For example, we can take uk to denote the
deviation from normal of the concentration of the mRNA that comes from the k’th early
gene.
Meanwhile, label the late genes by integers from 1 to n, and use pj to denote the
deviation of the latter’s mRNA from its normal level. If the affects of the early genes on
the late genes are simply additive, we might expect that any given pj has the form
pj = Aj1 u1 + Aj2 u2 + · · · + AjN uN ,
(6.1)
where each Ajk is a constant. This is to say that the level pj is a sum of factors, the first
proportional to the amount of the first early gene, the second proportional to the amount
of the second early gene, and so on. Note that when Ajk is positive, then the k’th early
gene tends to promote the expression of the j’th late gene. Conversely, when Ajk is
negative, the k’th early gene acts to repress the expression of the j’th late gene.
If we use u to denote the N-component column vector whose k’th entry is uk, and
if we use p to denote the n-component column vector whose j’th component is pj, then
the equation in (6.1) is the matrix equation p = A u . Thus, we see a linear transformation
from an N-dimensional space to an n-dimensional one.
By the way, note that the relation predicted by (6.1) can, in principle, be tested by
experiments that vary the levels of the early genes and see if the levels of the late genes
change in a manner that is consistent with (6.1). Such experiments will also determine
the values for the matrix entries {Ajk}. For example, to find A11, vary u1 while keeping
all k > 1 versions of uk equal to zero. Measure p1 as these variations are made and see if
the ratio p1/u1 is constant as u1 changes with each k > 1 version of uk equal zero. If so,
the constant is the value to take for A11. If this ratio is not constant as these variations are
made, then the linear model is wrong. One can do similar things with the other pj and uk
to determine all Ajk. One can then see about changing more than one uk from zero and
see if the result conforms to (6.1).
The question now arises as to the meaning of the kernel and the image of the
linear transformation A from RN to Rn. To make things explicit here, suppose that n and
N are both equal to 3 and that A is the matrix
1 1
A = 2 1

1 0
2 
1 

1
(6.2)
As you can check, this matrix has kernel equal to the scalar multiples of the vector
 1 
3
 
 1 
(6.3)
Meanwhile, its image is the span of the vectors
0
1 and
 
1
1
1 .
 
0
(6.4)
Thus, it consists of all vectors in R that can be written as a constant times the first vector
in (6.4) plus another constant times the second.
3
Here is the meaning of the kernel: Vary the early genes 1, 2 and 3 from their
normal levels in the ratio u1/u3 = 1 and u2/u3 = -3 and there is no affect on the late genes.
This is to say that if the expression levels of early genes 1 and 3 are increased by any
given amount r while that of early gene 2 is decreased by 3r, then there is no change to
the levels of expression of the three late genes. In a sense, the decrease in the level by a
factor of 3 of the second early gene exactly offsets the affect of increasing the equal
increases in the levels of the first and third early genes.
As to the meaning of the image, what we find is that only certain deviations of the
levels of expression of the three late genes from their background values can be obtained
by modifying the expression levels of the three early genes. For example, both vectors in
(6.4) are orthogonal to the vector
 1 
1
 
 1 
(6.5)
Thus, values of p1, p2 and p3 with the property that p1+ p3 ≠ p2 can not be obtained by any
variation in the expression levels of the three early genes. Indeed, the dot product of the
vector p with the vector in (6.5) is p1 – p2 + p3 and this must be zero in the case that p is
a linear combination of the vectors in (6.4).
Granted that the matrix A is that in (6.2), then the preceding observation has the
following consequence for the biologist: If values of p1, p2 and p3 are observed in a cell
with p1 + p3 ≠ p2, then the three early genes can not be the sole causative agent for the
expression levels of the three late genes.
Download