Degree Dependence in Rates of Transcription Factor Evolution Explains the Unusual Structure of Transcription Networks – Supporting Material Derivation of Equilibrium Degree Distributions Alexander J Stewart1,2,3, Robert M Seymour1,4 and Andrew Pomiankowski1,2 1 CoMPLEX, UCL, Physics Building , Gower Street, London, WC1E 6BT, U.K. 2 The Galton Laboratory, Research Department of Genetics, Evolution and Environment, UCL, 4 Stephenson Way, London NW1 2HE, U.K. 3 Department of Mathematics, UCL, Gower Street, London, WC1E 6BT, U.K. 4 Correspondence should be addressed to alex.stewart@ucl.ac.uk Model Networks in the model consist of transcription factors (TFs) and target genes (TGs). TFs regulate other genes and may be regulated by other TFs. TGs are only regulated, and therefore have only incoming edges. The model includes four distinct types of mutation – trans mutation, cis mutation, gene duplication and gene deletion. Each type of mutation is associated with a rate. The mutation rate is the rate at which each type of mutation becomes fixed in the network (not the rate at which new mutations actually occur, which may differ considerably from the rate at which they are fixed). The rates at which different mutations become fixed are referred to as the rates of evolution. Rates of evolution may be the same for all genes (either TFs or TGs), or else they may vary with the connectivity of a gene. Initially we consider constant rates of evolution. The network is updated by time increments t . These increments are taken to be sufficiently small so that at most one mutation is fixed in the network within each increment. We choose a time scalesuch that t 1. The rates of evolution in the model are therefore the probabilities that a mutation will occur (and be fixed) in a time interval. Mutations at trans and cis elements effect edges but do not effect nodes in the network. Gene duplication and deletion effect both nodes and edges. When a gene is duplicated the network is increased in size by one. When a gene is deleted, the network is decreased in size by one. Rates of gene duplication and deletion are taken to be equal. Therefore the expected size of the network remains constant. The actual size of the network undergoes random variation about some mean size. In our simulations we place upper and lower limits on the absolute size of the network. At the upper boundary, gene duplication events are forbidden. At the lower boundary gene deletions are forbidden. The rates of duplication and deletion may differ between TFs and TGs. We calculate the expected equilibrium degree distributions for the in- and out- degree of networks. The evolution of the in-degree distribution, nin (k,t) , in the mean field approximation is given by Eq. 4 of the main text: in in in in nin TG (k 1) TF (k 1) nin (k 1,t) TG (k 1) TF (k 1) nin (k 1,t) in TG in in in (k) TF (k) TG (k) TF (k) nin (k,t) , [1A] in in where TG (k) and TG (k) are the probabilities of a gene with in-degree k gaining or in in loosing an edge through a mutation at the gene and TF (k) and TF (k) are the probabilities of a gene with in-degree k gaining or loosing an edge through trans mutation, duplication or deletion at a TF in the time interval t. The time evolution of the out-degree distribution nout (k,t) is given by Eq. 5 of the main text: out out out nout TG (k 1) TF (k 1) nout (k 1,t) TG (k 1)nout (k 1,t) out TG N k jk j0 out out out out (k) TG (k) TF (k) nout (k,t) TF ( j, k)nout ( j,t) TF (k, j)nout (k,t) , [2A] out where N is the total number of genes (TFs and TGs) in the network, TG (k) and out TG (k) are the probabilities of a gene with out-degree k gaining or losing an edge out through mutation at one of its targets, TF (k) is the probability that a TF with out- out degree k gains a target through mutation at the TF, and TF ( j, k) is the probability that a TF with out-degree j k loses interactions to become a TF with out-degree k due to mutation at the TF. Setting the left hand side of Eq. 1A to zero, this has equilibrium solution satisfying in TF To solve N an approximation in in in (k 1) TG (k 1) nin (k 1) TF (k) TG (k) nin (k) Eq. 2A we make , [3A] for the term k out TF jk out ( j, k)nout ( j,t) TF (k, j)nout (k,t) , which describes loss of interactions j0 through trans evolution. For the model without degree dependence in the rate of trans out evolution (Model 1 and Model 3 in the main text), TF ( j, k) is given by out TF ( j, k) trans j! k!( j k )! m j k (1 m)k . [4A] By assuming a solution of the form nout (k) Aout k , where Aout is a normalization constant for the out-degree distribution, we can use the approximation N k out TF jk out ( j, k)nout ( j) TF (k, j)nout (k) j0 trans out trans 1 n (k)(1 m) 2( 1) trans nout (k) O(k (1 ) ) 1 (1 m) (k 1)n 1 out . [5A] (k 1) (k 1)nout (k 1) O(k (1 ) ) Observe that, when 1 the right hand side of Eq. 5A is zero. To derive this, first k note from Eq. 4A that out TF (k, j)nout (k) trans nout (k) . We use Lemma 2 of Chung j0 N et al (2003) to show that out TF ( j, k)nout ( j) trans nout (k)(1 m) 1 O(k (1 ) ) . To jk see this, we have N out TF ( j, k)nout ( j) Aout jk trans jk k m (1 m) j jk j k N j N j jk Aout trans k (1 m)k m jk j k Aout trans k (1 m)k 1 O(k 1 ) k j j jk m j k jk Aout trans k (1 m)k 1 O(k 1 ) N [6A] i k i m i i0 N k 1 O(k ) Aout trans k (1 m)k 1 O(k 1 ) (1 m) k 1 trans nout (k)(1 m) 1 1 valid for N k (in fact, exactly valid in the limit N ). We now have N k jk j0 nout (k)(1 m) 1 trans nout (k) O(k (1 ) ) TFout ( j, k)nout ( j) TFout (k, j)nout (k) trans ,[7A] Using our assumed solution form, we can also write (k 1)nout (k 1) knout (k) knout (k) (k 1)nout (k 1) 2(1 )nout (k) O(k ( 1) ) ,[8A] Eq. 7A and Eq. 8A combine to give Eq. 5A. For large k we can neglect terms O(k (1 ) ) and define K( , m) N k jk j0 1 2( 1) 1 (1 m) . This allows us to write 1 K( , m)(k 1)nout (k 1) (k 1)nout (k 1) TFout ( j, k)nout ( j) TFout (k, j)nout (k) trans , [9A] which is the form used in Eqs. 7 of the main text. For the model with degree dependence in the rate of trans evolution (Model 2 and out Model 4 in the main text), TF ( j, k) is given by out TF ( j, k) trans j! m jk (1 m)k k !( j k )! j . [10A] Once again assuming a solution of the form nout (k) Aout k , we can use the approximation N k TFout ( j, k)nout ( j) TFout (k, j)nout (k) jk j0 nout (k ) trans k trans 2 (1 m) trans 1 (1 m) n out nout (k ) k O(k (2 ) ) . [11A] (k 1) nout (k 1) O(k (2 ) ) k out To derive this, first note that in this case TF (k, j)nout (k) trans j0 use Lemma N out TF jk 2 of ( j, k)nout ( j) trans nout (k ) k (Chung et al. 2003) to nout (k ) k . Again we show (1 m) 1 O(k 1 ) . To see this, we have that N jk out TF N j ( j, k)nout ( j) Aout trans j k m j k (1 m)k j (1 ) jk N j jk Aout trans k (1 ) (1 m)k m jk j k Aout trans k (1 ) (1 m)k 1 O(k 1 ) k j 1 j 1 jk m j k jk Aout trans k (1 ) (1 m)k 1 O(k 1 ) N [12A] i k 1 i m i i0 N k Aout trans k (1 ) (1 m)k 1 O(k 1 ) (1 m) k trans nout (k ) k (1 m) 1 O(k 1 ) valid for N k (in fact, exactly valid in the limit N ). We now have N k TFout ( j, k)nout ( j) TFout (k, j)nout (k) trans jk j0 nout (k ) k (1 m) trans nout (k ) k O(k (2 ) ) , [13A] Using our assumed solution form, we can also write nout (k 1) nout (k) nout (k) nout (k 1) 2 nout (k ) k O k (2 ) ,[14A] Eq. 13A and Eq. 14A combine to give Eq. 11A. For large k we can neglect terms O k (2 ) and higher, and define K( , m) N k jk j0 1 2 1 (1 m) . This allows us to write K( , m)nout (k 1) nout (k 1), [15A] TFout ( j, k)nout ( j) TFout (k, j)nout (k) trans which is the form used in Eqs. 7 of the main text. Using Eq. 9A, the solution to Eq. 2A for the out-degree distribution is out TG (k 1) (k 1)trans K( , m) nout (k 1) out TG out (k) TF (k) k trans K( , m) nout (k) , [16A] for the model excluding degree distribution in the rate of trans evolution. And using Eq. 15A gives out TG (k 1) trans K( , m) nout (k 1) out TG out (k) TF (k) trans K( , m) nout (k) , [17A] for the model including degree distribution in the rate of trans evolution. Solution to Model 1 – No Connectivity Dependence The solution to Eq. 3A for the in-degree distribution, using the incoming edge event probabilities from Table 2 (main text), gives an equilibrium degree distribution nin (k) Ain cis trans k k ,[18A] D 1 k D cis mtrans D where Ain is a normalization constant. Following (Chung et al. 2003) we can write (x c) 1 O 1x x c . [19A] (x) For large x, the terms O1x can be neglected, and hence Eq 18A gives nin (k) k e k , where ln 1 1 cis m trans D cis trans D and . [20A] This is Eq. 8 of the main text. The solution to Eq. 16A for the out-degree distribution, using the outgoing edge event probabilities from Table 2 (main text), gives an equilibrium degree distribution nout (k) Aout cis trans k D K( , m) k ,[21A] trans D K( , m) 1 k D trans K ( ,m) cis trans Using Eq. 19A, this can be approximated to nout (k) k e k , where ln 1 cis D trans K ( ,m) D trans K ( ,m) and cis trans . [22A] D trans K( , m) This is Eq. 9 of the main text. This result is only consistent with the assumption that nout (k) Aout k , when 0. Solution to Model 2 – Degree Dependence in the Rate of trans Evolution The solution to Eq. 3A for the in-degree distribution, using the incoming edge event probabilities from Table 2 (main text), gives an equilibrium degree distribution nIn (k) Ain cis trans k D 1 k D cis m D k 1 k ,[23A] trans N Where 1 k j 1 nout ( j ) j determines the mean rate of trans evolution across the network. Using Eq. 19A this is approximately nin (k) k e k , cis m ln 1 1 1 k where trans D cis trans D , and . [24A] This is Eq. 10 of the main text. The solution to Eq. 17A for the out-degree distribution, using the outgoing edge event probabilities from Table 2 (main text), gives an equilibrium degree distribution nout (k) Aout cis trans trans K ( , m) k D k ,[25A] K ( ,m) 1 transD k D cis D cis Using Eq. 19A, this can be approximated to nout (k) k e k , where , and ln 1 cis D D . [22A] D( 1) cis trans trans K( , m) 1 D cis This is Eq. 11 of the main text. This result is only consistent with the assumption that nout (k) Aout k , when 0. Solution to Model 3 – Preferential Attachment The solution to Eq. 3A for the in-degree distribution, using the incoming edge event probabilities from Table 2 (main text), gives an equilibrium degree distribution nIn (k) Ain R cis trans k P k ,[23A] D trans 1 k D cis mtrans P D trans Using Eq. 19A this is approximately nin (k) k e k , where ln 1 1 cis m trans P D trans R cis trans P D trans and . [24A] This is Eq. 12 of the main text. The solution to Eq. 16A for the out-degree distribution, using the outgoing edge event probabilities from Table 2 (main text), gives an equilibrium degree distribution nout (k) Aout R cis trans P D cis trans K ( ,m) 1 k k D P K( , m) k ,[25A] cis trans D K( , m) cis trans Using Eq. 19A, this can be approximated to nout (k) k e k , where ln 1 cis D trans K ( ,m) P D cis trans K ( ,m) and R cis trans . [26A] P D cis trans K( , m) This is Eq. 13 of the main text. This result is only consistent with the assumption that nout (k) Aout k , when 0. Solution to Model 4 – Degree Dependence and Preferential Attachment The solution to Eq. 3A for the in-degree distribution, using the incoming edge event probabilities from Table 2 (main text), gives an equilibrium degree distribution nIn (k) Ain R cis trans k P k ,[27A] D trans 1 k D cis m 1k trans P D trans N Where 1 k j 1 nout ( j ) j determines the mean rate of trans evolution across the network. Using Eq. 19A this is approximately nin (k) k e k , ln 1 1 where cis m 1 trans k P D trans R cis trans P D trans , and . [28A] This is Eq. 14 of the main text. The solution to Eq. 17A for the out-degree distribution, using the outgoing edge event probabilities from Table 2 (main text), gives an equilibrium degree distribution nout (k) Aout R cis trans trans K ( , m) k D P k ,[29A] cis D trans K ( ,m) 1 D k cis P D cis cis Using Eq. 19A, this can be approximated to nout (k) k e k , where ln , and D cis P D cis P D cis R . [30A] D( 1) cis trans trans K( , m) 1 D cis This is Eq. 15 of the main text. This result is only consistent with the assumption that nout (k) Aout k , when 0. Shrinking Networks Model We now consider a model of a shrinking network, in which the rate of gene deletion is greater than the rate of gene duplication. This model is appropriate as a model of transcription network evolution immediately following a whole genome duplication, such as that which occurred in yeast around 100 million years ago (Kellis et al. 2004). We use a rate of gene duplication D , and gene deletion D , such that D D D , [31A], where D 0 . Firstly note that, the rate at which genes gain new edges through duplication of other genes is kD , and the rate at which they lose edges through deletion of other genes is kD kD kD . The rate at which new TFs of outdegree k are produced by this model is D nout (k) , and the rate at which they are lost is D nout (k) . Therefore TFs with out-degree k are lost at a rate Dnout (k) . Similarly, TGs with in-degree k are lost at a rate Dnin (k) . To see that this term is not sufficient produce an out-degree distribution with exponent 1 , we make the following approximation. Assuming an out-degree of the form nout (k) Aout k we can write using Eq. 8A Dnin (k) D 2( 1) (k 1)n out (k 1) (k 1)nout (k 1) O(k (1 ) ) , [32A] Using this with Model 1, we now define K( , m) 1 2( 1) 1 D trans (1 m) 1 , [33A] Then the solution for the out-degree distribution of this model can be written as nout (k) Aout cis trans k D K( , m) k ,[34A] trans D K( , m) 1 k cis trans D trans K ( ,m) Using Eq. 19A, this can be approximated to nout (k) k e k , where ln 1 cis D trans K ( ,m) D trans K ( ,m) and cis trans . [35A] D trans K( , m) Since K( , m) 0 from Eq. 33A and D D , there is no solution 0 for Eq. 35A. Therefore this model cannot produce a power-law out-degree distribution. References Chung, F., Lu, L. & Dewey, G. 2003 Duplication Models for Biological Networks. Journal of Computational Biology 10, 677-687. Kellis, M., Birren, B. & Lander, E. 2004 Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617624.