Chapter GENOME SCALE ANALYSIS OF REGULATORY NETWORK DYNAMICS 5.1 INTRODUCTION ................................................................................................. 5-1 5.2 MATERIALS AND METHODS ........................................................................... 5-2 5.2.1 DATASETS ...................................................................................................... 5-2 5.2.2 BACK-TRACKING ALGORITHM ........................................................................... 5-3 5.2.3 INTERCHANGE INDEX ....................................................................................... 5-4 5.2.4 TOPOLOGICAL MEASURES ................................................................................ 5-4 5.2.5 NORMALIZATION FOR REGULATORY HUBS ......................................................... 5-5 5.2.6 REGULATORY MOTIFS ...................................................................................... 5-5 5.2.7 RANDOM NETWORKS ....................................................................................... 5-6 5.2.8 SENSITIVITY ANALYSIS ..................................................................................... 5-6 5.3 RESULTS AND DISCUSSION ........................................................................... 5-6 5.3.1 REGULATORY NETWORK IN YEAST .................................................................... 5-6 5.3.2 DIFFERENTIAL USE OF THE REGULATORY NETWORK .......................................... 5-9 5.3.3 DYNAMICS OF REGULATORY INTERACTIONS ...................................................... 5-9 5.3.4 REGULATORY SPECIFICITY THROUGH TF COMBINATIONS ................................. 5-14 5.3.5 LARGE-SCALE TOPOLOGICAL CHANGES .......................................................... 5-15 5.3.6 TF HUBS IN THE REGULATORY NETWORK ........................................................ 5-18 5.3.7 PREFERENTIAL USE OF NETWORK MOTIFS ....................................................... 5-21 5.3.8 INTER-REGULATION OF TFS IN THE CELL CYCLE AND SPORULATION .................. 5-23 5.4 CONCLUSIONS ................................................................................................ 5-28 5.5 REFERENCES .................................................................................................. 5-28 5 5.1 Introduction GENOME SCALE ANALYSIS OF REGULATORY NETWORK DYNAMICS 5 Parts of this chapter appeared in: 1. Luscombe, N.1*, Madan Babu, M.1*, Yu, H., Snyder, M., Teichmann, S. A.* and Gerstein, M.* (2004). Genomic analysis of regulatory network dynamics reveals large topological changes, Nature, in press 5.1 Introduction Living cells respond to a variety of endogenous and exogenous stimuli by altering the expression of specific sets of genes. Transcriptional regulation plays a central role in directing the appropriate expression of these genes at the right time and much experimental data have been published that describe individual transcription factor to target gene interactions (Horak et al., 2002; Iyer et al., 2001; Lee et al., 2002; Lieb et al., 2001; Matys et al., 2003; Ren et al., 2000; Svetlov and Cooper, 1995). From a theoretical point of view, the collection of all these interactions on a genomic scale can be conceptualised as a directed network. Several studies have examined the properties of these biological networks as static entities in which all the observed and putative molecular interactions co-exist (Bader and Hogue, 2002; Guelzim et al., 2002; Jeong et al., 2001; Jeong et al., 2000; Milo et al., 2002; Ravasz et al., 2002; Rzhetsky and Gomez, 2001; Schlitt et al., 2003; Segal et al., 2003; Shen-Orr et al., 2002; von Mering et al., 2002; Wuchty et al., 2003). However useful the picture coming out of these studies is, it does not correspond to what actually happens in the cell, which is a highly dynamic system, with interactions selectively 5-1 5.2 Materials and Methods occurring at different times. That's why the exploration of network dynamics under different cellular conditions has been repetitively mentioned as a key factor in furthering our understanding of how the regulatory system is used in living cells (Barabasi and Oltvai, 2004; de Menezes and Barabasi, 2004; Fell, 1997; Lee et al., 2002; Savageau, 1976; Strogatz, 2001). So far, no research group explored the dynamics of the complete transcriptional regulatory network of an organism. In this chapter, we investigate for the first time the yeast network under different cellular conditions. We do so by decomposing the static network along five dimensions that correspond to different cell cycles: sporulation, diauxic shift, DNA damage, and stress response, and by integrating the network with the results of 240 microarray experiments that characterize gene expression changes during these different cell conditions. We demonstrate that, contrary to prior expectations, differentially active sub-networks are distinct in terms of their global and local network topologies, occurrence of regulatory hubs, and system of inter-regulation between transcription factors. These differences can be interpreted in biological terms once the five cellular processes are further classified into two broad categories: endogenous and exogenous. The former includes the cell cycle and sporulation, which are progressive processes with an internal program of transcriptional regulation. The latter comprises diauxic shift, DNA damage, and stress response, which respond to environmental changes with a rapid, large-scale turnover in the repertoire of expressed genes. Our results shows that the transcriptional regulatory network has evolved to accommodate diverse biological demands, and that this has implied major structural changes in the network's architecture at both the global and local level to suit different purposes. 5.2 Materials and Methods 5.2.1 Datasets The yeast transcriptional regulatory dataset was compiled from the results of genetic, biochemical and ChIp-chip experiments (Horak et al., 2002; Iyer et al., 2001; Lee et al., 2002; Lieb et al., 2001; Matys et al., 2003; Ren et al., 2000; Svetlov and Cooper, 1995). The initial dataset contained 7,419 TF-target interactions between 180 TFs and 3,474 target genes (Yu et al., 2003). From this, we removed non-DNA-binding factors such as histone deacetylases that are not involved in direct transcriptional regulation through promoter binding. This was done by a sequence search of 156 known DNA-binding motifs from Pfam (Bateman et al., 2004) against the amino acid sequences of the 180 TFs; those without a significant match were removed from the dataset, including their regulatory interactions. The final dataset contained 7,074 regulatory interactions comprising 142 TFs, and 3,420 target genes (supplementary materials). The major 5-2 5.2 Materials and Methods results of this study remain unchanged when a smaller dataset excluding ChIp-chip data is used. The gene expression data was compiled from a total of 240 published microarray experiments for five cellular conditions: cell cycle (Cho et al., 1998), sporulation (Chu et al., 1998), diauxic shift (DeRisi et al., 1997), DNA damage (Gasch et al., 2001), and stress response (Gasch et al., 2000). The conditions were classified as endogenous and exogenous by whether they have an internal or external program of transcriptional regulation of genes. We obtained the lists of genes that experience significant gene expression changes during each condition from the original publications; a total of 455, 477, 1,823, 1,718, and 866 genes are defined as differentially expressed in the respective conditions. For the endogenous conditions, we used information regarding expression levels through a time-course. We obtained lists of genes that are differentially expressed during a particular phase from the respective publications (Cho et al., 1998; Chu et al., 1998), and we used these data to observe the changes in regulatory system usage during successive phases of the cellular conditions. The cell cycle was divided into the early G1 (75 genes), late G1 (143), S (97), G2 (69), and M phases (105). Sporulation was divided into the metabolic (52), early I (61), early II (45), early-mid (95), middle (158), midlate (61), and late phases (5). 5.2.2 Back-tracking algorithm We used a back-tracking algorithm to define the sections of the transcriptional regulatory network that are actively employed in each cellular condition (Cormen et al., 2001). The method assumes that genes undergoing differential expression are regulated by TFs that are linked to them in the regulatory network. There are three steps: (i) defining sets of differentially expressed genes in each condition, (ii) identifying TFs that are present in the cell for each condition, and (iii) identifying sections of the regulatory network that are actively used to control each condition. (i) Section 5.2.1 above, describes how differentially expressed genes were identified. (ii) To identify the present TFs, we used a reference dataset for yeast protein and mRNA abundance – calibrated from scaling many mRNA and protein measurements over the cell cycle to define the starting expression levels of TFs (Jansen et al., 2002). TFs were grouped into high (17 TFs), medium (62 TFs) or low abundance (63 TFs) relative to all yeast proteins. For each condition we then assessed the expression level changes of all TFs relative to the start point. This is possible because the experiments for all cellular conditions were conducted with the cell cycle as the reference state. TFs were considered present in a condition if: (a) they have high abundance at the start point, (b) have medium abundance and the expression level goes up or stays level during the condition, or (c) have low abundance and the expression level goes up. The numbers of present TFs in each condition are: cell cycle (88 TFs), sporulation (85 TFs), 5-3 5.2 Materials and Methods diauxic shift (76 TFs), DNA damage (75 TFs), and stress response (85 TFs). (iii) To identify the active sections of the regulatory network, we started with the list of differentially expressed genes in a condition. We detected the present TFs that are connected to these genes by a regulatory interaction, and we flagged them. We then identified further present TFs that are connected to the already flagged TFs, and we flagged these also. The process is repeated iteratively until no further TFs are detected. For serial and parallel inter-regulation, we defined the active sub-networks during the cell cycle and sporulation time-courses by repeating the back-tracking procedure on genes that are differentially expressed in a particular phase. We tested alternative back-tracking models in which we apply different stringencies for present TFs. We made similar observations for all models. 5.2.3 Interchange index We calculated an interchange index (Ii) for each TF, i, in a particular condition, j, that measures the fraction of regulatory interactions that are unique to a particular condition. It is calculated as n ij Ii j Ni where nij is the number of regulatory interactions unique to condition j and Ni is the total number of regulatory interactions active in all conditions. Values range from 0 to 1. Low values indicate that most interactions are maintained across multiple conditions. High values indicate that most interactions are replaced and so are unique to particular conditions. 5.2.4 Topological measures Connectivity statistics were calculated individually for the static and active sub-networks (Albert and Barabasi, 2002). The average degree (<k>) measures the mean number of regulatory interactions to and from nodes in a network. The average incoming degree (<kin>) is the mean number of interactions entering TFs and target genes, and the average outgoing degree (<kout>) is the mean number of interactions leaving TFs. We determined the most suitable distributions for the incoming and outgoing degrees by calculating best-fitting exponential (Pk = Cie-k) and power-law (Pk = Cok-) distributions to the data for each network (where Pk is the probability that a randomly picked node has k interactions). Exponents were calculated by optimising the sum of the squared errors between the actual and fitted data. The average path length (<l>) is the mean distance, in number of nodes, between pairs of nodes in each network. Here we considered only the distance between TFs and all terminating target genes connected to them. The diameter (d) is the maximum path length in the network. The average clustering coefficient (<c>) is the mean ratio of the number of regulatory interactions between a node’s neighbours and the maximum number of possible interactions. The clustering coefficient for a node i is 5-4 5.2 Materials and Methods calculated as ci 2 Ei , where Ei is the existing number of regulatory interactions between ki (ki 1) k nodes connected to it. In calculating the coefficients, we ignored the directionality of the regulatory interactions. 5.2.5 Normalization for regulatory hubs Hubs were defined as the TFs in the top 30%, by number of targets, in at least one of the five cellular conditions. using pij normalized n n the number of target genes for each TF ij nij n ij i We j , where pij is the propensity that TF i regulates nij genes in ij i j condition j. This provides a measure of the relative influence of a TF as a regulatory hub in a particular cellular condition. Hubs were clustered according to their propensity values using a kmeans clustering algorithm (Tavazoie et al., 1999) with k = 6, where k is the pre-defined number of clusters. The value of k was chosen so as to group the five cellular conditions separately, and also provide an additional cluster for condition-independent hubs. Tests using k = 4-8 resulted in similar clusters with a few outlying TFs. The clusters were then separated into those containing TFs that are condition-dependent and independent. We used an equivalent procedure to cluster TFs used during the cell cycle time-course. We calculated propensity values for all TFs active in the cell cycle, and clustered them using k = 6 (a cluster for each phase, plus one for the ubiquitous TFs). Clusters were separated into those containing TFs that are phase-specific and ubiquitous. 5.2.6 Regulatory motifs We identified three of the most commonly used regulatory motifs using methods described by Shen-Orr et al. (Shen-Orr et al., 2002) and Lee et al. (Lee et al., 2002). In order to identify the motifs, we constructed a pair of adjacency matrices A and B. Matrix A contained binary entries Aij, where a 1 indicated a regulatory interaction from TF j to target gene i. Matrix B was a submatrix of A, containing only the rows corresponding to target genes that are TFs themselves. For single input motifs, we identified the subset of rows in B, such that the sum of each row was 1. For each TF column, we then found non-zero entries. For multiple input motifs, we identified the subset of rows in A so that the sum of each row was greater than 1. Then for each row, we identified other rows regulated by the same set of TFs. The collection of rows represented a motif. Finally with feed forward motifs, for each primary TF we identified non-zero entries in B, which correspond to regulated secondary TFs. For each primary and secondary TF pair, we then identified all rows in A regulated by both TFs. 5-5 5.3 Results and Discussion 5.2.7 Random networks We generated 1,000 random networks for each cellular condition as a control. For each random network, we: (i) sampled the same number of differentially expressed genes out of all yeast genes for a given condition, (ii) sampled the same number of present TFs from the list of all TFs, and (iii) used the back-tracking algorithm through the static network. We applied the calculations described in the methods section to calculate the architecture of each random network. 5.2.8 Sensitivity analysis We tested the sensitivity of our results to random errors in the data. We generated 1,000 static networks with error rates of 30% by introducing additions, deletions, and rearrangements of regulatory interactions at random. For each condition, we then back-tracked through these erroneous networks using the original differentially expressed genes and present TFs. 5.3 Results and Discussion 5.3.1 Regulatory network in yeast The known transcriptional regulatory interactions in yeast come from a variety of sources. To obtain as complete a regulatory network as possible, we have compiled a dataset of direct regulatory interactions between TFs and their target genes from the results of genetic, biochemical, and ChIp-chip experiments (see Methods) (Horak et al., 2002; Iyer et al., 2001; Lee et al., 2002; Lieb et al., 2001; Matys et al., 2003; Ren et al., 2000; Svetlov and Cooper, 1995). This dataset contains 7,074 interactions involving 142 TFs, and 3,420 non-TF target genes. We display this static network in Figure 5.1, with the TFs in an arc around the upper half of the circle and target genes in a larger arc in the lower half. There are regulatory interactions between TFs themselves and also from TFs to the target genes. In a table in Figure 5.1 (inset), we list topological measures that describe the gross features of the network (Albert and Barabasi, 2002; Barabasi and Oltvai, 2004). With this network, we integrate the results of 240 microarray experiments that characterize gene expression changes during five cellular conditions: cell cycle (Cho et al., 1998), sporulation (Chu et al., 1998), diauxic shift (DeRisi et al., 1997), DNA damage (Gasch et al., 2001), and stress response (Gasch et al., 2000). Genes that undergo significant changes in expression levels are considered to experience transcriptional regulation in the respective conditions. We classify the cellular processes into two broad categories: endogenous and exogenous. The former includes the cell cycle and sporulation, which are progressive processes with an internal program of transcriptional regulation that drives the sequential turnover of genes through subsequent stages (eg G1, S, etc in cell cycle). The latter comprises diauxic shift, DNA damage and stress response, which respond to environmental changes with a rapid, global turnover in 5-6 5.3 Results and Discussion the repertoire of expressed genes. Though sporulation is initiated by an environmental change, such as nitrogen depletion (Chu et al., 1998), we have placed it in the endogenous category as it progresses through multiple cellular phases via internal regulation. Employing a back-tracking algorithm (see Methods), we use the expression data to trace the paths in the regulatory network that are actively used in particular cellular conditions, and at different time-points during the multi-stage, endogenous processes. In the following sections, we demonstrate that the different active sub-networks are distinct in terms of their global and local network topologies, occurrence of regulatory hubs, and system of inter-regulation between transcription factors. 5-7 5.3 Results and Discussion Figure 5.1: The transcriptional regulatory network in yeast Figure 5.1: The nodes in the arc are the 113 transcription factors, and the 1,362 target genes that are active in one of the five cellular conditions. The connecting edges depict the 2,479 active regulatory interactions. The nodes are ordered by the number of cellular conditions in which they are active; nodes and edges are colour-coded by the number of cellular conditions under which they are active, from light blue (one condition) to green (five conditions). The number of regulators and target genes active in a given number of conditions is provided. *The table below the main figure summarizes the topological measures calculated for the whole regulatory network (including inactive sections); please refer to the main text for explanations of the measures. 5-8 5.3 Results and Discussion 5.3.2 Differential use of the regulatory network Different subsets of TFs, target genes and regulatory interactions are active in each of the five cellular conditions. Starting with the target genes (defined as “active genes”), a total of 1,906 are differentially expressed in at least one of the cellular conditions (Figure 5.1, lower arc). The smallest numbers of targets are found in the cell cycle (280 genes that have regulatory data), and sporulation (257 genes), and the largest numbers are in diauxic shift (748 genes) and DNA damage (678 genes). Half of all active targets (803 genes that have regulatory data) are uniquely expressed in one of the five conditions, and the remainder (559 genes) is common to two or more. There is a large difference in their functional classes (Mewes et al., 2002): for example cell cycle and sporulation are enriched for genes involved in cell growth, division and DNA synthesis, and diauxic shift is enriched for metabolism and energy related functions. In contrast to the target genes, most TFs are used in more than one condition (defined as “active TFs”), and 31 are used in all five (Figure 5.1, upper arc). This is because in a given condition, though only a minor proportion (4-12%) of yeast genes is differentially expressed, about half of the TFs are used to regulate them. For example during the cell cycle, 70 TFs are used to regulate just 280 target genes. It is surprising that regulatory specificity can be achieved with such great overlap in TF usages between the five processes, and we discuss this issue below. 5.3.3 Dynamics of regulatory interactions A topical but little studied issue is the dynamic nature of the regulatory interactions between TFs and their targets. Two recent studies addressed this for a small number of TFs, and reported that distinct regulatory interactions are indeed made according to the cellular state (Odom et al., 2004; Zeitlinger et al., 2003). However, the full extent and character of this phenomenon across all TFs is currently unknown. Here we examine this on a genomic scale. In Figure 5.2, we present the sub-networks – including the TFs, target genes and regulatory interactions – that are active under different cellular states. The dynamics of the regulatory network are strikingly clear: distinct sections of the entire system are used to control each condition. The sizes of the active sub-networks vary according to the number of target genes requiring regulation; thus much larger sub-networks are used during diauxic shift and DNA damage compared with the other conditions. The figure also highlights how regulatory interactions are maintained or rewired across the subnetworks, by displaying partially overlapping subsets of edges. Out of a total 2,479 active regulatory interactions, 1,476 are unique to a particular condition, and the remainder is common to two or more. Thus, over half of the interactions are completely replaced with new ones between conditions, resulting in specific regulation of the respective cellular states. 5-9 5.3 Results and Discussion Just 66 interactions are retained across four or more conditions. We can consider these to be “hot links” (de Menezes and Barabasi, 2004; Goh et al., 2002), regulatory interactions that are most actively used relative to the rest of the network. Many of these interactions originate from two types of TFs: metabolic regulators such as Mig1 and Mig2, and general transcriptional regulators such as Abf1 and Reb1. We therefore associate these hot links with the continual regulation of house-keeping functions in the cell. As will be shown later, many of the TFs making these links also comprise condition-independent hubs. 5-10 Figure 5.2: Dynamic representation of the transcriptional regulatory network and standard statistics 4 31 TF s 1 Gs 3T 80 Figure 5.2: (a) Schematics and summary of properties for the endogenous and exogenous sub-networks. (b) Graphs of the static and condition-specific networks. TFs and target genes are shown as nodes in the upper and lower sections of each graph respectively, and regulatory interactions are drawn as edges; they are coloured by the number of conditions in which they are active. Different conditions use distinct sections of the network. (c) Standard statistics (global topological measures and local network motifs) describing network structures. These vary between endogenous and exogenous conditions; those that are high compared with other conditions are shaded. (Note, the graph for the static state displays only sections that are active in at least one condition, but the table provides statistics for the entire network including inactive regions). 5.3 Results and Discussion As most TFs are actively used across multiple conditions, we now explore how they maintain or replace regulatory interactions (Figure 5.3). We quantify this for each TF with an index (Ii) that measures the percentage of interactions that are unique to a single condition (see Methods). The index ranges from 0 to 1, with higher values signifying a larger proportion of replaced interactions. A histogram of the indices reveals a tri-modal distribution: 42 TFs interchange most of their interactions between conditions (Ii 0.7), 16 preserve most of their interactions (Ii 0.3), and 55 exchange only part of their interactions (0.3 < Ii < 0.7). This divide demonstrates how some TFs exert their regulatory influence via radically altered sets of interactions depending on the condition, whereas others retain similar sets of interactions. An important observation highlighted by the TFs of high indices is the shift in regulatory functions as they target alternative sets of genes. We now focus on three example TFs (Figure 5.3): Yox1, whose interactions are highly variable, Rcs1, whose interactions are invariant, and Abf1 which is between these extremes. First, Yox1 makes a total of 120 regulatory interactions. It has a high exchange index (Ii = 0.84) and as shown in Figure 5.3, there is little overlap of interactions across cellular conditions. The TF has so far been implicated in control of the cell cycle, and DNA synthesis and repair, but information about its full range of regulatory functions is currently limited (Kaufmann, 1993; Pramila et al., 2002). Consistent with its role in DNA synthesis, it makes 21 regulatory interactions during the cell cycle and 30 during sporulation. Surprisingly, it produces the most interactions – 91 – during diauxic shift, which suggests a previously unreported role in this process. The exchange of regulatory interactions brings about a dramatic shift in the function of Yox1; it is focused on controlling cell growth during the cell cycle and sporulation, whereas it redirects its attention to protein synthesis during diauxic shift. Second, Rcs1 makes a total of 25 interactions. It has a low exchange index (Ii = 0.28) and as shown in Figure 5.3, most of its interactions are used across multiple conditions. Therefore, in contrast to Yox1, it retains similar regulatory functions in the different cellular states. Consistent with its activity during sporulation and stress response, it is involved in several processes including regulation of metal ion metabolism (Blaiseau et al., 2001; Yamaguchi-Iwai et al., 1995) and cell size control (Gil et al., 1991; Jorgensen et al., 2002). Finally, Abf1 makes a total of 160 regulatory interactions. It has an intermediate exchange index (Ii = 0.51), and among its many cited functions as general transcriptional activator are the regulation of meiosis (Gailus-Durner et al., 1996), metabolic activities (Nebohacova et al., 1996), translation (Della Seta et al., 1990; Mager and Planta, 1990) and gene silencing (Diffley and Stillman, 1989). It preserves 79 interactions across multiple conditions (Figure 5.3), corresponding to the consistent regulation of a core set of cellular functions such as glycolytic pathways, translation and cell biogenesis. The other half of interactions – 81 – is exchanged, 5-12 5.3 Results and Discussion and this results in a shift of regulatory focus on top of its core functions. In the cell cycle and sporulation Abf1 regulates cell growth, whereas in stress response it controls intracellular transport. Figure 5.3: Interchange of regulatory interactions Figure 5.3: The histogram plots the distribution of interchange index (Ii) values for all active transcription factors. Index values range from 0 for TFs that maintain the same regulatory interactions across multiple conditions to 1 for TFs that replace all regulatory interactions. Above the plot are network diagrams showing the regulatory interactions made by three representative transcription factors: Rcs1, Abf1 and Yox1. Regulatory interactions are black if retained across two or more conditions, or coloured if they are unique to a single condition. The conditions are labelled in the figure, along with a brief description of the major regulatory function of the transcription factor in that state. 5-13 5.3 Results and Discussion 5.3.4 Regulatory specificity through TF combinations 95 TFs are used in multiple conditions. In Figure 5.4a, we show that for the endogenous conditions, 53 out of a total 92 TFs overlap between the cell cycle and sporulation. A similar pattern emerges for the exogenous conditions, in which 44 out of 100 TFs are used in all three states. Therefore the key to regulatory specificity cannot lie in the activity of individual TFs. Figure 5.4: Overlap in TF and TF combination usages Figure 5.4: (a) Overlap in transcription factor usage. There is large overlap in the usage of transcription factors between conditions, as similar sets of regulators are used across all five cellular conditions. (b) Overlap in pair-wise transcription factor combination usage. Unlike the individual transcription factors, there is very little overlap in the use of pair-wise transcription factor combinations between conditions. Thus, regulatory specificity is achieved through combinatorial use of regulatory partners. Instead combinatorial use of TFs provides a clue to the expression of particular target genes (Pilpel et al., 2001). In the static network, a gene has an average of 2.1 incoming interactions, which means that in general, about two TFs regulate each gene. We consider a pair of TFs to combine if they both target the same gene in a given cellular condition. 360 different pair-wise combinations are used across the five cellular conditions, and in contrast to individual TFs, there is very little overlap in the use of TF pairs (Figure 5.4b). For example, in the endogenous conditions, just 3 out of 149 pairs overlap between the cell cycle and 5-14 5.3 Results and Discussion sporulation, and we make similar observations for the exogenous conditions. Overall, 309 TF pairs are unique to a single condition, indicating that much of the regulatory specificity is achieved through combinatorial TF use. An example of combinatorial TF specificity is illustrated by Abf1, the general transcriptional activator which we cited earlier. During sporulation, it combines with Ime1 to regulate Hop1, which is involved in chromosomal segregation. During diauxic shift, it acts in conjunction with the Hap2-Hap4 heteromeric complex to regulate the Aac2 major mitochondrial ADP/ATP carrier. 5.3.5 Large-scale topological changes We have shown that the regulatory sub-networks for the five cellular conditions differ in terms of the target genes that are expressed, the TF combinations that are used, and the regulatory interactions that are made. The complex topology of these sub-networks can be captured by assessing graph-theoretic measures that describe their structures on a global scale. Using these methods, recent studies have shown that molecular biological networks, including the regulatory system, are remarkably consistent in their architectural features (Agrawal, 2002; Albert and Barabasi, 2002; Barabasi and Oltvai, 2004; Featherstone and Broadie, 2002; Jeong et al., 2001; Oltvai and Barabasi, 2002; Tong et al., 2004; Wagner, 2001; Watts and Strogatz, 1998). For instance, cross-species comparisons of metabolic pathways demonstrated that the reduced networks of parasitic bacterium and the highly developed networks of large multicellular organisms preserve the same scale-free topology with similar degree exponents (Jeong et al., 2000; Wagner and Fell, 2001). In addition, the average path lengths and clustering coefficients remain constant irrespective of network size, indicating that all retain their small world-ness and high level of clustering (Jeong et al., 2000; Ravasz et al., 2002). We use five graph-theoretic measures to compare the topology of the networks in the five cellular conditions. We list the values of these measures for the complete network in Figure 5.1, and we describe how we calculate them in the Methods section. The first measure is the mean number of incoming interactions per target gene (<kin>). In the static network this is 2.1 and indicates that on average each gene is regulated by two TFs. The distribution of incoming connections per target gene follows an exponential behaviour of exponent = 0.8, with the probability that a given gene to be regulated by k TFs decreasing as Pk = Cie-k. The behaviour indicates a sharp decay in the distribution, and presumably reflects the molecular constraints on the number of TFs that can co-regulate at the same promoter. The second measure is the average number of target genes per TF (<kout>). In the static network this is 49.8, and the distribution of outgoing connectivities follows a power-law, where the probability that a given TF regulates k genes decreases as Pk = Cok- with the exponent = 0.6. This behaviour signifies a broader decay profile, and it is indicative of a hub-containing network structure, where selected TF nodes have a disproportionately large number of targets. The exponent is smaller than 5-15 5.3 Results and Discussion observed for other molecular biological networks, signifying that the distribution is not as polarized between hub and non-hub nodes. We also assess the interactions between TFs. The third measure is the path length (l) which quantifies the shortest distance between two nodes in a network. Here we define it as the distance – in number of intermediate TFs – between any given TF and a terminating target gene. The average path length (<l>) over the static network is 4.7, indicating that there are nearly five intermediate TFs in an average regulatory path. The fourth measure is the diameter (d), and it measures the maximal path length between any two nodes in the network. This is 12 in the static network, showing that the longest path has 12 intermediate TFs. Finally, the fifth measure is the average clustering coefficient (<c>), which measures the level of interconnection between TFs. This coefficient is defined as the ratio of existing links between a node’s neighbours compared with the maximum possible links between them, and can vary between 0 for a totally dispersed node to 1 for a fully clustered one. The average coefficient (<c>) is 0.11 for the static network. Given the results from previous studies of diverse networks, the expectation is for the topological measures to remain constant between the condition-specific sub-networks. We simulate random sub-networks corresponding to each cellular condition by sampling a given number of genes from the set of all targets, and back-tracking from these randomly picked genes (see Methods). Despite large variations in the sizes (232 < n < 746) and sampled regions of the simulated networks, most of the measures are invariant as expected. Average incoming degrees are 1.6-1.7 (<kin>), average path lengths (<l>) are 2.5-2.8 diameters (d) are 10-12 and average clustering coefficients (<c>) are 0.8-0.9. The differences in values are not statistically significant (p > 0.96). Only the average outgoing degree (<kout>) changes – ranging from 4.8 in the smallest sub-network to 18.8 in the largest – but this is unsurprising as outgoing edges must be shared from a limited pool of TF nodes. Given the prior results and the observations in the simulated networks, the expectation is for the topological measures to remain constant. In fact, the situation is very different for the real subnetworks and their architectural features vary considerably between conditions (Figure 5.2). The average outgoing degree (<kout>) doubles from endogenous (<kout> = 6.5-7.9) to exogenous conditions (<kout> = 9.0-17.1). The difference is statistically significant (p < 210-3), and values are larger than in the random networks for the exogenous conditions (p < 0.07 for diauxic shift). Power-law behaviour is maintained for all sub-networks, however the exponents () double from 0.8 to 1.5 between the exogenous and endogenous conditions. 5-16 5.3 Results and Discussion The changes in outgoing degrees have biological implications. Exogenous conditions can be thought to represent two-fold states in which large numbers of genes need to be up- or downregulated quickly in response to often drastic environmental changes. The most efficient way to achieve this is by having TFs regulate a large number of genes simultaneously. Endogenous conditions, on the other hand, correspond to states in which expression regulation is coordinated through multiple stages, and each TF targets fewer genes so as to fine-tune this process. The larger degree exponents in the exogenous conditions signify greater polarisation in the outgoing distribution, and indicate a network structure in which a few hubs are more dominant. The average incoming degree (<kin>) increases by a fifth from the exogenous (<kin> = 1.6) to the endogenous conditions (<kin> = 1.9-2.0). Again the change is statistically significant (p < 310-4); values are larger for the endogenous conditions (p < 0.01 for cell cycle), and smaller for the exogenous conditions (p < 0.05 for stress) when compared with the random controls. Though exponential behaviour is maintained throughout, the change in exponents signifies a faster drop-off for the exogenous conditions (exo = 1.1-1.3 compared with endo = 0.7-0.8). These observations suggest that TF combination usage is simpler in the exogenous conditions reflecting the more direct-acting nature of these cellular states. Average path lengths (<l>) double between exogenous (<l> = 2.0-2.2) and endogenous conditions (<l> = 3.4-4.5). Again this difference is statistically significant (p < 10-10), and values are larger than expected at random for endogenous conditions (p < 0.02 for cell cycle) and smaller for exogenous conditions (p < 510-3 for DNA damage). Further, the diameter (d) doubles from diauxic shift and DNA damage (d = 6) to the cell cycle (d = 12). As the path length and diameter measure the distance between a TF and its final target, they gauge the immediacy of a regulatory signal. The short distances in the exogenous sub-networks suggest that perturbations in the environment would reach the necessary targets very quickly. We speculate that the longer path lengths in the endogenous sub-networks are due to the use of regulatory chains; as the cell cycle and sporulation are multi-stage processes, they require the sequential regulation of genes in a time-dependent manner (Lee et al., 2002; Simon et al., 2001). This effectively forms a chain of regulatory events that correspond to the intermediate TFs in the long paths. Average clustering coefficients (<c>) are small, indicating that the sub-networks do contain many cliques of highly interconnected TFs. However, values nearly double from the endogenous (<c> = 0.08-0.09) to exogenous conditions (<c> = 0.14-0.15). Again the difference is significant (p < 0.01) and coefficients are larger than expected in the endogenous conditions (p < 510-3 for cell cycle). This means that there is more inter-regulation between TFs in the cell cycle and sporulation, which reflects the multi-stage nature of these processes. 5-17 5.3 Results and Discussion We performed a sensitivity analysis of our results by adding, deleting or rearranging 30% of the regulatory interactions at random. All of the observations that we make in this study are unaffected, suggesting that our findings are robust against addition of noise to the data. 5.3.6 TF hubs in the regulatory network Above we showed that the outgoing degree distribution approximates to a power-law, and that this was indicative of a network containing regulatory hubs. These hubs have been of great general interest as they are the most influential components of networks and dictate the overall structures of graphs (Barabasi and Albert, 1999; Barabasi and Oltvai, 2004). However, there has so far been some ambiguity in the precise definition of hubs (Guelzim et al., 2002; Madan Babu and Teichmann, 2003; Martinez-Antonio and Collado-Vides, 2003; Shen-Orr et al., 2002). More importantly, their regulatory role in the network has been the subject of much discussion. On the one hand, hubs are portrayed as general regulators that target genes across a wide spectrum of functions and conditions (Barabasi and Oltvai, 2004; Lee et al., 2002; Madan Babu and Teichmann, 2003; Martinez-Antonio and Collado-Vides, 2003). Topologically they are located upstream in the network and so they can amplify their range of control over multiple functions by regulating other TFs (Madan Babu and Teichmann, 2003). An example is the Abf1 general transcriptional activator which has 291 targets in the static network. On the other hand, this view must be reconciled with the modular nature of the regulatory network (Alon, 2003; Barabasi and Oltvai, 2004; Guelzim et al., 2002; Hartwell et al., 1999; Oltvai and Barabasi, 2002; Wall et al., 2004). Numerous studies have identified functionally distinct modules within the regulatory network (Ihmels et al., 2002; Tavazoie et al., 1999), and have shown that many of them centre about their own hubs (Bar-Joseph et al., 2003; Segal et al., 2003). Furthermore, it has been argued that hubs are topologically isolated from each other in order to decrease the likelihood of cross talk between modules (Maslov and Sneppen, 2002). These observations imply that hubs are generally associated with a specific cellular function and an example is the Swi4 cell cycle regulator that has 138 targets. We address this debate by examining the dynamic usage of hubs, and assessing whether key regulators in the static network remain important under different cellular conditions. As with the topological measures, we can establish a prior expectation by considering the occurrence of hubs in the random sub-networks. We define hubs in a given network as the top 30% of TFs by numbers of target genes. The randomised sub-graphs have 77-96% overlap in the TFs that are classified as hubs; thus they clearly converge on very similar sets of TFs despite their very different sizes and back-tracking from distinct sets of genes. 5-18 5.3 Results and Discussion Figure 5.5 shows the data for the real sub-networks. It is a two-dimensional matrix in which the columns correspond to the cellular states and the rows represent the composite set of all TF hubs for the five conditions. Figure 5.5: Condition specific transient hubs and permanent hubs Y MR 0 1 6 C YL R1 8 3 C YI L 1 3 1 C S WI 4 YDR4 5 1 C S WI 6 STE1 2 MB P 1 MC M1 YDR1 4 6 C YL R1 3 1 C U ME 6 I ME 1 YNL 2 1 6 W SI N3 YI R0 2 3 W YPL 0 3 8 W YNL 1 0 3 W Y MR 0 2 1 C CBF 1 YBL 0 2 1 C YI L 1 2 2 W HAP4 HAP2 YHR2 0 6 W YAP1 HSF 1 YPL 0 8 9 C YCR0 6 5 W CI N5 YDR3 1 0 C YDR2 5 9 C MS N 2 YDR5 0 1 W MS N 4 Y GL 0 9 6 W PDR1 YL R4 0 3 W Y GL 0 7 1 W YI R0 1 8 W YKL 0 4 3 W YL R0 1 3 W Y GL 2 0 9 W Y ML 0 2 7 W YF R0 3 4 C YEL 0 0 9 C YBR0 4 9 C Y GL 0 3 5 C YKL 1 1 2 W YDR0 4 3 C YPR0 6 5 W Figure 5.5: A cluster diagram depicts the use of regulatory hubs in different cellular conditions. The five cellular conditions are shown as columns, and the top 30% of transcription factors (by number of target genes) are displayed as rows. The intersecting cell is coloured according to the normalized number of target genes a transcription factor regulates in each condition, and transcription factors are clustered using the k-means clustering algorithm. Therefore, distinct sets of transcription factors act as regulatory hubs during different conditions, as highlighted by the coloured boxes; most factors act as condition-specific hubs, and a minor proportion act as condition-independent hubs throughout all cellular states. Gene names are coloured in blue if the transcription factor has an obvious regulatory role in the specific condition. Gene names are coloured red for transcription factors whose regulatory roles were previously unclear according to the Saccharomyces Genome Database. 5-19 5.3 Results and Discussion In order to identify patterns of hub usage, we cluster the TFs according to the normalized number of their target genes. Strikingly evident are two main groups of TFs: conditionindependent hubs that are important regardless of the cellular condition, and condition-specific hubs that are influential in one condition, but are much less significant in others. The conditionindependent cluster contains the general, multi-functional regulators discussed above. It comprises multi-functional regulators such as the Abf1 general transcriptional activator, and house-keeping TFs such as Mig1 and Mig2 (Klein et al., 1999; Lutfiyya et al., 1998) that are required regardless of the cellular condition. Most of the TFs are condition-dependent and this reflects the dynamic nature of the regulatory network as different sets of TFs assume varying importance during the lifetime of the cell. It is also emphasized by the smaller overlap in TF hub usage compared with the random subnetworks (36-74% depending on cellular conditions). The condition-specific hubs group into distinct clusters for each of the five cellular states. Clusters are smaller for the exogenous conditions, highlighting the more polarised nature of their sub-networks. About half of the TFs are known to be important for the particular condition (Figure 5.5: blue labels) (Christie et al., 2004; Mewes et al., 2002). For example, in the first cluster, 5 out of 11 TFs are known cell cycle regulators and include the Swi4 and Mbp1 G1/S factors (Andrews and Herskowitz, 1989; Breeden and Mikesell, 1991; Koch et al., 1993). In sporulation, we find Ime1, a key inducer of early meiotic genes (Kassir et al., 2003; Kassir et al., 1988; Vershon and Pierce, 2000), and Ume6, a co-activator (Bowdish et al., 1995; Kassir et al., 2003; Strich et al., 1994; Vershon and Pierce, 2000). Ndt80, an important regulator of the middle stages of sporulation (Kassir et al., 2003; Vershon and Pierce, 2000; Xu et al., 1995) is absent as it currently has only one assigned target gene in the dataset. During diauxic shift we find the Hap2 and Hap4 global regulators of respiratory gene expression and activator of cytochrome C (Forsburg and Guarente, 1989; Hahn and Guarente, 1988; Olesen et al., 1987). For the remaining TFs, it is harder to make direct functional associations with their respective cellular conditions. These are of great interest. As hubs, they are clearly important in the cellular process under consideration and we can tentatively add to their functional annotations. Such functional predictions are not trivial to do, and it is only by integrating gene expression data with the regulatory network that we are able to do this. Many TFs have functions that appear unrelated to the condition (Figure 5.5: black labels). For example in sporulation, there are three regulators of nitrogen utilization (YIR023W, YPL038W, YNL103W). These may appear to be surprising inclusions, but as sporulation is often initiated through nitrogen depletion, their appearance is biologically meaningful. Other unexpected 5-20 5.3 Results and Discussion examples include the YBL021C regulator of respiratory functions and we anticipate an equally important regulatory role during sporulation. Of particular interest are four TFs that were previously unannotated (Figure 5.5: red labels). For these we predict key regulatory functions in their respective conditions. Thus, we associate YDR451C with a role in cell cycle, YIL122W with sporulation, and YIR018W with stress response. YLR013W acts as a condition-independent hub, and we envisage a general regulatory function as seen for Abf1 or a house-keeping task as for Reb1. These regulatory predictions should provide a useful starting point for further experimental characterization. Additionally, we also find that temporary and permanent hubs regulate themselves (Figure 5.6). We find that there is much more inter-regulation of temporary hubs within cell cycle and sporulation than for stress, DNA damage and Diauxic shift. This will be discussed in detail in a separate section. Figure 5.6: Inter-regulation within and among regulatory hubs sporulation cell cycle permanent hubs stress diauxic shift DNA damage Figure 5.6: The inter-regulation of temporary and permanent hubs portrays a network that shifts its weight between different hubs to bring about distinct cellular states. 5.3.7 Preferential use of network motifs So far, we have studied the networks from a global perspective; however, we can also examine them at a local level. Analyses of regulatory network structures have revealed the occurrence of motifs, which represent compact units that build up the whole network (Lee et al., 2002; Milo et al., 2002; Shen-Orr et al., 2002). These motifs display specific patterns of inter-regulation between the TFs and their targets, and we calculate the occurrence of three that are most 5-21 5.3 Results and Discussion prevalent in the regulatory network (Table 5.1): single input, multiple input, and feed forward motifs. Single input motifs consist of units in which a lone TF regulates its targets and multiple input motifs comprise units where two or more TFs are involved. Feed forward motifs are composed of two TFs, where the primary TF targets the secondary TF, and both regulate a final target. Thus the single and multiple input motifs can be considered as direct-acting units, whereas the feed forward motif can be thought of as an indirect-acting motif requiring an interregulatory link between TFs. Table 5.1: Network motif usage in five cellular conditions network motifs endogenous conditions cell cycle sporulation 130 (32.0%) 117 (38.9%) 96 (23.7%) 50 (16.6%) feed-forward loop 180 (44.3%) Total 406 (100%) single input motif multiple input motif exogenous conditions diauxic shift DNA damage stress response 438 (57.4%) 462 (55.7%) 228 (59.1%) 180 (23.6.0%) 226 (27.3%) 78 (20.2%) 134 (44.5%) 145 (19.0%) 141 (17.0%) 80 (20.7%) 301 (100%) 763 (100%) 829 (100%) 386 (100%) Table 5.1: The table summarizes the number of regulatory interactions that are used in singleinput, multiple input, and feed forward motifs during the five cellular conditions. In parentheses we calculate the percentage of regulatory interactions used in the given motif with respect to the total number of regulatory interactions used in all motifs for a condition. Single input motifs are favoured in exogenous conditions, whereas feed forward motifs are favoured in endogenous conditions. Motif occurrences that are high with respect to other conditions are shaded green, and those that are low are shaded dark pink. Recent work has shown that the usage pattern of motifs is highly conserved across related networks, including the regulatory systems of diverse organisms such as B. subtilis, E. coli, and yeast (Milo et al., 2004). Furthermore, the relative occurrence of motifs does not change when smaller sub-networks of various sizes are considered (Milo et al., 2002). Therefore, as for the topological measures, we expect motif usage to remain invariant between conditions. Indeed, for the random controls, we show that motif occurrence stays comparable to the static network regardless of the number of target genes that are sampled for back-tracking (p > 0.12). Contrary to this expectation, there is a conspicuous difference in motif usage between conditions (Table 5.1; p < 10-9). Exogenous conditions prefer to use the direct-acting units, particularly the single input motif. These comprise over 4/5 of the regulatory interactions, whereas the proportion drops to roughly 1/2 in the endogenous conditions. The motif is over represented in DNA damage and stress response when compared with the random sub-graphs (p < 10-3). The indirect-acting feed forward motif, on the other hand, is extensively used in the cell cycle and sporulation and comprises over 2/5 of the regulatory interactions. It is used much 5-22 5.3 Results and Discussion more than expected at random in these conditions (p < 10-4). We discussed earlier that the clustering coefficients are larger in these conditions, indicating a higher degree of interregulation between the TFs. The use of feed forward motifs accentuates this observation. In fact, the regulatory neighbourhood with the highest clustering coefficient in the cell cycle subnetwork (c = 0.3) contains five inter-connected motifs, with Mbp1 as the primary TF and Swi4 as the final target gene. Though the motif is used sparingly in the exogenous conditions (the relative share drops to about 2/5 during stress response) they may still play an important role in filtering spurious signals from the environment (Mangan and Alon, 2003; Mangan et al., 2003). Previous studies have ascribed specific information processing tasks to motifs. It has been suggested that feed forward motifs can act as regulatory buffers; it acts as a circuit that responds only to persistent regulatory signals from the primary TF, and allows for a rapid shutdown of the signal (Mangan and Alon, 2003; Mangan et al., 2003). It would therefore appear suited to endogenous processes that require a controlled replacement of genes through multiple phases. With the use of feed forward motifs, the cell will only enter the next phase of the condition once the regulatory signal from the previous phase has stabilized. Furthermore, this signal can be quickly terminated once the cell has entered a new phase. The regulation of Ime2 during sporulation is an illustrative example. The gene is regulated by two early phase TFs – Rim1 as the primary and Ime1 as the secondary TFs – and it encodes for an important kinase that stimulates 20 further TFs during the middle and late phases. Subsequent phosphorylation of Ime1 by Ime2 ensures a quick shutdown of the transcriptional cascade during the latter stages of sporulation. Single input and multiple input motifs are designed for the simultaneous regulation of many genes, and are thus suited for controlling the large-scale turnover of genes seen in exogenous conditions. Single input motifs have previously been implicated in regulating systems of genes that function as a unit to form a complex or a pathway (Shen-Orr et al., 2002). Multiple input motifs can be thought of as an extension of this, but with stricter control because of the use of several TFs. An example is the regulation of three proteosomal subunits (Rpt2, Rpt4, Rpt6) by a lone TF, Rpn2, during DNA damage. 5.3.8 Inter-regulation of TFs in the cell cycle and sporulation Earlier we showed that the sub-networks of the endogenous conditions have larger clustering coefficients and longer path lengths compared with the exogenous conditions. We suggested that this results from a greater degree of inter-regulation between TFs, and the formation of regulatory chains. To study these observations in detail, we examined the temporal nature of the endogenous sub-networks as the cell progresses through successive phases, and observed how TFs regulate each other to bring about the sequential regulation of genes in a timedependent manner. Young and colleagues previously coined the term “serial inter-regulation” to describe the connectivity between the main cell cycle regulators (Simon et al., 2001). Here for 5-23 5.3 Results and Discussion the first, time we report the full scope of inter-regulation between all TFs used present during both endogenous conditions. The expression data for the cell cycle and sporulation provide measurements of gene expression levels at numerous time-points through the course of the cellular conditions. Genes were classified as being expressed at a particular cellular phase by the nature of the expression changes during these time-courses (Cho et al., 1998). Thus for the cell cycle, which we consider in detail now, genes were assigned to one of the five phases pre-defined by Cho et al. (Cho et al., 1998): early G1, late G1, S, G2, and M. To identify the active sub-networks during each phase, we then repeated the back-tracking procedure using these classified genes. The results of our analysis are presented in Figure 5.7. In Figure 5.7a, we show a cluster diagram of the TFs that are used during the cell cycle. The columns represent the 70 active TFs, and the rows correspond to the five cellular phases. We shaded the intersecting cell by the normalized number of genes targeted during a given phase. It is immediately obvious that most TFs operate during a particular phase, which is highlighted by their phase-specific targeting of genes. The activity of the major cell cycle regulators is in line with previous observations, emphasizing the validity of the methods we used in this paper (Futcher, 2002): Swi4 and Mbp1 are clustered in the late G1 phase (Koch et al., 1993), Fkh1 is found in G2 (Koranda et al., 2000; Kumar et al., 2000; Pic et al., 2000; Zhu et al., 2000), Mcm1 is in M (Koranda et al., 2000; Kumar et al., 2000; Pic et al., 2000; Zhu et al., 2000), and Ace2 and Swi5 are in the early G1 phase (McBride et al., 1999; McInerny et al., 1997). Of note is the residual activity of many TFs in additional phases; so for example, Swi4 and Mbp1 also target genes during the S phase as well as late G1. This is because many cell cycle regulators are often active during the transition between phases. It also underlines the somewhat arbitrary nature of the definition of phases in the expression data and the clustering method we used to group the TFs. Despite these limitations, it is clear that TFs are predominantly active during their assigned phases. In addition to the phase-specific TFs, we had a sizeable minority of TFs that are ubiquitously active throughout the cell cycle. These TFs regulate genes indiscriminately of the cellular phase. Interestingly, about a third of these TFs comprises the condition-independent hubs defined in Figure 5.5, and includes examples such as the Abf1 general transcriptional activator. 5-24 5.3 Results and Discussion Figure 5.7: Inter-regulation between TFs through the phases of the cell cycle Figure 5.7: (a) Cluster diagram depicting the activity of 70 transcription factors during the cell cycle. The five phases of the cell cycle are shown as rows, and the transcription factors are shown as columns. Names of transcription factors discussed in the main text are highlighted in red. The intersecting cell is coloured according to the normalized number of target genes a transcription factor regulates in each phase, and factors are clustered using the k-means algorithm. Distinct sets of transcription factors regulate genes during the different phases, as highlighted by the different colours. Phase-specific transcription factors are mainly active during a particular phase (early G1 – light blue, late G1 – green, S – orange, G2 – red, and M – magenta), and ubiquitous factors are active throughout the whole cell cycle (ubiquitous – dark blue). (b) Serial inter-regulation of transcription factors. A series of network diagrams depict the regulatory interactions between phase-specific transcription factors. Factors active in one phase regulate further factors in subsequent phases of the cell cycle; thus factors in early and late G1 regulate those in G2 and M, which in turn regulate factors in the G1 phase of the next cycle. (c) Parallel inter-regulation of transcription factors. A network diagram depicts a two-tier system of regulatory interactions from the ubiquitous factors that are active throughout the cell cycle to the phase-specific factors. The serial and parallel inter-regulatory processes act in tandem to drive the cell cycle forward. 5-25 5.3 Results and Discussion Of importance is the pattern of inter-regulation between TFs. Figures 5.7b,c depicts the two main methods of inter-regulation: serial and parallel. In serial inter-regulation (Figure 5.7b), TFs in one phase regulate further TFs in subsequent phases to drive the cell cycle forward (Simon et al., 2001). Among the complex circuitry, we can identify complete loops of regulatory interactions. Swi4 and Mbp1 in late G1 target TFs in the G2 and M phases. Fkh1 and Mcm1 in these phases in turn regulate Swi5 and Ace2 in early G1 of the next cell cycle. Finally Mcm1 loops back to regulate Swi4, the original TF. These types of regulatory chains contribute to the long path lengths observed in the endogenous conditions. In addition to serial inter-regulation, here we introduce the concept of parallel inter-regulation (Figure 5.7c), in which the ubiquitously active TFs regulate the temporal activity of the phasespecific TFs. Parallel inter-regulation effectively provides a two-tier system, in which the ubiquitous TFs provide a stable and prolonged signal for the phase-specific TFs. An example is the general regulator Abf1 that is active throughout the cell cycle. It regulates four TFs in the early G1 to M phases. Specificity is achieved by exchanging regulatory partners. Thus in early G1, Abf1 combines with Sin3 to regulate Ume6, which acts as a mitotic repressor (Kadosh and Struhl, 1997). In the M phase the same factor acts alone to regulate Pho2, a co-regulator of Swi5 for homothallic switching (Bhoite et al., 2002). In this way, a single TF can be involved in regulating several cellular functions, so ensuring a smooth transition between phases. Furthermore, as many of the ubiquitous TFs are also condition-independent hubs involved in housekeeping functions, they provide a channel of communication through which to coordinate the basic cellular processes and the progression of the cell cycle. There is also some reciprocal regulation from the phase-specific to ubiquitous TFs – shown in pale colours in Figure 5.7c – though these are much less frequent. We made similar observations for sporulation (Figure 5.8). The phase-specific TFs include known meiosis regulators Ime1, Ume6 and Ndt80, and many of the ubiquitous TFs comprise the condition-independent hubs. Again, the serial and parallel methods of inter-regulation operate in tandem to guide the cellular process through its time-course. 5-26 5.3 Results and Discussion Figure 5.8: Inter-regulation between TFs through the phases of the sporulation Figure 5.8: (a) Cluster diagram depicting the activity of transcription factors during the different stages in sporulation. The seven phases of sporulation are shown as rows, and the transcription factors are shown as columns. The intersecting cell is coloured according to the normalized number of target genes a transcription factor regulates in each phase, and factors are clustered using the k-means algorithm. Distinct sets of transcription factors regulate genes during the different phases, as highlighted by the different colours. Phase-specific transcription factors are mainly active during a particular phase (metabolic – light blue, early I and II – green, earlymiddle – orange, middle – red, mid-late – magenta and late - brown), and ubiquitous factors are active throughout the whole process of sporulation (ubiquitous – dark blue). (b) Serial interregulation of transcription factors. A series of network diagrams depict the regulatory interactions between phase-specific transcription factors. Factors active in one phase regulate further factors in subsequent phases of sporulation; thus factors in metabolic regulate those in early, which in turn regulate factors in the early-middle phase of sporulation. (c) Parallel interregulation of transcription factors. A network diagram depicts a two-tier system of regulatory interactions from the ubiquitous factors that are active throughout sporulation to the phasespecific factors. The serial and parallel inter-regulatory processes act in tandem to drive the whole process of sporulation forward. 5-27 5.5 References 5.4 Conclusions We examined the dynamics of the regulatory system in yeast under different cellular conditions and obtained unexpected results. By studying the dynamic interchange of regulatory interactions, we found that some transcription factors maintain the same interactions regardless of the cellular state, whereas others replace most interactions to provide alternative regulatory roles depending on the condition. Contrary to prior expectations, the dynamics of the regulatory system usage is accompanied by large structural changes in the network architecture. These changes include shortening of path length in environmental response conditions, and increased clustering of transcription factors with consequent refinement of control in internally driven multi-stage conditions. There are also major changes at the local level, in which specific network motifs appear to be favoured. We also investigated interactions between transcription factors in the context of the cell cycle, and we introduced the concept of serial and parallel inter-regulation acting in tandem to drive the cell through multi-stage conditions. We found that the role of the regulatory hubs in the network changes too. Although there is a small number of transcription factors that act as hubs throughout all cellular conditions, most of them behave like hubs in specific conditions only. The resulting picture is of a network that shifts its weight between different foci to coordinate distinct cellular processes. As highly connected transcription factors have a tendency to be lethal when removed from the system, this unveiled transient nature of the hubs has implications for their possibly condition-dependent lethality. Finally, our defined sets of regulatory hubs that are collectively active can be used in a bottomup approach for the classification of cellular conditions. One could even think of engineering entirely new cellular conditions by activating alternative combinations of regulatory hubs. 5.5 References Agrawal, H. (2002). Extreme self-organization in networks constructed from gene expression data. Phys Rev Lett 89, 268702. Albert, R. and Barabasi, A. L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47-111. Alon, U. (2003). Biological networks: the tinkerer as an engineer. Science 301, 1866-7. Andrews, B. J. and Herskowitz, I. (1989). Identification of a DNA binding factor involved in cell-cycle control of the yeast HO gene. Cell 57, 21-9. 5-28 5.5 References Bader, G. D. and Hogue, C. W. (2002). Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol 20, 991-7. Barabasi, A. L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286, 509-12. Barabasi, A. L. and Oltvai, Z. N. (2004). Network biology: understanding the cell's functional organization. Nat Rev Genet 5, 101-13. Bar-Joseph, Z., Gerber, G. K., Lee, T. I., Rinaldi, N. J., Yoo, J. Y., Robert, F., Gordon, D. B., Fraenkel, E., Jaakkola, T. S., Young, R. A. et al. (2003). Computational discovery of gene modules and regulatory networks. Nat Biotechnol 21, 1337-42. Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L. et al. (2004). The Pfam protein families database. Nucleic Acids Res 32, D138-41. Bhoite, L. T., Allen, J. M., Garcia, E., Thomas, L. R., Gregory, I. D., Voth, W. P., Whelihan, K., Rolfes, R. J. and Stillman, D. J. (2002). Mutations in the pho2 (bas2) transcription factor that differentially affect activation with its partner proteins bas1, pho4, and swi5. J Biol Chem 277, 37612-8. Blaiseau, P. L., Lesuisse, E. and Camadro, J. M. (2001). Aft2p, a novel iron-regulated transcription activator that modulates, with Aft1p, intracellular iron use and resistance to oxidative stress in yeast. J Biol Chem 276, 34221-6. Bowdish, K. S., Yuan, H. E. and Mitchell, A. P. (1995). Positive control of yeast meiotic genes by the negative regulator UME6. Mol Cell Biol 15, 2955-61. Breeden, L. and Mikesell, G. E. (1991). Cell cycle-specific expression of the SWI4 transcription factor is required for the cell cycle regulation of HO transcription. Genes Dev 5, 1183-90. Cho, R. J., Campbell, M. J., Winzeler, E. A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T. G., Gabrielian, A. E., Landsman, D., Lockhart, D. J. et al. (1998). A genomewide transcriptional analysis of the mitotic cell cycle. Mol Cell 2, 65-73. Christie, K. R., Weng, S., Balakrishnan, R., Costanzo, M. C., Dolinski, K., Dwight, S. S., Engel, S. R., Feierbach, B., Fisk, D. G., Hirschman, J. E. et al. (2004). Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res 32, D311-4. Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P. O. and Herskowitz, I. (1998). The transcriptional program of sporulation in budding yeast. Science 282, 699-705. Cormen, T. H., Leiserson, C. E., Rivest, R. L. and Stein, C. (2001). Introduction to algorithms. Cambridge, MA: MIT Press. de Menezes, M. A. and Barabasi, A. L. (2004). Fluctuations in network dynamics. Phys Rev Lett 92, 028701. Della Seta, F., Ciafre, S. A., Marck, C., Santoro, B., Presutti, C., Sentenac, A. and Bozzoni, I. (1990). The ABF1 factor is the transcriptional activator of the L2 ribosomal protein genes in Saccharomyces cerevisiae. Mol Cell Biol 10, 2437-41. DeRisi, J. L., Iyer, V. R. and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680-6. 5-29 5.5 References Diffley, J. F. and Stillman, B. (1989). Similarity between the transcriptional silencer binding proteins ABF1 and RAP1. Science 246, 1034-8. Featherstone, D. E. and Broadie, K. (2002). Wrestling with pleiotropy: genomic and topological analysis of the yeast gene expression network. Bioessays 24, 267-74. Forsburg, S. L. and Guarente, L. (1989). Identification and characterization of HAP4: a third component of the CCAAT-bound HAP2/HAP3 heteromer. Genes Dev 3, 1166-78. Futcher, B. (2002). Transcriptional regulatory networks and the yeast cell cycle. Curr Opin Cell Biol 14, 676-83. Gailus-Durner, V., Xie, J., Chintamaneni, C. and Vershon, A. K. (1996). Participation of the yeast activator Abf1 in meiosis-specific expression of the HOP1 gene. Mol Cell Biol 16, 277786. Gasch, A. P., Huang, M., Metzner, S., Botstein, D., Elledge, S. J. and Brown, P. O. (2001). Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Mol Biol Cell 12, 2987-3003. Gasch, A. P., Spellman, P. T., Kao, C. M., Carmel-Harel, O., Eisen, M. B., Storz, G., Botstein, D. and Brown, P. O. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 11, 4241-57. Gil, R., Zueco, J., Sentandreu, R. and Herrero, E. (1991). RCS1, a gene involved in controlling cell size in Saccharomyces cerevisiae. Yeast 7, 1-14. Goh, K. I., Kahng, B. and Kim, D. (2002). Fluctuation-driven dynamics of the internet topology. Phys Rev Lett 88, 108701. Guelzim, N., Bottani, S., Bourgine, P. and Kepes, F. (2002). Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet 31, 60-3. Hahn, S. and Guarente, L. (1988). Yeast HAP2 and HAP3: transcriptional activators in a heteromeric complex. Science 240, 317-21. Hartwell, L. H., Hopfield, J. J., Leibler, S. and Murray, A. W. (1999). From molecular to modular cell biology. Nature 402, C47-52. Horak, C. E., Luscombe, N. M., Qian, J., Bertone, P., Piccirrillo, S., Gerstein, M. and Snyder, M. (2002). Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. Genes Dev 16, 3017-33. Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y. and Barkai, N. (2002). Revealing modular organization in the yeast transcriptional network. Nat Genet 31, 370-7. Iyer, V. R., Horak, C. E., Scafe, C. S., Botstein, D., Snyder, M. and Brown, P. O. (2001). Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533-8. Jansen, R., Greenbaum, D. and Gerstein, M. (2002). Relating whole-genome expression data with protein-protein interactions. Genome Res 12, 37-46. Jeong, H., Mason, S. P., Barabasi, A. L. and Oltvai, Z. N. (2001). Lethality and centrality in protein networks. Nature 411, 41-2. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barabasi, A. L. (2000). The large-scale organization of metabolic networks. Nature 407, 651-4. 5-30 5.5 References Jorgensen, P., Nishikawa, J. L., Breitkreutz, B. J. and Tyers, M. (2002). Systematic identification of pathways that couple cell growth and division in yeast. Science 297, 395-400. Kadosh, D. and Struhl, K. (1997). Repression by Ume6 involves recruitment of a complex containing Sin3 corepressor and Rpd3 histone deacetylase to target promoters. Cell 89, 365-71. Kassir, Y., Adir, N., Boger-Nadjar, E., Raviv, N. G., Rubin-Bejerano, I., Sagee, S. and Shenhar, G. (2003). Transcriptional regulation of meiosis in budding yeast. Int Rev Cytol 224, 111-71. Kassir, Y., Granot, D. and Simchen, G. (1988). IME1, a positive regulator gene of meiosis in S. cerevisiae. Cell 52, 853-62. Kaufmann, E. (1993). In vitro binding to the leucine tRNA gene identifies a novel yeast homeobox gene. Chromosoma 102, 174-9. Klein, C. J., Rasmussen, J. J., Ronnow, B., Olsson, L. and Nielsen, J. (1999). Investigation of the impact of MIG1 and MIG2 on the physiology of Saccharomyces cerevisiae. J Biotechnol 68, 197-212. Koch, C., Moll, T., Neuberg, M., Ahorn, H. and Nasmyth, K. (1993). A role for the transcription factors Mbp1 and Swi4 in progression from G1 to S phase. Science 261, 1551-7. Koranda, M., Schleiffer, A., Endler, L. and Ammerer, G. (2000). Forkhead-like transcription factors recruit Ndd1 to the chromatin of G2/M-specific promoters. Nature 406, 94-8. Kumar, R., Reynolds, D. M., Shevchenko, A., Goldstone, S. D. and Dalton, S. (2000). Forkhead transcription factors, Fkh1p and Fkh2p, collaborate with Mcm1p to control transcription required for M-phase. Curr Biol 10, 896-906. Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K., Hannett, N. M., Harbison, C. T., Thompson, C. M., Simon, I. et al. (2002). Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799-804. Lieb, J. D., Liu, X., Botstein, D. and Brown, P. O. (2001). Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association. Nat Genet 28, 327-34. Lutfiyya, L. L., Iyer, V. R., DeRisi, J., DeVit, M. J., Brown, P. O. and Johnston, M. (1998). Characterization of three related glucose repressors and genes they regulate in Saccharomyces cerevisiae. Genetics 150, 1377-91. Madan Babu, M. and Teichmann, S. A. (2003). Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res 31, 1234-44. Mager, W. H. and Planta, R. J. (1990). Multifunctional DNA-binding proteins mediate concerted transcription activation of yeast ribosomal protein genes. Biochim Biophys Acta 1050, 351-5. Mangan, S. and Alon, U. (2003). Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci U S A 100, 11980-5. Mangan, S., Zaslaver, A. and Alon, U. (2003). The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. J Mol Biol 334, 197-204. Martinez-Antonio, A. and Collado-Vides, J. (2003). Identifying global regulators in transcriptional regulatory networks in bacteria. Curr Opin Microbiol 6, 482-9. Maslov, S. and Sneppen, K. (2002). Specificity and stability in topology of protein networks. Science 296, 910-3. 5-31 5.5 References Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A. E., Kel-Margoulis, O. V. et al. (2003). TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 31, 374-8. McBride, H. J., Yu, Y. and Stillman, D. J. (1999). Distinct regions of the Swi5 and Ace2 transcription factors are required for specific gene activation. J Biol Chem 274, 21029-36. McInerny, C. J., Partridge, J. F., Mikesell, G. E., Creemer, D. P. and Breeden, L. L. (1997). A novel Mcm1-dependent element in the SWI4, CLN3, CDC6, and CDC47 promoters activates M/G1-specific transcription. Genes Dev 11, 1277-88. Mewes, H. W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S. and Weil, B. (2002). MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30, 31-4. Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Sheffer, M. and Alon, U. (2004). Superfamilies of evolved and designed networks. Science 303, 1538-42. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. and Alon, U. (2002). Network motifs: simple building blocks of complex networks. Science 298, 824-7. Nebohacova, M., Novakova, Z., Haviernik, P., Betina, S. and Kolarov, J. (1996). Oxygenand carbon source-dependent transactivation effect of ABF1 on the expression of the AAC2 gene encoding mitochondrial ADP/ATP carrier. Folia Microbiol (Praha) 41, 115-6. Odom, D. T., Zizlsperger, N., Gordon, D. B., Bell, G. W., Rinaldi, N. J., Murray, H. L., Volkert, T. L., Schreiber, J., Rolfe, P. A., Gifford, D. K. et al. (2004). Control of pancreas and liver gene expression by HNF transcription factors. Science 303, 1378-81. Olesen, J., Hahn, S. and Guarente, L. (1987). Yeast HAP2 and HAP3 activators both bind to the CYC1 upstream activation site, UAS2, in an interdependent manner. Cell 51, 953-61. Oltvai, Z. N. and Barabasi, A. L. (2002). Systems biology. Life's complexity pyramid. Science 298, 763-4. Pic, A., Lim, F. L., Ross, S. J., Veal, E. A., Johnson, A. L., Sultan, M. R., West, A. G., Johnston, L. H., Sharrocks, A. D. and Morgan, B. A. (2000). The forkhead protein Fkh2 is a component of the yeast cell cycle transcription factor SFF. Embo J 19, 3750-61. Pilpel, Y., Sudarsanam, P. and Church, G. M. (2001). Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 29, 153-9. Pramila, T., Miles, S., GuhaThakurta, D., Jemiolo, D. and Breeden, L. L. (2002). Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle. Genes Dev 16, 3034-45. Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. and Barabasi, A. L. (2002). Hierarchical organization of modularity in metabolic networks. Science 297, 1551-5. Ren, B., Robert, F., Wyrick, J. J., Aparicio, O., Jennings, E. G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E. et al. (2000). Genome-wide location and function of DNA binding proteins. Science 290, 2306-9. Rzhetsky, A. and Gomez, S. M. (2001). Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 17, 988-96. Schlitt, T., Palin, K., Rung, J., Dietmann, S., Lappe, M., Ukkonen, E. and Brazma, A. (2003). From gene networks to gene function. Genome Res 13, 2568-76. 5-32 5.5 References Segal, E., Shapira, M., Regev, A., Pe'er, D., Botstein, D., Koller, D. and Friedman, N. (2003). Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34, 166-76. Shen-Orr, S. S., Milo, R., Mangan, S. and Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31, 64-8. Simon, I., Barnett, J., Hannett, N., Harbison, C. T., Rinaldi, N. J., Volkert, T. L., Wyrick, J. J., Zeitlinger, J., Gifford, D. K., Jaakkola, T. S. et al. (2001). Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697-708. Strich, R., Surosky, R. T., Steber, C., Dubois, E., Messenguy, F. and Esposito, R. E. (1994). UME6 is a key regulator of nitrogen repression and meiotic development. Genes Dev 8, 796810. Svetlov, V. V. and Cooper, T. G. (1995). Review: compilation and characteristics of dedicated transcription factors in Saccharomyces cerevisiae. Yeast 11, 1439-84. Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. and Church, G. M. (1999). Systematic determination of genetic network architecture. Nat Genet 22, 281-5. Tong, A. H., Lesage, G., Bader, G. D., Ding, H., Xu, H., Xin, X., Young, J., Berriz, G. F., Brost, R. L., Chang, M. et al. (2004). Global mapping of the yeast genetic interaction network. Science 303, 808-13. Vershon, A. K. and Pierce, M. (2000). Transcriptional regulation of meiosis in yeast. Curr Opin Cell Biol 12, 334-9. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S. and Bork, P. (2002). Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399-403. Wagner, A. (2001). The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol 18, 1283-92. Wagner, A. and Fell, D. A. (2001). The small world inside large metabolic networks. Proc R Soc Lond B Biol Sci 268, 1803-10. Wall, M. E., Hlavacek, W. S. and Savageau, M. A. (2004). Design of gene circuits: lessons from bacteria. Nat Rev Genet 5, 34-42. Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of 'small-world' networks. Nature 393, 440-2. Wuchty, S., Oltvai, Z. N. and Barabasi, A. L. (2003). Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet 35, 176-9. Xu, L., Ajimura, M., Padmore, R., Klein, C. and Kleckner, N. (1995). NDT80, a meiosisspecific gene required for exit from pachytene in Saccharomyces cerevisiae. Mol Cell Biol 15, 6572-81. Yamaguchi-Iwai, Y., Dancis, A. and Klausner, R. D. (1995). AFT1: a mediator of iron regulated transcriptional control in Saccharomyces cerevisiae. Embo J 14, 1231-9. Yu, H., Luscombe, N. M., Qian, J. and Gerstein, M. (2003). Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet 19, 422-7. 5-33 5.5 References Zeitlinger, J., Simon, I., Harbison, C. T., Hannett, N. M., Volkert, T. L., Fink, G. R. and Young, R. A. (2003). Program-specific distribution of a transcription factor dependent on partner transcription factor and MAPK signaling. Cell 113, 395-404. Zhu, G., Spellman, P. T., Volpe, T., Brown, P. O., Botstein, D., Davis, T. N. and Futcher, B. (2000). Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth. Nature 406, 90-4. 5-34