Evolutionary core of a protein interaction network in Plasmodium falciparum S. Wuchty1, A.-L. Barabasi2, M.T. Ferdig3 and J. H. Adams3,* 1 Northwestern Institute of Complexity, Donald P. Jacobs Center, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208, 2 Department of Physics, 225 Nieuwland Science Hall, University of Notre Dame, Notre Dame, IN 46556, 3 Department of Biological Sciences, Galvin Life Sciences Building, University of Notre Dame, Notre Dame, IN 46556. To whom correspondence should be addressed: E-mail: jadams3@nd.edu Keywords: malaria; Plasmodium falciparum; interactome; gene expression. Abbreviations: Running head: Interactome of malaria parasites The topology of biological networks reflects functional and evolutionary relationships among the constituents of a cell. In particular, protein interaction networks of different organisms though distinct share underlying evolutionary relationships that allow the theoretical inference of protein interactions. Modern bioinformatic methods can reconstruct these complex underpinnings of novel network architectures of nonmodel organisms, such as the devastating pathogen Plasmodium falciparum. Knowledge about the architecture of the prime networks of malaria parasites will enable us to understand the unique ways this important pathogen evolved to evade the hostile responses of the host’s immune system and to cope with the intracellular environments in host tissues. Utilizing protein interactions of S. cerevisiae and matching orthologous proteins in P. falciparum enables us to assess the probability that a link among yeast proteins has been conserved in malaria parasites by evaluating an interaction’s degree of local clustering of orthologous proteins. Characterizing the evolutionary core of interactions, we elucidate the underlying modular structure of the P. falciparum interactome shared with yeast to find that primary clusters contain core activities governing gene expression. In particular, we observe that exosome-proteasome functions are enriched in the inferred core of the P. falciparum interactome, specifying these pathways as key mechanisms controlling gene expression. These findings emphasize the role of cis elements and the corresponding trans RNA binding proteins as fundamental regulatory components for control of P. falciparum gene expression and indicate a critical role for post-transcriptional mRNA stability. In addition, we find that activities of the proteins of the core network remain cohesive within developmental stages. Nearly all functions of the core network occurred in the ring or the schizont phase while, remarkably, almost no expression activity of evolutionarily conserved interactions is observed in the trophozoite phase, suggesting that genes expressed during this period of parasite development result from evolutionarily adaptations distinct from the yeast lineage. The identification of this particular core of interactions potentially can lead to understanding of pathways and processes of the parasite, illuminate basic biological functions and, consequently, prioritize targets for therapeutic intervention. Introduction. An important challenge confronting modern biology is whether the wealth of information accruing from the study of model organisms can be applied to the pervasive, intractable microbial diseases that plague human kind. In particular, the global burden of malaria continues to worsen in many developing countries with a devastating impact on human health and corresponding impediment to economic improvement (Snow et al. 2005). Recent sequencing efforts yielded extensive annotations of the Plasmodium falciparum genome as well as several other malaria parasites (Gardner et al. 2002a; Hall et al. 2002; Hyman et al. 2002; Bozdech et al. 2003a; Le Roch et al. 2003). Despite this abundance of primary genomic and proteomic information, little is known about the web of the protein interactions that governs the unique biology of malaria parasites, illustrating the need for novel comparative tools to tap the information hidden in the genomic sequence. Tools that can facilitate searches of massive genome databases for evolutionary relatedness are now powerfully employed to explore gene function in a few model organisms. It is becoming increasingly evident that evolutionary information is conserved at higher orders of genome organization as clarified by the recent proliferation of studies of protein network topologies and their biological implications (Wuchty et al. 2003; Han et al. 2004; Li et al. 2004). Since network architecture contains fundamental information about the complex web of cellular and sub-cellular components, knowledge of the web of protein interactions facilitates our understanding of an organism’s basic biology. Knowledge of the P. falciparum interactome will provide a valuable insight into the global protein interaction network, which underlies parasite-specific organization of protein-protein interactions and will ultimately help to determine factors that contribute to the lethally efficient biology of this parasite. The challenges of working with this non-model organism led us to find an informatic approach to determine basic Plasmodium-specific protein interactions. Comparison of comprehensive data sets of interactions of network topologies in a few model organisms’ networks suggests that a few organizing principles give rise to the emergence of complex protein web structures in the malaria parasite as well (Barabasi and Oltvai 2004). Therefore, we wondered if this abundant information about protein interaction webs from well-studied model organisms could be used to drive the inference of an evolutionarily conserved core set of protein interactions in P. falciparum. Known protein interaction networks feature a small number of highly connected proteins, that secure the integrity and connectivity among modules despite the tremendous differences in the organism’s developmental cycle (Jeong et al. 2001; Han et al. 2004). The critical role of highly connected nodes is demonstrated by their elevated propensity to be essential for survival, and by their evolutionary conservation (Jeong et al. 2001; Wuchty 2002, 2004). This focus on linkages among proteins leads to a higher-order network view in which the presence of conserved modules or bundles of cohesively bound nodes indicate archetypal patterns of evolutionary building blocks (Wuchty et al. 2003), a blueprint reinforced by the tendency of the modules to be co-expressed (Ge et al. 2001). Because evolutionary signal is strongly retained in the network topology, we tapped this information to better understand the interactome of P. falciparum. A possible obstacle to this approach is the severe error-proneness of experimental methods for the determination of protein interactions, possibly limiting the integrity and usefulness of existing protein interaction sets. A recent estimate of the accuracy of protein interactions in S. cerevisiae uncovered a startling false negative rate of 90%, and a 50% false positive error rate (von Mering et al. 2002). Fortunately, this experimental noise can be overcome, since the topology of a protein interaction network contains information to assess an individual link's quality. Using a link-based clustering coefficient that reflects the degree of clustering of an interaction’s immediate neighborhood, Goldberg and Roth identified pronounced correlations between local clustering and the actual presence of a protein interaction in S. cerevisiae (Goldberg and Roth 2003). Independent observations that well-embedded interactions were highly reliable and their protein constituents were preferentially conserved allows us to propose that a significant and direct correlation exists, providing a rational basis to use orthologous information for the inference of protein interactions in another species. The method we introduce here aims to access the information in the network structure that has been retained between yeast and malaria parasites. Identification of this cohesion will highlight fundamental, conserved modular units operating in P. falciparum, that, when overlaid with high-resolution Plasmodium-specific transcriptional profiles (Bozdech et al. 2003a), will point to basic properties and their prominence at various stages throughout the Plasmodium life cycle. The availability of transcriptional data is a third and key dimension (in addition to primary sequence identity and network topology) that effectively superimposes the dynamic nature of the living cell onto the static inferred protein network, allowing us to parse out interactions as they relate to biological activity of the modules in the malaria parasite. Given a comprehensive set of orthologs, a framework that incorporates network architecture represents a novel means to elucidate evolutionary relevant protein interaction networks in a targeted organism. Therefore, we have developed a method that can identify a core set of protein interactions of organisms for which extensive genome annotation is available, but no comprehensive protein interaction data yet exist. These methods could serve as the starting point of future systematic experimental investigations of the interactomes of other species that lack direct protein interaction data. Results and Discussion Stability of evolutionary signals Among the continuously growing pool of organisms for which experimentally derived protein interactions are currently available, the interactome of S. cerevisiae remains the most finely characterized and validated. Utilizing a set of high quality interaction data compiled by Han et al., we pool 1,330 proteins that are involved in a total of 2,448 manually curated interactions. Despite their phylogenetic differences, S. cerevisiae and P. falciparum have a considerable number of identifiable orthologous proteins. Utilizing the InParanoid scripts (see Methods) for the determination of orthologous groups of proteins, we find 856 yeast proteins with an ortholog in P. falciparum. Investigating the possibility that highly clustered links also contain an evolutionary signal, we measure the local clustering around a link’s immediate network neighborhood by determining the simple hypergeometric clustering coefficient Cvw (Goldberg and Roth 2003) for every interaction in the yeast protein network (see Methods). Logarithmically binning the interactions according to their Cvw, we determine the excess retention (Wuchty 2004) of the orthologous proteins in P. falciparum that constitute these groups of interaction. If topology did not carry an evolutionary signal, the propensity of interacting proteins to be evolutionary conserved would not scale with the degree of local clustering around a specific interaction. Instead, we observe a significant and positive correlation between the simple hypergeometric clustering coefficients and evolutionary excess retention of proteins that constitute the corresponding interactions (Fig. 1a). Co-expression of Conserved Protein Interactions Searching for preferential co-expression of these protein pairs we determine Pearson's correlation coefficients rP from a comprehensive set of Plasmodium specific co-expression data (Bozdech et al. 2003a). Logarithmically binning all protein interactions in yeast according to their hypergeometric clustering coefficient Cvw and determining the mean coexpression correlation in each bin, we observe that the distribution of protein pairs that are fully conserved shows a clear and significant positive trend. Conversely, the distribution as maintained by interacting proteins without orthologs in Plasmodium shows an indifferent behavior (Fig. 1b). Subsuming the coincidence of high local clustering around conserved interacting proteins which also display a high propensity to be coexpressed in yeast we argue that a combination of these observations can reliably infer protein interactions (Fig. 1c). Since the data set of high quality interactions we used for the preliminary analyses is rather small, we have inferred interactions in Plasmodium from a larger protein interaction set in Yeast. Utilizing a large but noisy data set of Yeast as of the DIP database (Xenarios et al., 2002) that combines 3,833 proteins involved in 11,942 interactions, we apply a logistic regression method as a classification scheme to assess the quality of the newly added interactions. In particular, we feed the model with a positive and negative training set that features the parameters of hypergeometric clustering coefficient Cvw and coexpression correlation rP of interactions. As a classification parameter we label each interaction if its corresponding proteins have orthologs in Plasmodium. A crucial point in the set up of such a model is the choice of appropriate training sets. As for the choice of a positive training set we focus on our observations in Fig. 1a, suggesting that there exists a useable correlation between Cvw and rP when the corresponding interacting pairs of proteins are both conserved. In turn, distributions of interactions in which proteins are not conserved fail to provide a clear correlation. Enabling a reliable classification of interactions, there exist many ways to determine positive and negative trainings set that. Here, we find the best results by randomly choosing 500 interacting pairs of proteins as positive training set that both have orthologs in Plasmodium. In turn, as a negative control counterpart we randomly choose 500 from the pool of non conserved interactions. Applying a leave-one-out strategy to assess the prediction accuracy we obtain up to 99% exact predictions and apply this trained model on the set of yeast interactions, as of the DIP database, which are conserved in Plasmodium, thereby allowing us to pool 377 proteins and 630 interactions. We observe that the distribution of the interactions that are classified as correct significantly accumulate around high values of the co-expression coefficient while noncorrect links distribute much more evenly (Fig. 1d). Accounting for interactions that appear between conserved proteins in Plasmodium, we find a strong shift toward co- expression of the correctly classified interactions in the target organism (Fig. 1d, inset). Modular structure To determine functional modules in the obtained network, we utilize the MarkovClusterin-Algorithm (MCL) (Enright et al, 2003), which makes use of the topological fact that nodes which are embedded in well inter-connected parts of a network share most of their links with nodes of the same cluster, while a small fraction of links connect remote clusters. Therefore, we expect that a random walker will predominantly travel within a cluster and jump to other ones only sporadically. Mathematically, we construct a k x k dimensional matrix M, where Mij = wij, and wij is the weight of the interaction between i and j. For our case, we applied the function wij = 1 + rij, where rij is the correlation coefficient of interaction ij, to avoid the occurrence of negative weights. This adjacency matrix is a stochastic matrix; each entry of this particular matrix tells the probability that a random walker will take edge ij to neighbor j if beginning from node i. Alternatively, we (i) expand by matrix multiplication (i.e. matrix squaring) and (ii) renormalize by an inflation procedure resulting again in a stochastic matrix (see Methods). This process of alternating inflation and expansion is repeated until the resulting stochastic matrix takes the form of a doubly [idempontent?] matrix, i.e it does not change with further inflation/expansion cycles, leading to final matrix composed of several connected components, i.e. the sought clusters. The MCL algorithm has a tuneable[?] parameter, the exponent of the inflation parameter r. To get an assessment of the clustering’s quality we applied recently introduced functional modulation measure (Marcotte :04)[lit]. Essentially, in each module we obtain from the markov-clustering procedure we count the occurrence of protein pairs that share the same annotation as of the Gene Ontology (GO), a fraction that is balanced by the size of each cluster (see Methods). We find 134 clusters, many of them as disconnected components of the underlying network. As shown in Fig. 2, most clusters are rather small, while a minority of clusters is larger in size, indicating the presence of a power-law shaped tail in the frequency distribution of cluster sizes (data not shown). Noteably, we find a variety of larger clusters that represent functionally consistent accumulations of nodes, as indicated by shades. Predominantly, we find proteasome components, ribosomal proteins such as snRNPs, a large group of translation initiation factors and related proteins, exosome and spliceosome components, replication factor c, DNA related functions such as DNA polymerase and helicase subunits, ribosomal proteins and translation factor subunits. Nearly all of the identified clusters display a relatively high degree of co-expression (Fig. 2), where the co-expression patterns of intramodular links closely reflect the nature of underlying clusters. For example, the cluster that harbors proteins that play a dominant role in translation show a clear tendency for its interactions to be active at distinct points in the cell cycle. This observation supports earlier findings that varying environmental or cellular conditions imply changes in the activity of protein-protein interactions (Han et al. 2004), regulation webs (Luscombe et al. 2004), and metabolic paths (Almaas et al. 2004). We find similar patterns for replication factors and proteins that are involved in the exosome and chromosome condensation. From a systemic point of view, our network representation allows us to recover the stage-specific expression of protein complex constituents. For example, the subunit complements of ATP-synthase and RNA polymerase are necessarily co-expressed to ensure functional protein complexes. In contrast, translation comprises a series of distinct protein activities, operations that are occur sequentially. So, the clusters we found in this evolutionary core network either associate with large protein complexes that consist of simultaneously transcribed, physically interacting proteins (e.g., exosome) or proteins that belong to a spectrum of complex multi-component processes (e.g., transcription and translation). Special clusters of the Interactome Implicated in Control of Gene Expression In metazoan organisms transcription factors play a dominant role in the control of gene expression. A lack of many identifiable transcription factors in the Plasmodium genome is a conundrum in light of the apparent tight regulation of P. falciparum gene expression, which is coordinated through a continuous cycle of development (Blair et al. 2002a; Blair et al. 2002b; Le Roch et al. 2002; Bozdech et al. 2003a; Le Roch et al. 2003). Similar to higher eukaryotes, initial steps of gene activation occur through chromatin remodeling by SWI/SNF complexes and histone acetylation (Aravind et al. 2003; Duraisingh et al. 2005; Freitas-Junior et al. 2005; Ralph et al. 2005), followed by promoter recognition mediated through a TATA binding protein ortholog. Although it is possible that Plasmodium organisms do possess elaborate sets of transcription factors that simply have not been identified because they lack known orthologs, it is unlikely. In the absence of regulators of initiation of transcription after chromatin remodeling, previous studies have argued that post-transcriptional regulation may play the dominant role in controlling gene expression in malaria parasites (Carlton et al. 2002; Gardner et al. 2002b; Gardner et al. 2002a). The exosome is one of the most important protein complexes for post-transcriptional RNA processing in eukaryotic cells (Butler 2002; Haile et al. 2003; Raijmakers et al. 2004). In our elucidated network, the exosome scores high as a cohesive cluster and is identified as a prime component of the P. falciparum interactome (Fig. 2). The components of the exosome include 3’5’ exonuclease functions in both the nucleus and in the cytoplasm, along with associated proteins that coordinate exonuclease activity for specific functions in each of the cellular compartments (Murase et al. 1993; Koonin et al. 2001; Butler 2002; Mukherjee et al. 2002; Raijmakers et al. 2002; Haile et al. 2003; Lehner and Sanderson 2004). Functions in the nucleus include trimming the 3’ ends of the rRNA subunits and mRNA surveillance for incorrect splicing or polyadenylation. [RRP6 (PF14_0473) and Mtr4 (PFF0100w,)does not exist anymore)] Are exosome factors necessary to regulate this nuclear-specific processing and degradation of rRNA and pre-mRNA, respectively? Chaperone proteins that facilitate movement through nuclear pores control access to the nucleus for the components of the exosome and most other nuclear proteins, which may be more important in organisms like P. falciparum that maintain a nuclear envelope throughout its development cycle. Reflecting this critical role in regulating parasite growth, importin alpha (PF08_0087) is one of the most highly connected hubs in this network. This protein is the primary chaperone protein for controlling protein import into the nucleus in association with importin beta (PF08_0069) and RAN (PF11_0183) NTF2 (PF14_0122), all present in this network (Fig. 2). The unexpectedly high number of P. falciparum proteins with RNA-recognition motifs (Aravind et al. 2003) further supports post-transcriptional processing as critical for controlling gene expression. Two surveillance mechanisms, nonsense mediated decay (NMD) and AU-rich elements (ARE), are generally important for controlling mRNA stability in the cytoplasm of eukaryotes (Wilusz et al. 2001; Lehner and Sanderson 2004). The NMD is a general decay mechanism that targets aberrant transcripts that are incorrectly spliced or lack a stop codon, and ARE are gene-specific cis regulatory elements typically important in stage-specific degradation or stabilization of mRNA. Only recently has evidence emerged that ARE’s are potentially important in regulating gene expression in Plasmodium (Hall et al. 2005), although there is strong evidence that ARE have an integral role regulating gene expression in other parasitic protozoa (Haile et al. 2003). In most eukaryotic cells, mRNA degradation begins by 3’ deadenylation followed by 5’ decapping, then degradation is completed by either 5’3’ (yeast) and 3’5’ exonucleolytic pathways (metazoans) (Wilusz et al. 2001). It is not clear how mRNA degradation proceeds in Plasmodium, since there is no clear homologue to yeast DCP (pfam06058). Based on our network interactions the exosome 3’5’ pathway emerges as a major exonucleolytic decay mechanism similar to higher eukaryotes. Additional RNA decay pathways, such as LSM complex, may carry out 5’3’ degradation although this would be unusual to act independently of DCP. The relative lack of the typical cellular components attacking 5’ mRNA while the retaining 3’exonuclease activity, as well as nonsense mediated decay mechanisms, suggests that regulation of mRNA stability through poly-A nuclease complexes has the dominant role in controlling mRNA degradation and gene expression. Exonuclease activities are likely to be coordinated through stage-specific recognition of AU-rich elements in the 3’UTR, such as suggested for gametocyte-specific gene expression (Golightly et al. 2000; Cann et al. 2004; Shue et al. 2004; Hall et al. 2005). This mechanism is recognized in a number of organisms as an important fastresponse mechanism regulating mRNA degradation or stability in a stage-specific manner through recognition of RNA binding proteins (Mukherjee et al. 2002; Vasudevan et al. 2002; Duttagupta et al. 2003; Haile et al. 2003). Such coordinated functions of RNA degradation and protein degradation coupled with translation and transcription, respectively, reflects the evolutionary conserved nature of the proteasome-exosome complexes (Koonin et al. 2001) and corroborates the highly linked hubs in the elucidated P. falciparum network. When ARE-containing transcripts are stabilized by ARE-binding proteins, such as HuR, up-regulation of a ubiquitin-dependent proteasome mechanism leads to rapid destabilization of the ARE-mRNA (Laroia et al. 2002). Although this mechanism is still poorly characterized, other ARE-binding proteins can exert the opposite effect (e.g., AUF1) and their abundance is regulated by ubiquitin-dependent proteasome degradation (D'Orso and Frasch 2002; Moraes et al. 2003; Donnini et al. 2004; Mawji et al. 2004). Based on the prominent role of the proteasome-exosome roles in the P. falciparum interactome, we propose that this mechanism is a ubiquitous regulator of gene expression and not just important in sexual stage gene regulation. For example, a consensus AU-rich element (WWAUUUAUUUAWW) is evident in the transcript for EBA175, a merozoite protein whose expression is tightly controlled (Blair et al. 2002b; Bozdech et al. 2003b; Le Roch et al. 2003). Expressed at high levels only at the end of intraerythrocytic development, the transcripts of eba175 rapidly disappear once this stage completes development. Consistent with this interpretation of the exosome-proteasome in regulating gene expression, AREs also are considered to be important elements in governing gene expression in the Kinetoplastida, an evolutionarily distant family of parasitic protozoa (Estevez et al. 2001; Haile et al. 2003)(D'Orso and Frasch 2002; Milone et al. 2002). Characterisation of Nodes As an outcome of the algorithm, the final partition of clusters also allows a characterization of nodes with respect to their roles in connecting and/or facilitating the integrity of the networks modules. The intra-modular degree Z of a node i reflects its well-connectedness among nodes in the same module. In Fig. 3a, we observe that proteins of the inferred protein interaction network in Plasmodium tend to increase their Z with ascending degree k. We utilize these measures for a heuristic classification according to their placement in the k-Z space. As such, we define nodes that have at least 5 neighbors as hubs (Han et al, 2004). In particular, we refine this definition by labeling hubs with a Z that scores above 0 as party hubs because most of a hubs links accumulate in its given cluster. In contrast, the remainder represents the set of date-hubs, reflecting the hubs propensity to connect different clusters. Determination of the mean co-expression level of all interactions in which a given node participates (Fig. 3b) suggests that hubs tend to be largely co-expressed with their interaction partners, as indicated by strong signal around rP = 0.9. However, for date hubs, we also find a considerable strong signal around 0.5. Furthermore, we surprisingly find another peak around -0.5 and -0.7 for party hubs which indicates the presence of an inhomogeneity in the expressed neighborhood. Recently, Han et al. suggested a distinction of hubs according to the level of co-expression with their interaction partners (Han et al. 2004). A hub that was predominantly co-expressed with its neighbors was called a party hub. In contrast, party hubs have an inhomogeneous expression profile, indicating that they are predominately active with their neighbors at different times (Han et al., 2004). In quantitative terms, nodes were distinguished according to the averaged co-expression correlations with their neighboring nodes, allowing them to find peaks around 0.5 (party hubs) and 0.1 (date hubs) in frequency distributions. We do not observe such properties of date and party hubs in our network. In fact, we find that the transition from a party to a date hub in terms of a node’s propensity to participate mainly in a cluster and vice versa is surprisingly smooth. In the same way, we find slight differences in the distribution of mean fractions of shared GO annotations. While date hubs tend to have a more homogeneous distribution of functional neighbors we observe a slight deviation toward functionally inhomogeneous interaction partners of hubs (inset, Fig 3b)., an observation that meets our expectations. Specifically, we argue that party hubs share similar functionality with their neighborhood since their simultanenous activity indicates the presence of a functional cluster. In turn, date hubs whose neighbors have a scattered activity profile might connect and integrate different functions in a timeframe. To underline this point, we find in a list of the 50 highest connected nodes (Table 1) numerous regulatory subunits of the 26S proteasome, which have been classified as party hubs. These are generally coexpressed, suggesting that the functional homogeneous apparatus that facilitates protein degradation acts as a timely and spatially integrated unit. In contrast, we find translation initiation factor EIF-2b (PF08_0009), which is largely anti-expressed with its neighbors (rP = -0.4), though classified as a party hub, an observation supported by reports that translation factor related proteins are active in many aspects of the cell-cycle. These and other hubs can be found in Table 1 (Think this part needs work, any suggestions….the relevance must be stated clearly. Time Dependant Protein Interactions Malaria parasites’ transcriptional activity has the superficial appearance of a continuous cascade that masks the coordinated co-expression of functionally linked protein interactions. We have deduced the pattern of conserved co-expression in our network of core interactions to discover that the subnet modularity is time-dependent. Highly interactive proteins representing major hubs in the parasite’s metabolic activity vary with parasite development. Plasmodium proteins sort according to their maximum expression in the parasite’s cell cycle (Fig. 4) group according to their appearance in the three major stages of asexual development in the erythrocytic stage: ring, trophozoite, and schizont. Activities of the proteins of the core network remain cohesive within developmental stages, but somewhat surprisingly the separate clusters are highly segregated in time during development. As the parasite progresses through its intraerythrocytic growth cycle the flow of metabolic activity changes course and alters protein interactions within the network. Nearly all activity of the core network occurred in the ring or the schizont phase while remarkably almost no expression activity is observed in the trophozoite phase, suggesting genes expressed during this phase of the Plasmodium life cycle evolved as a result of evolutionarily recent adaptations. In the ring stage parasites the major clusters are involved with gene expression, metabolism and cellular transport. In the late-stages, the dominant hubs of the cellular network are the components of the proteasome and chromosome condensation, reflecting the parasite’s need to discard the trash and pack its genes for travel. The absence of any major evolutionarily conserved hubs during trophozoite developmental stage indicates that Plasmodium genes expressed during this time (i.e., lack an orthologue in yeast) are those metabolic activities that have evolved as the organism evolved its parasitic life style. This provides an important evolutionary insight, indicating that protein clusters involved in inter-cluster interactions tend to co-evolve. Outlook Here we provide the first biological characterization of a protein interaction network of Plasmodium that is completely inferred from the model organism yeast with out the knowledge of any experimental observation of interactions in questions. Although our process of obtaining interactions is a theoretical one, our qualitative analysis not only indicates that interactions among orthologous proteins are potentially conserved as well. Our approach recovers evolutionary and transcriptionally relevant entities underlining the prevalence of organizational structures that persisted in evolutionary time. It is important to note that because we did not identify an interaction in our inferred network this does not necessarily imply its absence. Expected-butmissing parts of modules or corresponding module interactions may represent unique divergences that distinguish the network structure of the malaria parasites from yeast. Such divergences may be especially interesting since it has been shown that the rate of evolution may be focused at the connections between modules (Guimera and Nunes Amaral 2005). Therefore, even though the modules themselves are highly conserved units, unique Plasmodium-specific links between modules may well highlight critical features of the parasite that can be exploited as therapeutic targets. We expect that further analysis of the inferred network in Plasmodium will uncover biologically significant information about novel interactions between the conserved clusters. Another type of novelty in the parasite interactome will involve absence of conserved clusters and their interactions. This is seen where known yeast partners are missing their orthologous partners in the Plasmodium, which represent either a loss of function or acquisition of novel cluster evolved for the metabolic needs of the malaria parasite. Several extensions of our method can be envisioned to benefit from the proliferation of whole-genome databases that continuously enrich the power of network inference. Our initial elucidated network for malaria parasites can be strengthened by deeper searches for orthologs using all Plasmodium species to query the non-redundant database. Similarly, a concatenated network comprising all known (and validated) interactions across the tree of life will be an important tool to recognize distant phylogenetic relationships with yeast and other organisms for which refined network data exist. More functional relationships can be elucidated within the dimensions of malaria genome expression by profiling transcription, at much higher resolution under a variety of conditions of cellular life (e.g. perturbations, and strain-specific variants). Eventually we expect to construct a comprehensive phylogenetic scaffold of networks using the methods developed in this project onto which new protein-protein interaction data can be placed. Since there is still considerable noise in the protein interaction data available from current technologies, such as from yeast two-hybrid determinations, a universal scaffold will be a powerful tool to interpret proposed interaction data. Materials and Methods Protein Interactions: As a reliable source of protein interactions we chose the manually curated set of yeast interactions presented in (Han et al., Nature 2004), allowing for 1,330 nodes and 2,448 interactions. Since the network obtained here is rather small, we also utilized yeast orptein interaction data from the DIP database (Xenarios et al., 2002). The current version contains 3,833 proteins involved in 11,942 interactions derived from combined, non-overlapping data which are mostly obtained from the highthroughput application of the two-hybrid method. Orthologous Data: Utilizing all-versus-all BLASTP searches determined by the InParanoid script (Remm et al. 2001) in protein sets of two species, sequence pairs with mutually best scores were selected as central orthologous pairs. Proteins of both species showing an elevated degree of homology were clustered around these central pairs, a procedure that forms orthologous groups. The quality of the clustering was then assessed by a standard bootstrap procedure. The central orthologue sequence pair that provides a confidence level of 100% was considered as the real orthologous relationship while proteins with a lower level of confidence were considered as their in-paralogues. In our study, we selected only the central orthologue sequence pairs of each group, resulting in 856 proteins with orthologues in P. falciparum. Orthologous Excess Retention: According to their hypergeometrical clustering coefficient C of the interactions they are involved in, we grouped all proteins in bins of logarithmically increasing C. For each group of NC proteins, the fraction of proteins that also have an ortholog is defined as eC,o = nC,o/NC. In the absence of a correlation between evolutionary conservation of interacting proteins and their position in the network, eC,o has the general C-independent value e = no/N, where no is the total number of yeast proteins having an ortholog, and N is the total number of yeast proteins in the underlying network. Thus, we define the clustering-dependent excess retention of such proteins as ERC,o = eC,o/eo, which has the C-independent value ERC,o = 1 for a random distribution of orthologous proteins (Wuchty 2004). Hypergeometric Clustering Coefficient: Recently, a network topology based approach uncovered a remarkable correlation between enhanced quality of protein interactions and the degree of clustering of their immediate network neighborhood (Goldberg and Roth 2003). Considering a protein-protein interaction network with N nodes, we define the hypergeometric clustering coefficient as where N(x) represents the neighborhood of a vertex x and N is the total number of proteins in the network. Given fixed neighborhood sizes N(v) and N(w) of proteins v and w, the hypergeometric clustering coefficient increases with elevated overlap between the protein's neighborhoods. Provided that the neighborhoods are independent, the summation can be interpreted as a P-value, reflecting the probability of obtaining a number of mutual neighbors between proteins v and w at or above the observed number by chance (Goldberg and Roth 2003). Intramodular degree: While the degree k alone is a very local measure of a node’s connectivity, the role of a node in a module can be assessed by its intramodular degree. If i is the number of intramodular links of node i while s is the mean number of intramodular links in module s, we define the intramodular degree as the Z-score of i: Zi i s s reflecting the degree of a nodes well-connectedness to other modules (Guimera and Nunes Amaral 2005). Expression coefficients: Genes with similar expression profiles are likely encoding interacting proteins. For P. falciparum, we utilized gene expression data, compiling 4,318 genes over 48 timepoints (Bozdech et al. 2003a). As a gene similarity metric we calculated Pearson’s correlation coefficient for every protein interaction. For yeast, we downloaded data sets of 1,051 different experiments from the Stanford Microarray database (ftp://), and calculated Pearson’s correlation coefficient for every protein interaction, accordingly. Cell-cycle specific expression data: By determining their maximum expression in the parasites cell cycle), we grouped the proteins of P. falciparum according to their appearance in the three major stages of asexual development in the erythrocytic stage: ring, trophozoite, and schizont (Bozdech et al. 2003a). Logistic regression: In order to get an estimate of an interactions reliability, we employed a logistic regression model. According to the logistic regression, the probability of a true interaction Tvw given the two input variables hypergeometric clustering coefficient x1 = Cvw and correlation coexpression x2 = rP, X = (x1, x2) Pr(Tvw | X ) exp( 0 1 x1 2 x 2 ) 1 exp( 0 1 x1 2 x 2 ) where n are the parameters of the distribution. Given training data we optimized the distribution parameters by maximizing the likelihood of the data. Here, applied the corresponding routines as of the Biopython package (www.biopython.org) , where we randomly chose 500 high quality protein interactions as of Han et al as true positives if they were fully conserved in Plasmodium as positive examples. Because of the vast abundance of false positives we randomly selected 500 intercation as of the DIP data set as negative examples. Evaluating the prediction accuracy of the our model we found from a leave-one-out analysis, in which the model is recalculated from the training data after removing the interaction to be predicted, allowing us to achieve a 70% accuracy. Markov-Cluster-Algorithm (MCL): In order to uncover the community structure of the inferred core protein interaction network of Plasmodium, we utilized a Markov-Cluster-Algorithm (MCL) (Enright et al.) designed specifically for computational graph clustering. Topologically , nodes which are embedded in well interconnected parts of a network share most of their links with nodes of the same cluster, while a small fraction links connect remote clusters. Hence, we expect that a random walker will predominantely travel within cluster and jump to other ones sporadically. Mathematically, an undirected network consist of k nodes that can be represented as a k x k dimensional matrix M, where Mij = wij, where wij is the weight of the interaction between i and j. In our case, we applied the function wij = 1 + rij, where is the correlation coefficnet of interaction ij, in order to avoid the occurrence of negative weights. Due to convergence reasons for this algorithm we introduce self-loops on each node, i.e Mii = 2. M turns into a column stochastic matrix T by normalizing each column sum to unity through the diagonal matrix d, whose entries are dkk = i Mik, giving T = Md-1. Thus, the entry Tij represents the probability for a random walker to directly jump from node i to j. The stochastic matrix T is alternately (i) expanded by matrix multiplication (i.e. matrix squaring) and (ii) renormalized by an inflation procedure resulting again in a stochastic matrix. Formally, the inflation operator r is defined as (r T ) pq (T pq ) r . k (T i 1 pq ) r This process of alternating inflation and expansion Is repeated until the resulating stochastic matrix T takes the form of a doubly idempontent matrix, i.e it does not change anymore with further inflation/expansion cycles. The final matrix is composed of several connected components which take the form of star like shapes, forms which are interpreted as the sought after clusters. Modularity of the network: In order to elucidate meaningful partitions of the network, we applied the MCL algorithm, chosing the inflation parameter r = 1, …, 4 in steps of 0.25. Evaluating the partitions thus obtained, we defined protein modules by optimizing the functional coherence and size of thje clusters (Marcotte). In particular, the functional coherence of cluster I fci is calculated as the fraction of annotated gene pairs that share at least one functional annotation fpi given cluster with a total of pi annotated pairs in the ith cluster fci fpi , pi a measure that tends to be high for small clusters while diminishes if more proteins are included. As a source of reliable annotation information we utilized the Gene Ontology (cite GO) In turn, we balance that trend by maximizing the size of the given clusters , defining the modulation efficiency EM as EM N 1 n fc N , i 1 i i where n is the number of clusters, N is the total number of proteins while Ni is the number of proteins in the ith cluster. Thus, the partition with the highest modulation efficiency reflects the best compromise between efficiency of clustering and degree of functional association between proteins in a cluster. References Almaas E, Kovacs B, Vicsek T, Oltvai ZN, Barabasi AL (2004) Global organization of metabolic fluxes in the bacterium Escherichia coli. Nature 427(6977): 839-843. Aravind L, Iyer LM, Wellems TE, Miller LH (2003) Plasmodium biology: genomic gleanings. Cell 115(7): 771-785. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell's functional organization. Nat Rev Genet 5(2): 101-113. Blair PL, Kappe SH, Maciel JE, Balu B, Adams JH (2002a) Plasmodium falciparum MAEBL is a unique member of the ebl family. Mol Biochem Parasitol 122(1): 3544. Blair PL, Witney A, Haynes JD, Moch JK, Carucci DJ et al. (2002b) Transcripts of developmentally regulated Plasmodium falciparum genes quantified by real-time RT-PCR. Nucleic Acids Res 30(10): 2224-2231. Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J et al. (2003a) The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum. PLoS Biol 1(1): 5. Bozdech Z, Zhu J, Joachimiak MP, Cohen FE, Pulliam B et al. (2003b) Expression profiling of the schizont and trophozoite stages of Plasmodium falciparum with a long-oligonucleotide microarray. Genome Biol 4(2): R9. Butler JS (2002) The yin and yang of the exosome. Trends Cell Biol 12(2): 90-96. Cann H, Brown SV, Oguariri RM, Golightly LM (2004) 3' UTR signals necessary for expression of the Plasmodium gallinaceum ookinete protein, Pgs28, share similarities with those of yeast and plants. Mol Biochem Parasitol 137(2): 239245. Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M et al. (2002) Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 419(6906): 512-519. D'Orso I, Frasch AC (2002) TcUBP-1, an mRNA destabilizing factor from trypanosomes, homodimerizes and interacts with novel AU-rich element- and Poly(A)-binding proteins forming a ribonucleoprotein complex. J Biol Chem 277(52): 5052050528. Donnini M, Lapucci A, Papucci L, Witort E, Jacquier A et al. (2004) Identification of TINO: a new evolutionarily conserved BCL-2 AU-rich element RNA-binding protein. J Biol Chem 279(19): 20154-20166. Duraisingh MT, Voss TS, Marty AJ, Duffy MF, Good RT et al. (2005) Heterochromatin silencing and locus repositioning linked to regulation of virulence genes in Plasmodium falciparum. Cell 121(1): 13-24. Duttagupta R, Vasudevan S, Wilusz CJ, Peltz SW (2003) A yeast homologue of Hsp70, Ssa1p, regulates turnover of the MFA2 transcript through its AU-rich 3' untranslated region. Mol Cell Biol 23(8): 2623-2632. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25): 1486314868. Estevez AM, Kempf T, Clayton C (2001) The exosome of Trypanosoma brucei. Embo J 20(14): 3831-3839. Freitas-Junior LH, Hernandez-Rivas R, Ralph SA, Montiel-Condado D, Ruvalcaba- Salazar OK et al. (2005) Telomeric heterochromatin propagation and histone acetylation control mutually exclusive expression of antigenic variation genes in malaria parasites. Cell 121(1): 25-36. Gardner MJ, Shallom SJ, Carlton JM, Salzberg SL, Nene V et al. (2002a) Sequence of Plasmodium falciparum chromosomes 2, 10, 11 and 14. Nature 419(6906): 531534. Gardner MJ, Hall N, Fung E, White O, Berriman M et al. (2002b) Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419(6906): 498-511. Ge H, Liu Z, Church GM, Vidal M (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29(4): 482486. Goldberg DS, Roth FP (2003) Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci U S A 100(8): 4372-4376. Golightly LM, Mbacham W, Daily J, Wirth DF (2000) 3' UTR elements enhance expression of Pgs28, an ookinete protein of Plasmodium gallinaceum. Mol Biochem Parasitol 105(1): 61-70. Guimera R, Nunes Amaral LA (2005) Functional cartography of complex metabolic networks. Nature 433(7028): 895-900. Haile S, Estevez AM, Clayton C (2003) A role for the exosome in the in vivo degradation of unstable mRNAs. Rna 9(12): 1491-1501. Hall N, Karras M, Raine JD, Carlton JM, Kooij TW et al. (2005) A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science 307(5706): 82-86. Hall N, Pain A, Berriman M, Churcher C, Harris B et al. (2002) Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13. Nature 419(6906): 527-531. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF et al. (2004) Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430(6995): 88-93. Hyman RW, Fung E, Conway A, Kurdi O, Mao J et al. (2002) Sequence of Plasmodium falciparum chromosome 12. Nature 419(6906): 534-537. Jeong H, Mason SP, Barabasi AL, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411(6833): 41-42. Koonin EV, Wolf YI, Aravind L (2001) Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. Genome Res 11(2): 240-252. Laroia G, Sarkar B, Schneider RJ (2002) Ubiquitin-dependent mechanism regulates rapid turnover of AU-rich cytokine mRNAs. Proc Natl Acad Sci U S A 99(4): 1842-1846. Le Roch KG, Zhou Y, Batalov S, Winzeler EA (2002) Monitoring the chromosome 2 intraerythrocytic transcriptome of Plasmodium falciparum using oligonucleotide arrays. Am J Trop Med Hyg 67(3): 233-243. Le Roch KG, Zhou Y, Blair PL, Grainger M, Moch JK et al. (2003) Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 301(5639): 1503-1508. Lehner B, Sanderson CM (2004) A protein interaction framework for human mRNA degradation. Genome Res 14(7): 1315-1323. Li S, Armstrong CM, Bertin N, Ge H, Milstein S et al. (2004) A map of the interactome network of the metazoan C. elegans. Science 303(5657): 540-543. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA et al. (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431(7006): 308-312. Mawji IA, Robb GB, Tai SC, Marsden PA (2004) Role of the 3'-untranslated region of human endothelin-1 in vascular endothelial cells. Contribution to transcript lability and the cellular heat shock response. J Biol Chem 279(10): 8655-8667. Milone J, Wilusz J, Bellofatto V (2002) Identification of mRNA decapping activities and an ARE-regulated 3' to 5' exonuclease activity in trypanosome extracts. Nucleic Acids Res 30(18): 4040-4050. Moraes KC, Quaresma AJ, Maehnss K, Kobarg J (2003) Identification and characterization of proteins that selectively interact with isoforms of the mRNA binding protein AUF1 (hnRNP D). Biol Chem 384(1): 25-37. Mukherjee D, Gao M, O'Connor JP, Raijmakers R, Pruijn G et al. (2002) The mammalian exosome mediates the efficient degradation of mRNAs that contain AU-rich elements. Embo J 21(1-2): 165-174. Murase T, Iwai M, Maede Y (1993) Direct Evidence for Preferential Multiplication of Babesia-Gibsoni in Young Erythrocytes. Parasitology Research 79(4): 269-271. Raijmakers R, Schilders G, Pruijn GJ (2004) The exosome, a molecular machine for controlled RNA degradation in both nucleus and cytoplasm. Eur J Cell Biol 83(5): 175-183. Raijmakers R, Egberts WV, van Venrooij WJ, Pruijn GJ (2002) Protein-protein interactions between human exosome components support the assembly of RNase PH-type subunits into a six-membered PNPase-like ring. J Mol Biol 323(4): 653663. Ralph SA, Scheidig-Benatar C, Scherf A (2005) Antigenic variation in Plasmodium falciparum is associated with movement of var loci between subnuclear locations. Proc Natl Acad Sci U S A 102(15): 5414-5419. Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and inparalogs from pairwise species comparisons. J Mol Biol 314(5): 1041-1052. Shue P, Brown SV, Cann H, Singer EF, Appleby S et al. (2004) The 3' UTR elements of P. gallinaceum protein Pgs28 are functionally distinct from those of human cells. Mol Biochem Parasitol 137(2): 355-359. Snow RW, Guerra CA, Noor AM, Myint HY, Hay SI (2005) The global distribution of clinical episodes of Plasmodium falciparum malaria. Nature 434(7030): 214-217. Vasudevan S, Peltz SW, Wilusz CJ (2002) Non-stop decay--a new mRNA surveillance pathway. Bioessays 24(9): 785-788. von Mering C, Krause R, Snel B, Cornell M, Oliver SG et al. (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887): 399-403. Wilusz CJ, Wormington M, Peltz SW (2001) The cap-to-tail guide to mRNA turnover. Nat Rev Mol Cell Biol 2(4): 237-246. Wuchty S (2002) Interaction and domain networks of yeast. Proteomics 2(12): 17151723. Wuchty S (2004) Evolution and topology in the yeast protein interaction network. Genome Res 14(7): 1310-1314. Wuchty S, Oltvai ZN, Barabasi AL (2003) Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet 35(2): 176-179. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM et al. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30(1): 303-305. Acknowledgments Roger Guimeras help with the algorithm is greatly acknowledged. This work was funded by the National Institutes of Health grants AI33656 (JHA), ###### (ALB), AI055035 (MTF). a b c d Figure 1: (a) Logarithmically binning the interactions according to their Cvw, we determined the excess retention ER of orthologous proteins in P. falciparum in function of the hypergeometric clustering coefficient Cvw that constitute these groups of interactions shows a significant logarithmic correlation between high hypergeometric clustering coefficients and the conservation of these interacting proteins (ER ~ 0.46 x log Cvw , Pearson's r = 0.71, P = 1.2 x 10-9, Spearman's rank = 0.69, P = 2.5 x 10-6). (b) Logarithmically binning data protein interactions according to their hypergeometric clustering coefficient Cvw,o we determine the mean expression correlation coefficients rP in S. cerevisiae. In particular, we separate the pool of interactions in disjoint sets of pairs of interacting proteins that are conserved in Plasmodium. We observe that the distribution of conserved interactions significantly ascends (Pearson's r = 0.35, P = 3.7 x 10-49, Spearman's rank = 0.42, P = 9.1 x 10-64), while the corresponding trend of non conserved links is weaker (Pearson's r = 0.04, P = 3.9 x10-2, Spearman's rank = 0.13, P = 5.1 x 10-14). (c) Concluding, our results indicate a coincidence of (i) co-expression of interacting proteins, (ii) an enhanced clustering of their immediate neighborhood and (iii) their elevated tendency to be evolutionary conserved in S. cerevisiae. Since high clustering around a certain protein interaction coincides well with an elevated degree of reliability the integration of knowledge about the yeast protein interaction network, its local clustering and its proteins' tendency to cluster can be used to elucidate evolutionary cores of protein interaction networks in organisms for which orthologues can be identified. (d) Training a logistic regression model with a random set of conserved and non conserved protein interactions which are characterized by their hypergeometric clustering coefficient and co-expression correlation, we evaluated and classified the actual presence of a yeast interaction. We observe that the distribution of the interactions which are classified as correct significantly accumulate around high values of the coexpression coefficient while non-correct links distribute much more even. Accounting for interactions that appear between conserved proteins in Plasmodium, we find a strong shift toward co-expression of the correctly classified interactions in the target organism (inset). Figure 2: Figure 2: Cartographic representation of the core interaction network of P. falciparum. Maximizing modularity M, we obtain numerous clusters as indicated by different colors. Widely, these clusters represent functionally consistent accumulations of nodes, which are indcated by shades. The network is basically dominated by larger clusters, where we predominately find proteasome components, ribosomal proteins such as snRNPs, a large group of translation initiation factors and related proteins, exosome and spliceosome components, replication factor c, DNA related functions such as DNA polymerase and helicase subunits, ribosomal proteins and translation factor subunits. Superimposing the respective co-expression correlation coefficients rP on each interaction (red: 1.0 to 0.5, yellow: 0.5 to 0.0, green: 0.0 to -0.5, blue: -0.5 to -1.0), we generally observe that the high cohesiveness in each module coincides with a significant degree of co-expression of module components. a Figure 3: (a) The space spanned by the nodes values of the intra-modular degree Z and degree k allows a heuristic classification of nodes. We define nodes as party hubs if their intramodular degree Z scores above 0 and have a degree k larger than 5. In turn, date hubs refer to nodes which score Z smaller than 1 but still have more than 5 neighbors. Shaded areas correspond to the color code in (b) Co-expression patterns of the hubs neighborhoods indicate that generally the majority of nodes in each class prefers to be coexpressed with their immediate neighboring nodes. However, date hubs show significant peaks around 0.5, indicating that hubs also are involved in interactions that do not occur simulatenously. In the inset, we repeat our analysis by determining the mean fraction of hub neighbors that share the same GO annotation with the node in question. In particular, we observe that date hubs show a slightly more homogeneous distribution. b Figure 4: Allowing us to discover that the subnet modularity is time-dependent, Plasmodium proteins were grouped according to their maximum expression in the three major stages of asexual development in the erythrocytic stage: ring, trophozoite and schizont. Nearly all activity of the core network occurs in the ring or the schizont phase while remarkably almost no expression activity is observed in the trophozoite phase. In the ring stage, active proteins facilitate cellular transport and gene expression. In the schizont phase, we predominantly find proteasomal functions and components of and chromosome condensation. Absence of any major evolutionarily conserved proteins during trophozoite developmental stage indicates that Plasmodium genes expressed during this time (i.e., lack an orthologue in yeast) are those metabolic activities that have evolved as the organism evolved its parasitic life style. protein PF08_0109 PF08_0130 PF10_0174 PF11_0305 PF14_0676 PF10_0278 PFD0665c PF13_0178 PF13_0063 PFB0370c PF14_0025 PFC0520w PF14_0068 PF10_0114 PF08_0009 PF11_0105 PFI0630w MAL13P1.343 PFL2345c PF11_0090 PFI0475w PFE1355c PF10_0298 MAL8P1.128 PF14_0174 MAL7P1.24 PF10_0081 PFD0180c MAL8P1.48 MAL6P1.119 PF14_0183 PFB0875c MAL13P1.190 MAL6P1.88 PF14_0716 PFL0335c PF11_0445 PFE0305w PF14_0587 PFL1680w PF07_0067 PF14_0055 PF14_0502 PF13_0033 PF11_0314 PFD0450c PF07_0117 PF11_0191 PF10_0103 description k Z rP fA C hypothetical protein wd repeat protein, putative 26s proteasome subunit p55, putative hypothetical protein 20S proteasome beta 4 subunit, putative hypothetical protein, conserved 26s proteasome aaa-ATPase subunit Rpt3 translation initiation factor 6, putative 26S proteasome regulatory subunit 7, putative RNA-binding protein, putative proteosome subunit, putative 26S proteasome regulatory subunit S14, putative fibrillarin, putative DNA repair protein RAD23, putative translation initiation factor EIF-2b alpha subunit, putative hypothetical protein 26S proteasome regulatory subunit, putative proteasome regulatory subunit, putative tat-binding protein homolog hypothetical protein small nuclear ribonucleoprotein (snRNP), putative ubiquitin carboxyl-terminal hydrolase, putative 26S proteasome subunit, putative proteasome subunit alpha type 6 hypothetical protein, conserved hypothetical protein, conserved 26S proteasome regulatory subunit 4, putative CGI-201 protein, short form small nuclear ribonucleoprotein polypeptide g, 65009- 64161, putative DEAD/DEAH box ATP-dependent RNA helicase, putative RNA helicase, putative hypothetical protein proteasome regulatory component, putative proteasome subunit alpha type 2, putative Proteosome subunit alpha type 1, putative eukaryotic translation initiation factor 5, putative DNA-directed RNA polymerase I, putative transcription initiation factor TFiid, TATA-binding protein hypothetical protein splicing factor 3b, subunit 3, 130kD, putative hypothetical protein hypothetical protein, conserved hypothetical protein 26S proteasome regulatory subunit, putative 26S protease subunit regulatory subunit 6a, putative hypothetical protein, conserved eukaryotic translation initiation factor 2 alpha subunit, putative hypothetical protein eukaryotic translation initiation factor 2, beta, putative 22 20 20 19 19 18 18 17 17 16 16 16 15 15 14 14 14 13 13 12 11 11 11 10 10 10 10 9 9 9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8 7 7 7 7 1.02 3.08 1.23 1.34 0.80 2.52 1.23 1.34 1.23 2.35 1.02 0.80 1.83 -0.04 1.41 1.14 -0.04 0.38 0.17 1.41 1.45 -0.04 -0.04 1.41 0.00 0.96 -0.25 0.27 0.81 -0.27 -0.55 1.22 -0.47 1.41 -0.71 1.01 2.37 0.00 1.57 0.59 0.96 -0.21 0.59 -0.68 -0.68 0.81 1.88 -0.07 0.00 0.55 0.86 0.75 0.84 0.65 0.90 0.61 0.87 0.65 0.77 0.67 0.69 0.85 0.09 -0.39 0.86 0.67 0.75 0.55 0.85 0.38 0.59 0.49 0.83 0.90 0.89 0.63 0.68 0.67 0.91 0.79 0.59 0.77 0.74 0.77 0.88 0.93 0.06 0.79 0.67 0.89 0.82 0.10 0.78 0.62 0.67 0.48 0.85 0.58 0.00 0.40 0.90 0.00 0.79 0.39 0.89 0.00 0.88 0.00 0.69 0.81 0.47 0.47 0.36 0.00 0.00 0.85 0.85 0.08 0.73 0.64 0.73 0.80 0.30 0.00 0.80 0.33 0.67 0.00 0.44 0.33 0.89 0.00 0.63 0.50 1.00 0.75 0.50 0.63 0.00 0.13 0.00 0.88 1.00 0.71 0.86 0.00 0.71 p p p p p p p p p p p p p d p p d p p p p d d p d p d p p d d p d p d p p d p p p d p d d p p d d Table 1: Here, we show the 50 highest connected hubs which occur in the inferred network of protein interaction network of P. falciparum. All nodes are characterized by their degree k, mean coexpression correlation rP and mean fraction of shared GO annotations fA of their immediate neighbor as well as corresponding standard deviations and classification of hubs (p: party; d: date hub) .