KOCSEA Technical Symposium 2010 Systematic Analysis of Interactome: A New Trend in Bioinformatics Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University History of Bioinformatics Stage 1. Sequence Analysis • Gene sequencing • Sequence alignment • Homolog search • Motif finding History of Bioinformatics Computational Biology Stage 1. Sequence Analysis Stage 2. Structure Analysis • Gene sequencing • Protein folding • Sequence alignment • Homolog search • Homolog search • Binding site prediction • Motif finding • Function prediction History of Bioinformatics Computational Biology Stage 1. Sequence Analysis Stage 2. Structure Analysis Functional Genomics Stage 3. Expression Analysis • Gene sequencing • Protein folding • Function prediction • Sequence alignment • Homolog search • Gene clustering • Homolog search • Binding site prediction • Sample classification • Motif finding • Function prediction History of Bioinformatics Computational Biology Stage 1. Sequence Analysis Stage 2. Structure Analysis Functional Genomics Systems Biology Stage 3. Expression Analysis Stage 4. Network Analysis • Gene sequencing • Protein folding • Function prediction • Network modeling • Sequence alignment • Homolog search • Gene clustering • Interaction prediction • Homolog search • Binding site prediction • Sample classification • Function prediction • Motif finding • Function prediction • Pathway identification • Module detection Biological Networks Definition Maps of biochemical reactions, interactions, regulations between genes or proteins Importance Provide insights into the mechanisms of molecular function within a cell Significant resource for functional characterization of genes or proteins Require computational and systematic approaches Examples Metabolic networks Protein-protein interaction networks Genetic interaction networks Gene regulatory networks (Signal transduction networks) Protein Interaction Networks Determination Experimental methods: Y2H, MS, Protein Microarray Computational methods: Homolog search, Gene fusion analysis, Phylogenetic profiles Genome-scale protein-protein interactions Interactome Representation Un-weighted, undirected graph Challenges Unreliability Large scale Complex connectivity Network Re-structuring Strategy unweighted network To resolve complex connectivity Converts the complex graph to a hierarchical tree structure Uses the concepts of path strength, functional linkage, and centrality Process Input: a protein interaction network edge weighting weighted network functional linkage measurement score matrix network restructuring structured network Output: a list of functional modules hub confidence measurement hubs network clustering clusters Path Strength Path Strength Model Assumption: each node in a path chooses a succeeding edge based on the weighted probability Path Strength Factors Edge weights Path length Node weighted degree Functional Linkage Measurements Path strength of the strongest path between two nodes Computational problem Needs a heuristic approach Uses a user-specified threshold of the max path length Formula k-length path strength: Functional linkage: shortest path length threshold Network Restructuring Centrality Weighted closeness: Algorithm Computes centrality for each node a Selects a set of ancestor nodes, T(a), of a by Selects a parent node, p(a), of a by Example Hub Confidence Measurement Selects a set of child nodes, D(a), of a by Selects a set of descendent nodes, La, of a by Computes the hub confidence, H(a), of a by Example Clustering Algorithm Iteratively select a hub a with the highest hub confidence Output the sub-tree La including a as a cluster (functional module) Cluster Depth The max path length from the root of the sub-tree to a leaf Example Topological Assessment of Hubs Network Vulnerability Random attack: repeatedly disrupt a randomly selected node Degree-based hub attack: repeatedly disrupt the highest degree node Structural hub attack: repeatedly disrupt the node with the highest hub confidence For each iteration, observes the largest component Results fraction of largest component 1.00 0.95 0.90 0.85 0.80 0.75 random attack degree-based hub attack structural hub attack 0.70 0.65 0.60 0 20 40 60 80 100 number of nodes 120 140 160 Biological Assessment of Hubs Protein Lethality Determines lethal / viable proteins by knock-out experiment Lethality represents functional essentiality Orders proteins by degree and hub confidence Observes the cumulative proportion of lethal proteins for every 10 proteins Results 1.0 structural hubs average lethality 0.9 degree-based hubs 0.8 0.7 0.6 0.5 0.4 0 20 40 60 80 number of hubs 100 120 140 Topological Assessment of Clusters Modularity A combined measure of density within each cluster and separability among clusters Estimated by the ratio of the number of edges within a cluster (sub-graph) to the number of all edges starting from the nodes in the cluster (sub-graph) Observes the average modularity of clusters with respect to the cluster depth Results More specific function module has Justify the general-to-specific concepts of hierarchical functional modules 160 average modularity higher modularity 180 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 cluster depth 9 10 11 12 Biological Assessment of Clusters f-Measure Compares each output cluster X with the real functional annotation Y (from MIPS) Recall = (# of common proteins of X and Y) / (# of proteins in Y) Precision = (# of common proteins of X and Y) / (# of proteins in X) f-measure = 2 × Recall × Precision / (Recall + Precision) Results Compared with the results from previous hierarchical clustering methods, e.g., edge-betweenness (top-down approach) and ProDistIn (bottom-up-approach) Conclusion Motivation Significant functional knowledge in protein interaction networks (interactome) Complex connectivity Contributions Convert an unstructured network to a structured network Conserve functional information through pathways High network vulnerability, low functional lethality at hubs as a drug target Applicable to various fields, e.g., social networks, WWW Foundation of structural dynamics during network evolution Questions ? Reference Y.-R. Cho and A. Zhang, “Identification of functional modules by converting interactome networks into hierarchical ordering of proteins”. BMC Bioinformatics, 11(Suppl 3):S3, 2010