Evolutionary Clues Embedded In Network Structure ——EPJB,85,106(2012) Zhu Guimei NGS Graduate School for Integrative Science & Engineering, Centre for Computational Sciences & Engineering, National University of Singapore 1 Outline • Introductions • Localizations on complex networks • Evolutionary ages • Conclustions 2 Objective and Scopes Detecting structural patterns at different scales :multi-scale structure Finding an intresting network evolution mechanisms based on multiscale structure networks . 3 Complex Networks Real-World Networks Communication networks: telephone, internet, www… Transportation networks: airports, highways, rail, electric power… Biological networks: genetic ,protein-protein interaction, metabolic… Social networks: friendship networks, collaboration networks… C A B scientific collaborations networks Internet networks 4 Protein networks Structure, Functions, Dynamics Function Structure Degree Motif Modularity … ? Dynamic Process at different structure scale Dynamics Mass Energy Signal Information… So Structure measures is the cornerstone for understanding the relations between structure, dynamic, function 5 How to measure: Multi-Scale Structure ? Dynamics on Different Structure Scales: Microscopic ? Degree what is a Meso (midterm) pattern? Motif clustering coefficient … Macroscopic Modules 6 Define the Mesoscopic pattern In Physics: Mesoscopic has been well defined Materials that have a relatively intermediate length scale in condensed matter physics: size between molecules microns BUT in Complex networks: not yet well defined We detect different structures patterns through localization method.7 Detect structure through localization: how? We map networks to large clusters (nodes as atoms; edges as bonds) Consider an undirected complex network with N identical nodes, topological structure can be described by an adjacency matrix Aij (or Laplace matrix Lij ). For an electron moving in such a molecule, the tight-binding Hamiltonian is: N N n 1 mn Hˆ n n n Amn t mn m n Huckel Model Adjacent matrix Laplace matrix: Aij i 0 0,(i 1, 2,3 HA N) tmn 1 (if nodes i and j are connected, Aij is 1, otherwise 0. Diagonal member all are 0) N Aij ki i j Lij j ki ij Aij Aij i j 8 Emergence of different scale structures on complex Networks Dynamics on networks: Diffusive process Transport processes on networks: from micro- to macro- scales 2.Tight-binding Hamiltonian Laplace Matrix Diffusive process N N n 1 m n Hˆ n n n Lmn tmn m n Huckle Model i ki 0(i 1,2,3...N ) tmn 1 H L Structures of networks: Motif ….. ….. Micro-scale module macro-scale Different Eigenvalues represented 9 different structure patterns How to describe Localizations on complex networks? The localization properties of electrons in the clusters can be used as measures of the structural properties of the networks. detect different structure patterns from the spectra of complex networks. The eigenvalues of L can be ranked as, 0 1 2 N They correspond to the eigenfunctions from high to low energies. 10 Different Eigenvalues represented different structure Scale patterns, how? In Eigen space: (for complex networks) each eigenstate represents a specific wave function, they are sensitive to the structural patterns matching in size with its wavelength. Eigenvectors associated with small eigenvalues, usually have large wavelengths, and so they are sensitive to perturbation on a large size of nodes in networks. Eigenvectors associated with large eigenvalues, have small wavelengths, are most sensitive to localized perturbations that are applied to a small set of nodes in the network. Hence, the eigenvalues from 2 to N can detect the structural patterns from macro- to micro-scales. 11 Eigenvalues sensitive to structural patterns matching in size with its wave length (a): The eigenstates on a perfect regular network are periodic waves with the wavelengths from to 2. (b): we construct a local deformation in the segment from the 40th to the 60th node by adding edges . the eigenstates with large values localize mainly in this region (local peak) 12 Methods to detect Multi-scale structures: The components of every eigenvector of L: X i i 1, 2, ,N s Standardized The components of the eigenvectors:X i max(X i ) : is value of the largest component. Then a threshold respectively. s Xi , i 1,2,, N . max X i can be used to identify the nodes involved in the scale structure, s The nodes with large values of standardized components ( ) are regarded as the nodes involved in the corresponding scale structure. For each scale structure, the components of the nodes involved in it are distinguishably large compared with others. Hence, the -based results are robustness. 13 The Santa Fe Cooperation network (part) We consider a part of the largest component of the Santa Fe Institute collaboration network, N=76 largest eigenvalues 7675 74 can detect the three hubs 40, 7 and 67 (red color). 73 :involves a group of nodes numbered 17 ~ 25 (green nodes), 72 : nodes 26 ~29 and 34 also (cyan) 70 :41 ~47 (blue), 69 :1 ~ 6 (magenta), 68 :48 ~ 53 (violet) ... With the decrease of eigenvalue , clusters in much larger scale can be identified (not shown). 14 Eigenmodes and Average Evolutionary Age: BA Scale Free network Three scale-free networks : With edge density w = 2, 4, 8, (a–c) average evolutionary ages, (d–f) average degree (on a logarithmic scale), (g–i) size of eigenvector versus the eigenvalue index i. Eigenvectors associated with large eigenvalues generally have small sizes, but their ages are “older” in the network. 15 Eigenmodes and Average Evolutionary Age: BA Scale Free network 16 Eigenvalue compared with Degree: to describe the Average Evolutionary Age Eigenmodes and Average Evolutionary Age: Scale-free networks generated by other mechanism Scale-free networks generated by duplication/divergence-based mechanism from PPI network of the Baker’s Yeast, (d) Average age versus degree. Because of large fluctuation, the degree cannot give age-related information, but the eigenvalues can. 17 Y11k: PPI network: Evolutionary Age Yeast 11k network: original: 5400 yeast proteins : 80 000 interaction. focused on 11 855 interactions with high and medium confidence among 2617 proteins. But finally, we only consider the part of the largest component of 2235 proteins from the 2617 proteins. Protein-protein interaction networks: Isotemporal Classification of Proteins First , classified all yeast proteins into four isotemporal categories: prokaryotes, eukarya, fungi, yeast only (the yeast without annotation). Based on the university tree of life, we assign evolutionary age 4,3,2,1 from ancient to modern for each group of prokaryotes-4, eukarya-3, fungi-3, and yeast only-1, respectively. (1). C. von Mering et al., Nature 417, 399 (2002). 18 Eigenmodes and Average Evolutionary Age: PPI Network For the largest connected component of the PPI network of the baker’s yeast with 2235 nodes, (d) Average age versus degree. We see that degree does not reveal age-related information. 19 Summary The localization properties of the eigenvectors from high to low energies can detect patterns from micro- to macroscales. Interestingly, the patterns contains significant clues of evolutionary ages. 20 References (1) G.M. Zhu, H.J. Yang, R.Yang, J. Ren, B. Li, and Y.-C. Lai, European Physical Journal B, 85, 106 (2012). (2). G.M. Zhu, H.J. Yang, C. Yin, B. Li, Localizations on Complex Networks, Phys. Rev. E 77, 066113 (2008). (3). H.J. Yang, C. Yin, G.M. Zhu, B. Li, Phys. Rev. E 77, 045101(R) (2008) 21 22 23 Complex Networks: Nontrivial Properties A: random; small-world; scale-free(power law degree distribution); B: motif, modularity, hierarchy, C: fractal properties, and so on. ….. A ER random networks, N=100, link connect ion probability p=0.02 B C SW networks, link rewiring probability r=0.1 Hierarchical networks cauliflower are fractal in nature.. self similarity Santo Fortunato, Physics Reports 486 (2010) 75174 24 BA scale free network, N=100, average degree w=2 Complex Networks: Basic Concepts Structure Description Hierarchical Description: Module Function 25 Structure Multi-Scale Measures Graph theory Bioinformatics Social Nets Degree Clustering coefficient Shortest path Small-world Scale-free micro Node/edge-based average Motif What is more? Community Hierarchy clustering macro global (Newman) Dynamics: Micro To Macro Dynamics process is the bridge between structure and functions R. Albert and A. -L. Barabasi, Rev. Mod. Phys. 47(2002); M. Newman, SIAM Review 45, 167-256 (2003); C. Song, et. al., Nature 433,6392(2005); Nature Physics 2,275(2006). 26 What is a Mesoscopic pattern? In Physics: Mesoscopic has been well defined Materials that have a relatively intermediate length scale in condensed matter physics: a quantity of atoms such as molecules size between materials measuring microns In complex networks: not yet well defined Could regard it as community in complex networks (but there are also other formations like trees or stars structures) We define it as intermediate length scale structures based on structure induced localization. 27 What is community on complex networks? Community(clusters, modules) groups of vertices : characterized by having more internal than external connections between them. Share common properties and/or play similar roles within the graph. Community detect methods Graph partitioning, hierarchical clustering Partitioning clustering Spectral clustering It is a hot topic but even the definition of a community is a controversial issue. people are still improving the methods to detect the true communities in real world. Santo Fortunato, Physics Reports 486 (2010) 75174 Fortunato, S., and C. Castellano, 2009,(Springer, Berlin, Germany), volume 1, eprint arXiv:0712.2716. 28 Y11k: PPI network: multi-scale analysis 29 How to Detect community? Several methods Graph partitioning: dividing the vertices in g groups of predefined size Traditional Methods Hierarchical clustering: definition of a similarity measure between vertices Partitional clustering: separate the points in k clusters such to maximize or minimize a given cost function based on distances between points. Spectral Clustering: eigenvectors of matrix Adjacent or Laplace. Divisive algorithms The algorithm of Girvan and Newman: according to the values of measures of edge centrality, estimating the importance of edges according to some property or process running on the graph Modularity-based methods Modularity optimization Modifications of modularity Limits of Modularity Spectral algorithms: Use the eigenvalue and eigenvectors Ahn, Y. Y., J. P. Bagrow, et al. (2010). "Link communities reveal multiscale complexity in networks." Nature 466(7307): 761-U711. 30 Small world, scale free, whole –cell networks WSSW model: we construct first a regular circular lattice with each node connecting with its d right-handed nearest neighbors. For each edge we rewire it with probability to another randomly selected node. Self- and double edges are forbidden. The BASF : preferential growth mechanism. Starting from several connected nodes as a seed, at each growth step a new node is added and w edges are established between this node and the existing network. The probability for an existing node to be connected with the new node is proportional to its degree. Self- and double edges are forbidden. For the resulting networks, the number of edges per node obeys a power law. Whole-cell networks: consider cellular functions such as intermediate metabolism and bioenergetics, information pathways, electron transport, and transmembrane transport. The directed edges are replaced simply with nondirected edges. We consider only cellular networks with sizes larger than 500. 31 Statistical properties of the spectra The PDF of the Nearest Neighbor Level Spacing(NNLS) distribution obeys the Brody distribution: 1 s 1 U s s exp 1 2 (localized) (extended) where s is the NNLS and the characteristic distribution width. In order to obtain the value of accumulated function: , we use the C (s) U x dx s 0 some trivial calculations lead to: 1 ln s ln ln R( s) ln ln 1 C s From this formula, we can determine the values of and . Fig . Value of Brody parameter versus network 32 parameters pr and w. (a) WSSW and (b) BASF networks. Wavelets Transform Wavelets are mathematical functions that cut up data into different frequency components, and then study each component with a resolution matched to its scale. They have advantages over traditional Fourier methods in analyzing physical situations where the signal contains discontinuities and sharp spikes. Wavelets were developed independently in the fields of mathematics, quantum physics, electrical engineering, and seismic geology. Interchanges between these fields during the last ten years have led to many new wavelet applications such as image compression, turbulence, human vision, radar, and earthquake prediction. http://www.amara.com/IEEEwave/IEEEwavelet.html 33 Fractals properties on networks: Wavelet transform (WT) We assume the probability values has been sorted in ascending order: 1 2 N WT can detect the fractal properties based on the ascending-order-ranked series . As a standard procedure, we first find the WT maximal values: T a, a k k , k g k 1 2 ,, k J where a is the given scale. The partition function should scale in the limit of small scales as kJ Z (a, q) Tg a, k a ~ a q q k k1 The fractal dimension Dh (statistical subsets properties) can be obtained through the Legendre transform: Dh qh q , h d q dq Local Hurst exponent h: denotes local subsets Positive q, q reflects the scaling of large fluctuations Negative q, q reflects the scaling of small fluctuations Fig5. The branched multifractal behavior for the whole cell network of M. jannaschii is presented as a typical example. 34 Structure, Functions, Dynamics • Structural measures: (cornerstone for understanding the relations) • Functions ? • Dynamics: (can be regarded as the transport progresses of ) Degree Clustering coefficient Shortest path Dynamic diffusive Process at different structure scale Mass Energy Singal Informations And so on L. K. Gallos, C. Song, S. Havlin, Proc. Natl. Acad. Sci. U.S.A. 104, 7746 (2007). H. Yang, C. Yin, G. Zhu, and B. Li, Phys. Rev. E 77,045101(R) (2008) Zhu, G.M., Yang H., Yin C., Li B., Physical Review E, 2008. 77(6) 35 Protein-protein interaction networks DNA sequence Protein-protein interaction networks Y1 Y3 Y2 Proteins Functions of Proteins realized by Protein-protein interactions Functions 1. Signal transduction: interactions between signaling molecules Protein-protein interactions 2. Protein complex * One carries another, e.g, from cytoplasm to nucleus * One modify another * complex formation often serves to activate or inhibit one or more of the associated proteins 36 Metabolic networks (life processes) metabolism of an organism, the basic chemical system that generates essential components (1) such as amino acids, sugars and lipids, (2) and the energy required to synthesize them (3) and to use them in creating proteins and cellular structures. This system of connected chemical reactions is a metabolic network. 37