Extracting information from complex networks From the metabolism to collaboration networks Roger Guimerà Department of Chemical and Biological Engineering Northwestern University Bloomington, April 11th, 2005 High-throughput techniques in biology Metabolic network Protein interactions in fruit fly Giot et al., Science (2003) Large databases for critical infrastructures World-wide airport network Large databases for social networks Collaborations in Econometrica Collaborations in the Astronomical Journal What do “statistical properties” tell us about the network? What are the important cities in the world-wide airport network? Most connected cities Most central cities Cartography of complex (metabolic) networks with L. A. N. Amaral Cartography of complex (metabolic) networks Modules One divides the system into “regions” Roles One highlights important players Real metabolic networks are extremely complex… …and “regions” are not so well defined Metabolic network of E. coli One can define a quantitative measure of modularity High modularity Low modularity Newman & Girvan, PRE (2003) One can define a quantitative measure of modularity ds: fraction of links within module s Modularity of a partition: Ds: expected fraction of links within module s, for a random partition of the nodes M= (ds – Ds) Newman & Girvan, PRE (2003); Guimera, Sales-Pardo, Amaral, PRE (2004) We use simulated annealing to obtain the partition with largest modularity Simulated Annealing The new algorithm for module detection outperforms previous algorithms Now we need to identify the role of each node We define the within-module degree and the participation coefficient Within-module relative degree k: number of links of a node to other nodes in the same module Within-module degree: z k k k Participation coefficient fis: fraction of links of node i in module s Participation coefficient: Pi = 1 - 2 fis 0 all links in one module P 1 links evenly distributu ed The within-module degree and the participation coefficient define the role of each node Peripheral Ultraperipheral Kinless non-hubs Non-hub connectors Kinless hubs Connector hubs Provincial hubs We define seven different roles Hubs Non-hubs The cartographic representation of the metabolic network of E. coli Guimera & Amaral, Nature (2005) The loss rate quantifies the importance of a role Metabolite Role in Species A Role in Species B A Ultra-peripheral Peripheral B Connector hub Connector hub C Ultra-peripheral LOST D LOST Peripheral ... Loss rate of role R: ploss(R) = p(lost | R) Non-hub connectors are more conserved across species than provincial hubs Comparison between 12 organisms: 4 archaea 4 bacteria 4 eukaryotes Different networks have different role structures 1 – Ultra-peripheral 2 – Peripheral 3 – Non-hub connectors 5 – Provincial hubs 6 – Connector hubs Collaboration networks: Team assembly, network structure, and performance with B. Uzzi, J. Spiro, and L. A. N. Amaral Different collaboration networks have different properties Collaborations in Econometrica Collaborations in the Astronomical Journal How do collaboration networks grow? How are teams assembled? A model for collaboration network formation must specify what rules determine the participation of an individual in a team Balancing expertise and diversity Expertise Diversity But: But: Need to incorporate new people It is easier to work with similar people and with former collaborators Performance Assembling a new team 1 4 3 2 5 2 1 1-p 5 3 4 Incumbents p Newcomers Assembling a new team 1 4 3 2 5 2 1 5 3 4 Incumbents 4 Assembling a new team 1 4 3 2 5 2 1 5 3 1-p 4 p Incumbents 4 Newcomers Assembling a new team 1 4 3 2 5 4 Newcomers 6 Assembling a new team 1 4 3 2 5 2 1 5 3 4 p 4 1-p Newcomers Incumbents 6 Assembling a new team 1 4 3 2 5 2 1 5 3 4 4 Incumbents 6 Assembling a new team 1 4 3 2 5 2 1 5 3 4 5 1-q 4 Any incumbent 6 q 3 Repeat collaboration Assembling a new team 1 4 3 2 5 5 3 4 6 3 Repeat collaboration Assembling a new team 1 4 6 3 2 1 5 4 3 2 4 6 3 5 The structure of the network depends on the fraction of incumbents... Guimera, Uzzi, Spiro & Amaral, Science (forthcoming 2005) ...and on the tendency to repeat past collaborations The size of the “invisible college” increases with the fraction of incumbents, p, and decreases with the tendency to repeat collaborations, q. Most fields have very similar values of p and q The fraction of incumbents is positively correlated with the impact factor of journals The tendency to repeat collaborations is negatively correlated with the impact factor of journals Conclusions We need to go one step further in the analysis of complex networks, so that we can provide specific answers to specific problems. Modules and roles give important information about the structure of a network and about the importance of each node. Networks with different functions have different role structure. In creative collaboration networks, the emergence of the invisible college and team performance are correlated to expertise and diversity (in a “network sense”), and there may be a universal optimum. Acknowledgements Marta Sales-Pardo, André A. Moreira, and Daniel B. Stouffer. Fulbright Commission and Spanish Ministry of Education, Culture, and Sports. More information: http://amaral.northwestern.edu/roger/ http://amaral.northwestern.edu/