Extracting information from complex networks Roger Guimerà From the metabolism to collaboration networks

advertisement
Extracting information from
complex networks
From the metabolism to collaboration networks
Roger Guimerà
Department of Chemical and Biological Engineering
Northwestern University
Bloomington, April 11th, 2005
High-throughput techniques in biology
Metabolic network
Protein interactions in fruit fly
Giot et al., Science (2003)
Large databases for critical infrastructures
World-wide airport network
Large databases for social networks
Collaborations in Econometrica
Collaborations in the
Astronomical Journal
What do “statistical properties” tell us about the
network?
What are the important cities in the world-wide airport
network?
Most
connected
cities
Most
central
cities
Cartography of complex
(metabolic) networks
with L. A. N. Amaral
Cartography of complex (metabolic) networks
 Modules
One divides the
system into
“regions”
 Roles
One highlights
important players
Real metabolic networks are extremely
complex…
…and “regions” are not so well defined
Metabolic network
of E. coli
One can define a quantitative measure of
modularity
High modularity
Low modularity
Newman & Girvan, PRE (2003)
One can define a quantitative measure of
modularity
ds: fraction of links
within module s
Modularity of a partition:
Ds: expected fraction of
links within module s,
for a random partition
of the nodes
M=
(ds – Ds)
Newman & Girvan, PRE (2003); Guimera, Sales-Pardo, Amaral, PRE (2004)
We use simulated annealing to obtain the
partition with largest modularity
Simulated
Annealing
The new algorithm for module detection
outperforms previous algorithms
Now we need to identify the role of each node
We define the within-module degree and the
participation coefficient


Within-module relative degree

k: number of links of a node to other nodes in the same module

Within-module degree:
z
k k
k
Participation coefficient
 fis:

fraction of links of node i in module s
Participation coefficient: Pi = 1 -
2
fis
0 all links in one module
P
1 links evenly distributu ed
The within-module degree and the participation
coefficient define the role of each node
Peripheral
Ultraperipheral
Kinless
non-hubs
Non-hub
connectors
Kinless
hubs
Connector
hubs
Provincial
hubs
We define seven different roles
Hubs
Non-hubs
The cartographic representation of the
metabolic network of E. coli
Guimera & Amaral, Nature (2005)
The loss rate quantifies the importance of a role
Metabolite
Role in Species A
Role in Species B
A
Ultra-peripheral
Peripheral
B
Connector hub
Connector hub
C
Ultra-peripheral
LOST
D
LOST
Peripheral
...
Loss rate of role R: ploss(R) = p(lost | R)
Non-hub connectors are more conserved
across species than provincial hubs

Comparison between
12 organisms:
 4 archaea
 4 bacteria
 4 eukaryotes
Different networks have different role structures





1 – Ultra-peripheral
2 – Peripheral
3 – Non-hub connectors
5 – Provincial hubs
6 – Connector hubs
Collaboration networks:
Team assembly, network
structure, and performance
with B. Uzzi, J. Spiro, and L. A. N. Amaral
Different collaboration networks have different
properties
Collaborations in Econometrica
Collaborations in the
Astronomical Journal
How do collaboration networks grow? How are teams
assembled?

A model for collaboration
network formation must
specify what rules determine
the participation of an
individual in a team
Balancing expertise and diversity
Expertise
Diversity
But:
But:
Need to incorporate new
people
It is easier to work with
similar people and with
former collaborators
Performance
Assembling a new team
1
4
3
2
5
2
1
1-p
5
3
4
Incumbents
p
Newcomers
Assembling a new team
1
4
3
2
5
2
1
5
3
4
Incumbents
4
Assembling a new team
1
4
3
2
5
2
1
5
3
1-p
4
p
Incumbents
4
Newcomers
Assembling a new team
1
4
3
2
5
4
Newcomers
6
Assembling a new team
1
4
3
2
5
2
1
5
3
4
p
4
1-p
Newcomers
Incumbents
6
Assembling a new team
1
4
3
2
5
2
1
5
3
4
4
Incumbents
6
Assembling a new team
1
4
3
2
5
2
1
5
3
4
5
1-q
4
Any
incumbent
6
q
3
Repeat
collaboration
Assembling a new team
1
4
3
2
5
5
3
4
6
3
Repeat
collaboration
Assembling a new team
1
4
6
3
2
1
5
4
3
2
4
6
3
5
The structure of the network depends on the fraction
of incumbents...
Guimera, Uzzi, Spiro & Amaral, Science (forthcoming 2005)
...and on the tendency to repeat past
collaborations
The size of the “invisible college” increases with the
fraction of incumbents, p, and decreases with the
tendency to repeat collaborations, q.
Most fields have very similar values of p and q
The fraction of incumbents is positively correlated with
the impact factor of journals
The tendency to repeat collaborations is negatively
correlated with the impact factor of journals
Conclusions

We need to go one step further in the analysis of complex
networks, so that we can provide specific answers to
specific problems.

Modules and roles give important information about the
structure of a network and about the importance of each
node.
Networks with different functions have different role
structure.


In creative collaboration networks, the emergence of the
invisible college and team performance are correlated to
expertise and diversity (in a “network sense”), and there
may be a universal optimum.
Acknowledgements
 Marta
Sales-Pardo, André A. Moreira, and Daniel
B. Stouffer.
 Fulbright Commission and Spanish Ministry of
Education, Culture, and Sports.
More information:
http://amaral.northwestern.edu/roger/
http://amaral.northwestern.edu/
Download