CN to HumanDynamics_Submit6March2006rev

advertisement
The architecture of complexity
From networks to human dynamics
Albert-László Barabási1,2
1
Center for Complex Network Research and Department of Physics, University of Notre
Dame, Notre Dame, IN 46556, USA. 2Center for Cancer Systems Biology, Dana-Farber
Cancer Institute, Harvard Medical School, 44 Binney Str., Boston, MA 02115, USA.
Introduction
We are surrounded with truly complex systems, from the cell, made of thousands
of different molecules that seamlessly work together, to the society, a collection of six
billion mostly cooperating individuals that display endless signatures of order and selforganization. Understanding and quantifying this complexity is a grand challenge for
science in general. Gas theory set the stage at the end of the nineteenth century,
demonstrating that the measurable properties of gases, from pressure to temperature, can
be all reduced to the random motion of billions of atoms and molecules. In the 1960s and
70s critical phenomena has upped the standards, developing a set of systematic
approaches to quantify the transition from disorder to order in material systems, like
magnets or liquids. Chaos theory, with its seductive message that complex and
unpredictable behavior emerges from the nonlinear interactions of a few components has
dominated our quest to understand complex behavior in the 1980s. The 1990s were the
1
decade of fractals, quantifying the geometry of patterns emerging in self-organized
systems, from surfaces to snowflakes. Despite these truly important conceptual advances,
nobody seriously believes that we have anything that could be called the theory of
complexity. When we try to characterize many complex systems around us, the available
suite of tools fail for increasingly obvious reasons: First, most complex systems are not
made of identical undistinguishable components, as gases or magnets are: each gene in a
cell or individual in a country has its own characteristic behavior. Second, while the
interactions between the components are manifestly nonlinear, truly chaotic behavior is
more of an exception rather than the rule. Third and most important, the molecules or
people do not obey neither the extreme disorder of gases, where a molecule can collide
with any other molecule, nor the extreme order of magnets, where spins interact only
with their immediate neighbors in a nicely periodic lattice. Rather, in complex systems
the interactions form exquisite networks, each component being in contact only with a
small number of selected interaction partners.
Networks are everywhere. The brain is a network of nerve cells connected by
axons, and cells themselves are networks of molecules connected by biochemical
reactions. Societies, too, are networks of people linked by friendship, family
relationships, and professional ties. On a larger scale, food webs and ecosystems can be
represented as networks of species. And networks pervade technology: the Internet,
power grids, and transportation systems are but a few examples. Even the language we
are using to convey our thoughts is itself a network of words connected by syntactic
relationships.
2
Yet despite the importance and pervasiveness of networks, scientists have had
little understanding of their structure and properties. How do the interactions of several
malfunctioning genes in a complex genetic network result in cancer? How does diffusion
occur so rapidly through certain social and communications networks, leading to
epidemics of diseases and computer viruses like the Love Bug? How do some networks
continue to function even after the vast majority of their nodes have failed?
Recent research has begun to answer such questions1,2,3,4,5,6,7,8. Over the past few
years, scientists from a variety of fields have discovered that complex networks seem to
have an underlying architecture that is guided by universal principles. We have found, for
instance, that many networks -- from the World Wide Web to the cell’s metabolic system
to the actors in Hollywood -- are dominated by a relatively small number of nodes that
are highly connected to other nodes. These important nodes, called “hubs,” can greatly
affect a network’s overall behavior, for instance, making it remarkably robust against
accidental failures but extremely vulnerable to coordinated attacks. In parallel, there is
increased understanding that we must also develop tools to understand network
dynamics, a need that sets the theme of this review: from network structure to (human)
dynamics.
The Random Network Paradigm
3
For over 40 years science treated all complex networks as being completely
random. This paradigm has its roots in the work of two Hungarian mathematicians, Paul
Erdős and Alfréd Rényi, who in 1959, aiming to describe networks seen in
communications and life sciences, suggested that we should build networks randomly9,10.
Their recipe was simple: take N nodes and connect them by L randomly placed links. The
simplicity of the model and the elegance of some of the related theorems proposed by
Erdős and Rényi have revitalized graph theory, leading to the emergence of a new field in
mathematics, focusing on random networks11.
An important prediction of random network theory is that despite the fact that we
place the links randomly the resulting network is deeply democratic, as most nodes have
approximately the same number of links. Indeed, in a random network the nodes follow a
Poisson distribution with a bell shape, meaning that it is extremely rare to find nodes that
have significantly more or fewer links than the average node. Random networks are also
called exponential networks because the probability that a node is connected to k other
nodes decreases exponentially for large k (Fig. 1). But the Erdős-Rényi model raises an
important question: do we believe that networks observed in nature are truly random?
Could the Internet offer us the relatively fast and seamless service if the computers were
randomly connected to each other? Or, to take a more extreme example, would you be
able to read this article if in a certain moment the chemicals in your body would decide to
react randomly to each other, bypassing the rigid chemical web they normally obey?
Intuitively the answer is no—we all feel that behind each complex system there is an
4
underlying network with non-random topology. Thus our challenge is to unearth the
signatures of order from this apparent chaos of millions of nodes and links.
The World Wide Web and the Internet as Complex Networks
The WWW contains over a billion documents, which represent the nodes of this
complex web. They are connected by URLs that allow us to navigate from one document
to another (Fig. 2a). To analyze its properties, we need to obtain a map, telling us how the
pages link to each other. This information is routinely collected by search engines, such
as Google. Given, however, that most search engines were reluctant to share their maps
for research purposes, we needed to obtain a map of our own. This is exactly what we did
in 1998—we wrote a robot or web crawler that, starting from a given webpage, collected
all the outgoing links, and followed those links to visit more pages and collect even more
links12. Through this iterative process we mapped out a small fraction of the web.
Since the WWW is a directed network, each document can be characterized by
the number kout of outgoing links and the number kin, of incoming links. The first
quantity that we investigated was the outgoing (incoming) degree distribution, which
represents the probability P(k) that a randomly selected webpage has exactly kout (kin)
links. Guided by random graph theory, we expected that P(k) would follow a Poisson
distribution. Thus it was rather surprising when the data indicated that P(k) decayed
following a power law (Fig. 2c),
5
P(k) ~ k , 




(1)
where out  2.45 (in  2.1).
There are major topological differences between networks with Poisson or powerlaw connectivity distribution (as illustrated in Figure 1). Indeed, for random networks
most nodes have approximately the same number of links, k ≈ k, where <k> represents
the average degree. The exponential decay of P(k) guarantees the absence of nodes with
significantly more links than k. In contrast, the power-law distribution implies that
nodes with only a few links are abundant, but a, small, negligible minority have a very
large number of links. For example, the highway map, with cities as nodes and highways
as links, is an exponential network, as most cities are at the intersection of two to five
highways. On the other hand, a scale-free network looks more like an airline routing
map, displayed routinely in glossy flight magazines: most airports are served only by a
few carriers, but there are a few hubs, such as Chicago or Frankfurt, from which links
emerge to almost all other U.S. or European airports, respectively. Thus, just as the
smaller airports, in the WWW the majority of the documents have only a few links.
These few links are not sufficient to ensure that the network is fully connected, a function
guaranteed by a few highly connected hubs, that hold the nodes together.
While for a Poisson distribution a typical node has k  <k> links, the average,
<k>, is not particularly significant for a power-law distribution. This absence of an
intrinsic scale in k prompted us to name networks with power-law degree distribution
6
scale-free13. Thus, contrary to sporadic misconceptions14, a scale-free network has a
simple definition: it represents networks whose degree distribution has a power law tail.
Empirical measurements indicate that in many real networks we observe deviations from
a simple power law behavior. The most common is a small degree flattening of the
degree distribution, and less common, but occasionally present are exponential cutoffs for
high degrees. Thus in general a proper fit to the degree distribution has the functional
form P(k) ~ (k+k0)exp(-k/kx), where k0
the small degree cutoff, and kx is the
lengthscale of the high degree exponential cutoff. Naturally, a scale-free behavior is
evident only between k0 and kx. These deviations do not mean that the underlying
network is not scale-free, and proper models should predict the emergence of both the
small degree saturation as well as the exponential cutoff.
The finding that the WWW is a scale-free network raised an important question:
could such an inhomogenous topology emerge in other complex systems as well? An
answer to this question came from an unexpected direction: the Internet. The Internet
forms a physical network, whose nodes are routers and domains, while links represent the
various physical lines, such as phone lines, optical cables, that connect them (Fig. 2b).
Due to the physical nature of the connections, this network was expected to be different
from the WWW, where adding a link to an arbitrary remote page is as easy as linking to a
computer in the next room. To the surprise of many, the network behind the Internet also
appears to follow a power law degree distribution. First noticed by three computer
scientists, Michalis, Petros, and Christos Faloutsos, who analyzed the Internet at the
router and domain level (see Fig. 2b). In each of these cases they found that the degree
7
distribution follows a power law with an exponent =2.5 for the router network and
=2.2 for the domain map, indicating that the wiring of the Internet is also dominated by
several highly connected hubs15.
19 degrees of separation
Stanley Milgram, a Harvard sociologist, surprised the world in 1967 with a bold
claim: in society two people are typically five to six handshakes away from each other16.
That is, despite the six billion inhabitants of our planet, we live in a “small-world”. This
feature of social networks came to be known as “six degrees of separation” after John
Guare’s brilliant Broadway play and movie17. On top of this, sociologists have repeatedly
argued that nodes in social networks are grouped in small clusters, representing circles of
friends and acquaintances, in which each node is connected to all other nodes, with only
sparse links to the outside world18. While the existence of such local clustering and small
world behavior agrees with our intuition, these features were not expected to be relevant
beyond social systems. The question is; does the Internet and the WWW follow this
paradigm? For a proper answer we need a full map of the web. But, as Steve Lawrence
and Lee Giles have shown in 1998, even the largest search engines cover only 16% of the
web19,20. This is where the tools of statistical mechanics come handy – we need to infer
the properties of the full web from a finite sample. To achieve this we constructed small
models of the WWW in the computer, making sure that the link distribution matches the
measured functional form12. Next we identified the shortest distance between two nodes,
and averaged this over all pairs of nodes, obtaining the average node separation, d.
8
Repeating this processes for networks of different sizes, called finite size scaling, a
standard procedure in statistical mechanics, we inferred that d depends on the number of
nodes N as d = 0.35 + 2.06.log(N). As in 1999 the WWW had 800 million nodes, this
expression predicted that the typical shortest path between two randomly selected pages
is around 19––assuming that there is such a path, which is not guaranteed thanks to the
web’s directed nature. An extensive study by an IBM-Compaq-AltaVista collaboration
has consequently found that for the 200 million nodes this distance is 16 – not too far
from 17 predicted by our formula for a sample of this size21. These results clearly
indicated that the WWW represents a small world, that is the typical number of clicks
between two webpages is around 19, despite the over billion pages out there. And as
Lada Adamic from Stanford University has shown, the WWW displays a high degree of
clustering as well, the probability that two neighbors of a given node are linked together
being much larger than the value expected for a clustering free random network22.
Results from our group indicated that the Internet follows suit––the typical separation
between two routers is 9, for example a package can reach any router within ten hops23,
and the network is highly clustered, demonstrating that the small world paradigm has
rapidly infiltrated the Earth’s newly developing electronic skin as well.
Evolving networks
Why do so different systems as the physical network of the Internet or the
virtual web of the WWW develop similar scale-free networks? We have recently traced
back the emergence of the power law degree distribution to two common mechanisms,
9
absent from the classical graph models, but present in many complex networks13. First,
traditional graph theoretic models assumed that the number of nodes in a network is
fixed. In contrast, the WWW continuously expands by the addition of new webpages,
or the Internet grows by the installation of new routers and communication links.
Second, while random graph models assume that the links are distributed randomly,
most real networks exhibit preferential attachment: there is a higher probability to
connect to a node with a large number of links. Indeed, we link our webpage with
higher probability to the more connected documents on the WWW, as these are the
ones we know about; network engineers tend to connect their institution to the Internet
through points where there is high bandwidth, which inevitably implies a high number
of other consumers, or links. Based on these two ingredients, we constructed a simple
model in which a new node is added at every timestep to the network, linking to some
of the nodes present in the system (Fig. 3)13. The probability (k) that a new node
connects to a node with k links follows preferential attachment, for example

(k)=
k
 ik i
(3)
where the sum goes over all nodes. Numerical simulations indicate that the resulting
network is indeed scale-free, the probability that a node has k links following (1) with
exponent  = 324. The power law nature of the distribution is predicted by rate equation
based2,3,6 as well as an exact solution25 of the scale-free model introduced above. This
simple model illustrates how growth and preferential attachment jointly lead to the
appearance of the hub hierarchy: a node rich in links will increase its connectivity faster
10
than the rest of the nodes, since incoming nodes link to it with higher probability, a
rich-gets-richer phenomenon present in many competitive systems.
Networks traditionally were viewed as static objects, the role of modeler being to
find a way to place the links between a constant number of nodes such that the resulting
network looks similar to the network we wish to model. In contrast, the scale-free model
views networks as dynamical systems, incorporating the fact that they self-assemble and
evolve in time through the addition and removal of nodes and links. Such a dynamical
approach follows the long tradition of physics based modeling, aiming to capture what
nature did when it assembled these networks. The expectation behind these modeling
efforts is that if we capture the microscopic processes that drive the placement of links
and nodes, the structural elements and the topology will follow from these. In addition,
viewing evolving networks as dynamical systems allows us to predict many of their
properties analytically.
Bose-Einstein condensation
In most complex systems nodes vary in their ability to compete for links. For
example, some webpages, through a mix of good content and marketing acquire a large
number of links in a short time, easily passing less popular sites that have been around
much longer. A good example is the Google search engine: a relatively latecomer with
an excellent product, in less than two years became one of the most connected nodes of
the WWW. This competition for links can be incorporated into the scale-free model by
11
adding to each node a fitness, describing its ability to compete for links at the expense of
other nodes26. Assigning a randomly chosen fitness i to each node i modifies the growth
rate in (3) to
 ( ki ) 
i ki
.
 j jk j
(4)
The competition generated by the unequal fitness leads to multiscaling: the connectivity
of a given node follow ki(t)  t(), where () increases with , allowing fitter nodes with
large  to join the network at some later time and overcome the older but less fit nodes.
The competitive fitness models appear to have close ties to Bose-Einstein
condensation, currently one of the most investigated problems in condensed matter
physics. Indeed, we have found27 that the fitness model can be mapped exactly into a
Bose gas by replacing each node with an energy level of energy  i  e  i . According to
this mapping, links connected to node i are replaced by particles on level i, and the
behavior of the Bose gas is uniquely determined by the distribution g() from which the
fitnesses are selected. One expects that the functional form of g() is system dependent:
the attractiveness of a router for a network engineer comes from a rather different
distribution that the fitness of a .com company competing for customers. For a wide class
of fitness distributions a fits-gets-richer phenomena emerges, in which, while the fittest
node acquires more links than its less fit counterparts, there is no clear winner. On the
other hand, certain fitness distributions can result in Bose-Einstein condensation, which
12
in the network language corresponds to a winner-takes-all phenomenon: the fittest node
emerges as a clear winner, developing a condensate by acquiring a finite fraction of the
links, independent of the size of the system.
The Achilles’ heel of the Internet
As the world economy becomes increasingly dependent on the Internet, a much
voiced concern arises: can we maintain its functionality under inevitable failures or
frequent hacker attacks? The good news is that so far the Internet has proven rather
resilient against failures: while about 3% of the routers are down at any moment, we
rarely observe major disruptions. The question is, where does this robustness come from?
While there is significant error tolerance built into the protocols that govern package
switching, lately we are learning that the scale-free topology also plays a crucial role. In
trying to understand the network component of error tolerance, we get help from
percolation, a much studied field of physics. Percolation theory tells us that the random
removal of nodes from a network will result in an inverse percolation transition: as a
critical fraction, ƒc, of the nodes is removed, the network should fragment into tiny, noncommunicating islands of nodes. To our considerable surprise simulations on scale-free
networks did not support this prediction28: we could remove as many as 80% of the
nodes, and the remaining nodes still formed a compact cluster. The mystery was resolved
by Shlomo Havlin of Bar-Ilan University and his collaborators, who have shown that as
long as the connectivity exponent  is smaller than 3 (which is the case for most real
networks, including the Internet) the critical threshold for fragmentation is ƒc=129,30. This
13
is a wonderful demonstration that scale-free networks cannot be broken into pieces by the
random removal of nodes. This extreme robustness to failures is rooted in the
inhomogeneous network topology: as there are far more small nodes than hubs, random
removal will most likely hit these. But the removal of a small node does not create a
significant disruption in the network topology, just like the closure of a small local airport
has little impact on international air traffic, explaining the network’s robustness against
random failures. The bad news is that the inhomogeneous topology has its drawbacks as
well: scale-free networks are rather vulnerable to attacks28. Indeed, the absence of a tiny
fraction of the most connected nodes will break the network into pieces. These findings
uncovered the underlying topological vulnerability of scale-free networks: while the
Internet is not expected to break under the random failure of the routers and lines, well
informed hackers can easily design a scenario to handicap the network.
The error tolerance and fragility to attacks is a consequence of the scale-free
property. The reverse, however, is not true: networks that display these properties do not
need to be scale-free. For example, the hub-and-spoke network, in which all nodes
connect to a central node, is the most resilient to random failures, as it will fail only when
the central hub is removed, whose probability with a network of N nodes is only 1/N.
Thus, most attempts to define a scale-free networks based on network’s robustness, in
contrast to its customary definition based on the degree distribution, so far have either
failed, or are fundamentally misguided14.
14
Scale-Free Epidemics
Knowledge about scale-free networks also has implications for understanding the
spread of computer viruses, diseases, and fads. Diffusion theories, intensively studied for
decades by both epidemiologists and marketing experts, predict a critical threshold for
the propagation of something throughout a population. Any virus that is less virulent (or a
fad that is less contagious) than that well-defined threshold will inevitably die out, while
those above the threshold will multiply exponentially, eventually penetrating the entire
system.
Recently, though, Romualdo Pastor-Satorras from Unversitat Politecnica de
Catalunya in Barcelona and Allessandro Vespigniani from Indiana University in
Bloomington reached a startling conclusion31. They found that on a scale-free network
the threshold is zero. That is, all viruses, even those that are only weakly contagious, will
spread and persist in the system. This explains why Love Bug, the most damaging virus
thus far, having shut down the British Parliament in 2000, was still the seventh most
frequent virus even a year after its introduction and supposed eradication. Hubs are the
key to that surprising behavior. Because hubs are highly connected, at least one of them
will tend to be infected by any corrupted node. And once a hub has been infected, it will
broadcast the virus to numerous other sites, eventually compromising other hubs that will
then help spread the virus throughout the entire system.
15
Because biological viruses spread on social networks, which in many cases appear
to be scale-free, this result suggests that scientists should take a second look at the
volumes of research written on the interplay of network topology and epidemics.
Specifically, in a scale-free contact network, the traditional public-health approach of
random immunization could easily fail because it will likely neglect some of the hubs.
Research in scale-free networks suggests an alternative approach: by targeting the hubs,
or the most connected individuals, the immunizations would have to reach only a small
fraction of the population32,33,34. But identifying the hubs in a social network is much
more difficult than in other types of systems like the Internet. Nevertheless, Reuven
Cohen and Shlomo Havlin of Bar-Ilan University in Israel, together with Daniel benAvraham of Clarkson University, New York, recently proposed a clever solution33:
immunize a small fraction of the random acquaintances of randomly selected individuals
– a procedure that selects hubs with high probability because they are acquainted to many
people. That approach, though, raises important ethical dilemmas. For instance, even if
the hubs could be identified, should they have priority for immunizations and cures?
In many business contexts, people want to start epidemics, not stop them. Many
viral marketing campaigns, for instance, specifically try to target hubs to spread the
adoption of a product as quickly as possible. Obviously, such a strategy is not new. As far
back as the 1950s, a study funded by the pharmaceutical giant Pfizer discovered the
important role that hubs play in how quickly a community of doctors will adopt a new
drug. Indeed, marketers have intuitively known for some time that certain customers are
much more effective in spreading promotional buzz about new products and fads. But
16
recent work in scale-free networks provides the scientific framework and mathematical
tools to probe that phenomenon more rigorously.
The temporal behavior of single nodes: Human Dynamics
In the previous sections we focused on one aspect of complex systems: their
topology. While the advances the community has achieved in the past five years in this
direction are clearly impressive, our limited understanding of network dynamics
continues to haunt us. Indeed, most complex systems of practical interest, from the cell to
the Internet and social networks, are fascinating because of their overall temporal
behavior, and the changes they undergo in time. All have nontrivial network topology as
well, but the role of the topology is to serve as a skeleton on which the myriad of
dynamical processes, from information to material transfer, take place. Network theory,
while indispensable towards formalizing and describing these dynamical processes,
cannot yet account for the complex behavior displayed by these systems. This means that
a very important task looms ahead of us: how do we characterize the dynamical processes
taking place on complex networks? What is the interplay between topology and
dynamics? And are there generic organizing principles characterizing network dynamics?
If the history of uncovering the network topology is any guide, advances in this direction
will be empirical, data driven. Indeed, the scale-free topology could not be discovered
until we had extensive datasets to explore large real networks13. Should we make
significant advances towards understanding the network dynamics, we probably have to
develop tools to monitor carefully the dynamics of many real systems, hoping that some
17
universal properties will emerge. In the following I will describe some recent advances in
this direction in the context of human dynamics. The reason for this approach is simple:
the dynamics of many social, technological and economic networks are driven by
individual human actions, turning the quantitative understanding of human behavior into
a central question of modern science, and one that network research can hardly ignore.
Current models of human dynamics, used from risk assessment to communications,
assume that human actions are randomly distributed in time and thus well approximated
by Poisson processes 35,36,37. In contrast, here I show that there is increasing evidence that
the timing of many human activities, ranging from communication to entertainment and
work patterns, follow non-Poisson statistics, characterized by bursts of rapidly occurring
events separated by long periods of inactivity. The bursty nature of human behavior is a
consequence of a decision based queuing process38,39: when individuals execute tasks
based on some perceived priority, the timing of the tasks will be heavy tailed, most tasks
being rapidly executed, while a few experience very long waiting times54,40. In contrast,
priority blind execution is well approximated by uniform interevent statistics. These
findings have important implications from resource management to service allocation in
both communications and retail. Here we show that this is not entirely the case: human
dynamics is driven by rather interesting reproducible mechanisms that can serve as a
starting point towards our understanding of network dynamics.
Humans participate on a daily basis in a large number of distinct activities,
ranging from electronic communication, such as sending emails or making phone calls, to
browsing the web, initiating financial transactions, or engaging in entertainment and
18
sports. Given the number of factors that determine the timing of each action, ranging
from work and sleep patterns to resource availability, as well as interactions with other
individuals it appears impossible to seek regularities in human dynamics, apart from the
obvious daily and seasonal periodicities. Therefore, in contrast with the accurate
predictive tools common in physical sciences, forecasting human and social patterns
remains a difficult and often elusive goal. Given this, the quantitative understanding of
network dynamics driven by human activities appears largely hopeless. Here I show that
this is not entirely the case: human dynamics is driven by rather interesting reproducible
mechanisms that can serve as a starting point towards understanding network dynamics.
Current models of human activity are based on Poisson processes, and assume
that in a dt time interval an individual engages in a specific action with probability qdt,
where q is the overall frequency of the monitored activity. This model predicts that the
time interval between two consecutive actions by the same individual, called the waiting
or inter-event time, follows an exponential distribution (Fig. 4)35,54. Poisson processes are
widely used to quantify the consequences of human actions, such as modeling traffic flow
patterns or accident frequencies35, and are commercially used in call center staffing36,
inventory control37, or to estimate the number of congestion caused blocked calls in
mobile communications41. Yet, an increasing number of recent measurements indicate
that the timing of many human actions systematically deviate from the Poisson
prediction, the waiting or inter-event times being better approximated by a heavy tailed or
Pareto distribution (Fig. 4). The differences between Poisson and heavy tailed behavior is
striking: a Poisson distribution decreases exponentially, forcing the consecutive events to
19
follow each other at relatively regular time intervals and forbidding very long waiting
times. In contrast, the slowly decaying heavy tailed processes allow for very long periods
of inactivity that separate bursts of intensive activity (Fig. 4).
To provide direct evidence for non-Poisson activity patterns in individual human
behavior, we study two human activity patterns: the e-mail based communication
between several thousand email users based on a dataset capturing the sender, recipient,
time and size of each email42,43 and the letter based communication pattern of Einstein53.
As Figure 5 shows, the response time distribution of both emails and letters is best
approximated with
P(τ)~ τ -α
(5)
where   1 for the email communication, and   3 2 for letters. There results indicate
that an individual's communication pattern has a bursty non-Poisson character: during a
short time period a user sends out several emails/letters in a quick succession, followed
by long periods of no communication. This behavior is not limited to email and letter
based communications. Measurements capturing the distribution of the time differences
between consecutive instant messages sent by individuals during online chats44 show a
similar pattern. Professional tasks, such as the timing of job submissions on a
supercomputer45, directory listings and file transfers (FTP requests) initiated by
individual users46, the timing of printing jobs submitted by users47, or the return visits of
individual users on a website48, were also reported to display non-Poisson features.
Similar patterns emerge in economic transactions, describing the number of hourly trades
in a given security49 or the time interval distribution between individual trades in
20
currency futures50. Finally, heavy tailed distributions characterize entertainment related
events, such as the time intervals between consecutive online games played by the same
user51.
The fact that a wide range of human activity patterns follow non-Poisson statistics
suggests that the observed bursty character reflects some fundamental and potentially
generic feature of human dynamics. Next I show that the bursty nature of human
dynamics is a consequence of a queuing process driven by human decision making:
whenever an individual is presented with multiple tasks and chooses among them based
on some perceived priority parameter, the waiting time of the various tasks will be Pareto
distributed54,40. In contrast, first-come-first-serve and random task execution, common in
most service oriented or computer driven environments, lead to a uniform Poisson like
dynamics.
Most human initiated events require an individual to weigh and prioritize different
activities. For example, at the end of each activity an individual needs to decide what to
do next: send an email, do some shopping, or place a phone call, allocating time and
resources for the chosen activity. Consider an agent operating with a priority list of L
tasks. After a task is executed, it is removed from the list, and as soon as new tasks
emerge, they are added to the list. The agent assigns to each task a priority parameter x,
which allows it to compare the urgency of the different tasks on the list. The question is,
how long a given task will have to wait before it is executed. The answer depends on the
21
method the agent uses to choose the task to be executed next. In this respect three
selection protocols39 are particularly relevant for human dynamics:
(i) The simplest is the first-in-first-out protocol, executing the tasks in the order they were
added to the list. This is common in service oriented processes, like the first-come-firstthe execution of orders in a restaurant or getting help from directory assistance and
consumer support. The time period an item stays on the list before execution is
determined by the cumulative time required to perform all tasks added to the list before it.
If the time necessary to perform the individual tasks are chosen from a bounded
distribution (the second moment of the distribution is finite), then the waiting time
distribution will develop an exponential tail, indicating that most tasks experience
approximately the same waiting time.
(ii) The second possibility is to execute the tasks in a random order, irrespective of their
priority or time spent on the list. This is common, for example, in educational settings,
when students are called on randomly, and in some packet routing protocols. The waiting
time distribution of the individual tasks (the time between two calls on the same student)
in this case is also exponential.
(iii) In most human initiated activities task selection is not random, but the individual
executes the highest priority item on its list. The resulting execution dynamics is quite
different from (i) and (ii): high priority tasks will be executed soon after their addition to
the list, while low priority items will have to wait until all higher priority tasks are
22
cleared, forcing them to stay on the list for considerable time intervals. In the following
we show that this selection mechanism, practiced by humans on a daily basis, is the likely
source of the fat tails observed in human initiated processes. To achieve this we consider
two models:
Model A54: We assume that an individual has a priority list with L tasks, each task being
assigned a priority parameter xi , i  1...., L chosen from a  x distribution. At each time
step the agent selects the highest priority task from the list and executes it, removing it
from the list. At that moment a new task is added to the list, its priority xi being again
chosen from  x . This simple model ignores the possibility that the agent occasionally
selects a low priority item for execution before all higher priority items are done,
common, for example, for tasks with deadlines. This can be incorporated by assuming
that the agent executes the highest priority item with probability p, and with probability
1- p executes a randomly selected task, independent of its priority. Thus the p  1 limit
of the model describes the deterministic (iii) protocol, when always the highest priority
task is chosen for execution, while p  0 corresponds to the random choice protocol
discussed in (ii).
Model B53,38: We assume that tasks arrive to the priority list at rate  following a Poisson
process with exponential arrival time distribution. Each time a new task arrives the length
of the list changes from L to L  1 . The tasks are executed at rate  , reflecting the overall
time a person devotes to his priority list. Each time a task is executed the length of the
priority list decreases from L to L  1 . Each task is assigned a discrete priority parameter
23
xi upon arrival, such that always the highest priority unanswered task will be chosen for
a reply. The lowest priority task will have to wait the longest before execution, and
therefore it dominates the waiting time probability density for large waiting times.
As one can notice, the only difference between the Model A and Model B comes
in the length of the queue L : in Model A the queue length is fixed, and remains
unchanged during the model’s evolution, while in Model B L can fluctuate as the tasks
arrive or are executed, taking up any non-negative value. Yet, this small difference has an
important impact on the distribution of the waiting times the tasks experience on the
priority list. Indeed, numerical and analytical results indicate that both models can
account for a power law waiting time distribution. Model A predicts that in the p  0
limit the waiting times follow Eq. 4 with   1 54,52, in agreement with the observed
scaling for email communications, web browsing, and several other human activity
patterns. In contrast, for Model B we find that for    the waiting time distribution
again follows (4), with exponent   3 2 , in agreement with the results obtained for the
mail correspondence patterns of Einstein53,40.
These results indicate that human dynamics is described by at least two
universality classes, characterized by empirically distinguishable exponents. In searching
for the explanation for the observed heavy tailed human activity patterns we limited our
study to the properties of single queues. In reality none of our actions are performed
independently--most of our daily activity is embedded in a web of actions of other
individuals. Indeed, the timing of an email sent to user A may depend on the time we
24
receive an email from user B. An important future goal is to understand how the various
human activities and their timing is affected by the fact that the individuals are embedded
in a network environment, potentially bringing under a single roof the network topology
and dynamics.
Outlook: From Networks to a Theory of Complexity
As it stands today, network theory is not a proxy for a theory of complexity—it
addresses only the emergence and structural evolution of the skeleton of a complex
system. The overall behavior of a complex system, which is what we ultimately need to
understand and quantify, is as much rooted in its architecture as it is in the nature of the
dynamical processes that take place on these networks, from information to material
transfer. We are, however, at the threshold to unravel the characteristics of these
dynamical processes. Indeed, the data collection capabilities that have catalyzed the
explosion of research on network topology today are far more penetrating, allowing us to
capture not only a system’s topology, but also the simultaneous dynamics of its
components, like the communication and travel patterns of millions of individuals, or the
expression level of all genes in a cell. These offer hope for a systematic program, starting
from a measurement based discovery process and potentially ending in a theory of
complexity with predictive power. Thus network theory is by no means the end of a
journey, but rather an unavoidable step towards the ultimate goal of understanding
complex systems54. Should we ever develop a theory of complexity, I believe that it will
25
have to incorporate and build on the newly discovered fundamental laws that govern the
architecture of complex systems58.
At the level of the nodes and links each network is driven by apparently random
and unpredictable events. How would I know whom will you meet tomorrow, what page
would you link your webpage to, and which molecule will react with ATP in your nerve
cell? The true intellectual thrill for a scientist studying complex networks comes from the
recognition that despite this microscopic randomness a few fundamental laws and
organizing principles can explain the topological features of such diverse systems as the
cell, the Internet or the society. This new type of universality drives the multidisciplinary
explosion in network science. Will there be a similar degree of universality when we start
looking at the dynamics of large complex systems? Are we going to find generic
organizing principles that are just as intriguing and powerful as those uncovered in the
past few years in our quest to understand the network topology? While hard to predict, I
expect that such laws and principles do exist. As I have attempted to show above, results
indicating some degree of universality when it comes to the behavior of individual nodes
in a complex systems have started to emerge. So far these results refer to human
dynamics only, but the questions they pose will probably initiate research in other areas
as well. Similarly exciting results have emerged lately on the nature of fluctuations in
complex networks55,56,57. These findings provide hope for just as interesting potential
developments on dynamics as network theory provided us in the past few years on
structure and topology.
26
The advances discussed here represent only the tip of the iceberg. Networks
represent the architecture of the complexity. But to fully understand complex systems, we
need to move beyond this architecture, and uncover the laws governing the underlying
dynamical processes, such as Internet traffic or reaction kinetics in the cell. Most
important, we need to understand how these two layers of complexity, architecture and
dynamics, evolve together. These are all formidable challenges for physicists, biologists,
computer scientists, engineers and mathematicians alike, inaugurating a new era that
Stephen Hawking recently called the ‘century of complexity’.
Acknowledgements This research was supported by NSF ITR 0426737, NSF ACT/SGER
0441089, and the NSF NetWorkBench awards and by an award from the James S.
McDonnell Foundation. I wish to also thank Alexei Vazquez and Suzanne Aleva for their
assistance with the preparation of this review.
27
Figure 1:
Random and Scale-Free Networks. The degree distribution of a random
network follows a Poisson curve very close in shape to the Bell Curve, telling us that
most nodes have the same number of links, and nodes with a very large number of links
don’t exist (top left). Thus a random network is similar to a national highway network, in
which the nodes are the cities, and the links are the major highways connecting them.
Indeed, most cities are served by roughly the same number of highways (bottom left). In
contrast, the power law degree distribution of a scale-free network predicts that most
nodes have only a few links, held together by a few highly connected hubs (top right).
Visually this is very similar to the air traffic system, in which a large number of small
airports are connected to each other via a few major hubs (bottom right). After Ref. 1.
28
Figure 2. (a) The nodes of the World Wide
Web are Web documents, each of which is
identified by an unique uniform resource
locator, or URL. Most documents contain
URLs that link to other pages. These URLs
represent outgoing links, three of which are
shown (blue arrows). Currently there are
about 80 documents worldwide that point to
our
Web
site
www.nd.edu/~networks,
represented by the incoming green arrows.
While we have complete control over the
number of the outgoing links, kout, from our
Web page, the number of incoming links,
kin, is decided by other people, and thus
characterizes the popularity of the page. (b)
The Internet itself, on the other hand, is a
network of routers that navigate packets of data from one computer to another. The
routers are connected to each other by various physical or wireless links and are grouped
into several domains. (c) The probability that a Web page has kin (blue) or kout (red) links
follows a power law. The results are based on a sample of over 325 000 Web pages
collected by Hawoong Jeong. (d) The degree distribution of the Internet at the router
level, where k denotes the number of links a router has to other routers. This research by
29
Ramesh Govindan from University of Southern California is based on over 260,000
routers and demonstrates that the Internet exhibits power-law behavior. After58.
30
Figure 3. The Birth of a Scale-Free Network. The scale-free topology is a natural
consequence of the ever-expanding nature of real networks. Starting from two connected
nodes (top left), in each panel a new node (shown as an empty circle) is added to the
network. When deciding where to link, new nodes prefer to attach to the more connected
nodes. Thanks to growth and preferential attachment, a few highly connected hubs
emerge. After Ref. 1.
31
Figure 4: The difference between the activity patterns predicted by a Poisson process
(top) and the heavy tailed distributions observed in human dynamics (bottom). a
Succession of events predicted by a Poisson process, which assumes that in any moment
an event takes place with probability q. The horizontal axis denotes time, each vertical
line corresponding to an individual event. Note that the interevent times are comparable
to each other, long delays being virtually absent. b The absence of long delays is visible
on the plot showing the delay times τ for 1,000 consecutive events, the size of each
vertical line corresponding to the gaps seen in a. c The probability of finding exactly n
events within a fixed time interval is P(n; q)=e-qt(qt)n/n!, which predicts that for a
Poisson process the inter-event time distribution follows P(τ)=qe-qτ, shown on a loglinear plot in c for the events displayed in a, b-d, The succession of events for a heavy
tailed distribution. e The waiting time τ of 1,000 consecutive events, where the mean
32
event time was chosen to coincide with the mean event time of the Poisson process
shown in a-c. Note the large spikes in the plot, corresponding to very long delay times. b
and e have the same vertical scale, allowing to compare the regularity of a Poisson
process with the bursty nature of the heavy tailed process. f, Delay time distribution P(τ)
~τ-2 for the heavy tailed process shown in d, e, appearing as a straight line with slope -2
on a log-log plot.
33
Figure 5: Left: The response time distribution of an email user, where the
response time is defined as the time interval between the time the user first sees an email
and the time she sends a reply to it. The first symbol in the upper left corner corresponds
to messages that were replied to right after the user has noticed it. The continuous line
corresponds to the waiting time distribution of the tasks, as predicted by Model A
discussed in the paper, obtained for p=0.999999+0.000005. Right: Distribution of the
response times for the letters replied to by Einstein. The distribution is well approximated
with a power law tail with exponent α=3/2, as predicted by Model B. Note that while in
most cases the identified reply is indeed a response to a received letter, there are
exceptions as well: some of the much delayed replies represent the renewal of a long lost
relationship. After Ref. 40
34
1
A.-L. Barabási, Linked: The New Science of Networks, (Plume Books; MA, 2003).
R. Albert and A.-L. Barabási, Statistical mechanics of complex networks, Review of
Modern Physics 74, 47-97 (2002).
3
R. Pastor-Satorras and A. Vespignani, Evolution and Structure of the Internet: A
Statistical Physics Approach, (Cambridge University Press, 2004).
4
A.-L. Barabási and Z. N. Oltvai, Network Biology: Understanding the Cell’s Functional
Organization, Nature Rev. Gen. 5, 101-113 (2004).
5
M. E. J. Newman, A.-L. Barabási and D. J. Watts, The structure and dynamics of
complex networks (Princeton University Press, Princeton, 2006).
6
S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks: From Biological Nets
to the Internet and WWW (New York: Oxford University Press, 2003).
7
S. H. Strogatz, Exploring complex networks, Nature 410, 268 (2001).
8
B. Ben-Naim, H. Frauenfelder, and Z. Toroczski, Complex Networks: Lecture Notes in
Physics (Springer; Germany, 2004).
9
P. Erdős and A. Rényi, On random graphs I, Publ. Math. (Debrecen) 6, 290 (1959).
10
P. Erdős and A. Rényi, On the evolution of random graph, Publ. Math. Inst. Hung.
Acad. Sci. 5, 17 (1960).
11
B. Bollobás, Random Graphs (Academic; London, 1985).
12
R. Albert, H. Jeong and A.-L. Barabási, Diameter of the World Wide Web, Nature 401,
130-131 (1999).
13
A.-L. Barabási and R. Albert, Emergence of scaling in random networks, Science 286,
509-512 (1999).
14
L. Li, D. Alderson, W. Willinger, and J. Doyle, A first-principles approach to
understanding the Internet's router-level topology, Computer Communication Review 34
(4): 3-14 (2004).
15
M. Faloutsos, P. Faloutsos and C.Faloutsos, On power-law relationships of the internet
topology, ACM SIGCOMM 99, Comput. Commun. Rev 29, 251 (1999).
16
S. Milgram, The small world problem, Psychology Today 1, 60 (1967).
17
J. Guare, Six Degrees of Separation, (Vintage Books; NY, 1990).
18
M. S. Granovetter, The strength of weak ties, Am. J. Sociol. 78, 1360 (1973).
19
S. Lawrence and C. L. Giles, Searching the World Wide Web, Science 280, 98 (1998).
20
S. Lawrence and C. L. Giles, Accessibility of information on the web, Nature 400, 107
(1999).
21
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. rajalopagan, R. Stata, A. Tomkins
and J. Weiner, Graph structure in the web, Comput. Netw. 33, 309 (2000).
22
L. A. Damic and B. A. Huberman, Nature 401, 131 (1999).
23
S.-H. Yook, H. Jeong, and A.-L. Barabási, Modeling the internet's large-scale
topology, Proc. Natl. Acad. of Sci USA 99, 13382 (2002).
24
A.-L. Barabasi, R. Albert and H. Jeong, Mean-field theory for scale-free random
networks, Physica A 272, 173 (1999).
25
B. Bollobas, O. Riordan, J. Spencer and G. Tusnady, The degree sequence of a scalefree random graph process, Random Structures & Algorithms 18 (3): 279-290 (2001).
2
35
26
G. Bianconi and A.-L. Barabási, Competition and multiscaling in evolving networks
Europhys. Lett. 54, 436 (2001).
27
G. Bianconi and A.-L. Barabási, Bose-Einstein Condensation in Complex Networks,
Phys. Rev. Lett. 86, 5632 (2001).
28
R. Albert, H. Jeong and A.-L. Barabási, The internet’s Achilles’ heel: Error and attack
tolerance in complex networks, Nature 406, 378 (2000).
29
R. Cohen, K. Reez, D. Ben-Avraham, and S. Havlin, Resilience of the internet to
random breakdowns, Phys. Rev. Lett. 85, 4626 (2000).
30
R. Cohen, K. Reez, D. Ben-Avraham, and S. Havlin, Breakdown of the Internet under
Intentional Attack, Phys. Rev. Lett. 86, 3682 (2001).
31
R. Pastor-Satorras and A. Vespignani, Dynamical and Correlation Properties of the
Internet, Phys. Rev. Lett. 86, 3200 (2001).
32
Z. Dezső and A.-L. Barabási, Can we stop the AIDS epidemic?, Phys. Rev. E 65,
055103(R) (2002).
33
R. Cohen, S. Havlin and D. ben-Avraham, Efficient Immunization Strategies for
Computer Networks and Populations, Phys. Rev. Lett. 91, 247901 (2003).
34
R. Pastor-Satorras and A. Vespignani, Immunization of complex networks, Phys. Rev.
E 65, 036104 (2002).
35
F. A. Haight, Handbook on the Poisson Distribution, (Wiley, New York, 1967).
36
P. Reynolds, Call Center Staffing (The Call Center School Press, Lebanon, Tennessee,
2003).
37
J. H. Greene, Production and Inventory Control Handbook 3rd edn (McGraw-Hill,
NewYork, 1997).
38
A. Cobham, Priority assignment in waiting line problems, J. Oper. Res. Sec. Am. 2, 7076 (1954).
39
J. W. Cohen, The Single Server Queue (North Holland, Amsterdam, 1969).
40
A. Vázquez, J. G. Oliveira, Z. Dezső, K.-I. Goh, I. Kondor and A.-L. Barabási,
Modeling bursts and heavy-tails in human dynamics, Phys. Rev. E. (in press).
41
H. R. Anderson, Fixed Broadband Wireless System Design (Wiley, New York, 2003).
42
J. P. Eckmann, E. Moses and D. Sergi, Entropy of dialogues creates coherent structure
in e-mail traffic. Proc. Natl. Acd. Sci. USA 101, 14333-14337 (2004).
43
H. Ebel, L. I. Mielsch and S. Bornholdt, Scale-free topology of e-mail network, Phys.
Rev. E 66, R35103 (2002).
44
C. Dewes, A. Wichmann and A. Feldman, An Analysis of Internet Chat Systems, Proc.
2003 ACM SIGCOMM Conf. on Internet Measurement (IMC-03), (ACM Press, New
York, 2003).
45
S. D. Kleban and S.H. Clearwater, Hierarchical Dynamics, Interarrival Times and
Performance, Proc. of SC'03, November 15-21, 2003, Phoenix, AZ, USA.
46
V. Paxson and S. Floyd, Wide-Area Traffic: The Failure of Poisson Modeling,
IEEE/ACM Transactions in Networking, 3, 226 (1995).
47
U. Harder and M. Paczuski, Correlated dynamics in human printing behavior,
http://xxx.lanl.gov/abs/cs.PF/0412027.
48
Z. Dezső, E. Almaas, A. Lukacs, B. Racz, I. Szakadat and A.-L. Barabási, Fifteen
Minutes of Fame: The Dynamics of Information Access on the Web,
arXiv:physics/0505087 v1 (2005).
36
49
V. Plerou, P. Gopikirshnan, L.A.N. Amaral, X. Gabaix, and H.E. Stanley, Economic
fluctuations and anomalous diffusion, Phys. Rev. E 62, R3023 (2000).
50
J. Masoliver, M. Montero and G. H. Weiss, Continuous-time random-walk model for
financial distributions, Phys. Rev. E 67, 021112 (2003).
51
T. Henderson and S. Nhatti, Modeling user behavior in networked games, Proc. ACM
Multimedia 2001, Ottawa, Canada, 212-220, 30 September-5 October (2001).
52
A. Vázquez, Exact results for the Barabási model of human dynamics, Physical Review
Letters 95, 248701 (2005).
53
J. G. Oliveira and A.-L. Barabási, Human Dynamics: The Correspondence Patterns of
Darwin and Einstein, Nature 437, 1251 (2005).
54
A.-L. Barabási, The origin of bursts and heavy tails in human dynamics, Nature 207211 (2005).
55
M. A. Menezes and A.-L. Barabási, Fluctuations in network dynamics, Physical
Review Letters 92, 028701 (2004).
56
M. A. de Menezes and A.-L. Barabási, Seperating internal and external dynamics of
complex systems, Phys. Rev. Lett. 93, 068701 (2004).
57
Z. Eisler, J. Kertesz, S.-H. Yook and A.-L. Barabási, Multiscaling and non-universality
in fluctuations of driven complex systems, Europhysics Letters 69, pp. 664-670 (2005).
58
A.-L. Barabási, Taming complexity, Nature Physics 1, 68-70 (2005).
37
Download