Coordination Failure and Congestion in Information ... A. M. Bell*, W. A. ...

advertisement
From: AAAI-00 Proceedings. Copyright © 2000, AAAI (www.aaai.org). All rights reserved.
Coordination
Failure and Congestion
A. M. Bell*, W.
A. Setharest,
in Information
and J. A. Bucklewl
than sixty people
Abstract
The El Far-01or Santa Fe bar problem
the action of other agents, may be an important
source of congestion in large decentralized systems.
sizes the difficulty
The El Far01 or Santa Fe bar problem provides
independent
a simple paradigm for congestion and coordina-
nism.
tion problems that may arise with over utilization
decentralized
of the Internet.
This paper recasts the problem
framework and derives a simple
go, but no one has a good time
when the bar is overcrowded.
Coordination failure, or agents’ uncertainty about
in a stochastic
Networks
of coordinating
agents without
The analogy
between
a centralized
resource allocation
adaptive strategy that has intriguing optimization
work
(Sethares
sion makers, each acting in their own best interests
the dynamics
and with limited knowledge, converge to a solution
Internet
that (optimally)
with public goods prevail.
solves a complex congestion and
consider
as well as in our
1998).
and Huberman
(1994)
(1997)
when externalities
and
is noted by Green-
and Bell
and Huberman
properties; a large collection of decentralized de&
of
mecha-
the bar problem
wald, Mishra and Parikh (1998),
previous
empha-
the actions
Glance
and Lukose
of congestion
on the
similar to those found
Unlike the standard pub-
social coordination problem. A variation in which
lic good framework,
agents are allowed access to full information is not
optimizing
agents will not increase consumption
of
nearly as successful.
a publicly
available resource until it experiences
an
inefficient
level of congestion.
Introduction
This paper
focuses
coordination
diet the behavior
on imperfect
failure
across
information
agents
as a source
and
of
in this scenario fully informed
of other agents perfectly
a good
time.
The only
We uti-
(1994)
as a sim-
of agents to coordinate
plified model of a large class of congestion
and coor-
congestion
in large decentralized
dination
posed by Arthur
problems
and economic
that arise in modern engineering
systems.
El Far01 is a bar in Santa
Fe. The bar is popular,
but becomes
when more than sixty people
evening.
Everyone
systems.
enjoys
overcrowded
attend on any given
themselves
when fewer
*NASA Ames Research Center, Mail Stop 269-3,
Moffett Field, CA 94035-1000,
abell@mail.arc nasa.gov.
‘Department of Electrical and Computer Engineering, University of Wisconsin-Madison.
tDepartment of Electrical and Computer Engineering, University of Wisconsin-Madison.
Copyright @ 2000, American Association for Artificial
Intelligence (www.aaai.org).
All rights reserved.
the bar
would never be crowded and all patrons would have
least in a deterministic
lize the scenario
if agents could %pre-
In
a previous
1998)
we proposed
gorithm
based
in aggregate
of congestion,
their actions.
treatment
(Sethares
formation
attendance
and
adaptive
which
in a decentralized
the seemingly
at
is the inability
a deterministic
on habit
agents to coordinate
while avoiding
source
framework,
Bell
al-
enabled
environment
random
fluctuations
that Arthur’s
simulations
demonstrated.
Here we consider
tic setting
where agents’
ized by a probability
time.
the bar problem
strategies
of attending
version of the adaptive
a clearer problem
statement,
are character-
that evolves over
There are several advantages
the stochastic
in a stochas-
to considering
learning r+le:
a simpler
algorithm
that
is amenable
general results
librium
to detailed
We analyze
characteristics
the mixed
analysis,
of the system
and pure strategy
responding
imize pleasure
to
of the cor-
game.
observed
limiting
agents
leads them
Pareto
efficient
information
of the equilibria
crucially
available
we show that
to agents.
to successfully
emphasize
leviating
congestion
failure.
For
coordinate
example,
role that
to
the agent goes more often
the bar is uncrowded,
supplying
about
system-wide
and al-
routers
may not
information
parameter
p. This learning
Let agents
an agent receives
g is the payoff
uncrowded
for attending
bar.
Without
M be the total
maximum
capacity
of two strategies,
stay home
all s-f,
ui(l,
the ith
(iteration)
of agents
a characteristic
agent
of agents
counter
attending
pure
>
N - 1.
where
are (‘Qb”) such equilibria.
parameter
that
ric pure strategy
no agent
another
worse
strategy
Because
better
equilibrium
of the variance
Let
by 1) and
value
Bernoulli
evolution
of the p;(k)
pi(k) - p(iV(L)
pi(k) - p(N(k)
- p(N(lc)
The operation
-
of the algorithm
otherwise
to N(k) -N
and decreasing
it proportionally
occurs
bar is crowded.
making
each period.
that
the equilibrium
When the agent attends,
proportionally
is one symmet-
results
is not
attends p percent
to N(k) -N
Since the pi(k) represent
pi(k + 1) = pi(L).
makes it feasible
z;(k)
is zero and
simplicity
in section
does not
of the scheme
to analyze the resulting
and as demonstrated
In Arthur’s
The
if the
probabili-
Note that the stepsize
over time.
it
to lie within 0 and 1.
the agent does not attend
decrease
increasing
if the bar is uncrowded
ties, they must be constrained
When
behavior,
.
formulation
of the problem,
agents
have access to information
about attendance
at the
bar even on evenings
attend.
rithm
efficient.
(2)
At each time k the agent flips a biased coin, attend-
utilize
strategy
de-
is uncomplicated.
N - 1, and
Pure
is then
N)‘zi(k) > 1
q(k)
-N)
pi(lc) is adjusted,
There
and zero
z;(k) < 0
-N)
pi(k).
agents
random
pi(k)
then the parameter
where every agent
Suppose that the agent initially
(1)
ing with probability
in equilibrium
of attending
Zi(lc)
= 0 for
are no symmet-
in attendance
from agents’ mixed strategies
Pareto
how much
to new informa-
the instantaneous
are independent
The
bar. The game
off without
There
otherwise.
if
efficient:
off.
has the same probability
be the
fined by pi (k + 1) =
1
equilibria.
are Pareto
can be made
agent
ric mixed
Nash
defines
that are 1 with probability
p;(k)
equilibrium
There
variables
and N be the
sixty agents choose to attend.
Nash equilibria
and N(lc)
Let
of pi at the time Ic. Let
if
5
setting
a Nash
an
where Si consists
= g when I,_,
strategies
when exactly
is pi.
at time /c. Let p be
tion and let pi(k) designate
let h,
be zero.
by 0) and ui(O,s_i)
= b when C,_i
attends
each agent changes pi in response
0
home,
go to the bar (indicated
In a deterministic
only
loss of generality
u;(s~, s-i)]
(indicated
ui(l,s_;)
s-i)
for attending
of an uncrowded
is then G = [M, {Si},
there are M
notation
for the N spaces at the bar. The
where the pi
a crowded bar and
for staying
number
of
i=l
b is the payoff
payoffs:
an agent receives
the payoff received
the state
performance.
Statement
have identical
Over time,
about
or stimulus-response.
N(lc) = 5
Algorithm
if
rule can be interpreted
the previous
that
L be a time
number
p slightly)
to go less often
this in the form of the
as a kind of habit formation
probability
to max-
experiences,
(increases
but prefers
and ‘remembers’
Summarizing
environment
congestion
the bar,
painful
p) if the bar is crowded
gathers
agents competing
increased
individual
the agent
on a
Our
with the desire
and minimize
more
arises from coordination
in a complex
with more information
result in better
outcome.
may play in creating
that
such as the Internet
available
while providing
the critical
exchange
of
In particular,
the information
equilibrium
ac-
on the nature
leads to an inefficient
information
Consistent
(to decrease
depend
the information
results
of the time.
and equi-
in relation
equilibria
The type and characteristics
tually
and more
the dynamic
This
when they do not themselves
can be incorporated
into
(a), giving the update pi(lc + 1) =
0 if pi(k) - ,u(N(~) -N)
< 0
the algo-
if pi(k) - p(N(k)
1
p;(k)
which
Arthur’s
otherwise
(3)
the information
structure
used by
agents.
mation
As will become
structure
of the algorithm.
clear, this infor-
as in (2) they utilize
In contrast,
(3) utilizes ‘%_ill
to whether
algorithm
the bar is crowded
if pi(Ic) - p sgn(N(k)
1
if p;(k)
- N)
- p sgn(l\r(k) -N)
- ,U sgn(N(lc)
-N)
Behavior
explores
z;(k)
xi(k)
for 60
that
ing 40 agents attend less and less frequently,
with
their probability
This
parameter
of the population
very near zero.
appears
nowhere in the
algorithm statement;
rather, it is an emergent prop-
or
erty of the adaptive
solution
to the El Far01 prob-
lem. When agents follow this adaptive
0
section
themselves
parameter
up-
not: pi(k + 1) =
system
the agents have divided
The probability
pi(k)
run. By the
they go to El Far01 nearly every time. The remain-
division
A related version of the stochastic
This
final iteration,
simulation
of the agents has risen very near 1, indicating
agents base their updates
because agents base their decisions on
Generic
Figure 2 shows values of the probabilities
over the course of a typical
When
the full record of attendance.
pi(k)
value and
about half the time.
into two groups.
“partial information”.
dates according
about the optimal
the bar is overcrowded
is a key element in the behavior
on only their own experiences
information”
far greater excursions
> 1
- N)
- p(N(L)
mimics
-N)
Farol looks more like Cheers.
< 0
Despite
tic nature of agents of the adaptive
converges to a pure strategy
zi(le) > 1
strategy, El
the stochas-
learning rule it
Nash equilibrium.
(4)
otherwise
of the Algorithms
the generic
behavior
of the
when each of the M = 100 agents follows
the strategy
defined by (2) above.
of the various
simulations
Though
differ, a typical
illustrated
in Figure 1. The probabilities
initialized
randomly.
details
case is
pi (0) were
80
$
70
15’00
Ej 60
z
0 50
t?i
< 40
time (iteration number)
1500
1000
500
Figure 2: Probabilities
time (iteration number)
with Partial Information
Finally Figures 3 and 4 use the “full information”
Figure I: Attendance
with Partial Information
algorithm
(3) to investigate
the effect of allowing
the agents to update their probabilities
Perhaps
the most striking
aspect of these simu-
lations is the rapid convergence
to near the optimal
value of N = 60 and the associated
variance
of attendance.
decline in the
The outcome
approaches
that which would be chosen with centralized
con-
eration, whether they have personally
in Arthur’s
proximately
over time,
simulations.
Note that the transient
ences.
In comparison,
attendance
is ap-
does not decline
and often the bar is overcrowded.
behavior
in the initial pe-
that is, on its own experi-
riods is masked by the long time scale.
in Arthur’s
should
setup, there are
the
structure
that seats in the bar ,often
remain unfilled,
information,
Mean
60, but the variance
indicating
and makes the decision
on local
attended
bar or not. This reflects the information
trol, despite the fact that each agent is autonomous,
to go (or not to go) based
at every it-
be compared
to Figure
Figufe
2; the probability
4
parameters
for these agents continue to bounce ran-
tration,”
or similar responses
domly about some fixed value as their probabilities
tion, and that consequently,
all increase
lead to increased congestion.
or decrease
simultaneously
in response
to common
informa-
more information
can
to the same signals.
Analysis
of the Adaptive
Solutions
The first step in the analysis of the dynamic
ior of the algorithms
is to determine
under which the means of the p(k)
that is, to determine
aged system.
’
0
m
500
1500
1000
We consider
the partial
Taking the expectation
E{Pi(k+
with Full Information
1)) = E{&(k))
assuming
exactly
1
-@{(N(k)
that the pi(k)
information
-N)
G(k)),
are not at the boundary
when the update
remains unchanged
portion
is zero, that is,
when
_$
09
E{(N(k)
’c
08
Using (1) this can be rewritten
3
2
$2
a&
Em
gr
07
2
%
fixed;
of both sides of (2) gives
points 0 or 1. This expectation
x
.z
.z
remain
the steady states of the aver-
case first.
time (iteration number)
Figure 3: Attendance
behav-
the conditions
El&
0.6
Xj(k)-N)
- N)
G(k))
Q(k)}
= S{(F
j=l
0.5
since the xi(k)
03
otherwise.
0.2
pendent of pi(k)
1000
500
1500
is 1 with probability
R(k))
pi(lc) and zero
Because the term in parenthesis
dent Bernoulli
OO
z:j(k)+l-N)
j=1
j#a
04
0.1
= 0.
( recall that the zj(k)
random
variables)
is inde-
are indepen-
this becomes
M
2000
= (1 -N
time (iteration number)
+ ~E{q(k)})
pi(k)
~
j=l
j#i
M
Figure 4: Probabilities
Somewhat
their
a Pareto
efficient
access
with Full Information
paradoxically,
ordinate
behavior
noted a similar phenomena
ulations
sue a strict
Several
authors
in transportation
strategy,
as a whole
tion.
Arnott,
pur-
changing
their
over
De Palma
show that congestion
of the system
if more than 25% of drivers
have access to real time information
and Lindsey
PI
(1-N+Cjf2~jW)
pa(k)
tk>
.
(5)
i
Consider
1991)
can arise because of “concen-
any candidate
ones and N - M zeroes.
steady
state p* with N
Let II be the indices of
the ones and IO be the indices of the zeroes.
there are two kinds of terms in (5).
When
Then
i E 11,
C$-pj=N-landso
about conges(1996,
(1- N + C;, c(k))
rout-
how small the improvement
degrades
‘&&+)) pi(k).
have
(1991) use sim-
their current choice, the performance
(I- N +
In full vector form this is
achieves
that when individuals
best response
co-
only when agents have
and Jayakrishnan
to demonstrate
route no matter
agents successfully
and the system
outcome
to less information.
ing. Mahmassani
=
M
(l-N+xp;)
p;’ = (I-N+N-1)
pi’ = O.l (6)
When i
x%-ipj’ = N,
IO,
E
pt = 0, and hence
can be derived as an approximation
neous gradient
to an instanta-
descent for minimization
of the cost
function
j#a
Hence p* is a steady
0 < pi
the relevant
(7)
< 1 for at least one n.
M
i=l
i=l
In this case,
term in (5) is
M
M
is not of the form of N ones and N - M zeroes.
Thus
-N)”
where
state.
any p* for which CyZ1 pj = N that
Now consider
= (E{N(k)}
J(lc)
j=l
i=l
(8)
is the expected
typical
number
gradient
of attendees
strategy
at time k. The
is to update
the state us-
ing
j=l
j=l
j#n
pi(k + I) = Pi(k) - P(#.
=(l-N+N-p;)p;=(l-p;)p:.
This
state.
With
Hence the only steady states of algorithm
are at p* consisting
In particular,
at pj
=
is not a steady
In contrast,
out for the
Nash
a similar
“full information”
analysis
algorithm.
Taking
states
= E(pi(k)}
-PE{Fsj(k)
occur when E{Cg,
dpi(k)
-N}.
zj(k)
which
-N}
= 0,
N.
j=l
j=l
any
p’
of this
with
j=l
xy=ip;
algorithm.
mixed strategy
equilibria
state
ual probabilities
a higher
these
are not
of the El Far01 game un-
less pi = .6 for every agent.
the steady
is an instantaneous
of J(k).
gradient
In the
limited
The expected
return in
state.
is lower for agents whose individare lower than average as they face
actually
observed
and vice versa for those whose individual
ties are higher than average.
probabili-
Consequently,
a sensi-
information
Similarly,
the
(N(k) - N),
assuming
analogy,
To further
understand
system
we relate
viduals
to a global
the global behavior
the algorithms
cost function.
utilized
The
of the
by indialgorithm
(LMS)
in the context
adaptive
J(k)
these algorithms
Mean Square
oc-
regardless
Adding the a priori lim(2) and (3).
= N in a steady
the limited information
equilibria
based
on the
althe
with full
costs will be $-Var[N
algorithm
equilibrium
their parame-
update
(k)].
sign of
4, can be derived from the absolute
value cost function
adjust
.(IO)
costs will be 0, whereas
the expected
rule might converge to a pure strategy
agents
this
to a pure strategy
ble learning
ters slowly overtime.
case
E{N(k)}
However, because
converges
that the bar will be crowded,
-N).
every iteration
attendance.
algorithms
to the
= 1, in the full information
occurs
gorithm
probability
P (N(k)
its on pi(k) then gives the algorithms
For both
value gives
this into (9) gives
information
curs only when xi(k)
of the agent’s
-N.
approximation
Substituting
case this update
N is a steady
=
Note that
= I, and hence
N(k) -N
M
pi(k + I) = pi(k) -
E{erj(k)]
=gE{zj(k)]
=cpj(k)
=
state
is w
by its instantaneous
-dJ(k)
i e., whenever
Hence
E{N(k)}
Replacing
of both sides of (3) gives
+ I)}
d;$L;)}.
*
3 % =E(N(k)}
carried
j=l
Steady
From (8), the derivative
-N)
1, b x
(for g =
state of this algorithm.
consider
the expectation
E(G(~
mixed strategy
.6 for all j
as in (7),
dJO
=(E{N(k)}
dpi(k)
(2)
of N ones and M - N zeroes.
the symmetric
equilibrium
-0.98)
J(lc)
be zero and hence p’ is not a steady
cannot
(9)
2
LMS algorithm.
are variants
algorithms
of linear
filtering;
= IE{N(k)}
system
-
nil.
By
of the Least
which are common
identification
(4) is an analog
and
of the signed
The
adaptive
mechanism
ized decision
interests
thus
provides
a large collection
makers,
a simple
of decentral-
each acting in their own best
and with only limited knowledge,
a complex
lem.
solution
whereby
congestion
Moreover,
atively
rapid
can solve
and social coordination
convergence
(depending
prob-
to the solution
on the initial
is rel-
conditions)
and robust.
References
Arnott,
R.;
De Palma
Information
Facilities
and Usage of Free-access
with Stochastic
International
Arnott,
Does
Capacity
Review
Economic
R.;
De Palma
Providing
Traffic
A.; and Lindsey,
R. 1996.
Congestible
and Demand.
37(1):181-203.
A.; and Lindsey,
Information
Transportation
Congestion?
R. 1991.
to Drivers
Reduce
Research
A
25A(5):309-318.
Arthur,
W.
B.
1994.
Bounded
Rationality:
American
Economic
Inductive
Reasoning
EZ Far01
The
Review:
Papers
and
Problem.
and Proceed-
ings 1994 84(May):406-411.
Glance,
N
S.;
Dynamics
and Huberman,
of Social
B. A. 1994.
Scientific
Dilemmas.
The
Ameri-
can (March):76-83.
Greenwald,
The
A.; Mishra,
Santa
Fe Bar
cal and Practical
B.;
and Parikh,
Problem
Revisited:
Implications.
R. 1998.
Theoreti-
Technical
Report,
New York University.
Huberman,
B. A.;
cial Dilemmas
277(July
and Lukose,
and Internet
R. M. 1997.
Congestion.
So-
Science
25):535-537.
Mahmassani,
System
H. S.;
and Jayakrishnan,
Performance
Real-time
ridor ,”
and User
Information
Transportation
in a Congested
Research
R.
Response
A
1991.
Under
Traffic Cor25A(5):293-
307.
Sethares,
W. A.; and Bell, A. M. 1998. An Adap-
tive Solution
of the
to the El Far01 Problem.
36th Annual
munication,
Sept. 1998.
Control,
Allerton
Proceedings
Conference
and Computing,
on Com-
Allerton
IL,
Download