Consider the general nation nonzero

advertisement
Subgame-consistent Solutions for Cooperative Stochastic Games
with Randomly Furcating Payoffs
LEON A. PETROSYAN
Faculty of Applied Mathematics-Control Processes, St Petersburg State
University
St Petersburg, 198904, Russia
spbuoasis7@petrlink.ru
DAVID W.K. YEUNG
Center of Game Theory, St Petersburg State University, St Petersburg,
198904, Russia
SRS Consortium for Advanced Study in Cooperative Dynamic Games
Hong Kong Shue Yan University, Hong Kong
dwkyeung@hksyu.edu
Abstract – In cooperative stochastic dynamic games a stringent condition subgame consistency – is required for a dynamically stable solution. In
particular, a cooperative solution is subgame consistent if an extension of the
solution policy to a situation with a later starting time and any feasible state
brought about by prior optimal behavior would remain optimal. The paradigm
of randomly-furcating stochastic games incorporates additional stochastic
elements via randomly branching payoff structures in stochastic games. This
paper considers subgame consistent solutions for cooperative stochastic games
with randomly furcating payoff structures. Analytically tractable payoff
distribution procedures contingent upon specific random realizations of the
state and payoff structure are derived. This is the first time that subgame
consistent solution for randomly-furcating stochastic games is obtained. It
widens the application of cooperative dynamic game theory to discrete-time
problems where the evolution of the state and future environments are not
known with certainty.
Keywords – Cooperative stochastic games, subgame consistency, randomly
furcating payoff structures.
AMS Subject Classifications -- Primary 91A12, Secondary 91A25.
1
1. Introduction
The discipline of game theory studies decision making in an interactive environment.
One particularly complex and fruitful branch of game theory is dynamic games,
which investigate interactive decision making over time. A property of decision
making over time is that the future is inherently unknown and therefore (in the
mathematical sense) uncertain. Yeung (2001 and 2003) introduces the class of
randomly furcating stochastic differential games which allows stochastic dynamics
and random changes in future payoff structures. This extension allows random
elements in future payoffs which are prevalent in many practical situations like
regional economic cooperation, corporate joint ventures and environmental
management.
Cooperative games suggest the possibility of socially optimal and group
efficient solutions to decision problems involving strategic action. In dynamic
cooperative games with stochastic elements, a stringent condition – that of subgame
consistency -- is required for a dynamically stable cooperative solution. In particular,
a cooperative solution is subgame-consistent if an extension of the solution policy to a
situation with a later starting time and any feasible state brought about by prior
optimal behavior would remain optimal.
In particular the property of subgame
consistency ensures that as the game proceeds
players are guided by the same
optimality principle at each instant of time, and hence do not possess incentives to
deviate from the previously adopted optimal behavior throughout the game. A
rigorous framework for the study of subgame-consistent solutions in cooperative
stochastic differential games was established in the work of Yeung and Petrosyan
(2004, 2005 and 2006). A generalized theorem was developed for the derivation of an
analytically tractable “payoff distribution procedure” which would lead to subgameconsistent solutions. Petrosyan and Yeung (2007) derived subgame consistent
solutions for randomly furcating continuous-time stochastic differential games.
In computer modeling and operations research study discrete-time analysis
often proved to be more applicable and compatible with actual data. In stochastic
dynamic games, Basar and Mintz (1972 and 1973) and Basar (1978) developed
equilibrium solution of linear-quadratic stochastic dynamic games with noisy
observation. Basar and Ho (1974) and Basar (1979) examined informational
properties of the Nash solutions of stochastic nonzero-sum games. Some applications
2
of stochastic dynamic games can be found in Smith and Zenou (2003) and EstebanBravo and Nogales (2008). Yeung and Petrosyan (2010) derived subgame consistent
solutions for cooperative stochastic dynamic games.
This paper develops the class of randomly furcating discrete-time stochastic
games. The Nash equilibrium is characterized for non-cooperative outcomes and
subgame-consistent cooperative solutions is derived for the cooperative paradigm. A
discrete-time analytically tractable payoff distribution procedures contingent upon
specific random realizations of the state and payoff structure are derived. This is the
first time that subgame consistent solutions of randomly-furcating cooperative
stochastic dynamic games are examined. This new approach widens the application of
cooperative differential game theory to discrete-time problems where the evolution of
the state and future environments are not known with certainty. The paper is
organized as follows. The game formulation is given in Section 2. Group optimality
and individual rationality under dynamic cooperation are discussed in section 3.
Subgame consistent solutions and payment mechanism leading to the realization of
these solutions are obtained in Section 4. Concluding remarks are provided in Section
5.
2. Game Formulation and Outcome
Consider the T  stage n  person nonzero-sum discrete-time dynamic game with
initial state x 0 . The state space of the game is X  R m and the state dynamics of the
game is characterized by the stochastic difference equation:
xk 1  f k ( xk , u1k , uk2 ,, ukn )  k ,
(2.1)
for k  {1,2,, T } and x1  x 0 ,
where uki U i  Rmi is the control vector of player i at stage k , xk  X is the state,
and k is a sequence of statistically independent random variables.
The payoff of player i at stage k is g ki [ xk , u1k , uk2 ,, ukn ; k ] which is affected
by a random variable  k . In particular,  k for k  {1,2,, T } are independent random
variables with range { k1 ,  k2 , ,  kk } and corresponding probabilities {1k , 2k ,  , kk } .
In stage 1, it is known that 1 equals 11 with probability 11  1.
The objective that player i seeks to maximize is
3

E1 , 2 ,, T ;1 ,2 ,,T 

T

k 1

g ki [ xk , u1k , uk2 ,, ukn ; k ]  qi ( xT 1 )  ,

for i  {1,2,  , n}  N ,
(2.2)
where E1 , 2 ,, T ;1 ,2 ,,T is the expectation operation with respect to the random
variables 1 , 2 , ,T and 1 ,2 ,,T , and qi ( xT 1 ) is a terminal payment given at
stage T  1 . The payoffs of the players are transferable.
The game (2.1)-(2.2) is a randomly furcating stochastic dynamic game. In a
stochastic dynamic game framework, strategy space with state-dependent property has
to be considered. In particular, a pre-specified class  i of mapping t( t ) i () : X  U i
with the property ut( t )i  t( t ) i ( x) , for i  N and  t  {1,2,,t } and t  {1,2,, T } ,
is the strategy space player i and each of its elements is a permissible strategy.
To solve the game, we invoke the principle of optimality in Bellman’s (1957)
technique of dynamic programming and begin with the subgame starting at last
operating stages, that is stage T . If TT {T1 ,T2 ,,TT } has occurred at stage T
and the state xT  x , the subgame becomes:
 i

1
2
n
i
E
max
 gT [ x, uT , uT ,, uT ;T T ]  q ( xT 1 )

i
T
uT


 , for i  N ,

subject to
xT 1  fT ( x, uT1 , uT2 ,, uTn )  T .
(2.3)
A set of state-dependent strategies { T( T )i* ( x)  i , for i  N } constitutes a
Nash equilibrium solution to the subgame (2.3) if the following conditions are
satisfied:

V ( T )i (T , x) = ET  gTi [ x,T( T )1* ( x),T( T ) 2* ( x),,T( T ) n* ( x);T T ]  qi ( xT 1 )





 ET  gTi [ x,T( T )1* ( x), T( T ) 2* ( x),,T( T )i 1* ( x), T( T )i* ( x), T( T )i 1* ( x),


,T( T ) n* ( x);T T ]  qi ( ~
xT 1 )  , for i  N ,

where xT 1  fT [ x,T( T )1* ( x), T( T ) 2* ( x),,T( T ) n* ( x)]  T
~
xT 1  fT [ x,T( T )1* ( x), T( T ) 2* ( x),,T( T )i 1* ( x), T( T )i ( x), T( T )i 1* ( x),
4
,T( T ) n* ( x)]  T .
A Nash equilibrium of the subgame (2.3) can be characterized by the
following lemma.
Lemma 2.1. A set of strategies {uT( T )i*  T( T )i* ( x)  i , for i  N } provides a Nash
equilibrium solution to the subgame (2.3) if there exist functions V ( T )i (T , x) , for
i  N , such that the following conditions are satisfied:
 i
( )1*
( ) 2*
( ) i 1*
E
( x), uT( T )i ,T( T )i 1* ( x),
V ( T )i (T , x)  max
 gT [ x,T T ( x), T T ( x),,T T

T
(T ) i
uT


~
,T( T ) n* ( x) ;T T ]  V ( T 1 )i [T  1, fT( T )i ( x,T( T )i ( x))  T ]  ,

V ( T )i (T  1, x)  qi (x) ;
for i  N ,
(2.4)
~
where fTi ( x,T( T ) i ( x))
 fT [ x,T( T )1* ( x),T( T ) 2* ( x),,T( T )i 1* ( x), T( T )i ( x),T( T )i 1* ( x), ,T( T ) n ( x)] .
Proof. The system of equations in (2.4) satisfies the standard discrete-time stochastic
dynamic programming property and the Nash property for each player i  N . Hence a
Nash (1951) equilibrium of the subgame (2.3) is characterized. One can see the
detailed proof of the results in Theorem 6.10 in Basar and Olsder (1999).

For the sake of exposition, we sidestep the issue of multiple equilibria and
focus on playable games in which a particular noncooperative Nash equilibrium is
chosen by the players in the subgame. Using Lemma 2.1, one can characterize the
value functions V ( T )i (T , x) for all  T {1,2,,T } if they exist. In particular,
V ( T )i (T , x) yields player i ’s expected game equilibrium payoff in the subgame
starting at stage T given that TT occurs and xT  x .
Then we proceed to the subgame starting at stage T  1 when TT11
{T1 1 ,T21 ,,TT 11 } occurs and xT 1  x . In this subgame which player i  N seeks
to maximize his expected payoff

E T ;T 1 ,T  gTi 1[ x, uT1 1 , uT21 ,, uTn1;TT11 ]


 gTi [ xT , uT1 , uT2 ,, uTn ;T ]  qi ( xT 1 ) 

5
 E
 i

1
2
n
 gT 1[ x, uT 1 , uT 1 ,, uT 1;T T11 ]

T 1 ,T

T



T

gTi [ xT , uT1 , uT2 ,, uTn ;T T ]  qi ( xT 1 )  , for i  N ,

T
T
1
(2.5)
subject to
xk 1  f k ( xk , u1k , uk2 ,, ukn )  k ,
for k  {T  1, T } and xT 1  x .
(2.6)
If the functions V ( T )i (T , x) for all  T {1,2,,T } characterized in Lemma
2.1 exist, the subgame (2.5)-(2.6) can be expressed as a game in which player i seeks
to maximize the expected payoff

ET 1  gTi 1[ x, uT1 1 , uT2 1 ,, uTn 1; TT11 ]


T



T
1
T
T

V ( T )i [T , fT 1 ( x, uT1 1 , uT2 1 ,, uTn 1 )  T 1 ]  , for i  N ,

(2.7)
using his control uTi 1 .
A set of strategies { T(1T 1 )i* ( x)  i , for i  N } constitutes a Nash
equilibrium solution to the subgame (2.7) if the following conditions are satisfied:

V ( T 1 )i (T  1, x) = ET 1  gTi 1[ x,T(1T 1 )1* ( x),T(1T 1 ) 2* ( x),,T(1T 1 ) n* ( x);TT11 ]


T



T
1
T
T

V ( T )i [T , fT 1[ x,T(1T 1 )1* ( x), T(1T 1 ) 2* ( x),,T(1T 1 ) n* ( x)]  T 1 


 ET 1  gTi 1[ x,T(1T 1 )1* ( x),T(1T 1 ) 2* ( x),,T(1T 1 )i 1* ( x),T(1T 1 )i ( x),T(1T 1 )i 1* ( x),

( T 1 ) n*
T 1
,
 T 1
( x);T 1 ] 
T



T
1
T
T

~
V ( T )i [T , fTi1 ( x,T(1T 1 )i ( x))  T 1 ]  , for i  N , (2.8)

~
where fTi1 ( x,T(1T 1 ) i ( x))
 fT 1[ x,T(1T 1 )1* ( x),T(1T 1 ) 2* ( x),,T(1T 1 )i 1* ( x),T(1T 1 )i ( x), T(1T 1 )i 1* ( x), ,T(1T 1 ) n* ( x)] .
A Nash equilibrium of the subgame (2.7) can be characterized by the
following lemma.
Lemma 2.2.
A set of strategies {uT(T11 )i*  T(1T 1 )i* ( x)  i , for i  N } provides a
Nash equilibrium solution to the subgame (2.7) if there exist functions V ( T )i (T , xT )
6
for i  N and  T  {1,2,,T } characterized in Lemma 2.1, and functions
V ( T 1 )i (T  1, x) , for i  N , such that the following conditions are satisfied:
 i
( T 1 )1*
E
( x), T(1T 1 ) 2* ( x),
V ( T 1 )i (T  1, x)  max
T 1  gT 1[ x,T 1
u T( T1 1 ) i

,T(1T 1 )i 1* ( x), uT(T11 )i ,T(1T 1 )i 1* ( x), ,T(1T 1 ) n* ( x) ;TT11 ]

T



T
1
T
T

~
V ( T )i [T , fT(1T 1 )i ( x, uT(T11 )i )  T 1 ]  ,

for i  N . (2.9)
~
where fTi1 ( x, uT(T11 ) i )
 fT 1[ x,T(1T 1 )1* ( x), T(1T 1 ) 2* ( x),,T(1T 1 )i 1* ( x), uT(T11 )i ,T(1T 1 )i 1* ( x), ,T(1T 1 ) n* ( x)] ,
Proof. The conditions in Lemma 2.1 and the system of equations in (2.9) satisfies the
standard discrete-time stochastic dynamic programming property and the Nash
property for each player i  N . Hence a Nash equilibrium of the subgame (2.7) is
characterized.

In particular, V ( T 1 )i (T  1, x) , if exists, yields player i ’s expected game
equilibrium payoff in the subgame starting at stage T  1 given that TT11 occurs and
xT 1  x .
Consider
the subgame starting at
stage t  {T  2, T  3,,1} when
 t  {t1 ,t2 ,,t } occurs and xt  x , in which player i  N maximizes his
t
t
expected payoff

E t 1 ;t ,t 1 ,,T  gti [ x, ut1 , ut2 ,, utn ;t t ]


gi [ x , u1 , u2 ,, un ; ]  qi ( xT 1 )  , for i  N ,

(2.10)
xk 1  f k ( xk , u1k , uk2 ,, ukn )  k , for k  {t , t  1,, T } and xt  x .
(2.11)

T


t 1
subject to
Following the above analysis, the subgame (2.10)-(2.11) can be expressed as
a game in which player i  N maximizes his expected payoff

Et  gti [ x, ut1 , ut2 ,, utn ;t t ]

7

 t 1



t 1
t 1
t 1 1

V ( t 1 )i [t  1, ft ( x, ut1 , ut2 ,, utn )  t ]  , for i  N ,

(2.12)
with his control uti ,
where V ( t 1 )i [t  1, f t ( x, ut1 , ut2 ,, utn )  t ] is player i ’s expected game equilibrium
payoff in the subgame starting at stage t  1 given that  tt11 occurs and
xT 1  f t ( x, ut1 , ut2 ,, utn )  t .
A set of strategies { t( t ) i * ( x)  i , for i  N } constitutes a Nash equilibrium
solution to the subgame (2.12) if the following conditions are satisfied:

V ( t )i (t , x) = Et  gti [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t ) n* ( x);t t ]


 t 1



t 1 1
t 1
t 1
V ( t 1 )i [t  1, f t [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t ) n* ( x)]  t




 Et  gti [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t )i 1* ( x), t( t )i ( x), t( t )i 1* ( x),

, t( t ) n* ( x); t t ] 
 t 1



t 1 1
t 1
t 1

~
V ( t 1 )i [t  1, ft i ( x,t( t )i ( x))  t ]  ,

where
~i
ft ( x,t( t )i ( x))
 f t [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t )i 1* ( x), t( t )i ( x), t( t )i 1* ( x), ,t( t ) n* ( x)] .
A Nash equilibrium solution for the game (2.1)-(2.2) can be characterized as
follows.
Theorem 2.1.
A set of strategies {ut( t ) i *  t( t ) i * ( x)  i , for  t  {1,2,,t } ,
t  {1,2,, T } and i  N } constitutes a Nash equilibrium solution to the game (2.1)-
(2.2) if there exist functions V ( t )i (t , x) , for  t  {1,2,,t } , t  {1,2,, T } and
i  N , such that the following recursive relations are satisfied:
V ( T )i (T  1, x)  qi (x) ,
 i
( T )1*
E
( x), T( T ) 2* ( x),,T( T )i 1* ( x), uT( T )i ,T( T )i 1* ( x),
V ( T )i (T , x)  max
T  gT [ x,T
(T ) i
uT

8

~
,T( T ) n* ( x) ;T T ]  V ( T 1 )i [T  1, fT( T )i ( x, uT( T )i )  T ]  ,

 i
( t )1*
( x), t( t ) 2* ( x),,t( t )i 1* ( x), ut( t )i ,t( t )i 1* ( x),
E
V ( t )i (t , x)  max
 t  g t [ x, t
( ) i
ut t

,t( t ) n* ( x ) ; t t ] 
 t 1



t 1 1
t 1
t 1

~
V ( t 1 )i [t  1, ft ( t )i ( x, ut( t )i )  t ]  ;

for  t  {1,2,,t } , t  {1,2,, T  1} and i  N ,
(2.13)
~
where ft i ( x, ut( t )i )
 f t [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t )i 1* ( x), ut( t )i ,t( t )i 1* ( x), ,t( t ) n* ( x)] ;
for t  {1,2,, T } .
Proof.
The results in (2.13) characterizing the game equilibrium in stage T and stage T  1
are proved in Lemma 2.1 and Lemma 2.2. Invoking the subgame in stage
t  {1,2,, T  2} as expressed in (2.12), the results in (2.13) satisfy the optimality
conditions in stochastic dynamic programming and the Nash equilibrium property for
each player in each of these subgames. Therefore, a feedback Nash equilibrium of the

game (2.1)-(2.2) is characterized.
Theorem 2.1 is the discrete-time analog of the Nash equilibrium in the
continuous-time randomly furcating stochastic differential games in Yeung (2001 and
2003) and Petrosyan and Yeung (2007).
3. Dynamic Cooperation
Now consider the case when the players agree to cooperate and distribute the joint
payoff among themselves according to an optimality principle. Two essential
properties that a cooperative scheme has to satisfy are group optimality and individual
rationality. Group optimality ensures that all potential gains from cooperation are
captured. Failure to fulfill group optimality leads to the condition where the
participants prefer to deviate from the agreed upon solution plan in order to extract the
unexploited gains. Individual rationality is required to hold so that the payoff
allocated to an economic agent under cooperation will be no less than its
noncooperative payoff.
Failure to guarantee individual rationality leads to the
9
condition where the concerned participants would reject the agreed upon solution plan
and play noncooperatively.
3.1. Group Optimality
Maximizing the players’ expected joint payoff guarantees group optimality in a game
where payoffs are transferable. To maximize their expected joint payoff the players
have to solve the discrete-time stochastic dynamic programming problem of
maximizing

E1 , 2 ,, T ;1 ,2 ,,T 

n

j 1
 j
1
2
n
 g k [ xk , uk , uk ,, uk ; k ]

k 1 

T
 n
j
   q ( xT 1 )
 j 1

 (3.1)

subject to (2.1).
The stochastic control problem (2.1)-(3.1) can be regarded as a single-player
case of the game problem (2.1)-(2.2) with n  1 and the payoff being the sum of the
all the players’ payoffs. In a stochastic dynamic framework, again strategy space with
state-dependent property has to be considered. In particular, a pre-specified class ̂ i
of mapping  t( t )i () : X  U i with the property ut( t )i   t( t ) i ( x ) , for i  N and
 t  {1,2,,t } and t  {1,2,, T } , is the strategy space player i and each of its
elements is a permissible strategy.
To solve the dynamic programming problem (2.1) and (3.1), we first consider
the problem starting at stage T . If TT {T1 ,T2 ,,TT } has occurred at stage T and
the state xT  x , the control problem becomes:
max
1
2
u T , uT ,, u Tn

ET 

n

gTj [ x, uT1 , uT2 ,, uTn ;T T ]   q j ( xT 1 ) 

j 1
n

j 1
(3.2)
subject to
xT 1  fT ( x, uT1 , uT2 ,, uTn )  T .
(3.3)
An optimal solution to the stochastic control problem (3.2)-(3.3) can be
characterized by the following lemma.
Lemma 3.1.
A set of controls { uT( T )i *   T( T )i* ( x)  ˆ i , for i  N } provides an
optimal solution to the control problem (3.2)-(3.3) if there exist functions
W ( T 1 ) (T , x) , for i  N , such that the following conditions are satisfied:
W ( T ) (T , x) 

E


T
,, u T(  T ) n

max
( ) 2
u T(  T )1 , u T
T
n

j 1
gTj [ x, uT(T )1 , uT(T ) 2 , , uT( T ) n ; T T ]
10

 W ( T 1 ) [T  1, fT ( x, uT( T )1 , uT( T ) 2 ,, uT( T ) n )  T ]  ,

n
W ( T ) (T  1, x)   q j ( x) .
(3.4)
j 1
Proof. The system of equations in (3.4) satisfies the standard discrete-time stochastic
dynamic programming property. One can see the detailed proof of the results in Basar
and Olsder (1999).

Using Lemma 3.1, one can characterize the functions W ( T ) (T , x) for all TT
{T1 ,T2 ,,TT } , if they exist. In particular, W ( T ) (T , x) yields the expected
cooperative payoff in the cooperation duration starting at stage T given that TT
occurs and xT  x .
Following the analysis in Section 2, the control problem starting at stage t
when  t t  {t1 ,t2 ,,t t } occurs and xt  x can be expressed as:

max Et 
ut


n

j 1
gtj [ x, ut1 , ut2 ,, utn ; t t ]
 t 1



t 1 1
t 1
t 1

W ( t 1 ) [t  1, ft ( x, ut1 , ut2 ,, utn )  t ]  ,

(3.5)
where W ( t 1 ) [t  1, f t ( x, ut1 , ut2 ,, utn )  t ] is the expected optimal cooperative
payoff in the control problem starting at stage t  1 when  tt11  { t11 , t21 ,, tt11 }
occurs and xt 1  f t ( x, ut1 , ut2 ,, utn )  t .
An optimal solution for the stochastic control problem (2.1) and (3.1) can be
characterized as follows.
Theorem 3.1.
A set of controls { ut( t )i*   t( t ) i * ( x )  ˆ i , for  t  {1,2,,t } ,
t  {1,2,, T } and i  N } provides an optimal solution to the stochastic control
problem (2.1) and (3.1) if there exist functions W ( t ) (t , x) , for  t  {1,2,,t } and
t  {1,2,, T } , such that the following recursive relations are satisfied:
n
W ( T ) (T  1, x)   q j ( x) ,
j 1
11
W ( T ) (T , x) 

ET 
(T ) n
,, u T

n

max
( ) 2
u T(  T )1 , u T
T
j 1
gTj [ x, uT(T )1 , uT(T ) 2 , , uT( T ) n ; T T ]

 W ( T 1 ) [T  1, fT ( x, uT( T )1 , uT( T ) 2 ,, uT( T ) n )  T ]  ,

W ( t ) (t , x) 

(  )1
ut t ,ut t
 t 1



t 1 1

E


t
( ) n
,, u t t

max
( ) 2
t 1
t 1
n

j 1
gtj [ x, ut( t )1 , ut( t ) 2 ,, ut( t ) n ;  t t ]

W ( t 1 ) [t  1, f t ( x, ut( t )1 , ut( t ) 2 ,, ut( t ) n )  t ]  ,

for  t  {1,2,,t } and t  {1,2,, T  1} .
Proof.
(3.6)
The results in (3.6) characterizing the optimal solution in stage T is proved
in Lemma 3.1. Invoking the specification of the control problem starting in stage
t  {1,2,, T  1} as expressed in (3.5), the results in (3.6) satisfy the optimality
conditions in stochastic dynamic programming. Therefore, an optimal solution of the

stochastic control problem (2.1) and (3.1) is characterized.
Theorem 3.1 is the discrete-time analog of the optimal cooperative scheme in
randomly furcating stochastic differential games in Petrosyan and Yeung (2007).
Substituting the optimal control { k( k ) i * ( x) , for k  {1,2,T } and i  N } into
the state dynamics (2.1), one can obtain the dynamics of the cooperative trajectory as:
xk 1  f k ( xk , k( k )1* ( xk ), k( k ) 2* ( xk ),, k( k ) n* ( xk ))  k ,
if  k k occurs,
(3.7)
for k  {1,2,, T } ,  k  {1,2,,k } and x1  x 0 .
We use X k* to denote the set of realizable values of x k* at stage k generated
by (3.7). The term x k*  X k* is used to denote an element in X k* .
The term W ( k ) (k , xk* ) gives the expected total cooperative payoff over the
stages from k to T if  k k occurs and x k*  X k* is realized at stage k .
3.2. Individual Rationality
The players then have to agree to an optimality principle in distributing the total
cooperative payoff among themselves. For individual rationality to be upheld the
expected payoffs a player receives under cooperation have to be no less than his
 
expected noncooperative payoff along the cooperative state trajectory xk*
T 1
k 1
. For
instance, (i) the players may share the excess of the total expected cooperative payoff
12
over the expected sum of individual noncooperative payoffs equally, or (ii) they may
share the total expected cooperative payoff proportional to their expected
noncooperative payoffs.
 [ ( k )1 (k , xk* ),  ( k ) 2 (k , xk* ),, ( k ) n (k , xk* )]
 ( ) (k , xk* )
Let
k
denote
the
imputation vector guiding the distribution of the total expected cooperative payoff
under the agreed-upon optimality principle along the cooperative trajectory given that
 k has occurred in stage k , for  k  {1,2,,k } and k  {1,2,, T } . In particular,
k
the imputation  ( k )i (k , xk* ) gives the expected cumulative payments that player i will
receive from stage k to stage T  1 under cooperation.
If for example, the optimality principle specifies that the players share the
excess of the total cooperative payoff over the sum of individual noncooperative
payoffs equally, then the imputation to player i becomes:
1
n 
n
 ( )i (k , xk* ) = V ( )i (k , xk* ) + W ( ) (k , xk* )  V (
k
k
k
j 1
k
)j

(k , xk* ) ,

(3.8)
for i  N and k  {1,2,, T } .
For individual rationality to be maintained throughout all the stages
k  {1,2,, T } , it is required that:
 ( )i (k , xk* )  V ( )i (k , xk* ) , for i  N ,  k {1,2,,k } and k  {1,2,, T } .
k
k
(3.9)
To satisfy group optimality, the imputation vector has to satisfy
W ( k ) (k , xk* ) =
n

(
j 1
k
)j
(k , xk* ) ,
for  k  {1,2,,k } and k  {1,2,, T } .
(3.10)
Hence, a valid imputation  ( k )i (k , xk* ) , for i  N ,  k  {1,2,,k } and
k  {1,2,, T } , has to satisfy conditions (3.9) and (3.10).
4. Subgame Consistent Solutions and Payment Mechanism
To guarantee dynamical stability in a stochastic dynamic cooperation scheme, the
solution has to satisfy the property of subgame consistency in addition to group
optimality and individual rationality.
Under a subgame-consistent cooperative
solution an extension of the solution policy to any subgame starting at a later time
with any feasible state brought about by prior optimal behavior would remain optimal.
In particular, subgame consistency ensures that as the game proceeds players are
13
guided by the same optimality principle at each stage of the game, and hence do not
possess incentives to deviate from the previously adopted optimal behavior. Therefore
for subgame consistency to be satisfied, the imputation according to the original
optimality principle has to be maintained at all the T stages along the cooperative
trajectory  xk* k 1 . In other words, the imputation
T
 ( ) (k , xk* )  [ ( )1 (k , xk* ),  ( ) 2 (k , xk* ),, ( ) n (k , xk* )] ,
k
k
k
k
(4.1)
for  k  {1,2,,k } , xk*  X k* and k  {1,2,, T } , has to be upheld.
4.1. Payoff Distribution Procedure
Following the analysis of Yeung and Petrosyan (2010), we formulate a Payoff
Distribution Procedure (PDP) so that the agreed imputations (4.1) can be realized. Let
Bk( k )i ( xk* ) denote the payment that player i will received at stage k under the
cooperative agreement if  k k  { k1 , k2 ,, k k } occurs and x k*  X k* is realized at
stage k  {1,2,, T } . The payment scheme {Bk( k ) i ( xk* ) contingent upon the event  k k
and state x k* , for k  {1,2,, T }} constitutes a PDP in the sense that the imputation to
player i over the stages 1 to T  1 can be expressed as:
 ( )i (1, x1(0) )  B1( )i ( x1(0) ) + E
1
1


2 ,, T ;1 , 2 ,,T 

T


2
(  ) i
B

( x* )  q i ( xT* 1 )  ,


(4.2)
for i  N .
However, according to the original optimality principle in (4.1), if  k k
occurs and x k*  X k* is realized at stage k the imputation to player i is  ( k )i (k , xk* ) .
For subgame consistency to be satisfied, the imputation according to the original
optimality principle has to be maintained at all the T stages along the cooperative
trajectory  xk* k 1 . Therefore to guarantee subgame consistency, the payment scheme
T
{Bk( k )i ( xk* )} in (4.2) has to satisfy the conditions
 ( )i (k , xk* )  Bk( )i ( xk* ) + E
k
k


k 1 , k  2 ,, T ; k , k 1 ,,T 

T


 k 1
(  ) i
B

( x* )  q i ( xT* 1 ) 


(4.3)
for i  N and k  {1,2,, T } .
Using (4.3) one can obtain  ( T 1 )i (T  1, xT* 1 ) equals qi ( xT* 1 ) with probability
1. Crucial to the formulation of a subgame consistent solution is the derivation of a
14
payment scheme {Bk( k ) i ( xk* ) , for
i  N ,  k  {1,2,, k } , xk*  X k* and
k  {1,2,, T } } so that the imputation in (4.3) can be realized. This will be done in
the sequel.
For notational convenience in deriving such a payment scheme, we denote the
vector [ k( k )1* ( x), k( k ) 2* ( x), , k( k ) n* ( x)] by  k( k )* ( x) . A theorem deriving a
subgame consistent PDP can be established as follows.
Theorem 4.1.
A payment equaling
Bk( k )i ( xk* )   ( k )i (k , xk* )

 Ek 

 k 1



k 1
k 1
k 1 1
 ( ) i
  k 1 [k  1, f k ( xk* , k( k )* ( xk* ))  k ]


 
 ,
 
 
(4.4)
for i  N ,
given to player i at stage k  {1,2,, T } , if  k k occurs and x k*  X k* , would lead to
the realization of the imputation in (4.3).
Proof. Following (4.3) one can express the term  ( k 1 )i (k  1, xk*1 ) as
 (
k 1 ) i
(k  1, xk*1 )  Bk(1k 1 ) i ( xk* 1 )

+ E k 2 , k 3 ,, T ;k 2 ,k 3 ,,T 


T


(  ) i
k 2
B

( x* )  q i ( xT* 1 )  .


(4.5)
Using (4.5), one can obtain

E k 1 , k 2 ,, T ;k ,k 1 ,,T 



 Ek 

 k 1



k 1 1
k 1
k 1
T


 k 1
(  ) i
B

( x* )  q i ( xT* 1 ) 


 ( ) i *
 Bk 1k 1 ( xk 1 )


+ E k 2 , k 3 ,, T ;k 2 ,k 3 ,,T 



 Ek 

 k 1



k 1 1
k 1
k 1
T


(  ) i
B
k 2
  
( x* )  q i ( xT* 1 )   
  
  
 ( ) i
  k 1 [k  1, f k ( xk* , k( k )* ( xk* ))  k ]



Substituting the term E k 1 , k 2 ,, T ;k ,k 1 ,,T 


(4.2) by the last expression in (4.6) yields:
15
T


 k 1
 
 .
 
 
(  ) i
B
(4.6)

( x* )  q i ( xT* 1 ) 


in
 ( )i (k , xk* )  Bk( )i ( xk* )
k
k

 Ek 

 k 1



 k 1   (
k 1
 k 1 1
k 1 ) i
 
[k  1, f k ( xk* , k( k )* ( xk* ))  k ]   .
 
 
(4.7)
By re-arranging terms in (4.7) one can obtain:
Bk( k )i ( xk* )   ( k )i (k , xk* )

 Ek 

 k 1



k 1 1
k 1
k 1
 ( ) i
  k 1 [k  1, f k ( xk* , k( k )* ( xk* ))  k ]


 
 ,
 
 
(4.8)
for i  N and k  {1,2,, T } ;
which is condition in (4.3). Hence Theorem 4.1 follows.

For a given imputation vector
 ( ) (k , xk* )  [ ( )1 (k , xk* ),  ( ) 2 (k , xk* ),, ( ) n (k , xk* )] ,
k
k
k
k
for  k  {1,2,,k } and k  {1,2,, T } ,
Theorem 4.1 can be used to derive the PDP that leads to the realization this vector.
4.2. Transfer Payments
When all players are using the cooperative strategies, the payoff that player i will
directly received at stage k given that x k*  X k* and  k k occurs becomes
g ki [ xk* , k( k )1* ( xk* ), k( k ) 2* ( xk* ), , k( k ) n* ( xk* ); k k ] .
However, according to the agreed upon imputation, player i is to received Bk( k )i ( xk* )
at stage k as given in Theorem 4.1. Therefore a side-payment
 k( )i ( xk* )  Bk( )i ( xk* )  g ki [ xk* , k(
k
k
k )1*
( xk* ), k( k ) 2* ( xk* ), , k( k ) n* ( xk* ); k k ] ,
(4.9)
for k  {1,2,, T } and i  N ,
will be given to player i to yield the cooperative imputation  i (k , xk* ) .
5. Concluding Remarks
This paper considers subgame-consistent cooperative solutions in randomly furcating
stochastic dynamic games. This new approach widens the application of cooperative
stochastic differential game theory to problems where future environments are not
known with certainty. The extension of the analysis in continuous-time randomly
furcating stochastic differential games to an analysis in discrete time is not just of
theoretical interest but also for practical reasons in applications. In computer
16
modeling and operations research discrete-time analysis often proved to be more
applicable and compatible with actual data. In the process of obtaining the main
results for subgame consistent solution, Nash equilibrium for randomly furcating
stochastic dynamic games and optimal control for randomly furcating stochastic
dynamic problems are also derived.
The analysis presented can be expanded in a couple of directions. First the
random event  k may take up more complex processes, like a branching process.
Second, the analysis can be readily applied to derive dynamically consistent
solutions in randomly-furcating dynamic games in which the random variables k are
not present. In particular, the objective that player i seeks to maximize becomes

E1 , 2 ,, T 

T

k 1

g ki [ xk , u1k , uk2 ,, ukn ; k ]  qi ( xT 1 )  ,

for i  N ,
(5.1)
subject to the deterministic dynamics:
xk 1  f k ( xk , u1k , uk2 ,, ukn ) .
(5.2)
Following the analysis in Sections 3 and 4 and the proof of Theorem 4.1, a
theorem deriving a subgame consistent PDP for the randomly-furcating dynamic
game (6.1)-(6.1) can be established as follows.
Theorem 5.1.
A payment equaling
Bk( k )i ( xk* )   ( k )i (k , xk* )



 k 1



k 1 1
k 1
k 1
 ( ) i
  k 1 [k  1, f k ( xk* , k( k )* ( xk* ))  k ]


 
 ,
 
 
(6.3)
for i  N ,
given to player i at stage k  {1,2,, T } , if  k k occurs and x k*  X k* , would yield the
PDP leading to a subgame consistent solution of the game (6.1)-(6.2).
Finally, since this is the first time that subgame consistent solution is derived
for randomly furcating dynamic games, further research along this line is expected.
Acknowledgements
This research was supported by the Research Grant of HKSYU and the EU TOCSIN
Project.
17
References
Basar, T.: Decentralized Multicriteria Optimization of Linear Stochastic Systems.
IEEE Transactions on Automatic Control, AC-23, 233-243 (1978)
Basar, T.: Information Structures and Equilibria in Dynamic Games. In: M. Aoki and
A. Marzollo (eds) New Trends in Dynamic System Theory and Economics. New
York and London, Academic Press, 3-55 (1979)
Basar, T., Ho, Y.C.: Informational Properties of The Nash Solutions of Two
Stochastic Nonzero-sum Games. Journal of Economic Theory, 7, 370-387 (1974)
Basar, T., Mintz, M.: On the Existence of Linear Saddle-Point Strategies for a Twoperson Zero-Sum Stochastic Game. Proceedings of the IEEE 11th Conference on
Decision and Control, New Orleans, LA, 1972, IEEE Computer Society Press,
Los Alamitos, CA, pp.188-192 (1972)
Basar, T., Mintz, M.: A Multistage pursuit-evasion Game that Admits a Gaussian
Random Process as a Maximum Control Policy. Stochastics, 1: 25-69 (1973)
Basar, T. and Olsder, G.J.: Dynamic Noncooperative Game Theory, Philadelphia,
SIAM (1999)
Bellman, R.: Dynamic programming, Princeton University Press, Princeton, NJ,
(1957)
Esteban-Bravo, M., Nogales, F.J.: Solving Dynamic Stochastic Economic Models by
Mathematical Programming Decomposition Methods. Computer and Operations
Research, 35: 226-240 (2008)
Nash, J.: Non-cooperative Games. Annals of Mathematics, 54(2), 286-295 (1951)
Petrosyan, L.A. and Yeung, D.W.K.: Subgame-consistent Cooperative Solutions in
Randomly-furcating Stochastic Differential Games, International Journal of
Mathematical and Computer Modelling (Special Issue on Lyapunov’s Methods
in Stability and Control), 45:1294-1307. (2007)
Smith, A.E., Zenou, Y.: A Discrete-Time Stochastic Model of Job Matching. Review
of Economic Dynamics, 6(1): 54-79 (2003)
Yeung, D.W.K.: Infinite Horizon Stochastic Differential Games with Branching
Payoffs, Journal of Optimization Theory and Applications, 111(2): 445-460
(2001)
Yeung, D.W.K.:
Randomly Furcating Stochastic Differential Games, in L.A.
Petrosyan and D.W.K. Yeung (eds.): ICM Millennium Lectures on Games,
Springer-Verlag, Berlin, pp.107-126. ISBN: 3-540-00615-X (2003)
18
Yeung, D.W.K. and Petrosyan, L.A.: Subgame Consistent Cooperative Solutions in
Stochastic Differential Games, Journal of Optimization Theory Applications,
120: 651-666 (2004)
Yeung, D.W.K. and Petrosyan, L.A.: Subgame Consistent Solution of A Cooperative
Stochastic Differential Game With Nontransferable Payoffs, Journal of
Optimization Theory and Applications, 124(3): 701-724, (2005)
Yeung, D.W.K. and Petrosyan, L.A.: Cooperative Stochastic Differential Games,
Springer-Verlag, New York (2006)
Yeung, D.W.K. and Petrosyan, L.A.: Subgame Consistent Solutions for cooperative
Stochastic Dynamic Games. Journal of Optimization Theory and Applications,
145(3): 579-596 (2010)
19
Download