Consider the general nation nonzero

Subgame-consistent Solutions for Cooperative Stochastic Games with Randomly Furcating Payoffs LEON A. PETROSYAN Faculty of Applied Mathematics-Control Processes, St Petersburg State University St Petersburg, 198904, Russia spbuoasis7@petrlink.ru DAVID W.K. YEUNG Center of Game Theory, St Petersburg State University, St Petersburg, 198904, Russia SRS Consortium for Advanced Study in Cooperative Dynamic Games Hong Kong Shue Yan University, Hong Kong dwkyeung@hksyu.edu Abstract – In cooperative stochastic dynamic games a stringent condition subgame consistency – is required for a dynamically stable solution. In particular, a cooperative solution is subgame consistent if an extension of the solution policy to a situation with a later starting time and any feasible state brought about by prior optimal behavior would remain optimal. The paradigm of randomly-furcating stochastic games incorporates additional stochastic elements via randomly branching payoff structures in stochastic games. This paper considers subgame consistent solutions for cooperative stochastic games with randomly furcating payoff structures. Analytically tractable payoff distribution procedures contingent upon specific random realizations of the state and payoff structure are derived. This is the first time that subgame consistent solution for randomly-furcating stochastic games is obtained. It widens the application of cooperative dynamic game theory to discrete-time problems where the evolution of the state and future environments are not known with certainty. Keywords – Cooperative stochastic games, subgame consistency, randomly furcating payoff structures. AMS Subject Classifications -- Primary 91A12, Secondary 91A25. 1 1. Introduction The discipline of game theory studies decision making in an interactive environment. One particularly complex and fruitful branch of game theory is dynamic games, which investigate interactive decision making over time. A property of decision making over time is that the future is inherently unknown and therefore (in the mathematical sense) uncertain. Yeung (2001 and 2003) introduces the class of randomly furcating stochastic differential games which allows stochastic dynamics and random changes in future payoff structures. This extension allows random elements in future payoffs which are prevalent in many practical situations like regional economic cooperation, corporate joint ventures and environmental management. Cooperative games suggest the possibility of socially optimal and group efficient solutions to decision problems involving strategic action. In dynamic cooperative games with stochastic elements, a stringent condition – that of subgame consistency -- is required for a dynamically stable cooperative solution. In particular, a cooperative solution is subgame-consistent if an extension of the solution policy to a situation with a later starting time and any feasible state brought about by prior optimal behavior would remain optimal. In particular the property of subgame consistency ensures that as the game proceeds players are guided by the same optimality principle at each instant of time, and hence do not possess incentives to deviate from the previously adopted optimal behavior throughout the game. A rigorous framework for the study of subgame-consistent solutions in cooperative stochastic differential games was established in the work of Yeung and Petrosyan (2004, 2005 and 2006). A generalized theorem was developed for the derivation of an analytically tractable “payoff distribution procedure” which would lead to subgameconsistent solutions. Petrosyan and Yeung (2007) derived subgame consistent solutions for randomly furcating continuous-time stochastic differential games. In computer modeling and operations research study discrete-time analysis often proved to be more applicable and compatible with actual data. In stochastic dynamic games, Basar and Mintz (1972 and 1973) and Basar (1978) developed equilibrium solution of linear-quadratic stochastic dynamic games with noisy observation. Basar and Ho (1974) and Basar (1979) examined informational properties of the Nash solutions of stochastic nonzero-sum games. Some applications 2 of stochastic dynamic games can be found in Smith and Zenou (2003) and EstebanBravo and Nogales (2008). Yeung and Petrosyan (2010) derived subgame consistent solutions for cooperative stochastic dynamic games. This paper develops the class of randomly furcating discrete-time stochastic games. The Nash equilibrium is characterized for non-cooperative outcomes and subgame-consistent cooperative solutions is derived for the cooperative paradigm. A discrete-time analytically tractable payoff distribution procedures contingent upon specific random realizations of the state and payoff structure are derived. This is the first time that subgame consistent solutions of randomly-furcating cooperative stochastic dynamic games are examined. This new approach widens the application of cooperative differential game theory to discrete-time problems where the evolution of the state and future environments are not known with certainty. The paper is organized as follows. The game formulation is given in Section 2. Group optimality and individual rationality under dynamic cooperation are discussed in section 3. Subgame consistent solutions and payment mechanism leading to the realization of these solutions are obtained in Section 4. Concluding remarks are provided in Section 5. 2. Game Formulation and Outcome Consider the T  stage n  person nonzero-sum discrete-time dynamic game with initial state x 0 . The state space of the game is X  R m and the state dynamics of the game is characterized by the stochastic difference equation: xk 1  f k ( xk , u1k , uk2 ,, ukn )  k , (2.1) for k  {1,2,, T } and x1  x 0 , where uki U i  Rmi is the control vector of player i at stage k , xk  X is the state, and k is a sequence of statistically independent random variables. The payoff of player i at stage k is g ki [ xk , u1k , uk2 ,, ukn ; k ] which is affected by a random variable  k . In particular,  k for k  {1,2,, T } are independent random variables with range { k1 ,  k2 , ,  kk } and corresponding probabilities {1k , 2k ,  , kk } . In stage 1, it is known that 1 equals 11 with probability 11  1. The objective that player i seeks to maximize is 3  E1 , 2 ,, T ;1 ,2 ,,T   T  k 1  g ki [ xk , u1k , uk2 ,, ukn ; k ]  qi ( xT 1 )  ,  for i  {1,2,  , n}  N , (2.2) where E1 , 2 ,, T ;1 ,2 ,,T is the expectation operation with respect to the random variables 1 , 2 , ,T and 1 ,2 ,,T , and qi ( xT 1 ) is a terminal payment given at stage T  1 . The payoffs of the players are transferable. The game (2.1)-(2.2) is a randomly furcating stochastic dynamic game. In a stochastic dynamic game framework, strategy space with state-dependent property has to be considered. In particular, a pre-specified class  i of mapping t( t ) i () : X  U i with the property ut( t )i  t( t ) i ( x) , for i  N and  t  {1,2,,t } and t  {1,2,, T } , is the strategy space player i and each of its elements is a permissible strategy. To solve the game, we invoke the principle of optimality in Bellman’s (1957) technique of dynamic programming and begin with the subgame starting at last operating stages, that is stage T . If TT {T1 ,T2 ,,TT } has occurred at stage T and the state xT  x , the subgame becomes:  i  1 2 n i E max  gT [ x, uT , uT ,, uT ;T T ]  q ( xT 1 )  i T uT    , for i  N ,  subject to xT 1  fT ( x, uT1 , uT2 ,, uTn )  T . (2.3) A set of state-dependent strategies { T( T )i* ( x)  i , for i  N } constitutes a Nash equilibrium solution to the subgame (2.3) if the following conditions are satisfied:  V ( T )i (T , x) = ET  gTi [ x,T( T )1* ( x),T( T ) 2* ( x),,T( T ) n* ( x);T T ]  qi ( xT 1 )       ET  gTi [ x,T( T )1* ( x), T( T ) 2* ( x),,T( T )i 1* ( x), T( T )i* ( x), T( T )i 1* ( x),   ,T( T ) n* ( x);T T ]  qi ( ~ xT 1 )  , for i  N ,  where xT 1  fT [ x,T( T )1* ( x), T( T ) 2* ( x),,T( T ) n* ( x)]  T ~ xT 1  fT [ x,T( T )1* ( x), T( T ) 2* ( x),,T( T )i 1* ( x), T( T )i ( x), T( T )i 1* ( x), 4 ,T( T ) n* ( x)]  T . A Nash equilibrium of the subgame (2.3) can be characterized by the following lemma. Lemma 2.1. A set of strategies {uT( T )i*  T( T )i* ( x)  i , for i  N } provides a Nash equilibrium solution to the subgame (2.3) if there exist functions V ( T )i (T , x) , for i  N , such that the following conditions are satisfied:  i ( )1* ( ) 2* ( ) i 1* E ( x), uT( T )i ,T( T )i 1* ( x), V ( T )i (T , x)  max  gT [ x,T T ( x), T T ( x),,T T  T (T ) i uT   ~ ,T( T ) n* ( x) ;T T ]  V ( T 1 )i [T  1, fT( T )i ( x,T( T )i ( x))  T ]  ,  V ( T )i (T  1, x)  qi (x) ; for i  N , (2.4) ~ where fTi ( x,T( T ) i ( x))  fT [ x,T( T )1* ( x),T( T ) 2* ( x),,T( T )i 1* ( x), T( T )i ( x),T( T )i 1* ( x), ,T( T ) n ( x)] . Proof. The system of equations in (2.4) satisfies the standard discrete-time stochastic dynamic programming property and the Nash property for each player i  N . Hence a Nash (1951) equilibrium of the subgame (2.3) is characterized. One can see the detailed proof of the results in Theorem 6.10 in Basar and Olsder (1999).  For the sake of exposition, we sidestep the issue of multiple equilibria and focus on playable games in which a particular noncooperative Nash equilibrium is chosen by the players in the subgame. Using Lemma 2.1, one can characterize the value functions V ( T )i (T , x) for all  T {1,2,,T } if they exist. In particular, V ( T )i (T , x) yields player i ’s expected game equilibrium payoff in the subgame starting at stage T given that TT occurs and xT  x . Then we proceed to the subgame starting at stage T  1 when TT11 {T1 1 ,T21 ,,TT 11 } occurs and xT 1  x . In this subgame which player i  N seeks to maximize his expected payoff  E T ;T 1 ,T  gTi 1[ x, uT1 1 , uT21 ,, uTn1;TT11 ]    gTi [ xT , uT1 , uT2 ,, uTn ;T ]  qi ( xT 1 )   5  E  i  1 2 n  gT 1[ x, uT 1 , uT 1 ,, uT 1;T T11 ]  T 1 ,T  T    T  gTi [ xT , uT1 , uT2 ,, uTn ;T T ]  qi ( xT 1 )  , for i  N ,  T T 1 (2.5) subject to xk 1  f k ( xk , u1k , uk2 ,, ukn )  k , for k  {T  1, T } and xT 1  x . (2.6) If the functions V ( T )i (T , x) for all  T {1,2,,T } characterized in Lemma 2.1 exist, the subgame (2.5)-(2.6) can be expressed as a game in which player i seeks to maximize the expected payoff  ET 1  gTi 1[ x, uT1 1 , uT2 1 ,, uTn 1; TT11 ]   T    T 1 T T  V ( T )i [T , fT 1 ( x, uT1 1 , uT2 1 ,, uTn 1 )  T 1 ]  , for i  N ,  (2.7) using his control uTi 1 . A set of strategies { T(1T 1 )i* ( x)  i , for i  N } constitutes a Nash equilibrium solution to the subgame (2.7) if the following conditions are satisfied:  V ( T 1 )i (T  1, x) = ET 1  gTi 1[ x,T(1T 1 )1* ( x),T(1T 1 ) 2* ( x),,T(1T 1 ) n* ( x);TT11 ]   T    T 1 T T  V ( T )i [T , fT 1[ x,T(1T 1 )1* ( x), T(1T 1 ) 2* ( x),,T(1T 1 ) n* ( x)]  T 1     ET 1  gTi 1[ x,T(1T 1 )1* ( x),T(1T 1 ) 2* ( x),,T(1T 1 )i 1* ( x),T(1T 1 )i ( x),T(1T 1 )i 1* ( x),  ( T 1 ) n* T 1 ,  T 1 ( x);T 1 ]  T    T 1 T T  ~ V ( T )i [T , fTi1 ( x,T(1T 1 )i ( x))  T 1 ]  , for i  N , (2.8)  ~ where fTi1 ( x,T(1T 1 ) i ( x))  fT 1[ x,T(1T 1 )1* ( x),T(1T 1 ) 2* ( x),,T(1T 1 )i 1* ( x),T(1T 1 )i ( x), T(1T 1 )i 1* ( x), ,T(1T 1 ) n* ( x)] . A Nash equilibrium of the subgame (2.7) can be characterized by the following lemma. Lemma 2.2. A set of strategies {uT(T11 )i*  T(1T 1 )i* ( x)  i , for i  N } provides a Nash equilibrium solution to the subgame (2.7) if there exist functions V ( T )i (T , xT ) 6 for i  N and  T  {1,2,,T } characterized in Lemma 2.1, and functions V ( T 1 )i (T  1, x) , for i  N , such that the following conditions are satisfied:  i ( T 1 )1* E ( x), T(1T 1 ) 2* ( x), V ( T 1 )i (T  1, x)  max T 1  gT 1[ x,T 1 u T( T1 1 ) i  ,T(1T 1 )i 1* ( x), uT(T11 )i ,T(1T 1 )i 1* ( x), ,T(1T 1 ) n* ( x) ;TT11 ]  T    T 1 T T  ~ V ( T )i [T , fT(1T 1 )i ( x, uT(T11 )i )  T 1 ]  ,  for i  N . (2.9) ~ where fTi1 ( x, uT(T11 ) i )  fT 1[ x,T(1T 1 )1* ( x), T(1T 1 ) 2* ( x),,T(1T 1 )i 1* ( x), uT(T11 )i ,T(1T 1 )i 1* ( x), ,T(1T 1 ) n* ( x)] , Proof. The conditions in Lemma 2.1 and the system of equations in (2.9) satisfies the standard discrete-time stochastic dynamic programming property and the Nash property for each player i  N . Hence a Nash equilibrium of the subgame (2.7) is characterized.  In particular, V ( T 1 )i (T  1, x) , if exists, yields player i ’s expected game equilibrium payoff in the subgame starting at stage T  1 given that TT11 occurs and xT 1  x . Consider the subgame starting at stage t  {T  2, T  3,,1} when  t  {t1 ,t2 ,,t } occurs and xt  x , in which player i  N maximizes his t t expected payoff  E t 1 ;t ,t 1 ,,T  gti [ x, ut1 , ut2 ,, utn ;t t ]   gi [ x , u1 , u2 ,, un ; ]  qi ( xT 1 )  , for i  N ,  (2.10) xk 1  f k ( xk , u1k , uk2 ,, ukn )  k , for k  {t , t  1,, T } and xt  x . (2.11)  T   t 1 subject to Following the above analysis, the subgame (2.10)-(2.11) can be expressed as a game in which player i  N maximizes his expected payoff  Et  gti [ x, ut1 , ut2 ,, utn ;t t ]  7   t 1    t 1 t 1 t 1 1  V ( t 1 )i [t  1, ft ( x, ut1 , ut2 ,, utn )  t ]  , for i  N ,  (2.12) with his control uti , where V ( t 1 )i [t  1, f t ( x, ut1 , ut2 ,, utn )  t ] is player i ’s expected game equilibrium payoff in the subgame starting at stage t  1 given that  tt11 occurs and xT 1  f t ( x, ut1 , ut2 ,, utn )  t . A set of strategies { t( t ) i * ( x)  i , for i  N } constitutes a Nash equilibrium solution to the subgame (2.12) if the following conditions are satisfied:  V ( t )i (t , x) = Et  gti [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t ) n* ( x);t t ]    t 1    t 1 1 t 1 t 1 V ( t 1 )i [t  1, f t [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t ) n* ( x)]  t      Et  gti [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t )i 1* ( x), t( t )i ( x), t( t )i 1* ( x),  , t( t ) n* ( x); t t ]   t 1    t 1 1 t 1 t 1  ~ V ( t 1 )i [t  1, ft i ( x,t( t )i ( x))  t ]  ,  where ~i ft ( x,t( t )i ( x))  f t [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t )i 1* ( x), t( t )i ( x), t( t )i 1* ( x), ,t( t ) n* ( x)] . A Nash equilibrium solution for the game (2.1)-(2.2) can be characterized as follows. Theorem 2.1. A set of strategies {ut( t ) i *  t( t ) i * ( x)  i , for  t  {1,2,,t } , t  {1,2,, T } and i  N } constitutes a Nash equilibrium solution to the game (2.1)- (2.2) if there exist functions V ( t )i (t , x) , for  t  {1,2,,t } , t  {1,2,, T } and i  N , such that the following recursive relations are satisfied: V ( T )i (T  1, x)  qi (x) ,  i ( T )1* E ( x), T( T ) 2* ( x),,T( T )i 1* ( x), uT( T )i ,T( T )i 1* ( x), V ( T )i (T , x)  max T  gT [ x,T (T ) i uT  8  ~ ,T( T ) n* ( x) ;T T ]  V ( T 1 )i [T  1, fT( T )i ( x, uT( T )i )  T ]  ,   i ( t )1* ( x), t( t ) 2* ( x),,t( t )i 1* ( x), ut( t )i ,t( t )i 1* ( x), E V ( t )i (t , x)  max  t  g t [ x, t ( ) i ut t  ,t( t ) n* ( x ) ; t t ]   t 1    t 1 1 t 1 t 1  ~ V ( t 1 )i [t  1, ft ( t )i ( x, ut( t )i )  t ]  ;  for  t  {1,2,,t } , t  {1,2,, T  1} and i  N , (2.13) ~ where ft i ( x, ut( t )i )  f t [ x,t( t )1* ( x), t( t ) 2* ( x),,t( t )i 1* ( x), ut( t )i ,t( t )i 1* ( x), ,t( t ) n* ( x)] ; for t  {1,2,, T } . Proof. The results in (2.13) characterizing the game equilibrium in stage T and stage T  1 are proved in Lemma 2.1 and Lemma 2.2. Invoking the subgame in stage t  {1,2,, T  2} as expressed in (2.12), the results in (2.13) satisfy the optimality conditions in stochastic dynamic programming and the Nash equilibrium property for each player in each of these subgames. Therefore, a feedback Nash equilibrium of the  game (2.1)-(2.2) is characterized. Theorem 2.1 is the discrete-time analog of the Nash equilibrium in the continuous-time randomly furcating stochastic differential games in Yeung (2001 and 2003) and Petrosyan and Yeung (2007). 3. Dynamic Cooperation Now consider the case when the players agree to cooperate and distribute the joint payoff among themselves according to an optimality principle. Two essential properties that a cooperative scheme has to satisfy are group optimality and individual rationality. Group optimality ensures that all potential gains from cooperation are captured. Failure to fulfill group optimality leads to the condition where the participants prefer to deviate from the agreed upon solution plan in order to extract the unexploited gains. Individual rationality is required to hold so that the payoff allocated to an economic agent under cooperation will be no less than its noncooperative payoff. Failure to guarantee individual rationality leads to the 9 condition where the concerned participants would reject the agreed upon solution plan and play noncooperatively. 3.1. Group Optimality Maximizing the players’ expected joint payoff guarantees group optimality in a game where payoffs are transferable. To maximize their expected joint payoff the players have to solve the discrete-time stochastic dynamic programming problem of maximizing  E1 , 2 ,, T ;1 ,2 ,,T   n  j 1  j 1 2 n  g k [ xk , uk , uk ,, uk ; k ]  k 1   T  n j    q ( xT 1 )  j 1   (3.1)  subject to (2.1). The stochastic control problem (2.1)-(3.1) can be regarded as a single-player case of the game problem (2.1)-(2.2) with n  1 and the payoff being the sum of the all the players’ payoffs. In a stochastic dynamic framework, again strategy space with state-dependent property has to be considered. In particular, a pre-specified class ̂ i of mapping  t( t )i () : X  U i with the property ut( t )i   t( t ) i ( x ) , for i  N and  t  {1,2,,t } and t  {1,2,, T } , is the strategy space player i and each of its elements is a permissible strategy. To solve the dynamic programming problem (2.1) and (3.1), we first consider the problem starting at stage T . If TT {T1 ,T2 ,,TT } has occurred at stage T and the state xT  x , the control problem becomes: max 1 2 u T , uT ,, u Tn  ET   n  gTj [ x, uT1 , uT2 ,, uTn ;T T ]   q j ( xT 1 )   j 1 n  j 1 (3.2) subject to xT 1  fT ( x, uT1 , uT2 ,, uTn )  T . (3.3) An optimal solution to the stochastic control problem (3.2)-(3.3) can be characterized by the following lemma. Lemma 3.1. A set of controls { uT( T )i *   T( T )i* ( x)  ˆ i , for i  N } provides an optimal solution to the control problem (3.2)-(3.3) if there exist functions W ( T 1 ) (T , x) , for i  N , such that the following conditions are satisfied: W ( T ) (T , x)   E   T ,, u T(  T ) n  max ( ) 2 u T(  T )1 , u T T n  j 1 gTj [ x, uT(T )1 , uT(T ) 2 , , uT( T ) n ; T T ] 10   W ( T 1 ) [T  1, fT ( x, uT( T )1 , uT( T ) 2 ,, uT( T ) n )  T ]  ,  n W ( T ) (T  1, x)   q j ( x) . (3.4) j 1 Proof. The system of equations in (3.4) satisfies the standard discrete-time stochastic dynamic programming property. One can see the detailed proof of the results in Basar and Olsder (1999).  Using Lemma 3.1, one can characterize the functions W ( T ) (T , x) for all TT {T1 ,T2 ,,TT } , if they exist. In particular, W ( T ) (T , x) yields the expected cooperative payoff in the cooperation duration starting at stage T given that TT occurs and xT  x . Following the analysis in Section 2, the control problem starting at stage t when  t t  {t1 ,t2 ,,t t } occurs and xt  x can be expressed as:  max Et  ut   n  j 1 gtj [ x, ut1 , ut2 ,, utn ; t t ]  t 1    t 1 1 t 1 t 1  W ( t 1 ) [t  1, ft ( x, ut1 , ut2 ,, utn )  t ]  ,  (3.5) where W ( t 1 ) [t  1, f t ( x, ut1 , ut2 ,, utn )  t ] is the expected optimal cooperative payoff in the control problem starting at stage t  1 when  tt11  { t11 , t21 ,, tt11 } occurs and xt 1  f t ( x, ut1 , ut2 ,, utn )  t . An optimal solution for the stochastic control problem (2.1) and (3.1) can be characterized as follows. Theorem 3.1. A set of controls { ut( t )i*   t( t ) i * ( x )  ˆ i , for  t  {1,2,,t } , t  {1,2,, T } and i  N } provides an optimal solution to the stochastic control problem (2.1) and (3.1) if there exist functions W ( t ) (t , x) , for  t  {1,2,,t } and t  {1,2,, T } , such that the following recursive relations are satisfied: n W ( T ) (T  1, x)   q j ( x) , j 1 11 W ( T ) (T , x)   ET  (T ) n ,, u T  n  max ( ) 2 u T(  T )1 , u T T j 1 gTj [ x, uT(T )1 , uT(T ) 2 , , uT( T ) n ; T T ]   W ( T 1 ) [T  1, fT ( x, uT( T )1 , uT( T ) 2 ,, uT( T ) n )  T ]  ,  W ( t ) (t , x)   (  )1 ut t ,ut t  t 1    t 1 1  E   t ( ) n ,, u t t  max ( ) 2 t 1 t 1 n  j 1 gtj [ x, ut( t )1 , ut( t ) 2 ,, ut( t ) n ;  t t ]  W ( t 1 ) [t  1, f t ( x, ut( t )1 , ut( t ) 2 ,, ut( t ) n )  t ]  ,  for  t  {1,2,,t } and t  {1,2,, T  1} . Proof. (3.6) The results in (3.6) characterizing the optimal solution in stage T is proved in Lemma 3.1. Invoking the specification of the control problem starting in stage t  {1,2,, T  1} as expressed in (3.5), the results in (3.6) satisfy the optimality conditions in stochastic dynamic programming. Therefore, an optimal solution of the  stochastic control problem (2.1) and (3.1) is characterized. Theorem 3.1 is the discrete-time analog of the optimal cooperative scheme in randomly furcating stochastic differential games in Petrosyan and Yeung (2007). Substituting the optimal control { k( k ) i * ( x) , for k  {1,2,T } and i  N } into the state dynamics (2.1), one can obtain the dynamics of the cooperative trajectory as: xk 1  f k ( xk , k( k )1* ( xk ), k( k ) 2* ( xk ),, k( k ) n* ( xk ))  k , if  k k occurs, (3.7) for k  {1,2,, T } ,  k  {1,2,,k } and x1  x 0 . We use X k* to denote the set of realizable values of x k* at stage k generated by (3.7). The term x k*  X k* is used to denote an element in X k* . The term W ( k ) (k , xk* ) gives the expected total cooperative payoff over the stages from k to T if  k k occurs and x k*  X k* is realized at stage k . 3.2. Individual Rationality The players then have to agree to an optimality principle in distributing the total cooperative payoff among themselves. For individual rationality to be upheld the expected payoffs a player receives under cooperation have to be no less than his   expected noncooperative payoff along the cooperative state trajectory xk* T 1 k 1 . For instance, (i) the players may share the excess of the total expected cooperative payoff 12 over the expected sum of individual noncooperative payoffs equally, or (ii) they may share the total expected cooperative payoff proportional to their expected noncooperative payoffs.  [ ( k )1 (k , xk* ),  ( k ) 2 (k , xk* ),, ( k ) n (k , xk* )]  ( ) (k , xk* ) Let k denote the imputation vector guiding the distribution of the total expected cooperative payoff under the agreed-upon optimality principle along the cooperative trajectory given that  k has occurred in stage k , for  k  {1,2,,k } and k  {1,2,, T } . In particular, k the imputation  ( k )i (k , xk* ) gives the expected cumulative payments that player i will receive from stage k to stage T  1 under cooperation. If for example, the optimality principle specifies that the players share the excess of the total cooperative payoff over the sum of individual noncooperative payoffs equally, then the imputation to player i becomes: 1 n  n  ( )i (k , xk* ) = V ( )i (k , xk* ) + W ( ) (k , xk* )  V ( k k k j 1 k )j  (k , xk* ) ,  (3.8) for i  N and k  {1,2,, T } . For individual rationality to be maintained throughout all the stages k  {1,2,, T } , it is required that:  ( )i (k , xk* )  V ( )i (k , xk* ) , for i  N ,  k {1,2,,k } and k  {1,2,, T } . k k (3.9) To satisfy group optimality, the imputation vector has to satisfy W ( k ) (k , xk* ) = n  ( j 1 k )j (k , xk* ) , for  k  {1,2,,k } and k  {1,2,, T } . (3.10) Hence, a valid imputation  ( k )i (k , xk* ) , for i  N ,  k  {1,2,,k } and k  {1,2,, T } , has to satisfy conditions (3.9) and (3.10). 4. Subgame Consistent Solutions and Payment Mechanism To guarantee dynamical stability in a stochastic dynamic cooperation scheme, the solution has to satisfy the property of subgame consistency in addition to group optimality and individual rationality. Under a subgame-consistent cooperative solution an extension of the solution policy to any subgame starting at a later time with any feasible state brought about by prior optimal behavior would remain optimal. In particular, subgame consistency ensures that as the game proceeds players are 13 guided by the same optimality principle at each stage of the game, and hence do not possess incentives to deviate from the previously adopted optimal behavior. Therefore for subgame consistency to be satisfied, the imputation according to the original optimality principle has to be maintained at all the T stages along the cooperative trajectory  xk* k 1 . In other words, the imputation T  ( ) (k , xk* )  [ ( )1 (k , xk* ),  ( ) 2 (k , xk* ),, ( ) n (k , xk* )] , k k k k (4.1) for  k  {1,2,,k } , xk*  X k* and k  {1,2,, T } , has to be upheld. 4.1. Payoff Distribution Procedure Following the analysis of Yeung and Petrosyan (2010), we formulate a Payoff Distribution Procedure (PDP) so that the agreed imputations (4.1) can be realized. Let Bk( k )i ( xk* ) denote the payment that player i will received at stage k under the cooperative agreement if  k k  { k1 , k2 ,, k k } occurs and x k*  X k* is realized at stage k  {1,2,, T } . The payment scheme {Bk( k ) i ( xk* ) contingent upon the event  k k and state x k* , for k  {1,2,, T }} constitutes a PDP in the sense that the imputation to player i over the stages 1 to T  1 can be expressed as:  ( )i (1, x1(0) )  B1( )i ( x1(0) ) + E 1 1   2 ,, T ;1 , 2 ,,T   T   2 (  ) i B  ( x* )  q i ( xT* 1 )  ,   (4.2) for i  N . However, according to the original optimality principle in (4.1), if  k k occurs and x k*  X k* is realized at stage k the imputation to player i is  ( k )i (k , xk* ) . For subgame consistency to be satisfied, the imputation according to the original optimality principle has to be maintained at all the T stages along the cooperative trajectory  xk* k 1 . Therefore to guarantee subgame consistency, the payment scheme T {Bk( k )i ( xk* )} in (4.2) has to satisfy the conditions  ( )i (k , xk* )  Bk( )i ( xk* ) + E k k   k 1 , k  2 ,, T ; k , k 1 ,,T   T    k 1 (  ) i B  ( x* )  q i ( xT* 1 )    (4.3) for i  N and k  {1,2,, T } . Using (4.3) one can obtain  ( T 1 )i (T  1, xT* 1 ) equals qi ( xT* 1 ) with probability 1. Crucial to the formulation of a subgame consistent solution is the derivation of a 14 payment scheme {Bk( k ) i ( xk* ) , for i  N ,  k  {1,2,, k } , xk*  X k* and k  {1,2,, T } } so that the imputation in (4.3) can be realized. This will be done in the sequel. For notational convenience in deriving such a payment scheme, we denote the vector [ k( k )1* ( x), k( k ) 2* ( x), , k( k ) n* ( x)] by  k( k )* ( x) . A theorem deriving a subgame consistent PDP can be established as follows. Theorem 4.1. A payment equaling Bk( k )i ( xk* )   ( k )i (k , xk* )   Ek    k 1    k 1 k 1 k 1 1  ( ) i   k 1 [k  1, f k ( xk* , k( k )* ( xk* ))  k ]      ,     (4.4) for i  N , given to player i at stage k  {1,2,, T } , if  k k occurs and x k*  X k* , would lead to the realization of the imputation in (4.3). Proof. Following (4.3) one can express the term  ( k 1 )i (k  1, xk*1 ) as  ( k 1 ) i (k  1, xk*1 )  Bk(1k 1 ) i ( xk* 1 )  + E k 2 , k 3 ,, T ;k 2 ,k 3 ,,T    T   (  ) i k 2 B  ( x* )  q i ( xT* 1 )  .   (4.5) Using (4.5), one can obtain  E k 1 , k 2 ,, T ;k ,k 1 ,,T      Ek    k 1    k 1 1 k 1 k 1 T    k 1 (  ) i B  ( x* )  q i ( xT* 1 )     ( ) i *  Bk 1k 1 ( xk 1 )   + E k 2 , k 3 ,, T ;k 2 ,k 3 ,,T      Ek    k 1    k 1 1 k 1 k 1 T   (  ) i B k 2    ( x* )  q i ( xT* 1 )           ( ) i   k 1 [k  1, f k ( xk* , k( k )* ( xk* ))  k ]    Substituting the term E k 1 , k 2 ,, T ;k ,k 1 ,,T    (4.2) by the last expression in (4.6) yields: 15 T    k 1    .     (  ) i B (4.6)  ( x* )  q i ( xT* 1 )    in  ( )i (k , xk* )  Bk( )i ( xk* ) k k   Ek    k 1     k 1   ( k 1  k 1 1 k 1 ) i   [k  1, f k ( xk* , k( k )* ( xk* ))  k ]   .     (4.7) By re-arranging terms in (4.7) one can obtain: Bk( k )i ( xk* )   ( k )i (k , xk* )   Ek    k 1    k 1 1 k 1 k 1  ( ) i   k 1 [k  1, f k ( xk* , k( k )* ( xk* ))  k ]      ,     (4.8) for i  N and k  {1,2,, T } ; which is condition in (4.3). Hence Theorem 4.1 follows.  For a given imputation vector  ( ) (k , xk* )  [ ( )1 (k , xk* ),  ( ) 2 (k , xk* ),, ( ) n (k , xk* )] , k k k k for  k  {1,2,,k } and k  {1,2,, T } , Theorem 4.1 can be used to derive the PDP that leads to the realization this vector. 4.2. Transfer Payments When all players are using the cooperative strategies, the payoff that player i will directly received at stage k given that x k*  X k* and  k k occurs becomes g ki [ xk* , k( k )1* ( xk* ), k( k ) 2* ( xk* ), , k( k ) n* ( xk* ); k k ] . However, according to the agreed upon imputation, player i is to received Bk( k )i ( xk* ) at stage k as given in Theorem 4.1. Therefore a side-payment  k( )i ( xk* )  Bk( )i ( xk* )  g ki [ xk* , k( k k k )1* ( xk* ), k( k ) 2* ( xk* ), , k( k ) n* ( xk* ); k k ] , (4.9) for k  {1,2,, T } and i  N , will be given to player i to yield the cooperative imputation  i (k , xk* ) . 5. Concluding Remarks This paper considers subgame-consistent cooperative solutions in randomly furcating stochastic dynamic games. This new approach widens the application of cooperative stochastic differential game theory to problems where future environments are not known with certainty. The extension of the analysis in continuous-time randomly furcating stochastic differential games to an analysis in discrete time is not just of theoretical interest but also for practical reasons in applications. In computer 16 modeling and operations research discrete-time analysis often proved to be more applicable and compatible with actual data. In the process of obtaining the main results for subgame consistent solution, Nash equilibrium for randomly furcating stochastic dynamic games and optimal control for randomly furcating stochastic dynamic problems are also derived. The analysis presented can be expanded in a couple of directions. First the random event  k may take up more complex processes, like a branching process. Second, the analysis can be readily applied to derive dynamically consistent solutions in randomly-furcating dynamic games in which the random variables k are not present. In particular, the objective that player i seeks to maximize becomes  E1 , 2 ,, T   T  k 1  g ki [ xk , u1k , uk2 ,, ukn ; k ]  qi ( xT 1 )  ,  for i  N , (5.1) subject to the deterministic dynamics: xk 1  f k ( xk , u1k , uk2 ,, ukn ) . (5.2) Following the analysis in Sections 3 and 4 and the proof of Theorem 4.1, a theorem deriving a subgame consistent PDP for the randomly-furcating dynamic game (6.1)-(6.1) can be established as follows. Theorem 5.1. A payment equaling Bk( k )i ( xk* )   ( k )i (k , xk* )     k 1    k 1 1 k 1 k 1  ( ) i   k 1 [k  1, f k ( xk* , k( k )* ( xk* ))  k ]      ,     (6.3) for i  N , given to player i at stage k  {1,2,, T } , if  k k occurs and x k*  X k* , would yield the PDP leading to a subgame consistent solution of the game (6.1)-(6.2). Finally, since this is the first time that subgame consistent solution is derived for randomly furcating dynamic games, further research along this line is expected. Acknowledgements This research was supported by the Research Grant of HKSYU and the EU TOCSIN Project. 17 References Basar, T.: Decentralized Multicriteria Optimization of Linear Stochastic Systems. IEEE Transactions on Automatic Control, AC-23, 233-243 (1978) Basar, T.: Information Structures and Equilibria in Dynamic Games. In: M. Aoki and A. Marzollo (eds) New Trends in Dynamic System Theory and Economics. New York and London, Academic Press, 3-55 (1979) Basar, T., Ho, Y.C.: Informational Properties of The Nash Solutions of Two Stochastic Nonzero-sum Games. Journal of Economic Theory, 7, 370-387 (1974) Basar, T., Mintz, M.: On the Existence of Linear Saddle-Point Strategies for a Twoperson Zero-Sum Stochastic Game. Proceedings of the IEEE 11th Conference on Decision and Control, New Orleans, LA, 1972, IEEE Computer Society Press, Los Alamitos, CA, pp.188-192 (1972) Basar, T., Mintz, M.: A Multistage pursuit-evasion Game that Admits a Gaussian Random Process as a Maximum Control Policy. Stochastics, 1: 25-69 (1973) Basar, T. and Olsder, G.J.: Dynamic Noncooperative Game Theory, Philadelphia, SIAM (1999) Bellman, R.: Dynamic programming, Princeton University Press, Princeton, NJ, (1957) Esteban-Bravo, M., Nogales, F.J.: Solving Dynamic Stochastic Economic Models by Mathematical Programming Decomposition Methods. Computer and Operations Research, 35: 226-240 (2008) Nash, J.: Non-cooperative Games. Annals of Mathematics, 54(2), 286-295 (1951) Petrosyan, L.A. and Yeung, D.W.K.: Subgame-consistent Cooperative Solutions in Randomly-furcating Stochastic Differential Games, International Journal of Mathematical and Computer Modelling (Special Issue on Lyapunov’s Methods in Stability and Control), 45:1294-1307. (2007) Smith, A.E., Zenou, Y.: A Discrete-Time Stochastic Model of Job Matching. Review of Economic Dynamics, 6(1): 54-79 (2003) Yeung, D.W.K.: Infinite Horizon Stochastic Differential Games with Branching Payoffs, Journal of Optimization Theory and Applications, 111(2): 445-460 (2001) Yeung, D.W.K.: Randomly Furcating Stochastic Differential Games, in L.A. Petrosyan and D.W.K. Yeung (eds.): ICM Millennium Lectures on Games, Springer-Verlag, Berlin, pp.107-126. ISBN: 3-540-00615-X (2003) 18 Yeung, D.W.K. and Petrosyan, L.A.: Subgame Consistent Cooperative Solutions in Stochastic Differential Games, Journal of Optimization Theory Applications, 120: 651-666 (2004) Yeung, D.W.K. and Petrosyan, L.A.: Subgame Consistent Solution of A Cooperative Stochastic Differential Game With Nontransferable Payoffs, Journal of Optimization Theory and Applications, 124(3): 701-724, (2005) Yeung, D.W.K. and Petrosyan, L.A.: Cooperative Stochastic Differential Games, Springer-Verlag, New York (2006) Yeung, D.W.K. and Petrosyan, L.A.: Subgame Consistent Solutions for cooperative Stochastic Dynamic Games. Journal of Optimization Theory and Applications, 145(3): 579-596 (2010) 19

Consider the general nation nonzero

Related documents

Products

Support

Consider the general nation nonzero

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib