Cover Letter

S2. Matching strategy in state-dependent choice behaviors Stationary condition of the matching strategy The stationary condition given as Eq. 7 in the main text was derived for the purpose of avoiding the difficulty in calculating P ( s) w j in Markov decision processes[29]. We provide another derivation suitable for the present framework. To transform  r w j , we use the recursive relation P ( s ')   a , s P( s ' | a, s) pas P( s) , where for the long-term state distribution P( s ' | a, s)  P( st 1  s ' | at  a, st  s) . By setting  r | a, s wj  0 and P ( s ' | a, s) w j  0 according to the matching strategy, we obtain  r w j   w j  r | a, s pas P( s )   r | a, s s ,a s ,a pas P ( s ) P( s )   r | a, s pas w j w j s ,a   r | a, s   s ',a ' P ( s | a ', s ') pa ' s ' P( s ') pas P ( s )   r | a , s pa s w j w j s ,a   r | a, s pas p P ( s ') P ( s )   r | a, s pas P( s | a ', s ') a ' s ' w j w j s ', a ' s , a   r | a, s pas pa ' s ' P ( s ') P ( s )   rt 1 | a ', s ' w j w j s ', a '   r | a, s pas pa ' s ' P ( s ') P ( s )   rt 1 | a ', s ' P( s ')   rt 1 | a ', s ' pa ' s ' w j w j w j s ', a ' s ', a ' s ,a s ,a s ,a s ,a    r | a, s  rt 1 | a, s s ,a pas  w P( s )   rt 1 | a ', s ' pas s ,a j P ( s ) , w j where we used the simplified expression rt  | a ', s '  rt  | at  a ', st  s ' and the relation rt  1 | a ', s '   a , s rt  | a, s pas P( s | a ', s ') . Transforming  r w j repetitively in the same way, we obtain  r P( s )  '  p     rt  | a, s  as P( s )   rt  ' | a, s pas w j w j s ,a   0 s ,a  w j P ( s )  '  p      rt  | a, s  r   as P( s)   rt  ' | a, s pas ,  w w j s ,a   0 s ,a  j 1 where we used the relation r pas  w a   r  | a, s  the infinite sum 0 t  0 derived from the probability conservation. If j  r  converges, then lim rt  | a, s  r   and we can simplify  r w j by taking the limit:  '   . However, the infinite sum may oscillate. In order to avoid oscillations in the individual terms, we take the average over to obtain  '  0,1,  r 1 T 1    ' P( s)   p  lim       rt  | a, s  r   as P( s)   rt  ' | a, s pas  w j T  T  '0  s ,a   0 w j  s ,a  w j p 1 T 1  ' 1 T 1 P( s)   lim   rt  | a, s  r  as P( s)   lim  rt  ' | a, s pas . T  T T  w j T  ' 0 w j s ,a  ' 0   0 s ,a T 1 T 1 Noting that lim 1  rt  | a, s  lim 1  rt  a, s  r a, s  r , we obtain T  T T   0 T  0  r 1 T 1  '   lim   rt  | a, s  r T  T w j s ,a  ' 0   0  w 1 T 1  '   rt  | a, s  r T  T s ,a  ' 0   0  w   lim T 1  ' 1   rt  | a, s  r T  T s ,a  ' 0   0   lim pas P( s)  r  s j pas p as s ,a j pas  w P( s)  r P( s ) w j P( s ) w j P( s), j which coincides with the right hand side of Eq. 7 in the main text. Extension of the matching law Using vectors dps   dp1s , dp2 s , 1 T 1  '   rt  a, s  r T  T  ' 0   0 Qas  lim , dpns  and Qs   Q1s , Q2 s , , Qns  , where  , the stationary condition for the matching strategy (Eq. 7 in the text) is written as a vector form:  s P( s)Qs  dps  0 . If the stationary choice probability pas  0 for a and s , changes in the probabilities dps  are allowed to be in an arbitrary direction that satisfies 1  dps  0 for s because of the assumption for the functions  pas (w) (Eq. 9 in the main text). Therefore, the vector Qs should be parallel to the vector 1 for s . This implies Q1s  Q2 s  2  Qns for s . To obtain a stationary point along a boundary on which pas  0 for some option in some state, we forbid the changes in this direction ( dpas  0 ), and obtain that the components of Qs should be identical for all options exhibiting non-zero choice probabilities. Using the summation  a Qas pas  Vs , the above mentioned condition is 1 T 1  '   rt  s  r  , T  T  ' 0   0 written as “ pas  0 or Qas  Vs for a, s ”. Using Vs  lim 1 T 1  ' 1 T 1  ' r a , s  r  lim    t    rt  s  r T  T T  T  ' 0   0  ' 0   0 1 T 1  '  lim   rt  a, s  rt  | s . T  T  ' 0   0 Qas  Vs  lim  Thus, we obtain Eq. 8 in the text. The extended matching law is derived from the matching strategy that attempts to maximize the average reward  r  . In contrast, if the brain’s decision system simply attempts to maximize the average reward in each state, r | s   a r | a, s pas ( w ) , without taking the total average r   s r | s P( s) into account, then the matching strategy leads to the normal matching law in the individual states: pas  0 or r | a, s  r | s  0 for a and s . It is interesting to investigate which matching law, the normal one or the extended version, animals’ state-dependent choice behavior exhibits. 3

Cover Letter

Related documents

Products

Support

Cover Letter

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib