S2. Matching strategy in state-dependent choice behaviors Stationary condition of the matching strategy The stationary condition given as Eq. 7 in the main text was derived for the purpose of avoiding the difficulty in calculating P ( s) w j in Markov decision processes[29]. We provide another derivation suitable for the present framework. To transform r w j , we use the recursive relation P ( s ') a , s P( s ' | a, s) pas P( s) , where for the long-term state distribution P( s ' | a, s) P( st 1 s ' | at a, st s) . By setting r | a, s wj 0 and P ( s ' | a, s) w j 0 according to the matching strategy, we obtain r w j w j r | a, s pas P( s ) r | a, s s ,a s ,a pas P ( s ) P( s ) r | a, s pas w j w j s ,a r | a, s s ',a ' P ( s | a ', s ') pa ' s ' P( s ') pas P ( s ) r | a , s pa s w j w j s ,a r | a, s pas p P ( s ') P ( s ) r | a, s pas P( s | a ', s ') a ' s ' w j w j s ', a ' s , a r | a, s pas pa ' s ' P ( s ') P ( s ) rt 1 | a ', s ' w j w j s ', a ' r | a, s pas pa ' s ' P ( s ') P ( s ) rt 1 | a ', s ' P( s ') rt 1 | a ', s ' pa ' s ' w j w j w j s ', a ' s ', a ' s ,a s ,a s ,a s ,a r | a, s rt 1 | a, s s ,a pas w P( s ) rt 1 | a ', s ' pas s ,a j P ( s ) , w j where we used the simplified expression rt | a ', s ' rt | at a ', st s ' and the relation rt 1 | a ', s ' a , s rt | a, s pas P( s | a ', s ') . Transforming r w j repetitively in the same way, we obtain r P( s ) ' p rt | a, s as P( s ) rt ' | a, s pas w j w j s ,a 0 s ,a w j P ( s ) ' p rt | a, s r as P( s) rt ' | a, s pas , w w j s ,a 0 s ,a j 1 where we used the relation r pas w a r | a, s the infinite sum 0 t 0 derived from the probability conservation. If j r converges, then lim rt | a, s r and we can simplify r w j by taking the limit: ' . However, the infinite sum may oscillate. In order to avoid oscillations in the individual terms, we take the average over to obtain ' 0,1, r 1 T 1 ' P( s) p lim rt | a, s r as P( s) rt ' | a, s pas w j T T '0 s ,a 0 w j s ,a w j p 1 T 1 ' 1 T 1 P( s) lim rt | a, s r as P( s) lim rt ' | a, s pas . T T T w j T ' 0 w j s ,a ' 0 0 s ,a T 1 T 1 Noting that lim 1 rt | a, s lim 1 rt a, s r a, s r , we obtain T T T 0 T 0 r 1 T 1 ' lim rt | a, s r T T w j s ,a ' 0 0 w 1 T 1 ' rt | a, s r T T s ,a ' 0 0 w lim T 1 ' 1 rt | a, s r T T s ,a ' 0 0 lim pas P( s) r s j pas p as s ,a j pas w P( s) r P( s ) w j P( s ) w j P( s), j which coincides with the right hand side of Eq. 7 in the main text. Extension of the matching law Using vectors dps dp1s , dp2 s , 1 T 1 ' rt a, s r T T ' 0 0 Qas lim , dpns and Qs Q1s , Q2 s , , Qns , where , the stationary condition for the matching strategy (Eq. 7 in the text) is written as a vector form: s P( s)Qs dps 0 . If the stationary choice probability pas 0 for a and s , changes in the probabilities dps are allowed to be in an arbitrary direction that satisfies 1 dps 0 for s because of the assumption for the functions pas (w) (Eq. 9 in the main text). Therefore, the vector Qs should be parallel to the vector 1 for s . This implies Q1s Q2 s 2 Qns for s . To obtain a stationary point along a boundary on which pas 0 for some option in some state, we forbid the changes in this direction ( dpas 0 ), and obtain that the components of Qs should be identical for all options exhibiting non-zero choice probabilities. Using the summation a Qas pas Vs , the above mentioned condition is 1 T 1 ' rt s r , T T ' 0 0 written as “ pas 0 or Qas Vs for a, s ”. Using Vs lim 1 T 1 ' 1 T 1 ' r a , s r lim t rt s r T T T T ' 0 0 ' 0 0 1 T 1 ' lim rt a, s rt | s . T T ' 0 0 Qas Vs lim Thus, we obtain Eq. 8 in the text. The extended matching law is derived from the matching strategy that attempts to maximize the average reward r . In contrast, if the brain’s decision system simply attempts to maximize the average reward in each state, r | s a r | a, s pas ( w ) , without taking the total average r s r | s P( s) into account, then the matching strategy leads to the normal matching law in the individual states: pas 0 or r | a, s r | s 0 for a and s . It is interesting to investigate which matching law, the normal one or the extended version, animals’ state-dependent choice behavior exhibits. 3