Lecture 6 - TeachLine

advertisement
Decision making in basketball
• 2-point shot: easier, fewer points
• 3-point shot: more difficult, more points
Kobe Bryant
LA Lakers
31.6 PPG (2006-7)
1
Chris Bosh
Toronto Raptors
26.3 PPG (2006-7)
3P attempts:
398
2P attempts:
1,359 (77%)
1,059 (97%)
3P success:
34%
34%
2P success :
50%
50%
(23%)
35
(3%)
The matching law
NBA best 100 players (2006-2007)
1
N3
N 2  N3
0.8
0.6
0.4
Bryant
0.2
0
0
2
N2,3 = # of 2,3 points shots
I2,3 = # 2,3 points earned
Bosh
0.2
0.4
0.6
0.8
I3
I 2  I3
1
The reward schedule
R t   r  At  , At 1 , At  2
3
,
The matching law
1
N1
N1  N 2
1
Herrnstein, JEAB, 1961
4
I1
I1  I 2
The matching law
Sugrue, Corrado & Newsome, Science, 2004
5
The matching law
Gallistel et al., unpublished
6
The matching law
Nj = # of attempts at alternative j  investment in j
Ij = # of points earned from alternative j  income from j
N1
I1

N1  N 2 I1  I 2
I1
I2

N1 N 2
equal returns
E  R A  1  E  R A  2 
7
The matching law is very general. It is found in
many animal types as well as humans, under very
different experimental conditions.
MATCHING  MAXIMIZING
8
Example: addiction model
E[R|A=drugs]
0
9
0.2
0.4
0.6
freq [drugs]
1–freq [work]
after Herrnstein and Prelec, J Econ Perspect, 1991
0.8
1
Example: addiction model
E[R|A=drugs]
E[R|A=work]
matching
0
10
0.2
0.4
0.6
freq [drugs]
1–freq [work]
after Herrnstein and Prelec, J Econ Perspect, 1991
0.8
1
Example: addiction model
E[R|A=drugs]
E[R|A=work]
E[R]
maximizing
0
11
0.2
matching
0.4
0.6
freq [drugs]
1–freq [work]
after Herrnstein and Prelec, J Econ Perspect, 1991
0.8
1
Question:
What is the neural basis of the matching law?
12
It is generally believed that
learning is due to changes in the
efficacy of synapses
0.4 μm
13
Kennedy, Science, 2000
Question:
What is the neural basis of the matching law?
Question:
What microscopic plasticity rules underlie
adaptation to matching behavior?
14
Question:
What is the neural basis of the matching law?
Hypothesis:
the matching law results from synaptic
plasticity that is driven by the covariance of
reward and neural activity
15
Question:
What is the neural basis of the matching law?
Hypothesis:
the matching law results from synaptic
plasticity that is driven by the covariance of
reward and neural activity
16
Covariance is a measure of dependence
• two random variables X, Y
 X  X  E[ X ]  Y  Y  E[Y ]
• covariance:
Cov  X , Y   E[ X   Y ]
 E[ X   Y ]
 E[ X  Y ]
• correlation coefficient: r 
17
Cov[ X , Y ]
Var[ X ]Var[Y ]
Covariance
Cov[X,Y ]  0
18

Cov[X,Y ]  0

Cov[X,Y ]  0
Hypothesis:
the matching law results from
synaptic plasticity that is driven by
the covariance of reward and neural
activity
19
Synaptic plasticity
• Local signals affect synaptic
efficacies. Popular theory:
Hebbain plasticity
W  S pre S post
• Global signals affect synaptic
efficacies. Popular theory: dopamine
gates Hebbian plasticity (Wickens)
W  DS pre S post
20
21
Schultz, Dayan & Montague, Science, 1997
Synaptic plasticity
• Local signals affect synaptic
efficacies. Popular theory:
Hebbain plasticity
W  S pre S post
• Global signals affect synaptic
efficacies. Popular theory: dopamine
gates Hebbian plasticity (Wickens)
W  DS pre S post
• Popular theory: dopamine codes the mismatch
between reward and expected reward (Schultz)
22
D  R  E  R   R
Synaptic plasticity
W  DS pre S post
D R
W   R  S pre S post
Average trajectory approximation
E  W   E[ R  S pre S post ]  Cov[ R,  S pre S post ]
23
Covariance-based plasticity rules
W   R  E  R   N
W  R   N  E  N 
N=Spre , N=Spost , N=SpreSpost
Average trajectory approximation:
E  W   Cov[ R, N ]
24
Hypothesis:
covariance-based
synaptic plasticity
 The matching law
outline:
Stationary state of covariance-based plasticity

Cov R, N   0 
25
The matching law
Assumptions
neurons
N1
N2
N3
N5
action
reward
A
R
N4
hidden variables
1. E[N|A=i] ≠ E[N|A≠i]
2. The dependence of the reward R on neural
activity N is through the action A.
26
Theorem
Suppose that Assumptions 1 and 2 are satisfied
j

Cov  R, N   0  E  R | A  1  E  R | A  2
The matching law
27
Intuition
Cov R, N   0  E  R | A  i   E  R
•
•
28
neuron
action
reward
N
A
R
In general R depends on A
If, as a result of the policy used by the subject,
R becomes independent of A then R also
becomes independent of N
W   R  S  E  S  
29
W   R  S  E  S  
W   RS
30
W   R  S  E  S  
W   RS
W    R  E  R  S
ij
31
i
pre
M
j
post
Summary
Hypothesis:
Covariance based synaptic plasticity underlies
the matching law
Theorem:
Cov R, N   0 
The matching law
Loewenstein & Seung, PNAS, 2006
Loewenstein, PLoS Comp Biol, 2008
Disclaimer:
There are learning rules that converge to
32
Cov[R,N]=0
that are not driven by covariance
Download