MIT Department of Brain and Cognitive Sciences Instructor: Professor Sebastian Seung

advertisement
MIT Department of Brain and Cognitive Sciences
9.641J, Spring 2005 - Introduction to Neural Networks
Instructor: Professor Sebastian Seung
Backprop for recurrent networks
Steady state
• Reward is an explicit function of x.
• The steady state of a recurrent network.


xi = f ∑ Wij x j + bi 
 j

max R( x)
W ,b
Recurrent backpropagation
• Find steady state
• Calculate slopes
• Solve for s
• Weight update
x = f (Wx +b)
D = diag{ f ′(Wx + b)}
∂R
(D − W )s = ∂x
−1
T
ΔW = ηsx
T
Sensitivity lemma
∂R ∂R
=
x
j
∂Wij ∂bi
Input as a function of output
• What input b is required to make x a
steady state?
bi = f ( xi ) − ∑ Wij x
j
−1
j
• This is unique, even when output is not
a unique function of the input!
Jacobian matrix
bi = f ( xi ) − ∑ Wij x
j
−1
j
∂bi
−1′
= f ( x
i )δij − W
ij
∂x j
= (D − W )
ij
−1
Chain rule
∂R
∂R ∂bi
=∑
∂x j i ∂bi ∂x j
∂R −1
= ∑ (D − W )ij
∂bi
i
∂R
−1
T ∂R
= (D − W )
∂x
∂b
Trajectories
• Initialize at x(0)
• Iterate for T time steps


xi (t ) = f ∑ Wij x j (t −1) + bi 
 j

max R(x(1),K ,x(T ))
W ,b
Unfold time into space
• Multilayer perceptron
– Same number of neurons in each layer
– Same weights and biases in each layer
(weight-sharing)
W ,b
W ,b
W ,b
x(0) ⎯
⎯→ x(1) ⎯
⎯→L ⎯
⎯→ x(T
)
Backpropagation through time
• Initial condition x(0)
x(t ) = f (Wx(t −1) + b(t ))
• Compute R / ∂x (t )
• Final condition s(T+1)=0
∂R
s(t ) = D(t )W s(t +1) + D(t )
∂x(t )
T
ΔW = η∑ s(t )x(t − 1) Δb = η∑ s(t )
T
t
t
Input as a function of output
x(t ) = f (Wx(t −1) + b(t ))
b(t ) = f
−1
(x(t )) − Wx(
t −1)
x(1), x(2),K , x(T − 1), x(T )
b(1), b(2),K , b(T − 1), b(T )
Jacobian matrix
b(t ) = f
−1
(x(t )) − Wx(
t − 1)
∂bi (t )
−1
= δtt' (D (t
))ij − Wij δt−1,t'
∂x j (t')
D(t ) = diag{ f (Wx(t −1) + b(t ))}
Chain rule
∂R ∂bi (t )
=∑
∂x j (t′) i,t′ ∂bi (t ) ∂x j (t′)
∂R
= ∑ si (t′)(D (t′))ij − ∑ si (t'+1)Wij
−1
i
i
∂R
−1
T
= D (t )s(t ) − W s(t +1)
∂x(t )
Download