ECE 5984
Double Deep Q Network
(Double DQN)
Jason J. Xuan, Ph.D.
Department of Electrical & Computer Engineering
Virginia Tech
ECE/VT
Outline
• Problem of DQN: Overoptimism
– Theory
– Experiments
• Solution: Double DQN
– Algorithm
– Experimental Results
ECE/VT
Motivation: Overoptimism
ECE/VT
Problem: Overoptimism
ECE/VT
Problem: Overoptimism (cont’d)
ECE/VT
Problem: Overoptimism (cont’d)
ECE/VT
DQN: Background
ECE/VT
DQN with target network
replay
buffer
ECE/VT
Double DQN
ECE/VT
DQN vs Double DQN
• DQN: only target network is used to compute
the target
• Double DQN: two networks, target network
and q-network, are used to compute the
target
– One for action selection and the other one for
action evaluation
ECE/VT
Double DQN
target
• Deep Q Network — selecting the best
action a with maximum Q-value of next
state.
ECE/VT
Double DQN (cont’d)
• Target Network —calculating the estimated Q-value with
action a selected above.
• Update the parameters of Deep Q Network based on a
gradient descent method (e.g., Adam optimizer)
• Update the parameters of Target Network based on the
parameters of Deep Q Network per several iterations.
ECE/VT
Results: Double DQN
ECE/VT
Results: Double DQN (cont’d)
ECE/VT
Summary
• Problem of DQN: Overoptimism
– Theory
– Experiments
• Solution: Double DQN
– Algorithm
– Experimental Results
ECE/VT
Question
• Comments are more than welcome!
ECE/VT