Double Deep Q-Network (Double DQN) Presentation

ECE 5984 Double Deep Q Network (Double DQN) Jason J. Xuan, Ph.D. Department of Electrical & Computer Engineering Virginia Tech ECE/VT Outline • Problem of DQN: Overoptimism – Theory – Experiments • Solution: Double DQN – Algorithm – Experimental Results ECE/VT Motivation: Overoptimism ECE/VT Problem: Overoptimism ECE/VT Problem: Overoptimism (cont’d) ECE/VT Problem: Overoptimism (cont’d) ECE/VT DQN: Background ECE/VT DQN with target network replay buffer ECE/VT Double DQN ECE/VT DQN vs Double DQN • DQN: only target network is used to compute the target • Double DQN: two networks, target network and q-network, are used to compute the target – One for action selection and the other one for action evaluation ECE/VT Double DQN target • Deep Q Network — selecting the best action a with maximum Q-value of next state. ECE/VT Double DQN (cont’d) • Target Network —calculating the estimated Q-value with action a selected above. • Update the parameters of Deep Q Network based on a gradient descent method (e.g., Adam optimizer) • Update the parameters of Target Network based on the parameters of Deep Q Network per several iterations. ECE/VT Results: Double DQN ECE/VT Results: Double DQN (cont’d) ECE/VT Summary • Problem of DQN: Overoptimism – Theory – Experiments • Solution: Double DQN – Algorithm – Experimental Results ECE/VT Question • Comments are more than welcome! ECE/VT

Double Deep Q-Network (Double DQN) Presentation

Related documents

Products

Support

Double Deep Q-Network (Double DQN) Presentation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib