Reinforcement Learning Algorithms Overview

1. Reinforcement Learning (RL) Reinforcement Learning (RL) ├── Model-Based RL │ ├── Dyna-Q (1991) │ ├── PILCO (2011) │ ├── World Models (2018) │ ├── Dreamer (2020) │ ├── DreamerV2 (2021) │ ├── DreamerV3 (2023) │ └── MuZero (2020) └── Model-Free RL ├── Value-Based Methods │ ├── DQN (2015) │ ├── Double DQN (2016) │ ├── Dueling DQN (2016) │ ├── Rainbow DQN (2017) │ ├── C51 (2017) │ └── QR-DQN (2017) ├── Policy-Based Methods │ ├── REINFORCE (1992) │ ├── TRPO (2015) │ ├── PPO (2017) │ ├── SAC (2018) │ ├── DDPG (2016) │ └── TD3 (2018) └── Actor-Critic Methods ├── A2C (2016) ├── A3C (2016) ├── SAC (2018) ├── DDPG (2016) └── TD3 (2018) 2. Multi-Agent RL Multi-Agent RL ├── MADDPG (2017) ├── COMA (2018) └── QMIX (2018) 3.Hierarchical RL Hierarchical RL ├── FeUdal Networks (FuN) (2017) ├── Hierarchical Actor-Critic (HAC) (2018) └── Option-Critic (2017) 4. Meta-RL Meta-RL ├── MAML (2017) ├── RL² (2016) └── PEARL (2019) 5. Safe/Constrained RL Safe/Constrained RL ├── CPO (2017) └── Safe Policy Iteration (2015) 6. Exploration-Focused RL Exploration-Focused RL ├── RND (2018) ├── ICM (2017) └── HER (2017) 7. Hybrid RL Hybrid RL ├── Dyna-Q (1991) ├── MVE (2018) └── SLBO (2018) 8. Cutting-Edge RL Cutting-Edge RL ├── DreamerV2 (2021) ├── DreamerV3 (2023) ├── MuZero (2020) ├── AlphaZero (2017) ├── Decision Transformer (2021) └── CQL (2020) 1. Model-Free RL Algorithms 1.1 Value-Based Methods Deep Q-Networks (DQN) (2015): Introduced by Mnih et al. in Human-level control through deep reinforcement learning. Key improvements: Double DQN (2016): Van Hasselt et al. Dueling DQN (2016): Wang et al. Rainbow DQN (2017): Hessel et al. Categorical DQN (C51) (2017): Introduced by Bellemare et al. in A Distributional Perspective on Reinforcement Learning. Quantile Regression DQN (QR-DQN) (2017): Introduced by Dabney et al. in Distributional Reinforcement Learning with Quantile Regression. 1.2 Policy-Based Methods Proximal Policy Optimization (PPO) (2017): Introduced by Schulman et al. in Proximal Policy Optimization Algorithms. Trust Region Policy Optimization (TRPO) (2015): Introduced by Schulman et al. in Trust Region Policy Optimization. Soft Actor-Critic (SAC) (2018): Introduced by Haarnoja et al. in Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Deep Deterministic Policy Gradient (DDPG) (2016): Introduced by Lillicrap et al. in Continuous control with deep reinforcement learning. Twin Delayed DDPG (TD3) (2018): Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods. 1.3 Actor-Critic Methods Advantage Actor-Critic (A2C) (2016): A synchronous variant of A3C, introduced in Mnih et al.'s Asynchronous Methods for Deep Reinforcement Learning. Asynchronous Advantage Actor-Critic (A3C) (2016): Introduced by Mnih et al. in Asynchronous Methods for Deep Reinforcement Learning. 2. Model-Based RL Algorithms Model-Based Policy Optimization (MBPO) (2019): Introduced by Janner et al. in When to Trust Your Model: Model-Based Policy Optimization. PILCO (Probabilistic Inference for Learning Control) (2011): Introduced by Deisenroth and Rasmussen in PILCO: A Model-Based and Data-Efficient Approach to Policy Search. World Models (2018): Introduced by Ha and Schmidhuber in World Models. Dreamer (2020): Introduced by Hafner et al. in Dream to Control: Learning Behaviors by Latent Imagination. 3. Multi-Agent RL Algorithms Multi-Agent Deep Deterministic Policy Gradient (MADDPG) (2017): Introduced by Lowe et al. in Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Counterfactual Multi-Agent Policy Gradients (COMA) (2018): Introduced by Foerster et al. in Counterfactual Multi-Agent Policy Gradients. QMIX (2018): Introduced by Rashid et al. in QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. 4. Hierarchical RL Algorithms FeUdal Networks (FuN) (2017): Introduced by Vezhnevets et al. in FeUdal Networks for Hierarchical Reinforcement Learning. Hierarchical Actor-Critic (HAC) (2018): Introduced by Levy et al. in Hierarchical Reinforcement Learning with Hindsight. Option-Critic (2017): Introduced by Bacon et al. in The Option-Critic Architecture. 5. Meta-RL Algorithms Model-Agnostic Meta-Learning (MAML) (2017): Introduced by Finn et al. in Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. RL² (Fast Reinforcement Learning via Slow Reinforcement Learning) (2016): Introduced by Duan et al. in RL²: Fast Reinforcement Learning via Slow Reinforcement Learning. PEARL (Probabilistic Embeddings for Actor-Critic RL) (2019): Introduced by Rakelly et al. in Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. 6. Safe and Constrained RL Algorithms Constrained Policy Optimization (CPO) (2017): Introduced by Achiam et al. in Constrained Policy Optimization. Safe Policy Iteration (2015): Introduced by Achiam et al. in Safe Policy Iteration. 7. Exploration-Focused RL Algorithms Random Network Distillation (RND) (2018): Introduced by Burda et al. in Exploration by Random Network Distillation. Intrinsic Curiosity Module (ICM) (2017): Introduced by Pathak et al. in Curiosity-Driven Exploration by Self-Supervised Prediction. Hindsight Experience Replay (HER) (2017): Introduced by Andrychowicz et al. in Hindsight Experience Replay. 8. Hybrid RL Algorithms Dyna-Q (1991): Introduced by Sutton in Dyna, an Integrated Architecture for Learning, Planning, and Reacting. Model-Based Value Expansion (MVE) (2018): Introduced by Feinberg et al. in Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning. Stochastic Lower Bound Optimization (SLBO) (2018): Introduced by Luo et al. in Algorithmic Framework for Model-Based Reinforcement Learning with Theoretical Guarantees. 9. Modern and Cutting-Edge RL Algorithms DreamerV2/V3 (2021/2023): DreamerV2 introduced by Hafner et al. in Mastering Atari with Discrete World Models (2021). DreamerV3 introduced by Hafner et al. in DreamerV3: Mastering Diverse Domains Through World Models (2023). MuZero (2020): Introduced by Schrittwieser et al. in Mastering Atari, Go, Chess, and Shogi by Planning with a Learned Model. AlphaZero (2017): Introduced by Silver et al. in Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. Decision Transformer (2021): Introduced by Chen et al. in Decision Transformer: Reinforcement Learning via Sequence Modeling. Conservative Q-Learning (CQL) (2020): Introduced by Kumar et al. in Conservative Q-Learning for Offline Reinforcement Learning.

Reinforcement Learning Algorithms Overview

Products

Support

Reinforcement Learning Algorithms Overview

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib