Reinforcement Learning Algorithms: A Comprehensive List

1. Model-Free RL Algorithms 1.1 Value-Based Methods ● Deep Q-Networks (DQN) (2015): ○ Introduced by Mnih et al. in Human-level control through deep reinforcement learning. ○ Key improvements: ■ Double DQN (2016): Van Hasselt et al. ■ Dueling DQN (2016): Wang et al. ■ Rainbow DQN (2017): Hessel et al. ● Categorical DQN (C51) (2017): ○ Introduced by Bellemare et al. in A Distributional Perspective on Reinforcement Learning. ● Quantile Regression DQN (QR-DQN) (2017): ○ Introduced by Dabney et al. in Distributional Reinforcement Learning with Quantile Regression. 1.2 Policy-Based Methods ● Proximal Policy Optimization (PPO) (2017): ○ Introduced by Schulman et al. in Proximal Policy Optimization Algorithms. ● Trust Region Policy Optimization (TRPO) (2015): ○ Introduced by Schulman et al. in Trust Region Policy Optimization. ● Soft Actor-Critic (SAC) (2018): ○ Introduced by Haarnoja et al. in Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ● Deep Deterministic Policy Gradient (DDPG) (2016): ○ Introduced by Lillicrap et al. in Continuous control with deep reinforcement learning. ● Twin Delayed DDPG (TD3) (2018): ○ Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods. 1.3 Actor-Critic Methods ● Advantage Actor-Critic (A2C) (2016): ○ A synchronous variant of A3C, introduced in Mnih et al.'s Asynchronous Methods for Deep Reinforcement Learning. ● Asynchronous Advantage Actor-Critic (A3C) (2016): ○ Introduced by Mnih et al. in Asynchronous Methods for Deep Reinforcement Learning. 2. Model-Based RL Algorithms ● Model-Based Policy Optimization (MBPO) (2019): ○ Introduced by Janner et al. in When to Trust Your Model: Model-Based Policy Optimization. ● PILCO (Probabilistic Inference for Learning Control) (2011): ○ Introduced by Deisenroth and Rasmussen in PILCO: A Model-Based and Data-Efficient Approach to Policy Search. ● World Models (2018): ○ Introduced by Ha and Schmidhuber in World Models. ● Dreamer (2020): ○ Introduced by Hafner et al. in Dream to Control: Learning Behaviors by Latent Imagination. 3. Multi-Agent RL Algorithms ● Multi-Agent Deep Deterministic Policy Gradient (MADDPG) (2017): ○ Introduced by Lowe et al. in Multi-Agent Actor-Critic for Mixed CooperativeCompetitive Environments. ● Counterfactual Multi-Agent Policy Gradients (COMA) (2018): ○ Introduced by Foerster et al. in Counterfactual Multi-Agent Policy Gradients. ● QMIX (2018): ○ Introduced by Rashid et al. in QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. 4. Hierarchical RL Algorithms ● FeUdal Networks (FuN) (2017): ○ Introduced by Vezhnevets et al. in FeUdal Networks for Hierarchical Reinforcement Learning. ● Hierarchical Actor-Critic (HAC) (2018): ○ Introduced by Levy et al. in Hierarchical Reinforcement Learning with Hindsight. ● Option-Critic (2017): ○ Introduced by Bacon et al. in The Option-Critic Architecture. 5. Meta-RL Algorithms ● Model-Agnostic Meta-Learning (MAML) (2017): ○ Introduced by Finn et al. in Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ● RL² (Fast Reinforcement Learning via Slow Reinforcement Learning) (2016): ○ Introduced by Duan et al. in RL²: Fast Reinforcement Learning via Slow Reinforcement Learning. ● PEARL (Probabilistic Embeddings for Actor-Critic RL) (2019): ○ Introduced by Rakelly et al. in Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. 6. Safe and Constrained RL Algorithms ● Constrained Policy Optimization (CPO) (2017): ○ Introduced by Achiam et al. in Constrained Policy Optimization. ● Safe Policy Iteration (2015): ○ Introduced by Achiam et al. in Safe Policy Iteration. 7. Exploration-Focused RL Algorithms ● Random Network Distillation (RND) (2018): ○ Introduced by Burda et al. in Exploration by Random Network Distillation. ● Intrinsic Curiosity Module (ICM) (2017): ○ Introduced by Pathak et al. in Curiosity-Driven Exploration by SelfSupervised Prediction. ● Hindsight Experience Replay (HER) (2017): ○ Introduced by Andrychowicz et al. in Hindsight Experience Replay. 8. Hybrid RL Algorithms ● Dyna-Q (1991): ○ Introduced by Sutton in Dyna, an Integrated Architecture for Learning, Planning, and Reacting. ● Model-Based Value Expansion (MVE) (2018): ○ Introduced by Feinberg et al. in Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning. ● Stochastic Lower Bound Optimization (SLBO) (2018): ○ Introduced by Luo et al. in Algorithmic Framework for Model-Based Reinforcement Learning with Theoretical Guarantees. 9. Modern and Cutting-Edge RL Algorithms ● DreamerV2/V3 (2021/2023): ○ DreamerV2 introduced by Hafner et al. in Mastering Atari with Discrete World Models (2021). ○ DreamerV3 introduced by Hafner et al. in DreamerV3: Mastering Diverse Domains Through World Models (2023). ● MuZero (2020): ○ Introduced by Schrittwieser et al. in Mastering Atari, Go, Chess, and Shogi by Planning with a Learned Model. ● AlphaZero (2017): ○ Introduced by Silver et al. in Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. ● Decision Transformer (2021): ○ Introduced by Chen et al. in Decision Transformer: Reinforcement Learning via Sequence Modeling. ● Conservative Q-Learning (CQL) (2020): ○ Introduced by Kumar et al. in Conservative Q-Learning for Offline Reinforcement Learning. 1. Reinforcement Learning (RL) Reinforcement Learning (RL) ├── Model-Based RL │ ├── Dyna-Q (1991) │ ├── PILCO (2011) │ ├── World Models (2018) │ ├── Dreamer (2020) │ ├── DreamerV2 (2021) │ ├── DreamerV3 (2023) │ └── MuZero (2020) └── Model-Free RL ├── Value-Based Methods │ ├── DQN (2015) │ ├── Double DQN (2016) │ ├── Dueling DQN (2016) │ ├── Rainbow DQN (2017) │ ├── C51 (2017) │ └── QR-DQN (2017) ├── Policy-Based Methods │ ├── REINFORCE (1992) │ ├── TRPO (2015) │ ├── PPO (2017) │ ├── SAC (2018) │ ├── DDPG (2016) │ └── TD3 (2018) └── Actor-Critic Methods ├── A2C (2016) ├── A3C (2016) ├── SAC (2018) ├── DDPG (2016) └── TD3 (2018) 2. Multi-Agent RL Multi-Agent RL ├── MADDPG (2017) ├── COMA (2018) └── QMIX (2018) 3. Hierarchical RL Hierarchical RL ├── FeUdal Networks (FuN) (2017) ├── Hierarchical Actor-Critic (HAC) (2018) └── Option-Critic (2017) 4. Meta-RL Meta-RL ├── MAML (2017) ├── RL² (2016) └── PEARL (2019) 5. Safe/Constrained RL Safe/Constrained RL ├── CPO (2017) └── Safe Policy Iteration (2015) 6. Exploration-Focused RL Exploration-Focused RL ├── RND (2018) ├── ICM (2017) └── HER (2017) 7. Hybrid RL Hybrid RL ├── Dyna-Q (1991) ├── MVE (2018) └── SLBO (2018) 8. Cutting-Edge RL Cutting-Edge RL ├── DreamerV2 (2021) ├── DreamerV3 (2023) ├── MuZero (2020) ├── AlphaZero (2017) ├── Decision Transformer (2021) └── CQL (2020)

Reinforcement Learning Algorithms: A Comprehensive List

Related documents

Products

Support

Reinforcement Learning Algorithms: A Comprehensive List

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib