Uploaded by Christopher Brown

Reinforcement Learning Algorithms: A Comprehensive List

advertisement
1. Model-Free RL Algorithms
1.1 Value-Based Methods
● Deep Q-Networks (DQN) (2015):
○ Introduced by Mnih et al. in Human-level control through deep
reinforcement learning.
○ Key improvements:
■ Double DQN (2016): Van Hasselt et al.
■ Dueling DQN (2016): Wang et al.
■ Rainbow DQN (2017): Hessel et al.
● Categorical DQN (C51) (2017):
○ Introduced by Bellemare et al. in A Distributional Perspective on
Reinforcement Learning.
● Quantile Regression DQN (QR-DQN) (2017):
○ Introduced by Dabney et al. in Distributional Reinforcement Learning with
Quantile Regression.
1.2 Policy-Based Methods
● Proximal Policy Optimization (PPO) (2017):
○ Introduced by Schulman et al. in Proximal Policy Optimization Algorithms.
● Trust Region Policy Optimization (TRPO) (2015):
○ Introduced by Schulman et al. in Trust Region Policy Optimization.
● Soft Actor-Critic (SAC) (2018):
○ Introduced by Haarnoja et al. in Soft Actor-Critic: Off-Policy Maximum
Entropy Deep Reinforcement Learning with a Stochastic Actor.
● Deep Deterministic Policy Gradient (DDPG) (2016):
○ Introduced by Lillicrap et al. in Continuous control with deep reinforcement
learning.
● Twin Delayed DDPG (TD3) (2018):
○ Introduced by Fujimoto et al. in Addressing Function Approximation Error in
Actor-Critic Methods.
1.3 Actor-Critic Methods
● Advantage Actor-Critic (A2C) (2016):
○ A synchronous variant of A3C, introduced in Mnih et al.'s Asynchronous
Methods for Deep Reinforcement Learning.
● Asynchronous Advantage Actor-Critic (A3C) (2016):
○ Introduced by Mnih et al. in Asynchronous Methods for Deep Reinforcement
Learning.
2. Model-Based RL Algorithms
● Model-Based Policy Optimization (MBPO) (2019):
○ Introduced by Janner et al. in When to Trust Your Model: Model-Based
Policy Optimization.
● PILCO (Probabilistic Inference for Learning Control) (2011):
○ Introduced by Deisenroth and Rasmussen in PILCO: A Model-Based and
Data-Efficient Approach to Policy Search.
● World Models (2018):
○ Introduced by Ha and Schmidhuber in World Models.
● Dreamer (2020):
○ Introduced by Hafner et al. in Dream to Control: Learning Behaviors by
Latent Imagination.
3. Multi-Agent RL Algorithms
● Multi-Agent Deep Deterministic Policy Gradient (MADDPG) (2017):
○ Introduced by Lowe et al. in Multi-Agent Actor-Critic for Mixed CooperativeCompetitive Environments.
● Counterfactual Multi-Agent Policy Gradients (COMA) (2018):
○ Introduced by Foerster et al. in Counterfactual Multi-Agent Policy Gradients.
● QMIX (2018):
○ Introduced by Rashid et al. in QMIX: Monotonic Value Function
Factorisation for Deep Multi-Agent Reinforcement Learning.
4. Hierarchical RL Algorithms
● FeUdal Networks (FuN) (2017):
○ Introduced by Vezhnevets et al. in FeUdal Networks for Hierarchical
Reinforcement Learning.
● Hierarchical Actor-Critic (HAC) (2018):
○ Introduced by Levy et al. in Hierarchical Reinforcement Learning with
Hindsight.
● Option-Critic (2017):
○ Introduced by Bacon et al. in The Option-Critic Architecture.
5. Meta-RL Algorithms
● Model-Agnostic Meta-Learning (MAML) (2017):
○ Introduced by Finn et al. in Model-Agnostic Meta-Learning for Fast
Adaptation of Deep Networks.
● RL² (Fast Reinforcement Learning via Slow Reinforcement Learning) (2016):
○ Introduced by Duan et al. in RL²: Fast Reinforcement Learning via Slow
Reinforcement Learning.
● PEARL (Probabilistic Embeddings for Actor-Critic RL) (2019):
○ Introduced by Rakelly et al. in Efficient Off-Policy Meta-Reinforcement
Learning via Probabilistic Context Variables.
6. Safe and Constrained RL Algorithms
● Constrained Policy Optimization (CPO) (2017):
○ Introduced by Achiam et al. in Constrained Policy Optimization.
● Safe Policy Iteration (2015):
○ Introduced by Achiam et al. in Safe Policy Iteration.
7. Exploration-Focused RL Algorithms
● Random Network Distillation (RND) (2018):
○ Introduced by Burda et al. in Exploration by Random Network Distillation.
● Intrinsic Curiosity Module (ICM) (2017):
○ Introduced by Pathak et al. in Curiosity-Driven Exploration by SelfSupervised Prediction.
● Hindsight Experience Replay (HER) (2017):
○ Introduced by Andrychowicz et al. in Hindsight Experience Replay.
8. Hybrid RL Algorithms
● Dyna-Q (1991):
○ Introduced by Sutton in Dyna, an Integrated Architecture for Learning,
Planning, and Reacting.
● Model-Based Value Expansion (MVE) (2018):
○ Introduced by Feinberg et al. in Model-Based Value Expansion for Efficient
Model-Free Reinforcement Learning.
● Stochastic Lower Bound Optimization (SLBO) (2018):
○ Introduced by Luo et al. in Algorithmic Framework for Model-Based
Reinforcement Learning with Theoretical Guarantees.
9. Modern and Cutting-Edge RL Algorithms
● DreamerV2/V3 (2021/2023):
○ DreamerV2 introduced by Hafner et al. in Mastering Atari with Discrete
World Models (2021).
○ DreamerV3 introduced by Hafner et al. in DreamerV3: Mastering Diverse
Domains Through World Models (2023).
● MuZero (2020):
○ Introduced by Schrittwieser et al. in Mastering Atari, Go, Chess, and Shogi
by Planning with a Learned Model.
● AlphaZero (2017):
○ Introduced by Silver et al. in Mastering Chess and Shogi by Self-Play with a
General Reinforcement Learning Algorithm.
● Decision Transformer (2021):
○ Introduced by Chen et al. in Decision Transformer: Reinforcement Learning
via Sequence Modeling.
● Conservative Q-Learning (CQL) (2020):
○ Introduced by Kumar et al. in Conservative Q-Learning for Offline
Reinforcement Learning.
1. Reinforcement Learning (RL)
Reinforcement Learning (RL)
├── Model-Based RL
│ ├── Dyna-Q (1991)
│ ├── PILCO (2011)
│ ├── World Models (2018)
│ ├── Dreamer (2020)
│ ├── DreamerV2 (2021)
│ ├── DreamerV3 (2023)
│ └── MuZero (2020)
└── Model-Free RL
├── Value-Based Methods
│ ├── DQN (2015)
│ ├── Double DQN (2016)
│ ├── Dueling DQN (2016)
│ ├── Rainbow DQN (2017)
│ ├── C51 (2017)
│ └── QR-DQN (2017)
├── Policy-Based Methods
│ ├── REINFORCE (1992)
│ ├── TRPO (2015)
│ ├── PPO (2017)
│ ├── SAC (2018)
│ ├── DDPG (2016)
│ └── TD3 (2018)
└── Actor-Critic Methods
├── A2C (2016)
├── A3C (2016)
├── SAC (2018)
├── DDPG (2016)
└── TD3 (2018)
2. Multi-Agent RL
Multi-Agent RL
├── MADDPG (2017)
├── COMA (2018)
└── QMIX (2018)
3. Hierarchical RL
Hierarchical RL
├── FeUdal Networks (FuN) (2017)
├── Hierarchical Actor-Critic (HAC) (2018)
└── Option-Critic (2017)
4. Meta-RL
Meta-RL
├── MAML (2017)
├── RL² (2016)
└── PEARL (2019)
5. Safe/Constrained RL
Safe/Constrained RL
├── CPO (2017)
└── Safe Policy Iteration (2015)
6. Exploration-Focused RL
Exploration-Focused RL
├── RND (2018)
├── ICM (2017)
└── HER (2017)
7. Hybrid RL
Hybrid RL
├── Dyna-Q (1991)
├── MVE (2018)
└── SLBO (2018)
8. Cutting-Edge RL
Cutting-Edge RL
├── DreamerV2 (2021)
├── DreamerV3 (2023)
├── MuZero (2020)
├── AlphaZero (2017)
├── Decision Transformer (2021)
└── CQL (2020)
Download