Uploaded by Christopher Brown

Reinforcement Learning Algorithms Overview

advertisement
1. Reinforcement Learning (RL)
Reinforcement Learning (RL)
├── Model-Based RL
│ ├── Dyna-Q (1991)
│ ├── PILCO (2011)
│ ├── World Models (2018)
│ ├── Dreamer (2020)
│ ├── DreamerV2 (2021)
│ ├── DreamerV3 (2023)
│ └── MuZero (2020)
└── Model-Free RL
├── Value-Based Methods
│ ├── DQN (2015)
│ ├── Double DQN (2016)
│ ├── Dueling DQN (2016)
│ ├── Rainbow DQN (2017)
│ ├── C51 (2017)
│ └── QR-DQN (2017)
├── Policy-Based Methods
│ ├── REINFORCE (1992)
│ ├── TRPO (2015)
│ ├── PPO (2017)
│ ├── SAC (2018)
│ ├── DDPG (2016)
│ └── TD3 (2018)
└── Actor-Critic Methods
├── A2C (2016)
├── A3C (2016)
├── SAC (2018)
├── DDPG (2016)
└── TD3 (2018)
2. Multi-Agent RL
Multi-Agent RL
├── MADDPG (2017)
├── COMA (2018)
└── QMIX (2018)
3.Hierarchical RL
Hierarchical RL
├── FeUdal Networks (FuN) (2017)
├── Hierarchical Actor-Critic (HAC) (2018)
└── Option-Critic (2017)
4. Meta-RL
Meta-RL
├── MAML (2017)
├── RL² (2016)
└── PEARL (2019)
5. Safe/Constrained RL
Safe/Constrained RL
├── CPO (2017)
└── Safe Policy Iteration (2015)
6. Exploration-Focused RL
Exploration-Focused RL
├── RND (2018)
├── ICM (2017)
└── HER (2017)
7. Hybrid RL
Hybrid RL
├── Dyna-Q (1991)
├── MVE (2018)
└── SLBO (2018)
8. Cutting-Edge RL
Cutting-Edge RL
├── DreamerV2 (2021)
├── DreamerV3 (2023)
├── MuZero (2020)
├── AlphaZero (2017)
├── Decision Transformer (2021)
└── CQL (2020)
1. Model-Free RL Algorithms
1.1 Value-Based Methods
Deep Q-Networks (DQN) (2015):
Introduced by Mnih et al. in Human-level control through deep reinforcement learning.
Key improvements:
Double DQN (2016): Van Hasselt et al.
Dueling DQN (2016): Wang et al.
Rainbow DQN (2017): Hessel et al.
Categorical DQN (C51) (2017):
Introduced by Bellemare et al. in A Distributional Perspective on Reinforcement Learning.
Quantile Regression DQN (QR-DQN) (2017):
Introduced by Dabney et al. in Distributional Reinforcement Learning with Quantile Regression.
1.2 Policy-Based Methods
Proximal Policy Optimization (PPO) (2017):
Introduced by Schulman et al. in Proximal Policy Optimization Algorithms.
Trust Region Policy Optimization (TRPO) (2015):
Introduced by Schulman et al. in Trust Region Policy Optimization.
Soft Actor-Critic (SAC) (2018):
Introduced by Haarnoja et al. in Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Deep Deterministic Policy Gradient (DDPG) (2016):
Introduced by Lillicrap et al. in Continuous control with deep reinforcement learning.
Twin Delayed DDPG (TD3) (2018):
Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods.
1.3 Actor-Critic Methods
Advantage Actor-Critic (A2C) (2016):
A synchronous variant of A3C, introduced in Mnih et al.'s Asynchronous Methods for Deep Reinforcement Learning.
Asynchronous Advantage Actor-Critic (A3C) (2016):
Introduced by Mnih et al. in Asynchronous Methods for Deep Reinforcement Learning.
2. Model-Based RL Algorithms
Model-Based Policy Optimization (MBPO) (2019):
Introduced by Janner et al. in When to Trust Your Model: Model-Based Policy Optimization.
PILCO (Probabilistic Inference for Learning Control) (2011):
Introduced by Deisenroth and Rasmussen in PILCO: A Model-Based and Data-Efficient Approach to Policy Search.
World Models (2018):
Introduced by Ha and Schmidhuber in World Models.
Dreamer (2020):
Introduced by Hafner et al. in Dream to Control: Learning Behaviors by Latent Imagination.
3. Multi-Agent RL Algorithms
Multi-Agent Deep Deterministic Policy Gradient (MADDPG) (2017):
Introduced by Lowe et al. in Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.
Counterfactual Multi-Agent Policy Gradients (COMA) (2018):
Introduced by Foerster et al. in Counterfactual Multi-Agent Policy Gradients.
QMIX (2018):
Introduced by Rashid et al. in QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning.
4. Hierarchical RL Algorithms
FeUdal Networks (FuN) (2017):
Introduced by Vezhnevets et al. in FeUdal Networks for Hierarchical Reinforcement Learning.
Hierarchical Actor-Critic (HAC) (2018):
Introduced by Levy et al. in Hierarchical Reinforcement Learning with Hindsight.
Option-Critic (2017):
Introduced by Bacon et al. in The Option-Critic Architecture.
5. Meta-RL Algorithms
Model-Agnostic Meta-Learning (MAML) (2017):
Introduced by Finn et al. in Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
RL² (Fast Reinforcement Learning via Slow Reinforcement Learning) (2016):
Introduced by Duan et al. in RL²: Fast Reinforcement Learning via Slow Reinforcement Learning.
PEARL (Probabilistic Embeddings for Actor-Critic RL) (2019):
Introduced by Rakelly et al. in Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables.
6. Safe and Constrained RL Algorithms
Constrained Policy Optimization (CPO) (2017):
Introduced by Achiam et al. in Constrained Policy Optimization.
Safe Policy Iteration (2015):
Introduced by Achiam et al. in Safe Policy Iteration.
7. Exploration-Focused RL Algorithms
Random Network Distillation (RND) (2018):
Introduced by Burda et al. in Exploration by Random Network Distillation.
Intrinsic Curiosity Module (ICM) (2017):
Introduced by Pathak et al. in Curiosity-Driven Exploration by Self-Supervised Prediction.
Hindsight Experience Replay (HER) (2017):
Introduced by Andrychowicz et al. in Hindsight Experience Replay.
8. Hybrid RL Algorithms
Dyna-Q (1991):
Introduced by Sutton in Dyna, an Integrated Architecture for Learning, Planning, and Reacting.
Model-Based Value Expansion (MVE) (2018):
Introduced by Feinberg et al. in Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning.
Stochastic Lower Bound Optimization (SLBO) (2018):
Introduced by Luo et al. in Algorithmic Framework for Model-Based Reinforcement Learning with Theoretical Guarantees.
9. Modern and Cutting-Edge RL Algorithms
DreamerV2/V3 (2021/2023):
DreamerV2 introduced by Hafner et al. in Mastering Atari with Discrete World Models (2021).
DreamerV3 introduced by Hafner et al. in DreamerV3: Mastering Diverse Domains Through World Models (2023).
MuZero (2020):
Introduced by Schrittwieser et al. in Mastering Atari, Go, Chess, and Shogi by Planning with a Learned Model.
AlphaZero (2017):
Introduced by Silver et al. in Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.
Decision Transformer (2021):
Introduced by Chen et al. in Decision Transformer: Reinforcement Learning via Sequence Modeling.
Conservative Q-Learning (CQL) (2020):
Introduced by Kumar et al. in Conservative Q-Learning for Offline Reinforcement Learning.
Download