1. Reinforcement Learning (RL) Reinforcement Learning (RL) ├── Model-Based RL │ ├── Dyna-Q (1991) │ ├── PILCO (2011) │ ├── World Models (2018) │ ├── Dreamer (2020) │ ├── DreamerV2 (2021) │ ├── DreamerV3 (2023) │ └── MuZero (2020) └── Model-Free RL ├── Value-Based Methods │ ├── DQN (2015) │ ├── Double DQN (2016) │ ├── Dueling DQN (2016) │ ├── Rainbow DQN (2017) │ ├── C51 (2017) │ └── QR-DQN (2017) ├── Policy-Based Methods │ ├── REINFORCE (1992) │ ├── TRPO (2015) │ ├── PPO (2017) │ ├── SAC (2018) │ ├── DDPG (2016) │ └── TD3 (2018) └── Actor-Critic Methods ├── A2C (2016) ├── A3C (2016) ├── SAC (2018) ├── DDPG (2016) └── TD3 (2018) 2. Multi-Agent RL Multi-Agent RL ├── MADDPG (2017) ├── COMA (2018) └── QMIX (2018) 3.Hierarchical RL Hierarchical RL ├── FeUdal Networks (FuN) (2017) ├── Hierarchical Actor-Critic (HAC) (2018) └── Option-Critic (2017) 4. Meta-RL Meta-RL ├── MAML (2017) ├── RL² (2016) └── PEARL (2019) 5. Safe/Constrained RL Safe/Constrained RL ├── CPO (2017) └── Safe Policy Iteration (2015) 6. Exploration-Focused RL Exploration-Focused RL ├── RND (2018) ├── ICM (2017) └── HER (2017) 7. Hybrid RL Hybrid RL ├── Dyna-Q (1991) ├── MVE (2018) └── SLBO (2018) 8. Cutting-Edge RL Cutting-Edge RL ├── DreamerV2 (2021) ├── DreamerV3 (2023) ├── MuZero (2020) ├── AlphaZero (2017) ├── Decision Transformer (2021) └── CQL (2020) 1. Model-Free RL Algorithms 1.1 Value-Based Methods Deep Q-Networks (DQN) (2015): Introduced by Mnih et al. in Human-level control through deep reinforcement learning. Key improvements: Double DQN (2016): Van Hasselt et al. Dueling DQN (2016): Wang et al. Rainbow DQN (2017): Hessel et al. Categorical DQN (C51) (2017): Introduced by Bellemare et al. in A Distributional Perspective on Reinforcement Learning. Quantile Regression DQN (QR-DQN) (2017): Introduced by Dabney et al. in Distributional Reinforcement Learning with Quantile Regression. 1.2 Policy-Based Methods Proximal Policy Optimization (PPO) (2017): Introduced by Schulman et al. in Proximal Policy Optimization Algorithms. Trust Region Policy Optimization (TRPO) (2015): Introduced by Schulman et al. in Trust Region Policy Optimization. Soft Actor-Critic (SAC) (2018): Introduced by Haarnoja et al. in Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Deep Deterministic Policy Gradient (DDPG) (2016): Introduced by Lillicrap et al. in Continuous control with deep reinforcement learning. Twin Delayed DDPG (TD3) (2018): Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods. 1.3 Actor-Critic Methods Advantage Actor-Critic (A2C) (2016): A synchronous variant of A3C, introduced in Mnih et al.'s Asynchronous Methods for Deep Reinforcement Learning. Asynchronous Advantage Actor-Critic (A3C) (2016): Introduced by Mnih et al. in Asynchronous Methods for Deep Reinforcement Learning. 2. Model-Based RL Algorithms Model-Based Policy Optimization (MBPO) (2019): Introduced by Janner et al. in When to Trust Your Model: Model-Based Policy Optimization. PILCO (Probabilistic Inference for Learning Control) (2011): Introduced by Deisenroth and Rasmussen in PILCO: A Model-Based and Data-Efficient Approach to Policy Search. World Models (2018): Introduced by Ha and Schmidhuber in World Models. Dreamer (2020): Introduced by Hafner et al. in Dream to Control: Learning Behaviors by Latent Imagination. 3. Multi-Agent RL Algorithms Multi-Agent Deep Deterministic Policy Gradient (MADDPG) (2017): Introduced by Lowe et al. in Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Counterfactual Multi-Agent Policy Gradients (COMA) (2018): Introduced by Foerster et al. in Counterfactual Multi-Agent Policy Gradients. QMIX (2018): Introduced by Rashid et al. in QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. 4. Hierarchical RL Algorithms FeUdal Networks (FuN) (2017): Introduced by Vezhnevets et al. in FeUdal Networks for Hierarchical Reinforcement Learning. Hierarchical Actor-Critic (HAC) (2018): Introduced by Levy et al. in Hierarchical Reinforcement Learning with Hindsight. Option-Critic (2017): Introduced by Bacon et al. in The Option-Critic Architecture. 5. Meta-RL Algorithms Model-Agnostic Meta-Learning (MAML) (2017): Introduced by Finn et al. in Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. RL² (Fast Reinforcement Learning via Slow Reinforcement Learning) (2016): Introduced by Duan et al. in RL²: Fast Reinforcement Learning via Slow Reinforcement Learning. PEARL (Probabilistic Embeddings for Actor-Critic RL) (2019): Introduced by Rakelly et al. in Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. 6. Safe and Constrained RL Algorithms Constrained Policy Optimization (CPO) (2017): Introduced by Achiam et al. in Constrained Policy Optimization. Safe Policy Iteration (2015): Introduced by Achiam et al. in Safe Policy Iteration. 7. Exploration-Focused RL Algorithms Random Network Distillation (RND) (2018): Introduced by Burda et al. in Exploration by Random Network Distillation. Intrinsic Curiosity Module (ICM) (2017): Introduced by Pathak et al. in Curiosity-Driven Exploration by Self-Supervised Prediction. Hindsight Experience Replay (HER) (2017): Introduced by Andrychowicz et al. in Hindsight Experience Replay. 8. Hybrid RL Algorithms Dyna-Q (1991): Introduced by Sutton in Dyna, an Integrated Architecture for Learning, Planning, and Reacting. Model-Based Value Expansion (MVE) (2018): Introduced by Feinberg et al. in Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning. Stochastic Lower Bound Optimization (SLBO) (2018): Introduced by Luo et al. in Algorithmic Framework for Model-Based Reinforcement Learning with Theoretical Guarantees. 9. Modern and Cutting-Edge RL Algorithms DreamerV2/V3 (2021/2023): DreamerV2 introduced by Hafner et al. in Mastering Atari with Discrete World Models (2021). DreamerV3 introduced by Hafner et al. in DreamerV3: Mastering Diverse Domains Through World Models (2023). MuZero (2020): Introduced by Schrittwieser et al. in Mastering Atari, Go, Chess, and Shogi by Planning with a Learned Model. AlphaZero (2017): Introduced by Silver et al. in Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. Decision Transformer (2021): Introduced by Chen et al. in Decision Transformer: Reinforcement Learning via Sequence Modeling. Conservative Q-Learning (CQL) (2020): Introduced by Kumar et al. in Conservative Q-Learning for Offline Reinforcement Learning.