Reinforcement learning is a learning problem in which an actor has to behave optimally in its environment. Deep learning methods, on the other hand, are a subclass of representation learning, which in turn focuses on extracting the necessary features for the task (e.g. classification or detection). As such, they serve as powerful function approximators. The combination of those two paradigm results in deep reinforcement learning.
This thesis gives an overview of the recent advancement in the field. The results are divided into two broad research directions: value-based and policy-based approaches. This research shows several algorithms from those directions and how they perform. Finally, multiple open research questions are addressed and new research directions are proposed.
Table of Contents
1 Introduction
2 Research Method
2.1 Related Work
2.2 Research Conduction
3 Background
3.1 Reinforcement Learning
3.1.1 Markov Decision Process
3.1.2 Value Functions
3.1.3 Tabular Solution Methods
3.2 Deep Learning
4 Results
4.1 Value-Based Deep Reinforcement Learning
4.1.1 Deep Q-Learning and Deep Q-Networks
4.1.2 Double Q-Learning and Double Q-Network
4.1.3 Prioritized Replay
4.1.4 Dueling Network
4.1.5 Distributional Reinforcement Learning
4.1.6 Rainbow
4.2 Policy-Based Deep Reinforcement Learning
4.2.1 Asynchronous Advantage Actor-Critic
4.2.2 Trust Region Policy Optimization
4.2.3 Deep Deterministic Policy Gradients
4.2.4 Policy Iteration Using Monte Carlo Tree Search
4.2.5 Evolutionary Algorithms
4.3 Performance of the Algorithms
4.3.1 Atari 2600
4.3.2 MuJuCo
4.3.3 Various Measures
5 Discussion
5.1 Exploration vs. Exploitation
5.2 Need for Rewards
5.3 Knowledge Reusability
5.4 Inefficiency
5.5 Multi-Agent Reinforcement Learning
5.6 Model-Based Reinforcement Learning
5.7 Proposed Research Directions
6 Conclusion
Objectives and Research Themes
This thesis aims to provide a comprehensive review of recent advancements in Deep Reinforcement Learning (DRL) by categorizing the field into distinct research directions and evaluating the performance of key algorithms.
- Overview of value-based Deep Reinforcement Learning approaches.
- Analysis of policy-based DRL methods and their architectural innovations.
- Evaluation of algorithmic performance across standard benchmarks like Atari 2600 and MuJoCo.
- Discussion of fundamental challenges in RL, including exploration versus exploitation and reward engineering.
- Identification of future research directions, such as knowledge reusability and multi-agent systems.
Excerpt from the Book
Deep Learning
In this section the main concepts of DL will be covered. For the purposes of this thesis, DL methods can be seen as a form of non-linear function approximator. DL methods are a subclass of representation learning, which in turn focuses on extracting the necessary features for the task (e.g. classification or detection) (Lecun et al., 2015, p. 436). Typically, supervised learning, which this section will be focused on, is used. Here, labeled training data is fed to a non-linear function approximator, such as a neural network. In this context, "labeled" means that along with the data (e.g. pixel values of an image) a target is also given (e.g. the image shows a dog). The network now learns a function that maps the inputs (e.g. the pixels) to the output (e.g. the label). The goal of the network is, given enough training data, to be able to generalize to new, unseen data (Lecun et al., 2015, p. 436-438).
The most widely used model architectures are feedforward (neural) networks (FNN), which are also called multilayer perceptrons (MLP) (Goodfellow et al., 2016, p. 167).
Summary of Chapters
1 Introduction: Provides the definition of Reinforcement Learning and describes the motivation for combining it with Deep Learning.
2 Research Method: Details the literature review process and the sources used to identify relevant research in the DRL field.
3 Background: Establishes the theoretical foundations of Reinforcement Learning (including Markov Decision Processes) and Deep Learning concepts.
4 Results: Presents an analysis of value-based and policy-based DRL algorithms and summarizes their performance on common benchmarks.
5 Discussion: Examines open research challenges such as the exploration-exploitation dilemma and multi-agent settings.
6 Conclusion: Summarizes the thesis findings and reflects on the trajectory of Artificial General Intelligence.
Keywords
Deep Reinforcement Learning, DRL, Artificial Intelligence, Neural Networks, Q-Learning, Policy Gradient, Atari 2600, MuJoCo, Exploration vs. Exploitation, Multi-Agent Reinforcement Learning, Markov Decision Process, Function Approximation, Experience Replay, Reward Shaping, Deep Learning.
Frequently Asked Questions
What is the core focus of this bachelor thesis?
The work provides a detailed review of recent advancements in Deep Reinforcement Learning (DRL), summarizing the most important algorithms and their performance in various environments.
What are the primary research areas discussed?
The thesis categorizes research into value-based approaches (like DQN) and policy-based approaches (like A3C, TRPO, and D-DPG), while also addressing model-based learning and evolutionary strategies.
What is the main objective of the research?
The objective is to offer a structured overview of the current DRL landscape and to synthesize how different techniques contribute to reaching an agent's goals.
Which scientific methods are primarily used for performance evaluation?
The thesis relies on benchmark testing using the Atari 2600 game suite and the MuJoCo physics simulation environment to compare different algorithmic implementations.
What topics are covered in the main section of the thesis?
The main part covers the theoretical background of RL and DL, the transition to Deep Q-Networks, improvements like Dueling Networks and Distributional RL, and a variety of policy-based methods.
Which keywords best characterize this work?
Key terms include Deep Reinforcement Learning, Neural Networks, Policy Gradients, Atari 2600 benchmarks, and Multi-Agent Reinforcement Learning.
What distinguishes the Rainbow agent from previous models?
Rainbow is a state-of-the-art agent that combines multiple enhancements—such as Prioritized Replay, Multi-Step Learning, and Noisy Nets—into a single architecture, significantly improving efficiency.
How does the thesis evaluate the efficiency of RL agents?
The work discusses the significant time and sample requirements of current models compared to human learning speed and highlights the need for higher sample efficiency in future applications.
- Arbeit zitieren
- Artur Sahakjan (Autor:in), 2018, A Review of Recent Advancements in Deep Reinforcement Learning, München, GRIN Verlag, https://www.grin.com/document/432230