Curriculum Learning in Reinforcement Learning (Doctoral Consortium)

Curriculum Learning in Reinforcement Learning
(Doctoral Consortium)
Sanmit Narvekar
Department of Computer Science
University of Texas at Austin
Austin, TX 78712 USA
sanmit@cs.utexas.edu
ABSTRACT
Transfer learning in reinforcement learning is an area of research that seeks to speed up or improve learning of a complex target task, by leveraging knowledge from one or more
source tasks. This thesis will extend the concept of transfer
learning to curriculum learning, where the goal is to design
a sequence of source tasks for an agent to train on, such
that final performance or learning speed is improved. We
discuss completed work on this topic, including an approach
for modeling transferability between tasks, and methods for
semi-automatically generating source tasks tailored to an
agent and the characteristics of a target domain. Finally,
we also present ideas for future work.
Figure 1: Different subgames in Quick Chess
General Terms
Algorithms; Performance; Experimentation
Keywords
Reinforcement Learning; Transfer Learning; Curriculum Learning
1.
INTRODUCTION
As autonomous agents are called upon to perform increasingly difficult tasks, new techniques will be needed to make
learning such tasks tractable. Transfer learning [3, 9] is a
recent area of research that has been shown to speed up
learning on a complex task by transferring knowledge from
one or more easier source tasks. Most existing transfer learning methods treat this transfer of knowledge as a one-step
process, where knowledge from all the sources are directly
transferred to the target. However, for complex tasks, it
may be more beneficial (and even necessary) to gradually
acquire skills over multiple tasks in sequence, where each
subsequent task requires and builds upon knowledge gained
in a previous task. This is especially important, for example,
if the knowledge or experience needed from a source requires
prerequisites to obtain.
The goal of this thesis work is to extend transfer learning to the problem of curriculum learning. As a motivatAppears in: Proceedings of the 15th International Conference
on Autonomous Agents and Multiagent Systems (AAMAS 2016),
J. Thangarajah, K. Tuyls, C. Jonker, S. Marsella (eds.),
May 9–13, 2016, Singapore.
c 2016, International Foundation for Autonomous Agents and
Copyright Multiagent Systems (www.ifaamas.org). All rights reserved.
ing example, consider the game of Quick Chess1 (Figure
1). Quick Chess is a game designed to introduce players to
the full game of chess, by using a sequence of progressively
more difficult “subgames.” For example, the first subgame is
a 5x5 board with only pawns, where the player learns how
pawns move and about promotions. The second subgame
is a small board with pawns and a king, which introduces
a new objective: keeping the king alive. In each successive
subgame, new elements are introduced (such as new pieces, a
larger board, or different configurations) that require learning new skills and building upon knowledge learned in previous games. The final game is the full game of chess.
The question that motivates this line of work is: can we
find an optimal sequence of tasks (i.e. a curriculum) for an
agent to play that will make it possible to learn some target
task (such as chess) fastest, or at a performance level better
than learning from scratch?
Designing an effective curriculum is a complex problem
that ties task creation, sequencing, and transfer learning. A
good set of source tasks is crucial for having positive transfer. A curriculum designer must be able to suggest source
tasks using knowledge of the target domain, and that are tailored to the current abilities of the agent. Second, the tasks
must be sequenced in a way that allows new knowledge to
be accumulated by the agent along each step. During this
process, the agent also needs to know how long it should
spend in a source in order to acquire the necessary skills,
the kind of knowledge it should obtain for transfer, and how
to transfer it to the next stage of the curriculum. In the following sections, we discuss progress made towards some of
these objectives, and plans for future work in this direction.
1
http://www.intplay.com/uploadedFiles/Game Rules/P20051QuickChess-Rules.pdf
1528
2.
COMPLETED WORK
ond, from a practical standpoint, if the necessary skills can
be acquired from a source task before reaching convergence,
there is no benefit to keep training on it. Doing so would
make it harder to show benefits such as strong transfer.
Finally, there is the overall goal of automatically sequencing tasks in the curriculum. Our current work [6] proposed
a way of creating a space of tasks, but assumed a separate
process (e.g., a human expert) chose which tasks to use. Automating this process completes the loop. Once we have a
curriculum for an agent, we can also examine related questions such as how to individualize that curriculum for a new
set of agents that have different representations or abilities.
There are two pieces of completed work that relate to this
thesis. Last year [7], we examined the problem of learning
to select which tasks could serve as suitable source tasks for
a given target task. Unlike previous approaches, which typically relied on a model of the two task MDPs, or experience
(such as sample trajectories) from the two tasks, our approach used a set of meta-level feature descriptors describing
each task. We trained a regression model over these feature
descriptors to predict the benefit of transfer from one task
to another, as measured by the jumpstart metric [9], and
showed using a large scale experiment that these descriptors
could be used to predict transferability.
This method serves as one way of identifying which task
to select as part of a curriculum. However, it assumed that
a fixed and suitable set of tasks was provided, which was
created using knowledge of the domain. It also only showed
successful transfer for a 1-stage curriculum.
At AAMAS 2016, we will present the overall problem of
curriculum learning in the context of reinforcement learning
[6], and show concrete multi-stage transfer. Specifically, as
a first step towards creating a curriculum, we consider how
a space of useful source tasks can be generated, using a parameterized model of the domain and observed trajectories
of an agent on the target task. We claim that a good source
task is one that leverages knowledge about the domain to
simplify the problem, and that promotes learning new skills
tailored to the abilities of the agent in question. Thus, we
propose a series of functions to semi-automatically create
useful, agent-specific source tasks, and show how they can
be used to form components of a curriculum in two challenging multiagent domains. Particularly, we show that a
curriculum is useful and can provide strong transfer [9].
The methods we propose are also useful for creating tasks
in the classic transfer learning paradigm (1 step curriculum).
Past work in transfer learning has typically assumed a fixed
set of source tasks are provided, using a static analysis of the
domain. This work allows new source tasks to be suggested
from a dynamic analysis of the agent’s performance.
3.
Acknowledgements
This work was done in collaboration with Jivko Sinapov,
Matteo Leonetti, and Peter Stone as part of the Learning
Agents Research Group (LARG) at the University of Texas
at Austin. LARG research is supported in part by NSF
(CNS-1330072, CNS-1305287), ONR (21C184-01), AFRL
(FA8750-14-1-0070), & AFOSR (FA9550-14-1-0087).
REFERENCES
[1] A. Fachantidis, I. Partalas, G. Tsoumakas, and
I. Vlahavas. Transferring task models in reinforcement
learning agents. Neurocomputing, 107:23–32, 2013.
[2] F. Fernández, J. Garcı́a, and M. Veloso. Probabilistic
policy reuse for inter-task transfer learning. Robotics
and Autonomous Systems, 58(7):866 – 871, 2010.
Advances in Autonomous Robots for Service and
Entertainment.
[3] A. Lazaric. Transfer in reinforcement learning: a
framework and a survey. In M. Wiering and M. van
Otterlo, editors, Reinforcement Learning: State of the
Art. Springer, 2011.
[4] A. Lazaric and M. Restelli. Transfer from multiple
MDPs. In Proceedings of the Twenty-Fifth Annual
Conference on Neural Information Processing Systems
(NIPS’11), 2011.
[5] A. Lazaric, M. Restelli, and A. Bonarini. Transfer of
samples in batch reinforcement learning. In
A. McCallum and S. Roweis, editors, Proceedings of the
Twenty-Fifth Annual International Conference on
Machine Learning (ICML-2008), pages 544–551,
Helsinki, Finalnd, July 2008.
[6] S. Narvekar, J. Sinapov, M. Leonetti, and P. Stone.
Source task creation for curriculum learning. In
Proceedings of the 15th International Conference on
Autonomous Agents and Multiagent Systems (AAMAS
2016), Singapore, May 2016.
[7] J. Sinapov, S. Narvekar, M. Leonetti, and P. Stone.
Learning inter-task transferability in the absence of
target task samples. In Proceedings of the 2015 ACM
Conference on Autonomous Agents and Multi-Agent
Systems (AAMAS). ACM, 2015.
[8] V. Soni and S. Singh. Using homomorphisms to
transfer options across continuous reinforcement
learning domains. In In Proceedings of American
Association for Artificial Intelligence (AAAI), 2006.
[9] M. E. Taylor and P. Stone. Transfer learning for
reinforcement learning domains: A survey. Journal of
Machine Learning Research, 10(1):1633–1685, 2009.
DIRECTIONS FOR FUTURE WORK
In our most recent work [6], we showed how one form of
transfer – value function transfer – could be used to transfer
knowledge via a curriculum. However, transfer learning literature has shown that alternate forms of transfer are also
possible, using options [8], policies [2], models [1], or even
samples [5, 4]. Thus, an interesting extension is characterizing what type of knowledge is useful to extract from a task,
and whether we can combine multiple forms of transfer for
use in a curriculum. For example, it may be that some tasks
are more suitable for transferring options, while others are
more suitable for transferring samples. Thus, when designing a curriculum, the designer would ask both which task to
transfer from, and the type of knowledge to transfer.
Another key question to address is how long to spend on
a source task. In our previous work [6, 7], agents trained
on source tasks until convergence. However, this is suboptimal for two reasons. First, spending too much time training
in a source task can result in overfitting to the source. In
curriculum learning, we only care about performance in the
target task, and not whether we also achieve optimality in a
source. Thus, this excess training can be detrimental. Sec-
1529