Cognitive Science and Machine Learning Summer School 2010

advertisement
Multi-armed Bandit Problem and Bayesian
Optimization in Reinforcement Learning
From Cognitive Science and Machine Learning
Summer School 2010
Loris Bazzani
1
Outline Summer School
2
www.videolectures.net
Outline Summer School
3
www.videolectures.net
Outline Presentation
• What are Machine Learning and Cognitive
Science?
• How are they related each other?
• Reinforcement Learning
– Background
– Discrete case
– Continuous case
4
Outline Presentation
• What are Machine Learning and Cognitive
Science?
• How are they related each other?
• Reinforcement Learning
– Background
– Discrete case
– Continuous case
5
What is Machine Learning (ML)?
• Endow computers with the ability to “learn” from
“data”
• Present data from sensors, the internet,
experiments
• Expect computer to make decisions
• Traditionally categorized as:
– Supervised Learning: classification, regression
– Unsupervised Learning: dimensionality reduction,
clustering
– Reinforcement Learning: learning from feedback,
planning
6
From N. Lawrence slides
What is Cognitive Science (CogSci)?
• How does the mind get so much out of so
little?
– Rich models of the world
– Make strong generalizations
• Process of reverse engineering of the brain
– Create computational models of the brain
• Much of cognition involves induction: finding
patterns in data
7
From N. Chater slides
Outline Presentation
• What are Machine Learning and Cognitive
Science?
• How are they related each other?
• Reinforcement Learning
– Background
– Discrete case
– Continuous case
8
Link between CogSci and ML
• ML takes inspiration from psychology, CogSci and
computer science
– Rosenblatt’s Perceptron
– Neural Networks
–…
• CogSci uses ML as engineering toolkit
–
–
–
–
Bayesian inference in generative models
Hierarchical probabilistic models
Approximated methods of learning and inference
…
9
Outline Presentation
• What are Machine Learning and Cognitive
Science?
• How are they related each other?
• Reinforcement Learning
– Background
– Discrete case
– Continuous case
11
…
12
13
Outline Presentation
• What are Machine Learning and Cognitive
Science?
• How are they related each other?
• Reinforcement Learning
– Background
– Discrete case
– Continuous case
14
Multi-armed Bandit Problem
[Auer et al. ‘95]
I wanna win a
lot of cash!
15
Multi-armed Bandit Problem
[Auer et al. ‘95]
• Trade-off between Exploration and
Exploitation
• Adversary controls payoffs
• No statistical assumptions on the rewards
distribution
• Performances measurement: Regret = Player
Reward – Best Reward
• Upper Bound on the Expected Regret
16
Multi-armed Bandit Problem
[Auer et al. ‘95]
Reward(s)
Actions
Goal: define a probability distribution over
Sequence of
Trials
17
The Full Information Game
[Freund & Shapire ‘95]
Regret Bound:
Problem: Compute the reward for each action!
18
The Partial Information Game
Exp3 = Exponential-weight algorithm for Exploration and Exploitation
Bound for certain values
of
and
depending
on the best reward
Tries out all the
possible actions
Update only the
selected action
19
The Partial Information Game
Exp3.1 = Exp3 with rounds, where a round consists of a sequence of trials
Each round guesses a
bound for the total
reward of the best action
Bound:
20
Applications [Hedge]
[Bazzani et al. ‘10]
25
Outline Presentation
• What are Machine Learning and Cognitive
Science?
• How are they related each other?
• Reinforcement Learning
– Background
– Discrete case
– Continuous case
26
Bayesian Optimization
[Brochu et al. ‘10]
• Optimize a nonlinear function over a set:
Function that
gives
rewards
actions
Classic Optimization Tools
Bayesian Optimization Tools
•Known math representation
•Convex
•Evaluation of the function on all
the points
•Not close-form expression
•Not convex
•Evaluation of the function only
on one point gets noisy response
27
Bayesian Optimization
[Brochu et al. ‘10]
• Uses the Bayesian Theorem
Posterior: our
updated beliefs
about the unknown
objective function
Likelihood: given what
we think we know about
the prior, how likely is
the data we have seen?
Prior: our beliefs about the
space of possible objective
functions
where
Goal: maximize the posterior at each step, so that each new
evaluation decreases the distance between the true global
maximum and the expected maximum given the model.
28
Bayesian Optimization
[Brochu et al. ‘10]
29
Priors over Functions
• Convergence conditions of BO:
– The acquisition function is continuous and
approximately minimizes the risk
– Conditional variance converges to zero
– The objective is continuous
– The prior is homogeneous
– The optimization is independent of the mth differences
Guaranteed by Gaussian Processes (GP)
30
Priors over Functions
• GP = extension of the multivariate
Gaussian distribution to an infinite
dimension stochastic process
• Any finite linear combination of
samples will be normally
distributed
• Defined by its mean function and
covariance function
• Focus on defining the covariance function
31
Why use GPs?
• Assume zero-mean GP, function values are drawn according to
, where
• When a new observation comes
• Using Sherman-Morrison-Woodbury formula
32
Choice of Covariance Functions
• Isotropic model with hyperparameter
• Squared Exponential Kernel
• Mater Kernel
Gamma function
Bessel function
33
Acquisition Functions
• The role of the acquisition function is to guide the
search for the optimum and the uncertainty is great
• Assumption: Optimize the acquisition function is
simple and cheap
• Goal: high acquisition corresponds to potentially high
values of the objective function
• Maximizing the probability of improvement
34
Acquisition Functions
• Expected improvement
CDF and PDF of normal distribution
• Confidence bound criterion
35
Applications [BO]
Learn a set of robot gait parameters that
maximize velocity of a Sony AIBO ERS-7 robot
Find a policy for robot path planning that
would minimize uncertainty about its
location and heading
Select the locations of a set of sensors (e.g.,
cameras) in a dynamic system
36
Take-home Message
• ML and CogSci are connected
• Reinforcement Learning is useful for optimization
when dealing with temporal information
– Discrete case: Multi-armed bandit problem
– Continuous case: Bayesian optimization
• We can employ these techniques for Computer
Vision and System Control problems
37
[Abbeel et al. 2007]
http://heli.stanford.edu/
38
Some References
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. 1995. Gambling in a rigged
casino: The adversarial multi-armed bandit problem. FOCS '95.
Yoav Freund and Robert E. Schapire. 1995. A decision-theoretic generalization of online learning and an application to boosting. EuroCOLT '95.
Eric Brochu, Vlad Cora and Nando de Freitas. 2009. A Tutorial on Bayesian
Optimization of Expensive Cost Functions, with Application to Active User Modeling
and Hierarchical Reinforcement Learning. Technical Report TR-2009-023. UBC.
Loris Bazzani, Nando de Freitas and Jo-Anne Ting. 2010. Learning attentional
mechanisms for simultaneous object tracking and recognition with deep networks.
NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop.
Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for
Machine Learning. The MIT Press.
Pieter Abbeel, Adam Coates, Morgan Quigley, and Andrew Y. Ng. 2007. An Application
of Reinforcement Learning to Aerobatic Helicopter Flight. NIPS 2007.
39
Download