online

advertisement
Online Learning
Rong Jin
Batch Learning
• Given a collection of training examples D
• Learning a classification model from D
• What if training examples are received one
at each time ?
Online Learning
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• Encounter loss
• Update the classification model
Online Learning
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• Encounter loss
• Update the classification model
Online Learning
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• Encounter loss
• Update the classification model
A sequence of classifiers is generated
Objective
• Minimize the total loss
• Loss function
• Zero-One loss:
• Hinge loss:
6
Loss Functions
Hinge Loss
Zero-One Loss
1
1
7
Linear Classifiers
• Restrict our discussion to linear classifier
• Prediction:
• Confidence:
8
Separable Set
9
Inseparable Sets
10
Why Online Learning?
Fast
Memory efficient - process one example at a time
Simple to implement
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• Encounter loss
• Update the classification model
11
Why Online Learning?
Formal guarantees – Regret/Mistake bounds
No statistical assumptions
Adaptive
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• Encounter loss
• Update the classification model
12
Concept Drifting
t
• Online learning algorithm is able to track the
changing classifiers as long as the number of
changes is small
Why Online Learning?
Online to Batch conversions
• How to compute one classifier from the sequence
of classifiers generated by online learning
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• Encounter loss
• Update the classification model
14
Online to Batch Conversation
Why Online Learning?
• Not as good as a well designed batch
algorithms
Online Learning
16
Online Learning
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• Encounter loss
• Update the classification model
Update Rules
• Online algorithms are based on an update rule
which defines
from
(and possibly
other information)
• Linear Classifiers : find
from
based
on the input
Some Update Rules :
–
–
–
–
Perceptron (Rosenblat)
ALMA (Gentile)
ROMMA (Li & Long)
NORMA (Kivinen et. al)
– MIRA (Crammer & Singer)
– EG (Littlestown and Warmuth)
– Bregman Based (Warmuth)
18
Perceptron
Initialize
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• If
then
Geometrical Interpretation
20
Mistake Bound: Separable Case
• Assume the data set D is linearly separable with
margin , i.e.,
• Assume
R
Then, the maximum
number of mistakes made
by the Perceptron
algorithm is bounded by
Mistake Bound: Separable Case
Mistake Bound: Separable Case
Mistake Bound: Inseparable Case
• Let
be the best linear classifier
• We measure our progress by
• Consider we make a mistake for
Mistake Bound: Inseparable Case
Mistake Bound: Inseparable Case
• Result 1:
Mistake Bound: Inseparable Case
• Result 2
Perceptron with Projection
Initialize
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• If
then
• If
then
Remarks
• Mistake bound is measured for a sequence
of classifiers
• Bound does not depend on dimension of the
feature vector
• The bound holds for all sequences (no i.i.d.
assumption).
• It is not tight for most real world data. But, it
can not be further improved in general.
29
Perceptron
Conservative: updates
Initialize
the classifier only
For t=1, 2, … T
when it misclassifies
• Receive an instance
• Predict its class label
• Receive the true class label
• If
then
Aggressive Perceptron
Initialize
For t=1, 2, … T
• Receive an instance
• Predict its class label
• Receive the true class label
• If
then
Regret Bound
Learning a Classifier
• The evaluation (mistake bound or regret
bound) concerns a sequence of classifiers
• But, by the end of the day, which classifier
should used ? The last? By Cross Validation ?
Learning with Expert Advice
• Learning to combine the predictions from
multiple experts
• An ensemble of d experts:
• Combination weights:
• Combined classifier
Hedge
Simple Case
• There exists one expert, denoted by
who can perfectly classify all the training
examples
• What is your learning strategy ?
,
Difficult case
• What if we don’t have such a perfect expert ?
Hedge Algorithm
+1
-1
+1
+1
Hedge Algorithm
Initialize
For t=1, 2, … T
• Receive a training example
• Prediction
• If
then
For i=1, 2, …, d
• If
then
Mistake Bound
Mistake Bound
• Measure the progress
• Lower bound
Mistake Bound
• Upper bound
Mistake Bound
• Upper bound
Mistake Bound
Download