Machine Learning On-Line Algorithms in By: WALEED ABDULWAHAB YAHYA AL-GOBI

advertisement
On-Line Algorithms in
Machine Learning
By:
WALEED ABDULWAHAB YAHYA AL-GOBI
MUHAMMAD BURHAN HAFEZ
KIM HYEONGCHEOL
HE RUIDAN
SHANG XINDI
Overview
2
Introduction: online learning vs. offline learning
2. Predicting from Expert Advice
1.
 Weighted Majority Algorithm: Simple Version
 Weighted Majority Algorithm: Randomized Version
3.
Mistake Bound Model
 Learning a Concept Class C
 Learning Monotone Disjunctions
 Simple Algorithm
 Winnow Algorithm
 Learning Decision List
Conclusion
5. Q & A
4.


1
Offline Learning
Online Learning
Intro to Machine Learning
WALEED ABDULWAHAB YAHYA AL-GOBI
Machine Learning | Definition
4

Definition
“A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E”
--- [Mitchell, 1997]

More concrete Example:
Task T
: Prediction traffic patterns at a busy intersection.
Experience E
: Historical or past traffic pattern.
Performance Measure P : Accuracy of predicting future traffic patterns.

Learned Model (i.e. Target function) y = h(x)
Machine Learning | Offline Learning vs Online Learning
5

Offline Learning:

Learning Phase
the learning algorithm is trained on a pre-defined set of learning examples to create a
hypothesis.

Testing phase
The hypothesis will be used to find the accurate conclusion for a given new data.
Training Examples  Learning Algorithm  h(x)

Example: MRI brain images classification
Training Labels
Learned
model h(x)
Image Features
Training
Training Images
Machine Learning | Offline Learning vs Online Learning
6

Online Learning

As opposite to offline learning that finds the predictor h(x) on the entire training
set at once.

Online learning algorithm is a common technique used in the areas of ML where
it is computationally infeasible to train on the entire dataset all at once.

Online learning is a method of ML in which data becomes available in sequential
order, and is used to update our predictor h(x) at each step.
Machine Learning | Offline Learning vs Online Learning
7

Examples of Online Learning


Stock Price Prediction:

Here the data is generated as a function of time

so online learning can dynamically adopt to new patterns in the new data.
Spam Filtering:

Here the data is generated based on the output of learning algorithm
(Spam Detector)

so online learning can dynamically adopt to new pattern to minimize our losses.
Machine Learning | Offline Learning vs Online Learning
8

Online Learning:
Training Examples  Learning Algorithm  h(x)
Training Examples  Learning Algorithm  h(x)
…...........
Training Examples  Learning Algorithm  h(x)
Example: Stock Price Prediction
Stock prices
Time
Receiving
Truth
Data
Features
Prediction
Update
hypothesis
h(x)
Machine Learning | Offline Learning vs Online Learning
9
Offline Learning
Online Learning
Two Phase Learning: How?
Multi-phase Learning: How?
Entire dataset given at once
One Example given at time
Learn the dataset to Construct
target function h(x)
Predict, Receive correct answer,
Update target function h(x) at
each step of learning
Predict incoming new data
Learning phase is separated
from testing phase
Learning phase is combined with
testing phase


2
Basic Flow
An Example
Predicting from Expert Advice
WALEED ABDULWAHAB YAHYA AL-GOBI
Predicting from Expert Advice | Basic Flow
11
Receiving prediction
from experts
Combining Expert Advice
Prediction
Assumption: prediction ∈ {0, 1}.
Truth
Making its own
prediction
Being told the
correct answer
Predicting from Expert Advice | An Example
12




Task
Input
Output
Goal
: predicting whether it will rain today.
: advices of n experts ∈ {1 (yes), 0 (no)}.
: 1 or 0.
: make the least number of mistakes.
Expert 1
Expert 2
Expert 3
Truth
21 Jan 2013
1
0
1
1
22 Jan 2013
0
1
0
1
23 Jan 2013
1
0
1
1
24 Jan 2013
0
1
1
1
25 Jan 2013
1
0
1
1


3
Simple Version
Randomized Version
The Weighted Majority Algorithm
WALEED ABDULWAHAB YAHYA AL-GOBI
The Weighted Majority Algorithm
14
The Weighted Majority Algorithm
15
Date
Expert Advice
Weight
∑wi
Prediction
Correct
Answer
21 Jan 2013
x1
1
x2
0
x3
1
w1
1
w2
1
w3
1
(xi=0)
1
(xi=1)
2
1
1
22 Jan 2013
0
1
0
1
0.50
1
2
0.50
0
1
23 Jan 2013
1
0
1
0.50 0.50
0.50
0.50
1
1
1
24 Jan 2013
0
1
1
0.50 0.25
0.50
0.50
0.75
1
1
25 Jan 2013
1
0
1
0.25 0.25
0.50
0.25
0.75
1
1
The Weighted Majority Algorithm
16

Proof:

Let



A mistaken prediction:


W ≤ n(¾)M
Assuming the best expert made m mistakes.


At least ½ W predicted incorrectly.
In step 3 total weight reduced by a factor of ¼ (= ½ W x ½).


M := # of mistakes made by Weight Majority algorithm.
W := total weight of all experts (initially = n).
W ≥ ½m
So, ½m ≤ n(¾)M  M ≤ 2.41(m + lgn).


4
Simple Version
Randomized Version
Randomized Weighted Majority Algorithm (RWMA)
MUHAMMAD BURHAN HAFEZ
The Randomized Weighted Majority Algorithm (RWMA)
18
MWMA ≤ 2.41 (m + lg n)
Suppose n = 10, m = 20, and we run 100 prediction trials.
MWMA = 56!!!
Can we do better?
The Randomized Weighted Majority Algorithm (RWMA)
19
Two modifications:
Weights
1. View weights as probabilities.
0.25
0.5
0.25
1st Expert
2nd Expert
3rd Expert
4th Expert
1
1. Replace “multiply by ½” with “multiply by β” .
The Randomized Weighted Majority Algorithm (RWMA)
20
The algorithm:
1. Initialize the weights w1, …, wn of all experts to 1.
2. Given a set of predictions {x1, …, xn} by experts, output xi
with probability wi /W.
3. Receive the correct answer l and penalize each mistaken
expert by multiplying its weight by β.
Go-to 2.
The Randomized Weighted Majority Algorithm (RWMA)
21
RWMA in action (β = ½ ):
Experts
E1
E2
E3
E4
E5
E6
Weights
1
1
1
1
1
1
Advice
1
1
0
0
0
0
Weights
1
1
½
½
½
½
Advice
0
1
1
1
1
0
Weights
1
½
¼
¼
¼
½
prediction
Correct
answer
0
1
1
0
The Randomized Weighted Majority Algorithm (RWMA)
22
Mistake bound:
M 
1
m ln(
 ) ln( n)
1 
 Define Fi to be the fraction of the total weight on the wrong answers at ith trial.
Say we have seen t examples.
t
Let M be our expected # of mistakes so far, so M   Fi
i 1
 On the
ith
trial,
W  W (1  (1   ) Fi )
t
 W  n 1  1   Fi 
i 1
t
 n 1  1   Fi   
m
i 1
t
 ln( n)   ln 1  1   Fi   m  ln (  )
i 1
t
  ln n    ln 1  1   Fi   m  ln ( 1  )
i 1
The Randomized Weighted Majority Algorithm (RWMA)
23
t
  ln n    ln 1  1   Fi   m  ln ( 1  )
i 1
x
t
  ln n    1  β Fi 
i 1
x
t
t
 ln n    ln 1  1   Fi   m  ln( 1  )
i 1
  ln( n)  (1   ) Fi  m  ln ( 1  )
i 1
 M  m ln(
1
 ) ln( n)
1 
The Randomized Weighted Majority Algorithm (RWMA)
24
The relation between β and M:
β
M
¼
1.85m + 1.3 ln (n)
½
1.39m + 2 ln (n)
¾
1.15m + 4 ln (n)
When β = ½
The simple algorithm
RWMA
M ≤ 2.41m + 2.41 ln(n)
M ≤ 1.39m + 2 ln(n)
The Randomized Weighted Majority Algorithm (RWMA)
25
Other advantages of RWMA:
1.
Consider the case where just only %51 of the experts were mistaken.
•
In WMA, we directly use this majority and predict accordingly, resulting
in a wrong prediction.
•
In RWMA, there is still roughly a 50/50 chance that we’ll predict
correctly.
2.
Consider the case where predictions are strategies (cannot easily be
combined together).
•
In WMA, since all strategies are generally different, we cannot combine
experts who predicted the same strategies.
•
RWMA can be directly applied, because it doesn’t depend on summing
the weights of experts who gave the same strategy to make a decision,
but rather on the individual weights of experts



5
A Concept Class
Mistake Bound Model
Definition of learning a class in Mistake Bound Model
Learning a Concept Class in Mistake Bound Model
KIM HYEONGCHEOL
Quick Review
27
What we covered so far …

Input : Yes/No from the “experts”




Output : The algorithm make a prediction as well




Weather experts
Question to experts : Will it rain tomorrow?
Experts’ prediction : Yes/No
Question to the algorithm : Will it rain tomorrow?
Prediction : Yes/No
Penalization according to the correctness
Simple algorithm & Better randomized algorithm
Learn a Concept Class
28
On line learning a concept class C in
Mistake Bound Model
# Questions
 What is a concept class C?
 What is Mistake Bound Model?
 What do we mean by learning a concept class in
Mistake Bound Model?
A Concept Class C
29

Definition

A concept class C
A set of Boolean functions over a domain X
 Each Boolean function in the set can be called a concept
 E.g. A concept class of disjunctions over a domain X ∈ {0,1}n



All functions of the class can be described as *disjunctions over
the variables {X1,…..,Xn}
A concept Class C of disjunctions
* Disjunction : a ∨ b
 X1 ∨ X2 : A concept
* Conjunction : a ∧ b
 X3 ∨ X2 ∨ X6 : A concept
 X5 ∨ X1 ∨ X7 ∨ X8 ∨ Xn : A concept
:
Mistake Bound Model
30

On-line learning


Iteration:
 The algorithm receives unlabeled example
 The algorithm predicts the label of the example
 The algorithm is then given the true label
 Penalization will be applied to the algorithm depending
on correctness
Mistake Bound

The mistake made by the algorithm is bounded by M
(ideally, we hope M is as small as possible)
Learning a Concept Class in Mistake Bound Model
31

Assumption & Condition
 The
target concept could be any (unknown) concept
from the concept class.
 The
target concept is fixed during the process.
 The
true label attached to examples are generated by
target concept c ∈ C
 In
each example, true label of an example = c(x)
 The true label will be given to the algorithm to update
hypothesis

The goal is to make as few mistakes as possible
Learning a Concept Class in Mistake Bound Model
32

Assumption & Condition (cont’d)
any concept c ∈ C,
 At most, poly(n, size(c)) mistakes
 For
poly : a polynomial equation
n : The description length of the examples
Ex) For X { X1, X2….X10 }, n = 10
size(c) : the description length of some
concept c ∈ C
Ex) Size (X1 ∨ X2 ∨ X6 ) = 3
Learning a Concept Class in Mistake Bound Model
33

If the algorithm takes the assumption and the
condition, we can say that, it learns class C in the
mistake bound learning model

Especially, if the number of mistakes made is only
poly(size(c)) ∙ polylog(n), the algorithm is robust to
the presence of many additional irrelevant variables :
attribute efficient
Examples of Learnings
34
Some examples of learning classes in Mistake
Bound Model

Monotone disjunctions



Simple algorithm
The winnow algorithm
Decision list


6
Simple Algorithm
Winnow Algorithm
Learning Monotone Disjunctions
KIM HYEONGCHEOL
Learning Monotone disjunctions | Problem Definition
36




Monotone disjunctions, Boolean functions of the form
for some subset S ⊆ {1, . . . , n}
 i.e. X2 ∨ X3
Input:
𝑛
 a sequence of 𝑋 ∈ {0, 1}
 a sequence of true label 𝐶 ∈ {0,1}, which is generated by an
unknown monotone disjunction
Output:
 a sequence of predicted label 𝐵 ∈ {0,1}
Objective: make least false prediction
Simple Algorithm
37

Algorithm workflow
A
initial hypothesis for prediction
h
= X1 ∨ X2 ………..∨ Xn
hypothesis will be given examples, 𝑋 ∈ {0,1}n
 If a mistake is made on negative example 𝑋, remove
all variables in X that equals to 1 from hypothesis h.
 The

Mistake bound
 At
most n mistakes !
Simple Algorithm | An example
38

When the target concept c(x) = X2 ∨ X3
Hypothesis ‘h’
Red -> A mistake on negative example
Green -> A correct prediction
‘n’ = 6
c(x)
Negative examples
Simple Algorithm | An example
39



6
Simple Algorithm
Winnow Algorithm
Learning Monotone Disjunctions
HE RUIDAN
Learning the class of disjunctions | Winnow Algorithm
41

The simple algorithm learns the class of disjunctions with
mistakes bound by n

Winnow algorithm : An algorithm with less mistakes
Winnow Algorithm | Basic Concept
42

Winnow Algorithm:
Each input vector x = {x1, x2, … xn}, xi ∈ {0, 1}
 Assume the target function is the disjunction of r relevant
variables. i.e. c(x) = xt1 V xt2 V … V xtr


Winnow algorithm provides
a linear separator
Winnow Algorithm | Work Flow
43


Initialize: weights w1 = w2 = … = wn=1
Iterate:
Receive an example vector x = {x1, x2, … xn}
 Predict:

Output 1 if
 Output 0 otherwise

Get the true label
 Update if making a mistake:

Predict negative on positve example: for each xi = 1: wi = 2*wi
 Predict positive on negative example: for each xi = 1: wi = wi/2

Winnow Algorithm | Mistake Bound
44

Theorem: The Winnow Algorithm learns the class of
disjunctions in the Mistake Bound model, making at most
2+3r(1 + lgn) mistakes when the target concept is a
disjunction of r variables
Attribute efficient: the # of mistakes is only poly(r) .
polylog(n)
 Particularly good for learning where the number of
relevant variables r is much less than the total number of
variables n

Winnow Algorithm | Proof of Mistake Bound
45


u: # of mistakes made on positive examples (output 0 while
true result is 1)
v: # of mistakes made on negative examples (output 1 while
true result is 0)
Proof 1: u <= r(1 + lgn)
 Proof 2: v < 2(u + 1)
 Therefore, # of total mistakes = u + v = 3u + 2, which is
bounded by 2 + 3r(1 + lgn)

Winnow Algorithm | Proof of Mistake Bound
46

u: # of mistakes made on positive examples

v: # of mistakes made on negative examples

Proof 1: u <= r(1+lgn)

Any mistakes made on positive examples must double at least one of
the weights in the target function
Winnow Algorithm | Proof of Mistake Bound
47

Any mistakes made on positive examples must double at
least one of the weights in the target function



For an example X, h(X) = negative, c(X) = positive
c(X) = positive  at least one target variable is one in X
According to the algorithm, when hypothesis predicts positive as
negative, the weights of variables equals to one in the example are
doubled, therefore at least the weight of one target variable will be
doubled.
Winnow Algorithm | Proof of Mistake Bound
48

u: # of mistakes made on positive examples

v: # of mistakes made on negative examples

Proof 1: u <= r(1+lgn)


Any mistakes made on positive examples must double at least one of
the weights in the target function
The weights of target variables will not be halved.
Winnow Algorithm | Proof of Mistake Bound
49

The weights of target variables will not be halved.

According to the algorithm, only when h(X) = positive while c(X) =
negative, the weights of variables equals to one in X will be halved.

c(X) = negative  no target variable is one in X  no target variables’
weight will be halved
Winnow Algorithm | Proof of Mistake Bound
50

u: # of mistakes made on positive examples

v: # of mistakes made on negative examples

Proof 1: u <= r(1+lgn)



Any mistakes made on positive examples must double at least one of
the weights in the target function
The weights of target variables will not be halved.
Each of the weights of target variables can be doubled at most 1 +
lgn times.
Winnow Algorithm | Proof of Mistake Bound
51

Each of the weights of target variables can be doubled at most 1
+ lgn times





The weight of target variable could only be doubled, will never be halved.
When the weight of any target variable equals or larger than n, hypothesis
will always predict positive, if the target variable is one.
Only when hypothesis predict negative on positive examples, the weights
of variables equals to one will be doubled  if the hypothesis always
predict positive, no weights will be doubled.
Therefore, the weight of any target variable cannot be doubled when it
equals or larger than n
The weight of any target variable can be doubled at most 1+lgn times
Winnow Algorithm | Proof of Mistake Bound
52

u: # of mistakes made on positive examples

v: # of mistakes made on negative examples

Proof 1: u <= r(1+lgn)




Any mistakes made on positive examples must double at least one of
the weights in the target function
Each of the weight of target variables will not be halved. As when any
of the target variable is one, the example must not be negative
Each of the weights of target variables can be doubled at most 1 +
lgn times as only weight less than n can be doubled.
Therefore, u <= r(1+lgn) since there are r variables in target function
Winnow Algorithm | Proof of Mistake Bound
53

u: # of mistakes made on positive examples

v: # of mistakes made on negative examples

Proof 2: v < 2(u+1)

Initially, total weight W = n
Mistake on positive examples: W < W + n
Mistake on negative examples: W <= W – n/2
Therefore, W < n + un – v(n/2)

0 <= W < n + un – v(n/2)




v < 2(u + 1)
# mistakes = u + v < 3u + 2
# mistakes < 2 + 3r(1 + lgn)

7
Learning Decision List in Mistake Bound Model
Learning Decision List in Mistake Bound Model
SHANG XINDI
Decision List
55
if 𝑋1 then 𝐵1 … else if 𝑋𝑟 then 𝐵𝑟 else 𝐵𝑟+1

A general form of a decision list is
{𝑋1 → 𝐵1 , ⋯ , 𝑋𝑛1 → 𝐵𝑛1 }
⋯
𝑒𝑙𝑠𝑒 𝑋𝑛𝑟−1+1 → 𝐵𝑛𝑟−1+1 , ⋯ , 𝑋𝑛𝑟 → 𝐵𝑛𝑟
𝑒𝑙𝑠𝑒 {𝑇𝑟𝑢𝑒 → 𝐵𝑛𝑟+1 , ⋯ }
where 𝑋𝑖 is boolean variable, 𝐵𝑖 ∈ 0,1
-- level 1
-- level r
-- level r+1
Decision List | Example
56
Decision List vs Disjunction
57
Learning Decision List
58
Learning Decision List | Algorithm
59



Hypothesis ℎ: decision list
Initialize: 1-level decision list, which contains 4n+2 possible
“if/then” rules
Iterate:
 Receive an example 𝑋 = {𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 }
 Predict:
Find the first level that contains a rule satisfied by 𝑋
 Use that rule for prediction (if there are several choices, choose
one arbitrarily)

Receive the true label
 Update if making a mistake:


Move all rules that predict wrong down to the next level
Learning Decision List | Example
60
{𝑋
{𝑋11→
→0,0,𝑋𝑋11→
→1,1,𝑋𝑋11→
→0,0,𝑋𝑋11→
→1,1,
𝑋𝑋22→
→0,0,𝑋𝑋22→
→1,1,𝑋𝑋22→
→0,0,𝑋𝑋22→
→1,1,
𝑇𝑟𝑢𝑒
𝑇𝑟𝑢𝑒→
→0,0,𝑇𝑟𝑢𝑒
𝑇𝑟𝑢𝑒→
→1}
1}
Learning Decision List | Example
61
{𝑋1 → 0, 𝑋1 → 1, 𝑋1 → 0, 𝑋2 → 0, 𝑋2 → 0, 𝑋2 → 1, 𝑇𝑟𝑢𝑒 → 0}
else {𝑋1 → 1, 𝑋2 → 1, 𝑇𝑟𝑢𝑒 → 1}
Learning Decision List | Example
62
𝑿𝟏
𝑿𝟐
𝒄
𝒉
0
0
1
1
0
1
0
0
1
0
0
0
1
1
0
0
Learning Decision List | Mistake Bound
63
Summary
64
Introduction: online learning vs. offline learning
2. Predicting from Expert Advice
1.
 Weighted Majority Algorithm: Simple Version
 Weighted Majority Algorithm: Randomized Version
3.
Mistake Bound Model
 Learning a Concept Class C
 Learning Monotone Disjunctions
 Simple Algorithm
 Winnow Algorithm
 Learning Decision List
4.
Demo of online learning
Learning to Swing-Up and Balance
65
Q&A
Download