ML Lect 05 DecisionTree

advertisement
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Machine Learning Models
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Decision Tree
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Classification by Decision Tree Induction
• Decision tree
–
–
–
–
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Training Dataset
age
<=30
<=30
31…40
>40
>40
>40
31…40
<=30
<=30
>40
<=30
31…40
31…40
>40
income
high
high
high
medium
low
low
low
medium
low
medium
medium
medium
high
medium
student
no
no
no
no
yes
yes
yes
no
yes
yes
yes
no
yes
no
credit_rating
fair
excellent
fair
fair
fair
excellent
excellent
fair
fair
fair
excellent
excellent
fair
excellent
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Output: A Decision Tree for “buys_computer”
age?
<=30
overcast
30..40
student?
yes
>40
credit rating?
no
yes
excellent
fair
no
yes
no
yes
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer
manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are
discretized in advance)
– Examples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning
– There are no samples left
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Example of a Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
10
Refund
Yes
No
NO
MarSt
Single, Divorced
TaxInc
NO
< 80K
NO
> 80K
YES
Model: Decision Tree
Training Data
Department of Computer Science & Information Technology
The University of Lahore
Married
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Apply Model to Test Data
Test Data
Start at the root of tree
Refund
Yes
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
10
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
NO
> 80K
YES
Department of Computer Science & Information Technology
The University of Lahore
?
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Apply Model to Test Data
Test Data
Refund
Yes
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
10
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
NO
> 80K
YES
Department of Computer Science & Information Technology
The University of Lahore
?
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Apply Model to Test Data
Test Data
Refund
Yes
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
10
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
NO
> 80K
YES
Department of Computer Science & Information Technology
The University of Lahore
?
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Apply Model to Test Data
Test Data
Refund
Yes
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
10
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
NO
> 80K
YES
Department of Computer Science & Information Technology
The University of Lahore
?
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Apply Model to Test Data
Test Data
Refund
Yes
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
10
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
NO
> 80K
YES
Department of Computer Science & Information Technology
The University of Lahore
?
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Apply Model to Test Data
Test Data
Refund
Yes
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
10
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
Assign Cheat to “No”
NO
> 80K
YES
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Attribute Selection Measure
• Information gain (ID3/C4.5)
– All attributes are assumed to be categorical
– Can be modified for continuous-valued attributes
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Decision Tree Learning: ID3
Function ID3(Training-set, Attributes)
– If all elements in Training-set are in same class, then return leaf
node labeled with that class
– Else if Attributes is empty, then return leaf node labeled with
majority class in Training-set
– Else if Training-Set is empty, then return leaf node labeled with
default majority class
– Else
Select and remove A from Attributes
Make A the root of the current tree
For each value V of A
–
–
–
–
Create a branch of the current tree labeled by V
Partition_V  Elements of Training-set with value V for A
Induce-Tree(Partition_V, Attributes)
Attach result to branch V
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Entropy (1)
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Entropy (2)
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Information Gain
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Information Gain (ID3/C4.5)
• Select the attribute with the highest information gain
• Assume there are two classes, P and N
– Let the set of examples S contain p elements of class P
and n elements of class N
– The amount of information, needed to decide if an
arbitrary example in S belongs to P or N is defined as
p
p
n
n
I ( p, n)  
log 2

log 2
pn
pn pn
pn
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Information Gain in Decision Tree Induction
• Assume that using attribute A a set S will be partitioned
into sets {S1, S2 , …, Sv}
– If Si contains pi examples of P and ni examples of N, the
entropy, or the expected information needed to classify objects
in all subtrees Si is
pi  ni
E ( A)  
I ( pi , ni )
i 1 p  n

• The encoding information that would be gained by
branching on A Gain( A)  I ( p, n)  E ( A)
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Attribute Selection by Information Gain
Computation

Class P: buys_computer = “yes”

Class N: buys_computer = “no”

I(p, n) = I(9, 5) =0.940

Compute the entropy for age:
5
4
I ( 2,3) 
I ( 4,0)
14
14
5

I (3,2)  0.69
14
E ( age) 
Hence
Gain(age)  I ( p, n)  E (age)
Similarly
age
<=30
30…40
>40
pi
2
4
3
ni I(pi, ni)
3 0.971
0 0
2 0.971
Gain(income)  0.029
Gain( student )  0.151
Gain(credit _ rating )  0.048
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Extracting Classification Rules from Trees
•
•
•
•
•
•
Represent the knowledge in the form of IF-THEN rules
One rule is created for each path from the root to a leaf
Each attribute-value pair along a path forms a conjunction
The leaf node holds the class prediction
Rules are easier for humans to understand
Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40”
THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Choosing Best Attribute?
• Consider 64 examples, 29 and 35
• Which one is better?
29, 35 A1
t
29, 35 A2
f
t
f
25, 5
4, 30
14, 16
15, 19
• Which is better?
29, 35 A1
t
21, 5
29, 35 A2
t
f
8, 30
f
18, 33
Department of Computer Science & Information Technology
24
The University of Lahore
11, 2
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Entropy
•
A measure for
–
–
–
uncertainty
purity
information content
Information theory: optimal length code assigns ( log2p) bits to
message having probability p
• S is a sample of training examples
•
– p+ is the proportion of positive examples in S
– p is the proportion of negative examples in S
•
•
Entropy of S: average optimal number of bits to encode
information about certainty/uncertainty about S
Entropy(S)  p(log2p)  p(log2p)  plog2p plog2p
Can be generalized to more than two values
Department of Computer Science & Information Technology
25
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Entropy
 Entropy can also be viewed as measuring
– purity of S,
– uncertainty in S,
– information in S, …
 E.g.: values of entropy for p+=1, p+=0, p+=.5
 Easy generalization to more than binary values
– Sum over pi *(-log2 pi) , i=1,n
 i is + or – for binary
 i varies from 1 to n in the general case
Department of Computer Science & Information Technology
26
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Choosing Best Attribute?
• Consider 64 examples (29,35) and compute entropies:
• Which one is better?
29, 35 A1 E(S)=0.993
0.650
t
f
0.522
25, 5
4, 30
• Which is better?
29,
0.708
21, 5
35
t
A1
E(S)=0.993
f
0.742
8, 30
29,
35
t
0.989
E(S)=0.993
A2
f
14, 16
15, 19
29,
0.937
35
t
E(S)=0.993
A2
f
18, 33
Department of Computer Science & Information Technology
27
The University of Lahore
0.997
0.619
11, 2
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Information Gain
• Gain(S,A): reduction in entropy after choosing attr. A
Gain( S , A)  Entropy( S ) 
Sv

S
vValues( A )
29, 35 A1 E(S)=0.993
0.650
t
f
25, 5
0.522
4, 30
29,
0.989
0.708
21, 5
t
t
E(S)=0.993
A2
f
0.997
14, 16
Gain: 0.000
E(S)=0.993
f
35
15, 19
Gain: 0.395
29, 35 A1
Entropy( S v )
0.742
8, 30
E(S)=0.993
29, 35 A2
0.937
t
f
18, 33
Gain:
0.265 of Computer Science & InformationGain:
0.121
Department
Technology
28
The University of Lahore
0.619
11, 2
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Gain function
 Gain is measure of how much can
– Reduce uncertainty
 Value lies between 0,1
 What is significance of
 gain of 0?
 example where have 50/50 split of +/- both before and after
discriminating on attributes values
 gain of 1?
 Example of going from “perfect uncertainty” to perfect certainty
after splitting example with predictive attribute
– Find “patterns” in TE’s relating to attribute values
 Move to locally minimal representation of TE’s
Department of Computer Science & Information Technology
29
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Training Examples
Day
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
Outlook
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Temp
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
Humidity
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Wind Tennis?
Weak
No
Strong
No
Weak
Yes
Weak
Yes
Weak
Yes
Strong
No
Strong
Yes
Weak
No
Weak
Yes
Weak
Yes
Strong
Yes
Strong
Yes
Weak
Yes
Strong
No
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Determine the Root Attribute
9+, 5
E0.940
9+, 5
Humidity
High
3+, 4
E0.985
E0.940
Wind
Low
Weak
6+, 1
E0.592
6+, 2
E0.811
Strong
3+, 3
E1.000
Gain (S, Humidity)  0.151
Gain (S, Wind)  0.048
Gain (S, Outlook)  0.246
Gain (S, Temp)  0.029
Department of Computer Science & Information Technology
31
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Sort the Training Examples
9+, 5 {D1,…,D14}
Outlook
Sunny
{D1,D2,D8,D9,D11}
2+, 3
Overcast
Rain
{D3,D7,D12,D13}
4+, 0
?
Yes
{D4,D5,D6,D10,D15}
3+, 2
?
Ssunny= {D1,D2,D8,D9,D11}
Gain (Ssunny, Humidity) = .970
Gain (Ssunny, Temp) = .570
Gain (Ssunny, Wind) = .019
Department of Computer Science & Information Technology
32
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Final Decision Tree for Example
Outlook
Sunny
Rain
Overcast
Humidity
High
No
Yes
Normal
Yes
Wind
Strong
No
Department of Computer Science & Information Technology
33
The University of Lahore
Weak
Yes
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Department of Computer Science & Information Technology
The University of Lahore
Lecture 06: Machine Learning Models
MSCS: Machine Learning
Discusion
• Hypothesis Space
• Overfitting and Underfitting
• Bias/Variance
Department of Computer Science & Information Technology
The University of Lahore
Download