Uploaded by im.alobaid

05decisionTreeInduction

advertisement
King Saud University
College of Computer & Information Sciences
IS465: Decision Support Systems
Tutorial 5: Decision Tree Induction for Classification
Exercise 1:
The following table consists of training data from mobile store. Construct a Decision
Tree based on this data, using the basic algorithm for decision tree induction.
Classify the records by the “Customer satisfaction” attribute.
Memory
Battery Life
Price
<=4
>4
>4
<=4
>4
>4
<=4
<=4
>4
<=4
<=4
>4
<=4
>4
>4
High
High
High
High
High
Low
Low
Low
Low
Low
Medium
Medium
Medium
Medium
Medium
<=150
>150
<=150
>150
>150
>150
>150
>150
<=150
<=150
<=150
<=150
>150
>150
<=150
Customer
Satisfaction
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
No
No
No
Yes
Yes
No
Exercise 2:
The following table consists of training data from computer store. Construct a
Decision Tree based on this data, using the basic algorithm for decision tree
induction.
Classify the records by the “Buys_computer” attribute.
age
<=30
<=30
31…40
>40
>40
>40
31…40
<=30
<=30
>40
<=30
31…40
31…40
>40
income
High
High
High
Medium
Low
Low
Low
Medium
Low
Medium
Medium
Medium
High
Medium
student
No
No
No
No
Yes
Yes
Yes
No
Yes
Yes
Yes
No
Yes
No
Leasing_rating
Fair
Excellent
Fair
Fair
Fair
Excellent
Excellent
Fair
Fair
Fair
Excellent
Excellent
Fair
Excellent
Buys_computer
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
Solutions:
Exercise 1:
Class P: customer satisfaction = “yes”
Class N: customer satisfaction = “no”
I(p, n)=I(9,6)=-9/15 log(9/15)-6/15log(6/15)= 0.971
Attribute A1= “Battery life”:
S1 =“high”:
p1=5 n1=0
I(p1, n1)=0
S2 =“medium”:
p2=2 n2=3
I(p2, n2)= 0.9710
S3 =“low”:
p3=2 n3=3
I(p3, n3)= 0.9710
E(A1)=5/15 *I(p1, n1)+ 5/15*I(p2, n2)+ 5/15 *I(p3, n3)= 0.6473
Gain(A1)= I(p, n)-E(A1)= 0.3237
Attribute A2= “Memory”:
S1 =“<=4”:
p1=3 n1=4
I(p1, n1)= 0.9852
S2 =“>4”:
p2=6 n2=2
I(p2, n2)= 0.8113
E(A2)=7/15 *I(p1, n1)+ 8/15*I(p2, n2) =0.8925
Gain(A2)= I(p, n)-E(A2)= 0.0785
Attribute A3= “Price”:
S1 =”<=150”:
p1=3 n1=4
I(p1, n1)= 0.9852
S2 =“>150”:
p2=6 n2=2
I(p2, n2)= 0.8113
E(A3)=7/15 *I(p1, n1)+ 8/15*I(p2, n2) =0.8925
Gain(A3)=(p, n)-E(A3)= 0.0785
Since A1 =”Battery life” has the highest information gain value it is
selected as the first test attribute. For value “high” all the samples belong
to the same class (“yes”). We get:
Battery Life
High
Medium
Yes
Low
For value “Medium”:
I(p, n)=I(2,3)= 0.9710
Attribute A2= “Memory”:
S1 =“<=4”:
p1=1 n1=1
I(p1, n1)= 1
S2 =“>4”:
p2=1 n2=2
I(p2, n2)= 0.9183
E(A2)=2/5 *I(p1, n1)+ 3/5*I(p2, n2) =0.9510
Gain(A2)= I(p, n)-E(A2)= 0.0200
Attribute A3= “Price”:
S1 =”<=150”:
p1=0 n1=3
I(p1, n1)= 0
S2 =“>150”:
p2=2 n2=0
I(p2, n2)= 0
E(A3)=3/5 *I(p1, n1)+ 2/5*I(p2, n2) =0
Gain(A3)=(p, n)-E(A3)= 0.9710
A3> A2 , i.e. A3=“Price” is selected as a test attribute for the edge
“Medium”. For ”<=150” all samples belong to the same class “no” and for
”>150” all samples belong to the same class “yes”. We get:
Battery Life
High
Medium
Yes
Price
<= 150
No
For value “Low”:
I(p, n)=I(2,3)= 0.9710
Attribute A2= “Memory”:
S1 =“<=4”:
p1=0 n1=3
I(p1, n1)= 0
S2 =“>4”:
p2=2 n2=0
I(p2, n2)= 0.9183
E(A2)=3/5 *I(p1, n1)+ 2/5*I(p2, n2) =0
Gain(A2)= I(p, n)-E(A2)= 0.9710
> 150
Yes
Low
Attribute A3= “Price”:
S1 =”<=150”:
p1=1 n1=1
I(p1, n1)= 1
S2 =“>150”:
p2=1 n2=2
I(p2, n2)= 0.9183
E(A3)=2/5 *I(p1, n1)+ 3/5*I(p2, n2) =0.9510
Gain(A3)=(p, n)-E(A3)= 0.0200
A2> A3 , i.e. A2=“Memory” is selected as a test attribute for the edge
“Low”. For ”>4” all samples belong to the same class “yes” and for ”<=4”
all samples belong to the same class “no”. We get the final decision tree:
Battery Life
High
Medium
Yes
Low
Price
<= 150
No
Memory
> 150
Yes
>4
Yes
<= 4
No
Download