King Saud University College of Computer & Information Sciences IS465: Decision Support Systems Tutorial 5: Decision Tree Induction for Classification Exercise 1: The following table consists of training data from mobile store. Construct a Decision Tree based on this data, using the basic algorithm for decision tree induction. Classify the records by the “Customer satisfaction” attribute. Memory Battery Life Price <=4 >4 >4 <=4 >4 >4 <=4 <=4 >4 <=4 <=4 >4 <=4 >4 >4 High High High High High Low Low Low Low Low Medium Medium Medium Medium Medium <=150 >150 <=150 >150 >150 >150 >150 >150 <=150 <=150 <=150 <=150 >150 >150 <=150 Customer Satisfaction Yes Yes Yes Yes Yes Yes No No Yes No No No Yes Yes No Exercise 2: The following table consists of training data from computer store. Construct a Decision Tree based on this data, using the basic algorithm for decision tree induction. Classify the records by the “Buys_computer” attribute. age <=30 <=30 31…40 >40 >40 >40 31…40 <=30 <=30 >40 <=30 31…40 31…40 >40 income High High High Medium Low Low Low Medium Low Medium Medium Medium High Medium student No No No No Yes Yes Yes No Yes Yes Yes No Yes No Leasing_rating Fair Excellent Fair Fair Fair Excellent Excellent Fair Fair Fair Excellent Excellent Fair Excellent Buys_computer No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No Solutions: Exercise 1: Class P: customer satisfaction = “yes” Class N: customer satisfaction = “no” I(p, n)=I(9,6)=-9/15 log(9/15)-6/15log(6/15)= 0.971 Attribute A1= “Battery life”: S1 =“high”: p1=5 n1=0 I(p1, n1)=0 S2 =“medium”: p2=2 n2=3 I(p2, n2)= 0.9710 S3 =“low”: p3=2 n3=3 I(p3, n3)= 0.9710 E(A1)=5/15 *I(p1, n1)+ 5/15*I(p2, n2)+ 5/15 *I(p3, n3)= 0.6473 Gain(A1)= I(p, n)-E(A1)= 0.3237 Attribute A2= “Memory”: S1 =“<=4”: p1=3 n1=4 I(p1, n1)= 0.9852 S2 =“>4”: p2=6 n2=2 I(p2, n2)= 0.8113 E(A2)=7/15 *I(p1, n1)+ 8/15*I(p2, n2) =0.8925 Gain(A2)= I(p, n)-E(A2)= 0.0785 Attribute A3= “Price”: S1 =”<=150”: p1=3 n1=4 I(p1, n1)= 0.9852 S2 =“>150”: p2=6 n2=2 I(p2, n2)= 0.8113 E(A3)=7/15 *I(p1, n1)+ 8/15*I(p2, n2) =0.8925 Gain(A3)=(p, n)-E(A3)= 0.0785 Since A1 =”Battery life” has the highest information gain value it is selected as the first test attribute. For value “high” all the samples belong to the same class (“yes”). We get: Battery Life High Medium Yes Low For value “Medium”: I(p, n)=I(2,3)= 0.9710 Attribute A2= “Memory”: S1 =“<=4”: p1=1 n1=1 I(p1, n1)= 1 S2 =“>4”: p2=1 n2=2 I(p2, n2)= 0.9183 E(A2)=2/5 *I(p1, n1)+ 3/5*I(p2, n2) =0.9510 Gain(A2)= I(p, n)-E(A2)= 0.0200 Attribute A3= “Price”: S1 =”<=150”: p1=0 n1=3 I(p1, n1)= 0 S2 =“>150”: p2=2 n2=0 I(p2, n2)= 0 E(A3)=3/5 *I(p1, n1)+ 2/5*I(p2, n2) =0 Gain(A3)=(p, n)-E(A3)= 0.9710 A3> A2 , i.e. A3=“Price” is selected as a test attribute for the edge “Medium”. For ”<=150” all samples belong to the same class “no” and for ”>150” all samples belong to the same class “yes”. We get: Battery Life High Medium Yes Price <= 150 No For value “Low”: I(p, n)=I(2,3)= 0.9710 Attribute A2= “Memory”: S1 =“<=4”: p1=0 n1=3 I(p1, n1)= 0 S2 =“>4”: p2=2 n2=0 I(p2, n2)= 0.9183 E(A2)=3/5 *I(p1, n1)+ 2/5*I(p2, n2) =0 Gain(A2)= I(p, n)-E(A2)= 0.9710 > 150 Yes Low Attribute A3= “Price”: S1 =”<=150”: p1=1 n1=1 I(p1, n1)= 1 S2 =“>150”: p2=1 n2=2 I(p2, n2)= 0.9183 E(A3)=2/5 *I(p1, n1)+ 3/5*I(p2, n2) =0.9510 Gain(A3)=(p, n)-E(A3)= 0.0200 A2> A3 , i.e. A2=“Memory” is selected as a test attribute for the edge “Low”. For ”>4” all samples belong to the same class “yes” and for ”<=4” all samples belong to the same class “no”. We get the final decision tree: Battery Life High Medium Yes Low Price <= 150 No Memory > 150 Yes >4 Yes <= 4 No