Lecture 06: Machine Learning Models MSCS: Machine Learning Machine Learning Models Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Decision Tree Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Classification by Decision Tree Induction • Decision tree – – – – A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution • Decision tree generation consists of two phases – Tree construction • At start, all the training examples are at the root • Partition examples recursively based on selected attributes – Tree pruning • Identify and remove branches that reflect noise or outliers • Use of decision tree: Classifying an unknown sample – Test the attribute values of the sample against the decision tree Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Training Dataset age <=30 <=30 31…40 >40 >40 >40 31…40 <=30 <=30 >40 <=30 31…40 31…40 >40 income high high high medium low low low medium low medium medium medium high medium student no no no no yes yes yes no yes yes yes no yes no credit_rating fair excellent fair fair fair excellent excellent fair fair fair excellent excellent fair excellent Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Output: A Decision Tree for “buys_computer” age? <=30 overcast 30..40 student? yes >40 credit rating? no yes excellent fair no yes no yes Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Algorithm for Decision Tree Induction • Basic algorithm (a greedy algorithm) – Tree is constructed in a top-down recursive divide-and-conquer manner – At start, all the training examples are at the root – Attributes are categorical (if continuous-valued, they are discretized in advance) – Examples are partitioned recursively based on selected attributes – Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) • Conditions for stopping partitioning – All samples for a given node belong to the same class – There are no remaining attributes for further partitioning – There are no samples left Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Example of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 60K 10 Refund Yes No NO MarSt Single, Divorced TaxInc NO < 80K NO > 80K YES Model: Decision Tree Training Data Department of Computer Science & Information Technology The University of Lahore Married Lecture 06: Machine Learning Models MSCS: Machine Learning Apply Model to Test Data Test Data Start at the root of tree Refund Yes Refund Marital Status Taxable Income Cheat No 80K Married 10 No NO MarSt Married Single, Divorced TaxInc < 80K NO NO > 80K YES Department of Computer Science & Information Technology The University of Lahore ? Lecture 06: Machine Learning Models MSCS: Machine Learning Apply Model to Test Data Test Data Refund Yes Refund Marital Status Taxable Income Cheat No 80K Married 10 No NO MarSt Married Single, Divorced TaxInc < 80K NO NO > 80K YES Department of Computer Science & Information Technology The University of Lahore ? Lecture 06: Machine Learning Models MSCS: Machine Learning Apply Model to Test Data Test Data Refund Yes Refund Marital Status Taxable Income Cheat No 80K Married 10 No NO MarSt Married Single, Divorced TaxInc < 80K NO NO > 80K YES Department of Computer Science & Information Technology The University of Lahore ? Lecture 06: Machine Learning Models MSCS: Machine Learning Apply Model to Test Data Test Data Refund Yes Refund Marital Status Taxable Income Cheat No 80K Married 10 No NO MarSt Married Single, Divorced TaxInc < 80K NO NO > 80K YES Department of Computer Science & Information Technology The University of Lahore ? Lecture 06: Machine Learning Models MSCS: Machine Learning Apply Model to Test Data Test Data Refund Yes Refund Marital Status Taxable Income Cheat No 80K Married 10 No NO MarSt Married Single, Divorced TaxInc < 80K NO NO > 80K YES Department of Computer Science & Information Technology The University of Lahore ? Lecture 06: Machine Learning Models MSCS: Machine Learning Apply Model to Test Data Test Data Refund Yes Refund Marital Status Taxable Income Cheat No 80K Married ? 10 No NO MarSt Married Single, Divorced TaxInc < 80K NO Assign Cheat to “No” NO > 80K YES Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Attribute Selection Measure • Information gain (ID3/C4.5) – All attributes are assumed to be categorical – Can be modified for continuous-valued attributes Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Decision Tree Learning: ID3 Function ID3(Training-set, Attributes) – If all elements in Training-set are in same class, then return leaf node labeled with that class – Else if Attributes is empty, then return leaf node labeled with majority class in Training-set – Else if Training-Set is empty, then return leaf node labeled with default majority class – Else Select and remove A from Attributes Make A the root of the current tree For each value V of A – – – – Create a branch of the current tree labeled by V Partition_V Elements of Training-set with value V for A Induce-Tree(Partition_V, Attributes) Attach result to branch V Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Entropy (1) Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Entropy (2) Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Information Gain Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Information Gain (ID3/C4.5) • Select the attribute with the highest information gain • Assume there are two classes, P and N – Let the set of examples S contain p elements of class P and n elements of class N – The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as p p n n I ( p, n) log 2 log 2 pn pn pn pn Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Information Gain in Decision Tree Induction • Assume that using attribute A a set S will be partitioned into sets {S1, S2 , …, Sv} – If Si contains pi examples of P and ni examples of N, the entropy, or the expected information needed to classify objects in all subtrees Si is pi ni E ( A) I ( pi , ni ) i 1 p n • The encoding information that would be gained by branching on A Gain( A) I ( p, n) E ( A) Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Attribute Selection by Information Gain Computation Class P: buys_computer = “yes” Class N: buys_computer = “no” I(p, n) = I(9, 5) =0.940 Compute the entropy for age: 5 4 I ( 2,3) I ( 4,0) 14 14 5 I (3,2) 0.69 14 E ( age) Hence Gain(age) I ( p, n) E (age) Similarly age <=30 30…40 >40 pi 2 4 3 ni I(pi, ni) 3 0.971 0 0 2 0.971 Gain(income) 0.029 Gain( student ) 0.151 Gain(credit _ rating ) 0.048 Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Extracting Classification Rules from Trees • • • • • • Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunction The leaf node holds the class prediction Rules are easier for humans to understand Example IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no” Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Choosing Best Attribute? • Consider 64 examples, 29 and 35 • Which one is better? 29, 35 A1 t 29, 35 A2 f t f 25, 5 4, 30 14, 16 15, 19 • Which is better? 29, 35 A1 t 21, 5 29, 35 A2 t f 8, 30 f 18, 33 Department of Computer Science & Information Technology 24 The University of Lahore 11, 2 Lecture 06: Machine Learning Models MSCS: Machine Learning Entropy • A measure for – – – uncertainty purity information content Information theory: optimal length code assigns ( log2p) bits to message having probability p • S is a sample of training examples • – p+ is the proportion of positive examples in S – p is the proportion of negative examples in S • • Entropy of S: average optimal number of bits to encode information about certainty/uncertainty about S Entropy(S) p(log2p) p(log2p) plog2p plog2p Can be generalized to more than two values Department of Computer Science & Information Technology 25 The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Entropy Entropy can also be viewed as measuring – purity of S, – uncertainty in S, – information in S, … E.g.: values of entropy for p+=1, p+=0, p+=.5 Easy generalization to more than binary values – Sum over pi *(-log2 pi) , i=1,n i is + or – for binary i varies from 1 to n in the general case Department of Computer Science & Information Technology 26 The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Choosing Best Attribute? • Consider 64 examples (29,35) and compute entropies: • Which one is better? 29, 35 A1 E(S)=0.993 0.650 t f 0.522 25, 5 4, 30 • Which is better? 29, 0.708 21, 5 35 t A1 E(S)=0.993 f 0.742 8, 30 29, 35 t 0.989 E(S)=0.993 A2 f 14, 16 15, 19 29, 0.937 35 t E(S)=0.993 A2 f 18, 33 Department of Computer Science & Information Technology 27 The University of Lahore 0.997 0.619 11, 2 Lecture 06: Machine Learning Models MSCS: Machine Learning Information Gain • Gain(S,A): reduction in entropy after choosing attr. A Gain( S , A) Entropy( S ) Sv S vValues( A ) 29, 35 A1 E(S)=0.993 0.650 t f 25, 5 0.522 4, 30 29, 0.989 0.708 21, 5 t t E(S)=0.993 A2 f 0.997 14, 16 Gain: 0.000 E(S)=0.993 f 35 15, 19 Gain: 0.395 29, 35 A1 Entropy( S v ) 0.742 8, 30 E(S)=0.993 29, 35 A2 0.937 t f 18, 33 Gain: 0.265 of Computer Science & InformationGain: 0.121 Department Technology 28 The University of Lahore 0.619 11, 2 Lecture 06: Machine Learning Models MSCS: Machine Learning Gain function Gain is measure of how much can – Reduce uncertainty Value lies between 0,1 What is significance of gain of 0? example where have 50/50 split of +/- both before and after discriminating on attributes values gain of 1? Example of going from “perfect uncertainty” to perfect certainty after splitting example with predictive attribute – Find “patterns” in TE’s relating to attribute values Move to locally minimal representation of TE’s Department of Computer Science & Information Technology 29 The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Training Examples Day D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain Temp Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild Humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High Wind Tennis? Weak No Strong No Weak Yes Weak Yes Weak Yes Strong No Strong Yes Weak No Weak Yes Weak Yes Strong Yes Strong Yes Weak Yes Strong No Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Determine the Root Attribute 9+, 5 E0.940 9+, 5 Humidity High 3+, 4 E0.985 E0.940 Wind Low Weak 6+, 1 E0.592 6+, 2 E0.811 Strong 3+, 3 E1.000 Gain (S, Humidity) 0.151 Gain (S, Wind) 0.048 Gain (S, Outlook) 0.246 Gain (S, Temp) 0.029 Department of Computer Science & Information Technology 31 The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Sort the Training Examples 9+, 5 {D1,…,D14} Outlook Sunny {D1,D2,D8,D9,D11} 2+, 3 Overcast Rain {D3,D7,D12,D13} 4+, 0 ? Yes {D4,D5,D6,D10,D15} 3+, 2 ? Ssunny= {D1,D2,D8,D9,D11} Gain (Ssunny, Humidity) = .970 Gain (Ssunny, Temp) = .570 Gain (Ssunny, Wind) = .019 Department of Computer Science & Information Technology 32 The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Final Decision Tree for Example Outlook Sunny Rain Overcast Humidity High No Yes Normal Yes Wind Strong No Department of Computer Science & Information Technology 33 The University of Lahore Weak Yes Lecture 06: Machine Learning Models MSCS: Machine Learning Department of Computer Science & Information Technology The University of Lahore Lecture 06: Machine Learning Models MSCS: Machine Learning Discusion • Hypothesis Space • Overfitting and Underfitting • Bias/Variance Department of Computer Science & Information Technology The University of Lahore