WorkSheet 2 Machine Learning Please attempt some of the exercises below. To aid your learning you need plenty of examples too! Exercise 1 Read and try to master chapter 3 [Mitchell] Exercise 2 Using the decision tree learning algorithm choosing random decision variables instead of disorder based, calculate a decision tree that will correctly classify the data below. Day Outlook Temperature Humidity Wind Play Tennis Day1 Day2 Sunny Sunny Hot Hot High High Weak Strong No No Day3 Overcast Hot High Weak Yes Day4 Rain Mild High Weak Yes Day5 Rain Cool Normal Weak Yes Day6 Rain Cool Normal Strong No Day7 Overcast Cool Normal Strong Yes Day8 Sunny Mild High Weak No Day9 Sunny Cool Normal Weak Yes Day10 Rain Mild Normal Weak Yes Day11 Sunny Mild Normal Strong Yes Day12 Overcast Mild High Strong Yes Day13 Overcast Hot Normal Weak Yes Day14 Rain Mild High Strong No Exercise 3 For the sunbathers example given in the lectures, calculate the Disorder function for all the possible attributes at the root node (i.e. height, weight, lotion). Exercise 4 For the sunbathers example given in the lectures, calculate the Disorder function associated with the possible branches of the decision tree once the root node (hair colour) has been chosen. Exercise 5 Using the decision tree learning algorithm, calculate the decision tree for the following data set Name Hair Height Weight Lotion Result Sarah Dana Blonde Blonde Average Tall Light Average No Yes Sunburned None Alex Brown Short Average Yes None Annie Blonde Short Average No Sunburned Julie Blonde Average Light No None Pete Brown Tall Heavy No None John Brown Average Heavy No None Ruth Blonde Average Light No None Exercise 6 Calculate the disorder corresponding to the "attribute" person_id for the data set below. How does it compare with hair colour? Person ID Hair Height Weight Lotion Result 1 Blonde Average Light No Sunburned 2 Blonde Tall Average Yes None 3 Brown Short Average Yes None 4 Blonde Short Average No Sunburned 5 Red Average Heavy No Sunburned 6 Brown Tall Heavy No None 7 Brown Average Heavy No None 8 Blonde Short Light Yes None Exercise 7 (Partial Exam question, May2001) (a) Decision Tree Learning uses a particular method for choosing decision the variables that are used at each decision node. Explain in words how one decision variable is chosen over another. (b) [5%] The medical data below has been collected about eight patients. Decision Tree Learning is going to be used to try to learn a hypothesis that will predict the patients with the disease diabetes. Apply the algorithm to this data to learn such a hypothesis. [15%] Patient Pulse Rate Family History Blood Pressure 1 2 3 4 5 6 7 8 Normal High Normal Low Normal High Low Low High None None None Some Some Some Some High High High Norm Low Norm Norm Low Obesity in childhood Yes Yes Yes No Yes No Yes No Diabetes Yes No No No Yes No Yes No