WorkSheet 2

advertisement
WorkSheet 2
Machine Learning
Please attempt some of the exercises below. To aid your learning you need plenty of
examples too!
Exercise 1
Read and try to master chapter 3 [Mitchell]
Exercise 2
Using the decision tree learning algorithm choosing random decision variables instead
of disorder based, calculate a decision tree that will correctly classify the data below.
Day
Outlook
Temperature
Humidity
Wind
Play
Tennis
Day1
Day2
Sunny
Sunny
Hot
Hot
High
High
Weak
Strong
No
No
Day3
Overcast
Hot
High
Weak
Yes
Day4
Rain
Mild
High
Weak
Yes
Day5
Rain
Cool
Normal
Weak
Yes
Day6
Rain
Cool
Normal
Strong
No
Day7
Overcast
Cool
Normal
Strong
Yes
Day8
Sunny
Mild
High
Weak
No
Day9
Sunny
Cool
Normal
Weak
Yes
Day10
Rain
Mild
Normal
Weak
Yes
Day11
Sunny
Mild
Normal
Strong
Yes
Day12
Overcast
Mild
High
Strong
Yes
Day13
Overcast
Hot
Normal
Weak
Yes
Day14
Rain
Mild
High
Strong
No
Exercise 3
For the sunbathers example given in the lectures, calculate the Disorder function for
all the possible attributes at the root node (i.e. height, weight, lotion).
Exercise 4
For the sunbathers example given in the lectures, calculate the Disorder function
associated with the possible branches of the decision tree once the root node (hair
colour) has been chosen.
Exercise 5
Using the decision tree learning algorithm, calculate the decision tree for the
following data set
Name
Hair
Height
Weight
Lotion
Result
Sarah
Dana
Blonde
Blonde
Average
Tall
Light
Average
No
Yes
Sunburned
None
Alex
Brown
Short
Average
Yes
None
Annie
Blonde
Short
Average
No
Sunburned
Julie
Blonde
Average
Light
No
None
Pete
Brown
Tall
Heavy
No
None
John
Brown
Average
Heavy
No
None
Ruth
Blonde
Average
Light
No
None
Exercise 6
Calculate the disorder corresponding to the "attribute" person_id for the data set
below. How does it compare with hair colour?
Person
ID
Hair
Height
Weight
Lotion
Result
1
Blonde
Average
Light
No
Sunburned
2
Blonde
Tall
Average
Yes
None
3
Brown
Short
Average
Yes
None
4
Blonde
Short
Average
No
Sunburned
5
Red
Average
Heavy
No
Sunburned
6
Brown
Tall
Heavy
No
None
7
Brown
Average
Heavy
No
None
8
Blonde
Short
Light
Yes
None
Exercise 7 (Partial Exam question, May2001)
(a)
Decision Tree Learning uses a particular method for choosing decision the
variables that are used at each decision node. Explain in words how one
decision variable is chosen over another.
(b)
[5%]
The medical data below has been collected about eight patients. Decision Tree
Learning is going to be used to try to learn a hypothesis that will predict the
patients with the disease diabetes. Apply the algorithm to this data to learn
such a hypothesis.
[15%]
Patient
Pulse Rate
Family
History
Blood
Pressure
1
2
3
4
5
6
7
8
Normal
High
Normal
Low
Normal
High
Low
Low
High
None
None
None
Some
Some
Some
Some
High
High
High
Norm
Low
Norm
Norm
Low
Obesity
in
childhood
Yes
Yes
Yes
No
Yes
No
Yes
No
Diabetes
Yes
No
No
No
Yes
No
Yes
No
Download