Program 1:

advertisement
Programming Assignment 4:
Decision Trees
Due date: Friday, April 21, 2006 – beginning of class
Description:
For this assignment, you will implement a simple version of the ID3 decision tree
algorithm. This is the basic algorithm discussed in your book, using information gain as
the splitting criteria. You may write this program in either Lisp or Prolog: your choice.
Your code will need to do several things:
1) Produce a decision tree from a list of examples
2) Given a decision tree and an instance (an unclassified example), classify the
instance
3) Print out a decision tree
What to do when:
Step 1:
Decide on a representation for your tree. Just like we needed to know what a plan
looked like for program 3, you need to decide what a decision tree looks like.
Step 2:
Write a function or predicate to print out your decision tree.
Step 3:
Write a function or predicate that, given a decision tree and an instance, returns the
classification of the instance. In other words, write the performance element.
Step 4:
Once all the rest of that works, actually write the decision tree learner, which will
build a tree from a list of examples. Note that this is the bulk of your program.
Completing steps 1-3 does not mean that you are even halfway through the job.
However, completing steps 1-3 first and in order will help significantly as you try to
complete and test your decision tree learner.
Format:
To handle a set of examples, you’ll need two pieces of information: an idea of what the
examples look like and the examples themselves. To provide the first piece of
information, you'll use two lists: a list of the attributes and a list of the domains of each
attribute. The examples will be a list of examples. Each example will be a two-item list:
the first item will be either + or – and the second a list of the values for each attribute. In
Lisp, you’ll use variables to hold this information. In Prolog, you can use a predicate to
define the attribute and domain lists (similar to the way action, initial, and goal were used
in Program 3).
Lisp:
In your program file, declare your domain and attribute variables:
(defvar *attributes*)
; A list defining the features of the instances
(defvar *domains*)
; A list defining the domain of each feature
To use the program you’ll do something like the following:
(setf *attributes* '(outlook temperature humidity windy))
(setf *domains* '((sunny overcast rain)
(cool mild hot)
(high normal)
(true false)))
(setf weather-examples '((- (sunny hot high false))(- (sunny hot high true))
(+ (overcast hot high false))(+ (rain mild high false))
(+ (rain cool normal false)) (- (rain cool normal true))
(+ (overcast cool normal true))(- (sunny mild high false))
(+ (sunny cool normal false))(+ (rain mild normal false))
(+ (sunny mild normal true))(+ (overcast mild high true))
(+ (overcast hot normal false))(- (rain mild high true))))
Prolog:
To use the program you’ll do something like the following:
attributes([outlook, temperature, humidity, windy]).
domains([[sunny, overcast, rain], [cool, mild, hot], [high, normal], [true, false]]).
A list of examples would look like:
[['-', [sunny, hot, high, false]], ['-', [sunny, hot, high, true]],
['+', [overcast, hot, high, false]], [ '+', [rain, mild, high, false]],
['+', [rain, cool, normal, false]], ['-', [rain, cool, normal, true]],
['+', [overcast, cool, normal, true]], [ '-', [sunny, mild, high, false]],
['+', [sunny, cool, normal, false]], ['+', [rain, mild, normal, false]],
['+', [sunny, mild, normal, true]], ['+', [overcast, mild, high, true]],
['+', [overcast, hot, normal, false]] ['-', [rain, mild, high, true]]]
What the tree should look like printed out:
If you really want to, you can print the tree in a standard tree format, but a simpler
suggestion would be something like the following:
OUTLOOK
= SUNNY
HUMIDITY
= NORMAL => YES
= HIGH => NO
= OVERCAST => YES
= RAIN
WIND
= STRONG => NO
= WEAK => YES
Notes:
The basic function or predicate to construct a decision tree is recursive. The base case
will produce a leaf node.
You should write a predicate to compute the entropy of a set of examples given the
number of positive examples and number of negative examples. Both sicstus and clisp
provide a logarithm.
In both languages, you probably want to use format to print.
Both languages provide ways to get the length of a list and to get a particular member of
a list. In sicstus, that will require the lists library (which was used in program 3).
You need to start by thinking about what a tree is going to look like. In lisp, you would
probably use a list-based representation. In prolog you could use lists or structures and
lists.
Start work on this problem promptly so that you can ask questions about language
features as well as the program.
Extra credit options:
 Change your tree and algorithm to handle multi-class problems.
 Add pruning to your tree.
 Make your tree handle instance with continuous attributes.
Submit:
On paper:
1) Printouts of your Lisp or Prolog code
2) Detailed instructions on how to execute your code.
Using submit: A tar file containing:
1) Your source code
2) A README file with the instructions for executing your code
Download