CS 416 * Machine Learning

advertisement
CS 416 – Machine Learning
LECTURE 7
DECISION TREE – CLASSIFICATION
Introduction:
Decision tree builds classification or regression models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an associated decision
tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A
decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy). Leaf
node (e.g., Play) represents a classification or decision. The topmost decision node in a tree
which corresponds to the best predictor called root node. Decision trees can handle both
categorical and numerical data.
Why decision tree?

Decision trees are powerful and popular tools for classification and prediction.

Decision trees represent rules, which can be understood by humans and used in
knowledge system such as database.
Key Requirements

Attribute-value description: object or case must be expressible in terms of a fixed
collection of properties or attributes (e.g., hot, mild, cold).

Predefined classes (target values): the target function has discrete output values
(boolean or multiclass)

Sufficient data: enough training cases should be provided to learn the model.
Engr. Nadeem
Page 1
CS 416 – Machine Learning
An Example Data Set and Decision Tree
Classification
Engr. Nadeem
Page 2
CS 416 – Machine Learning
Induction of Decision Trees

Data Set (Learning Set)
o Each example = Attributes + Class

Induced description = Decision tree
Algorithm
The core algorithm for building decision trees called ID3 by J. R. Quinlan which employs a topdown, greedy search through the space of possible branches with no backtracking. ID3
uses Entropy and Information Gain to construct a decision tree.
Entropy
A decision tree is built top-down from a root node and involves partitioning the data into subsets
that contain instances with similar values (homogenous). ID3 algorithm uses entropy to calculate
the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if
the sample is an equally divided it has entropy of one.
To build a decision tree, we need to calculate two types of entropy using frequency tables as
follows:
a) Entropy using the frequency table of one attribute:
Engr. Nadeem
Page 3
CS 416 – Machine Learning
b) Entropy using the frequency table of two attributes:
Information Gain
The information gain is based on the decrease in entropy after a dataset is split on an attribute.
Constructing a decision tree is all about finding attribute that returns the highest information gain
(i.e., the most homogeneous branches).
Step 1: Calculate entropy of the target.
Engr. Nadeem
Page 4
CS 416 – Machine Learning
Step 2: The dataset is then split on the different attributes. The entropy for each branch is
calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy
is subtracted from the entropy before the split. The result is the Information Gain, or decrease in
entropy.
Step 3: Choose attribute with the largest information gain as the decision node.
Engr. Nadeem
Page 5
CS 416 – Machine Learning
Step 4a: A branch with entropy of 0 is a leaf node.
Step 4b: A branch with entropy more than 0 needs further splitting.
Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.
Decision Tree to Decision Rules
A decision tree can easily be transformed to a set of rules by mapping from the root node to the
leaf nodes one by one.
Engr. Nadeem
Page 6
CS 416 – Machine Learning
Evaluation

Training accuracy
o How many training instances can be correctly classify based on the available
data?
o Is high when the tree is deep/large, or when there is less confliction in the training
instances.
o

however, higher training accuracy does not mean good generalization
Testing accuracy
o Given a number of new instances, how many of them can we correctly classify?
o Cross validation
Strengths

can generate understandable rules

perform classification without much computation

can handle continuous and categorical variables

provide a clear indication of which fields are most important for prediction or
classification
Engr. Nadeem
Page 7
CS 416 – Machine Learning
Weakness

Not suitable for prediction of continuous attribute.

Perform poorly with many class and small data.

Computationally expensive to train.

–
At each node, each candidate splitting field must be sorted before its best split can
be found.
–
In some algorithms, combinations of fields are used and a search must be made
for optimal combining weights.
–
Pruning algorithms can also be expensive since many candidate sub-trees must be
formed and compared.
Do not treat well non-rectangular regions.
Question 1:
Our goal is to classification of predicting sail class by using a decision tree is shown below.
Based on the following table, classify the tuple by labeling sail class (yes, no)?
Engr. Nadeem
Page 8
CS 416 – Machine Learning
Question 2:
Convert the following decision tree to decision rules?
Answers
Engr. Nadeem
Page 9
Download