Bhargavi TL Bhaskar Raj R What is a Decision Tree? • Decision Trees are tree-like structure model which resembles an upside-down tree. • Decision Trees build the tree by asking a series of questions to the data to reach a decision. • Hence it is said that Decision Trees mimic the human decision process. During the tree-building process, it divides the entire data into subsets of data until it reaches a decision. DECISION TREE TERMINOLOGIES FEW TERMINOLOGIES IN DECISION TREES: Root Node: The topmost node of the tree corresponds to the Root Node. All the data will be present at this Root Node. The arrows in the decision tree are generally pointed away from this Root Node. Leaf Node or Terminal Node: Also called as Terminal Node. If a particular node cannot be split further that it is considered as Leaf Node. The Decisions or the Predictions are held by this Leaf Node. The arrows in the decision tree are generally pointed to this Leaf Node. Internal Node or Decision Node: The nodes between the root node and the leaf node are said to be internal nodes. These nodes can be split further into sub-nodes. 3 ELEMENTS OF DECISION TREE Decisions Uncertainties Payoffs(Get/Pays) EXAMPLE How to Create a Decision Tree? A decision tree is created in simple ways with the top-down manner. They consist of nodes that form a directed node which has root nodes with no incoming edges all other nodes are called decision -nodes with at least one incoming edges. The main goal of the data sets is to minimize generalization errors by finding the optimal solution in the decision tree. The steps involved in the tree building process is as follows: 1. Recursive partition of the data into multiple subsets. 2. At each node identifying the variable and the rule associated with the variable for the best split. 3. Applying the split at that node using the best variable using the rule defined for the variable. 4. Repeating steps 2 and 3 on the sub-nodes. 5. Repeating this process until we reach a stopping condition. 6. Assigning the decisions at the leaf nodes based on the majority class label present at that node if performing a classification task or considering the average of the target variable values present at that leaf node if performing a regression task. CREATING A DECISION TREE • Let us consider a scenario where a new planet is discovered by a group of astronomers. Now the question is whether it could be ‘the next earth?’ • Let us create a decision tree to find out whether we have discovered a new habitat. The habitable temperature falls into the range 0 to 100 Celsius. Whether water is present or not? Whether flora and fauna flourishes? The planet has a stormy surface? EXAMPLE 2: Advantages of Decision Trees: Versatile Fast Minimal data preprocessing Easy Interpretable Able to handle non-linear relationships Handles Multicollinearity Disadvantages of Decision Trees: Loss of Inference Loss of the numerical nature of the variable Unstable Biased response Overfitting