INTENDED LEARNING OUTCOME • Understand the concept of Decision Tree. • Learn the different terminologies in Decision Tree. • Learn the process involved in each step of Decision Tree algorithm. 4 What is Decision Tree? https://octaviansima.wordpress.com/2011/03/25/decision-trees-c4-5/ What is a Decision Tree? • A decision tree is a diagram or chart that people use to determine a course of action or show a statistical probability. • Each node represents a feature (attribute) • Each branch represents a possible decision (rule) or reaction. • The leaf, farthest branches on the tree, represent an outcome or the end results. 6 Types of Decision Trees 1. Classification trees (Yes/No types) What we’ve seen above is an example of classification tree, where the outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable is Categorical. 2. Regression trees (Continuous data types) Here the decision or the outcome variable is Continuous, e.g. a number like 123. 7 Decision Tree – Sample Problem An example of a decision tree can be explained using above binary tree. Let’s say you want to predict whether a person is fit given their information like age, eating habit, and physical activity, etc. The decision nodes here are questions like ‘What’s the age?’, ‘Does he exercise?’, ‘Does he eat a lot of pizzas’? And the leaves, which are outcomes like either ‘fit’, or ‘unfit’. In this case this was a binary classification problem (a yes no type problem). https://www.xoriant.com/blog/wpcontent/uploads/2017/08/Decision-Trees-modified-1.png 8 Decision Tree – Sample Problem Let’s say we have a problem to predict whether a customer will pay his renewal premium with an insurance company (yes/ no). Here we know that the income of customers is a significant variable but the insurance company does not have income details for all customers. Now, as we know this is an important variable, then we can build a decision tree to predict customer income based on occupation, product, and various other variables. In this case, we are predicting values for the continuous variables. 9 Important Decision Tree Terminologies Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets. Splitting: It is a process of dividing a node into two or more sub-nodes. Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node. Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node. 10 Important Decision Tree Terminologies Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting. Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-tree. Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node. 11 Important Decision Tree Terminologies https://miro.medium.com/max/1376/1*bcLAJfWN2GpVQNTVOCrrvw.png 12 Decision Tree Algorithms ID3 → (extension of D3) C4.5 → (successor of ID3) CART → (Classification And Regression Tree) CHAID → (Chi-square automatic interaction detection Performs multi-level splits when computing classification trees) MARS → (multivariate adaptive regression splines) 13 Decision Tree using ID3 Algorithm Now that we know what a Decision Tree is, we’ll see how it works internally. There are many algorithms out there which construct Decision Trees, but one of the best is called as ID3 Algorithm. 14 What is an ID3 Algorithm? • Stands for Iterative Dichotomiser 3 • An algorithm developed by J.Ross Quinlan • The core algorithm for building decision trees. • Supervised learning algorithm used for classification problems. • It is a classification algorithm that follows a greedy approach by selecting a best attribute that yields maximum Information Gain(IG) or minimum Entropy(H). 15 What is Entropy and Information Gain? Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of the amount of uncertainty or randomness in data. Information Gain IG(A) tells us how much uncertainty in S was reduced after splitting set S on attribute A. 16 Example Data Set: Create a decision tree that predicts whether Tennis will be played on the day. Attributes : Outlook, Temperature, Humidity, Wind, Play Tennis 17 STEPS in Creating the Decision Tree Important Concepts: • Calculate Entropy (the amount of uncertainty in dataset : the number of positive and negative evidences). ** Entropy(S) – the entropy of the entire data set. ** Computing the entropy of each attribute • Calculate Average Information. • Calculate Information Gain (Difference in Entropy before and after splitting dataset on attribute A) 18 STEP 1 : Create a Root Node. How? • • Choose the attribute that best classifies the training data, use this attribute at the root of the tree. How to choose the best attribute? ** from here, ID3 algorithm begins… 1.) Compute the Entropy for dataset Entropy(S) Calculate the number of positive and negative examples/evidences: P=9 N=5 Total = 14 19 Complete entropy of dataset is H(S) = - p(yes) * log2(p(yes)) - p(no) * log2(p(no)) = - (9/14) * log2(9/14) - (5/14) * log2(5/14) = - (-0.41) - (-0.53) = 0.94 20 2.) For every attribute/feature: a. Calculate entropy for all other values Entropy(A) Categorical values - sunny, overcast and rain H(Outlook=sunny) = -(2/5)*log(2/5)-(3/5)*log(3/5) =0.971 H(Outlook=rain) = -(3/5)*log(3/5)-(2/5)*log(2/5) =0.971 H(Outlook=overcast) = -(4/4)*log(4/4)-0 = 0 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 References • Segal, T., 1-Sep-2019. Decision Tree. Investopedia. Retrieved from https://www.investopedia.com/terms/d/decision-tree.asp • Chauhan, N.S., 24-Dec-2019. Decision Tree Algorithm — Explained. Retrieved from https://towardsdatascience.com/decision-tree-algorithm-explained83beb6e78ef4 • Kulkarni, M., 7-Sep-2017. Decision Trees for Classification: A Machine Learning Algorithm. The Xoriant Blog. Retrieved from https://www.xoriant.com/blog/product-engineering/decision-trees-machinelearning-algorithm.html • Serengil , S., 13-May-2018. A Step By Step C4.5 Decision Tree Example. Retrieved from https://sefiks.com/2018/05/13/a-step-by-step-c4-5-decision-treeexample/ 39 References • Decision Tree Solved | ID3 Algorithm (concept and numerical) | Machine Learning (2019). Retrieved from https://www.youtube.com/watch?v=UdTKxGQvYdc • Mantri, N.. Using ID3 Algorithm to build a Decision Tree to predict the weather. OpenGenus IQ: Learn Computer Science. Retrieved from https://iq.opengenus.org/id3-algorithm/ 40 INTENDED LEARNING OUTCOME • Understand the concept and the process involved in each step of the Naïve Bayes algorithm. • Understand the concept and the process involved in each step of the Neural Network algorithm. 44 45 Naive Bayes Data Mining Algorithm • Naive Bayes is a probabilistic machine learning algorithm that can be used in a wide variety of classification tasks. • It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Why is it called naive? The assumption that all features of a dataset are independent is precisely why it’s called naive — it’s generally not the case that all features are independent. Why is it called naive? For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.. What’s Bayes? Thomas Bayes was an English statistician for which Bayes’ Theorem is named after. The theorem allows us to predict the class given a set of features using probability. The simplified equation for classification looks something like this: What does the equation mean? The equation finds the probability of Class A given Features 1 and 2. In other words, if you see Features 1 and 2, this is the probability the data is Class A. The equation reads: The probability of Class A given Features 1 and 2 is a fraction. The fraction’s numerator is the probability of Feature 1 given Class A multiplied by the probability of Feature 2 given Class A multiplied by the probability of Class A. The fraction’s denominator is the probability of Feature 1 multiplied by the probability of Feature 2. Naive Bayes - Example Here’s the deal: We have a training dataset of 1,000 fruits. The fruit can be a Banana, Orange or Other (these are the classes). The fruit can be Long, Sweet or Yellow (these are the features). For the sake of computing the probabilities, let’s aggregate the training data to form a counts table like this. Naive Bayes - Example What do you see in this training dataset? Out of 500 bananas, 400 are long, 350 are sweet and 450 are yellow. Out of 300 oranges, none are long, 150 are sweet and 300 are yellow. Out of the remaining 200 fruit, 100 are long, 150 are sweet and 50 are yellow. If we are given the length, sweetness and color of a fruit (without knowing its class), we can now calculate the probability of it being a banana, orange or other fruit. Suppose we are told the unknown fruit is long, sweet and yellow. Here’s how we calculate all the probabilities in 4 steps: Naive Bayes Algorithm (based on the example) Step 1: To calculate the probability the fruit is a banana, let’s first recognize that this looks familiar. It’s the probability of the class Banana given the features Long, Sweet and Yellow or more succinctly: P(Banana|Long, Sweet, Yellow) This is exactly like the equation discussed earlier. Naive Bayes Algorithm (based on the example) Step 2: Starting with the numerator, let’s plug everything in. P(Long|Banana) = 400/500 = 0.8 P(Sweet|Banana) = 350/500 = 0.7 P(Yellow|Banana) = 450/500 = 0.9 P(Banana) = 500 / 1000 = 0.5 Multiplying everything together (as in the equation), we get: 0.8 \times 0.7 \times 0.9 \times 0.5 = 0.252 Naive Bayes Algorithm (based on the example) Step 3: Ignore the denominator, since it’ll be the same for all the other calculations. Step 4: Do a similar calculation for the other classes: P(Orange|Long, Sweet, Yellow) = 0 P(Other|Long, Sweet, Yellow) = 0.01875 Since the 0.252 is greater than 0.01875, Naive Bayes would classify this long, sweet and yellow fruit as a banana. Why use Naive Bayes? Naive Bayes involves simple arithmetic. It’s just tallying up counts, multiplying and dividing. Once the frequency tables are calculated, classifying an unknown fruit just involves calculating the probabilities for all the classes, and then choosing the highest probability. Despite its simplicity, Naive Bayes can be surprisingly accurate. For example, it’s been found to be effective for spam filtering. 57 What is a Neural Network? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He defines a neural network as: "...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs.” In "Neural Network Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989 58 The Basics of Neural Networks Neural neworks are typically organized in layers. Layers are made up of a number of interconnected 'nodes' which contain an 'activation function'. Patterns are presented to the network via the 'input layer', which communicates to one or more 'hidden layers' where the actual processing is done via a system of weighted 'connections'. The hidden layers then link to an 'output layer' where the answer is output as shown in the picture. https://tse2.mm.bing.net/th?id=OIP.hHdLlbyP yIwAT0rPOVn0XgHaFE&pid=Api&P=0&w=289 &h=199 59 The Architecture of Neural Networks There are 3 layers mainly in neural networks: • Input Layer • Hidden Layers • Output Layer 60 The Architecture of Neural Networks Input Layer: The input layer contains the neurons for the input of features. There is also one bias added to the input layer in addition to the features. So if there are n features then the input layer contains n+1 neurons. Hidden Layer: The hidden layers are the intermediate layers between the input and output layers. There can be any number of hidden layers. The network with more than one hidden layer is called deep neural networks. The neurons in the hidden layer get input from the input layer and they give output to the output layer. 61 The Architecture of Neural Networks Output Layer: The output layer contains the number of neurons based on the number of output classes. If it is a multi-class classification problem then it contains the number of neurons equal to the number of classes. For binary classification, it contains one neuron. 62 Types of Neural Network 1. Feed-Forward 2. Radial Basis Function (RBF) 3. Multilayer Perceptron 4. Convolutional 5. Recurrent 6. Modular 63 Types of Neural Networks Feed-Forward This is a basic neural network that can exist in the entire domain of neural networks. As the name suggests, the motion of this network is only forward, and it moves till the point it reaches the output node. There is no back feedback to improve the nodes in different layers and not much self-learning mechanism. Below is a simple representation of one-layer neural network. 64 Types of Neural Networks Radial Basis Function (RBF) The main intuition in these types of neural networks is the distance of data points with respect to the center. These neural networks have typically 2 layers (One is the hidden and other is the output layer). The hidden layer has a typical radial basis function. This function helps in reasonable interpolation while fitting the data to it. The intuition goes like this: “The predicted target output of an item will behave similar as other items that have close resemblance of the predictor variables.” 65 Types of Neural Networks Multilayer Perceptron Now, slowly we would move to neural networks having more than 2 layers, i.e. more than one hidden layer. In a Multilayer Perceptron, the main intuition of using this method is when the data is not linearly separable. This neural network is fully connected and also has the capability to learn by itself by changing the weights of connection after each data point is processed and the amount of error it generates. 66 Types of Neural Networks Convolutional Now coming on to Convolutional Neural Network, this type of neural network is an advanced version of Multilayer Perceptron. In this type, there is one or more than one convolutional layer. Convolutional layer is nothing but a simple filtering mechanism that enables an activation. When this filtering mechanism is repeated, it yields the location and strength of a detected feature. As a result of this ability, these networks are widely used in image processing, natural language processing, recommender systems so as to yield effective results of the important feature detected. 67 Types of Neural Networks Recurrent As the name suggests, in this network something recurs. Now to mention this network the output of a particular layer is saved and is put back into the input again. Here the first layer will be a simple feed-forward neural network and subsequently, each node will retain information in the next layers. On doing this, if the prediction is wrong the network will try to re-learn and learn it effectively to the right prediction. This is widely used in text-to-speech conversion. The main building block of this network is storing in memory will influence the better prediction of what is coming next. 68 Types of Neural Networks Modular As the name suggests modularity is the basic foundation block of this neural network. Modularity means that independently functioning different networks carry out sub-tasks and since they do not interact with each other the computation speed increases and lead to large complex process work significantly faster by processing individual components. Similar to how independently the left and right side of the brain handles things independently, yet be one, a Modular neural network is an analogous situation to this biological situation. 69 Types of Neural Networks Modular https://mk0analyticsindf35n9.kinstacdn.com/wpcontent/uploads/2018/01/Modular-neural-network.gif 70 Application of Neural Network Neural Network in Images: Neural Network in Language: • • • • • Text Classification and Categorization • Language Generation and Document Summarization Character Recognition Image Classification or labeling Object Detection Image Generation Neural Network in Signals: • Speech Recognition 71 Application of Neural Network Aerospace − Autopilot aircrafts, aircraft fault detection. Automotive − Automobile guidance systems. Military − Weapon orientation and steering, target tracking, object discrimination, facial recognition, signal/image identification. Electronics − Code sequence prediction, IC chip layout, chip failure analysis, machine vision, voice synthesis. Financial − Real estate appraisal, loan advisor, mortgage screening, corporate bond rating, portfolio trading program, corporate financial analysis, currency value prediction, document readers, credit application evaluators. 72 Application of Neural Network Industrial − Manufacturing process control, product design and analysis, quality inspection systems, welding quality analysis, paper quality prediction, chemical product design analysis, dynamic modeling of chemical process systems, machine maintenance analysis, project bidding, planning, and management. Medical − Cancer cell analysis, EEG and ECG analysis, prosthetic design, transplant time optimizer. Telecommunications − Image and data compression, automated information services, real-time spoken language translation. Transportation − Truck Brake system diagnosis, vehicle scheduling, routing systems. 73 Application of Neural Network Software − Pattern Recognition in facial recognition, optical character recognition, etc. Time Series Prediction − ANNs are used to make predictions on stocks and natural calamities. Signal Processing − Neural networks can be trained to process an audio signal and filter it appropriately in the hearing aids. Control − ANNs are often used to make steering decisions of physical vehicles. Anomaly Detection − As ANNs are expert at recognizing patterns, they can also be trained to generate an output when something unusual occurs that misfits the pattern. 74 References • EDUCBA, 2020. Data Mining Algorithms. Retrieved from https://www.educba.com/data-mining-algorithms/ • Li, R.. Top 10 Data Mining Algorithms. Hacker Bits. Retrieved from https://hackerbits.com/data/naive-bayes-data-mining-algorithm/ • Prabhakaran, S.. How Naive Bayes Algorithm Works? (with example and full code). Machine Learning Plus Let’s Data Science. Retrieved from https://www.machinelearningplus.com/predictive-modeling/how-naivebayes-algorithm-works-with-example-and-full-code/ • Ray, S., 11-Sep-2017. 6 Easy Steps to Learn Naive Bayes Algorithm with codes in Python and R. Analytics Vidhya. Retrieved from https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/ 75 References • Neural Network. Neural Designer. Retrieved from https://www.neuraldesigner.com/learning/tutorials/neural-network • A Basic Introduction To Neural Networks. Retrieved from http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html • EDUCBA, 2020. Types of Neural Network. Retrieved from https://www.educba.com/types-of-neural-networks/ • Maladkar, K., 15-Jan-2018. 6 Types of Artificial Neural Networks Currently Being Used in Machine Learning. Retrieved from https://analyticsindiamag.com/6-types-of-artificial-neural-networks-currentlybeing-used-in-todays-technology/ • EDUCBA, 2020. Application of Neural Network. https://www.educba.com/application-of-neural-network/ 76