Uploaded by m mago

M12-PPT

advertisement
INTENDED LEARNING OUTCOME
• Understand the concept of Decision Tree.
• Learn the different terminologies in Decision Tree.
• Learn the process involved in each step of Decision Tree
algorithm.
4
What is Decision Tree?
https://octaviansima.wordpress.com/2011/03/25/decision-trees-c4-5/
What is a Decision Tree?
• A decision tree is a diagram or chart that people use to determine a
course of action or show a statistical probability.
• Each node represents a feature (attribute)
• Each branch represents a possible decision (rule) or reaction.
• The leaf, farthest branches on the tree, represent an outcome or the end
results.
6
Types of Decision Trees
1. Classification trees (Yes/No types)
What we’ve seen above is an example of classification tree, where the
outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable
is Categorical.
2. Regression trees (Continuous data types)
Here the decision or the outcome variable is Continuous, e.g. a number like
123.
7
Decision Tree – Sample Problem
An example of a decision tree can be explained
using above binary tree. Let’s say you want to
predict whether a person is fit given their
information like age, eating habit, and physical
activity, etc.
The decision nodes here are questions like
‘What’s the age?’, ‘Does he exercise?’, ‘Does he
eat a lot of pizzas’? And the leaves, which are
outcomes like either ‘fit’, or ‘unfit’.
In this case this was a binary classification
problem (a yes no type problem).
https://www.xoriant.com/blog/wpcontent/uploads/2017/08/Decision-Trees-modified-1.png
8
Decision Tree – Sample Problem
Let’s say we have a problem to predict whether a customer will pay his
renewal premium with an insurance company (yes/ no). Here we know that
the income of customers is a significant variable but the insurance company
does not have income details for all customers. Now, as we know this is an
important variable, then we can build a decision tree to predict customer
income based on occupation, product, and various other variables. In this
case, we are predicting values for the continuous variables.
9
Important Decision Tree Terminologies
Root Node: It represents the entire population or sample and this further gets
divided into two or more homogeneous sets.
Splitting: It is a process of dividing a node into two or more sub-nodes.
Decision Node: When a sub-node splits into further sub-nodes, then it is called
the decision node.
Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
10
Important Decision Tree Terminologies
Pruning: When we remove sub-nodes of a decision node, this process is called
pruning. You can say the opposite process of splitting.
Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-tree.
Parent and Child Node: A node, which is divided into sub-nodes is called a
parent node of sub-nodes whereas sub-nodes are the child of a parent node.
11
Important Decision Tree Terminologies
https://miro.medium.com/max/1376/1*bcLAJfWN2GpVQNTVOCrrvw.png
12
Decision Tree Algorithms
ID3 → (extension of D3)
C4.5 → (successor of ID3)
CART → (Classification And Regression Tree)
CHAID → (Chi-square automatic interaction detection Performs multi-level
splits when computing classification trees)
MARS → (multivariate adaptive regression splines)
13
Decision Tree using ID3 Algorithm
Now that we know what a Decision Tree is, we’ll see how it works
internally. There are many algorithms out there which construct Decision
Trees, but one of the best is called as ID3 Algorithm.
14
What is an ID3 Algorithm?
• Stands for Iterative Dichotomiser 3
• An algorithm developed by J.Ross Quinlan
• The core algorithm for building decision trees.
• Supervised learning algorithm used for classification problems.
• It is a classification algorithm that follows a greedy approach by selecting a
best attribute that yields maximum Information Gain(IG) or minimum
Entropy(H).
15
What is Entropy and Information Gain?
Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is
the measure of the amount of uncertainty or randomness in data.
Information Gain IG(A) tells us how much uncertainty in S was reduced after
splitting set S on attribute A.
16
Example Data Set:
Create a decision tree that predicts whether Tennis will be played on the day.
Attributes : Outlook, Temperature, Humidity, Wind, Play Tennis
17
STEPS in Creating the Decision Tree
Important Concepts:
• Calculate Entropy (the amount of uncertainty in dataset : the number of
positive and negative evidences).
** Entropy(S) – the entropy of the entire data set.
** Computing the entropy of each attribute
• Calculate Average Information.
• Calculate Information Gain (Difference in Entropy before and after splitting
dataset on attribute A)
18
STEP 1 : Create a Root Node. How?
•
•
Choose the attribute that best classifies the training data, use this attribute at the root
of the tree.
How to choose the best attribute?
** from here, ID3 algorithm begins…
1.) Compute the Entropy for dataset Entropy(S)
Calculate the number of
positive and negative
examples/evidences:
P=9
N=5
Total = 14
19
Complete entropy of dataset is H(S) = - p(yes) * log2(p(yes)) - p(no) * log2(p(no))
= - (9/14) * log2(9/14) - (5/14) * log2(5/14)
= - (-0.41) - (-0.53)
= 0.94
20
2.) For every attribute/feature:
a. Calculate entropy for all other values Entropy(A)
Categorical values - sunny, overcast and rain
H(Outlook=sunny) = -(2/5)*log(2/5)-(3/5)*log(3/5) =0.971
H(Outlook=rain) = -(3/5)*log(3/5)-(2/5)*log(2/5) =0.971
H(Outlook=overcast) = -(4/4)*log(4/4)-0 = 0
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
References
• Segal, T., 1-Sep-2019. Decision Tree. Investopedia. Retrieved from
https://www.investopedia.com/terms/d/decision-tree.asp
• Chauhan, N.S., 24-Dec-2019. Decision Tree Algorithm — Explained. Retrieved
from https://towardsdatascience.com/decision-tree-algorithm-explained83beb6e78ef4
• Kulkarni, M., 7-Sep-2017. Decision Trees for Classification: A Machine Learning
Algorithm. The Xoriant Blog. Retrieved from
https://www.xoriant.com/blog/product-engineering/decision-trees-machinelearning-algorithm.html
• Serengil , S., 13-May-2018. A Step By Step C4.5 Decision Tree Example.
Retrieved from https://sefiks.com/2018/05/13/a-step-by-step-c4-5-decision-treeexample/
39
References
• Decision Tree Solved | ID3 Algorithm (concept and numerical) | Machine
Learning (2019). Retrieved from
https://www.youtube.com/watch?v=UdTKxGQvYdc
• Mantri, N.. Using ID3 Algorithm to build a Decision Tree to predict the
weather. OpenGenus IQ: Learn Computer Science. Retrieved from
https://iq.opengenus.org/id3-algorithm/
40
INTENDED LEARNING OUTCOME
• Understand the concept and the process involved in each step
of the Naïve Bayes algorithm.
• Understand the concept and the process involved in each step
of the Neural Network algorithm.
44
45
Naive Bayes Data Mining Algorithm
• Naive Bayes is a probabilistic machine learning algorithm that
can be used in a wide variety of classification tasks.
• It is a classification technique based on Bayes’ Theorem with an
assumption of independence among predictors. In simple terms,
a Naive Bayes classifier assumes that the presence of a particular
feature in a class is unrelated to the presence of any other feature.
Why is it called naive?
 The assumption that all features of a dataset are independent is precisely
why it’s called naive — it’s generally not the case that all features are
independent.
Why is it called naive?
 For example, a fruit may be considered to be an apple if it is red, round, and
about 3 inches in diameter. Even if these features depend on each other or
upon the existence of the other features, all of these properties
independently contribute to the probability that this fruit is an apple and
that is why it is known as ‘Naive’..
What’s Bayes?
 Thomas Bayes was an English statistician for which Bayes’
Theorem is named after.
 The theorem allows us to predict the class given a set of features
using probability.
 The simplified equation for classification looks something like
this:
What does the equation mean?
The equation finds the probability of Class A given Features 1 and 2. In
other words, if you see Features 1 and 2, this is the probability the data is
Class A.
The equation reads: The probability of Class A given Features 1 and 2 is
a fraction.
 The fraction’s numerator is the probability of Feature 1 given Class A
multiplied by the probability of Feature 2 given Class A multiplied by
the probability of Class A.
 The fraction’s denominator is the probability of Feature 1 multiplied
by the probability of Feature 2.
Naive Bayes - Example
Here’s the deal:
 We have a training dataset of 1,000 fruits.
 The fruit can be a Banana, Orange or Other (these are the classes).
 The fruit can be Long, Sweet or Yellow (these are the features).
For the sake of computing the probabilities, let’s aggregate the training data to
form a counts table like this.
Naive Bayes - Example
What do you see in this training dataset?
 Out of 500 bananas, 400 are long, 350 are sweet and 450 are yellow.
 Out of 300 oranges, none are long, 150 are sweet and 300 are yellow.
 Out of the remaining 200 fruit, 100 are long, 150 are sweet and 50 are yellow.
If we are given the length, sweetness and color of a fruit (without knowing its
class), we can now calculate the probability of it being a banana, orange or other
fruit.
 Suppose we are told the unknown fruit is long, sweet and yellow.
 Here’s how we calculate all the probabilities in 4 steps:
Naive Bayes Algorithm
(based on the example)
Step 1:
 To calculate the probability the fruit is a banana, let’s first recognize that
this looks familiar. It’s the probability of the class Banana given the
features Long, Sweet and Yellow or more succinctly:
 P(Banana|Long, Sweet, Yellow)
 This is exactly like the equation discussed earlier.
Naive Bayes Algorithm (based on
the example)
Step 2:
 Starting with the numerator, let’s plug everything in.
P(Long|Banana) = 400/500 = 0.8
P(Sweet|Banana) = 350/500 = 0.7
P(Yellow|Banana) = 450/500 = 0.9
P(Banana) = 500 / 1000 = 0.5
 Multiplying everything together (as in the equation), we
get:
0.8 \times 0.7 \times 0.9 \times 0.5 = 0.252
Naive Bayes Algorithm (based on
the example)
Step 3:
 Ignore the denominator, since it’ll be the same for all the
other calculations.
Step 4:
 Do a similar calculation for the other classes:
P(Orange|Long, Sweet, Yellow) = 0
P(Other|Long, Sweet, Yellow) = 0.01875
Since the 0.252 is greater than 0.01875, Naive Bayes would
classify this long, sweet and yellow fruit as a banana.
Why use Naive Bayes?
Naive Bayes involves simple arithmetic. It’s just tallying up counts,
multiplying and dividing.
 Once the frequency tables are calculated, classifying an unknown
fruit just involves calculating the probabilities for all the classes,
and then choosing the highest probability.
 Despite its simplicity, Naive Bayes can be surprisingly accurate.
For example, it’s been found to be effective for spam filtering.
57
What is a Neural Network?
The simplest definition of a neural network, more properly referred to as an 'artificial'
neural network (ANN), is provided by the inventor of one of the first neurocomputers,
Dr. Robert Hecht-Nielsen. He defines a neural network as:
"...a computing system made up of a number of simple, highly interconnected
processing elements, which process information by their dynamic state response
to external inputs.”
In "Neural Network Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989
58
The Basics of Neural Networks
Neural neworks are typically organized in
layers. Layers are made up of a number of
interconnected 'nodes' which contain an
'activation function'. Patterns are presented to
the network via the 'input layer', which
communicates to one or more 'hidden layers'
where the actual processing is done via a
system of weighted 'connections'. The hidden
layers then link to an 'output layer' where the
answer is output as shown in the picture.
https://tse2.mm.bing.net/th?id=OIP.hHdLlbyP
yIwAT0rPOVn0XgHaFE&pid=Api&P=0&w=289
&h=199
59
The Architecture of Neural Networks
There are 3 layers mainly in neural networks:
• Input Layer
• Hidden Layers
• Output Layer
60
The Architecture of Neural Networks
Input Layer: The ​input layer​ contains the neurons for the input of features. There is
also one bias added to the input layer in addition to the features. So if there are n
features then the input layer contains n+1 neurons.
Hidden Layer: The ​hidden layers​ are the intermediate layers between the input and
output layers. There can be any number of hidden layers. The network with more
than one hidden layer is called deep neural networks. The neurons in the hidden layer
get input from the input layer and they give output to the output layer.
61
The Architecture of Neural Networks
Output Layer: The ​output layer​ contains the number of neurons based on the number
of output classes. If it is a multi-class classification problem then it contains the
number of neurons equal to the number of classes. For binary classification, it
contains one neuron.
62
Types of Neural Network
1. Feed-Forward
2. Radial Basis Function (RBF)
3. Multilayer Perceptron
4. Convolutional
5. Recurrent
6. Modular
63
Types of Neural Networks
Feed-Forward
This is a basic neural network that can exist in the entire domain of neural
networks. As the name suggests, the motion of this network is only forward,
and it moves till the point it reaches the output node. There is no back
feedback to improve the nodes in different layers and not much self-learning
mechanism. Below is a simple representation of one-layer neural network.
64
Types of Neural Networks
Radial Basis Function (RBF)
The main intuition in these types of neural networks is the distance of data
points with respect to the center. These neural networks have typically 2
layers (One is the hidden and other is the output layer). The hidden layer has
a typical radial basis function. This function helps in reasonable interpolation
while fitting the data to it.
The intuition goes like this: “The predicted target output of an item will
behave similar as other items that have close resemblance of the predictor
variables.”
65
Types of Neural Networks
Multilayer Perceptron
Now, slowly we would move to neural networks having more than 2 layers,
i.e. more than one hidden layer. In a Multilayer Perceptron, the main
intuition of using this method is when the data is not linearly separable.
This neural network is fully connected and also has the capability to learn by
itself by changing the weights of connection after each data point is
processed and the amount of error it generates.
66
Types of Neural Networks
Convolutional
Now coming on to Convolutional Neural Network, this type of neural
network is an advanced version of Multilayer Perceptron.
In this type, there is one or more than one convolutional layer.
Convolutional layer is nothing but a simple filtering mechanism that enables
an activation. When this filtering mechanism is repeated, it yields the
location and strength of a detected feature. As a result of this ability,
these networks are widely used in image processing, natural language
processing, recommender systems so as to yield effective results of the
important feature detected.
67
Types of Neural Networks
Recurrent
As the name suggests, in this network something recurs.
Now to mention this network the output of a particular layer is saved and is
put back into the input again.
Here the first layer will be a simple feed-forward neural network and
subsequently, each node will retain information in the next layers.
On doing this, if the prediction is wrong the network will try to re-learn and
learn it effectively to the right prediction.
This is widely used in text-to-speech conversion. The main building block of
this network is storing in memory will influence the better prediction of
what is coming next.
68
Types of Neural Networks
Modular
As the name suggests modularity is the basic foundation block of this neural
network. Modularity means that independently functioning different
networks carry out sub-tasks and since they do not interact with each other
the computation speed increases and lead to large complex process work
significantly faster by processing individual components.
Similar to how independently the left and right side of the brain handles
things independently, yet be one, a Modular neural network is an analogous
situation to this biological situation.
69
Types of Neural Networks
Modular
https://mk0analyticsindf35n9.kinstacdn.com/wpcontent/uploads/2018/01/Modular-neural-network.gif
70
Application of Neural Network
Neural Network in Images:
Neural Network in Language:
•
•
•
•
• Text Classification and
Categorization
• Language Generation and
Document Summarization
Character Recognition
Image Classification or labeling
Object Detection
Image Generation
Neural Network in Signals:
• Speech Recognition
71
Application of Neural Network
Aerospace − Autopilot aircrafts, aircraft fault detection.
Automotive − Automobile guidance systems.
Military − Weapon orientation and steering, target tracking, object
discrimination, facial recognition, signal/image identification.
Electronics − Code sequence prediction, IC chip layout, chip failure analysis,
machine vision, voice synthesis.
Financial − Real estate appraisal, loan advisor, mortgage screening,
corporate bond rating, portfolio trading program, corporate financial
analysis, currency value prediction, document readers, credit application
evaluators.
72
Application of Neural Network
Industrial − Manufacturing process control, product design and analysis,
quality inspection systems, welding quality analysis, paper quality
prediction, chemical product design analysis, dynamic modeling of chemical
process systems, machine maintenance analysis, project bidding, planning,
and management.
Medical − Cancer cell analysis, EEG and ECG analysis, prosthetic design,
transplant time optimizer.
Telecommunications − Image and data compression, automated
information services, real-time spoken language translation.
Transportation − Truck Brake system diagnosis, vehicle scheduling, routing
systems.
73
Application of Neural Network
Software − Pattern Recognition in facial recognition, optical character
recognition, etc.
Time Series Prediction − ANNs are used to make predictions on stocks and
natural calamities.
Signal Processing − Neural networks can be trained to process an audio
signal and filter it appropriately in the hearing aids.
Control − ANNs are often used to make steering decisions of physical
vehicles.
Anomaly Detection − As ANNs are expert at recognizing patterns, they can
also be trained to generate an output when something unusual occurs that
misfits the pattern.
74
References
• EDUCBA, 2020. Data Mining Algorithms. Retrieved from
https://www.educba.com/data-mining-algorithms/
• Li, R.. Top 10 Data Mining Algorithms. Hacker Bits. Retrieved from
https://hackerbits.com/data/naive-bayes-data-mining-algorithm/
• Prabhakaran, S.. How Naive Bayes Algorithm Works? (with example and full
code). Machine Learning Plus Let’s Data Science. Retrieved from
https://www.machinelearningplus.com/predictive-modeling/how-naivebayes-algorithm-works-with-example-and-full-code/
• Ray, S., 11-Sep-2017. 6 Easy Steps to Learn Naive Bayes Algorithm with codes
in Python and R. Analytics Vidhya. Retrieved from
https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
75
References
• Neural Network. Neural Designer. Retrieved from
https://www.neuraldesigner.com/learning/tutorials/neural-network
• A Basic Introduction To Neural Networks. Retrieved from
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html
• EDUCBA, 2020. Types of Neural Network. Retrieved from
https://www.educba.com/types-of-neural-networks/
• Maladkar, K., 15-Jan-2018. 6 Types of Artificial Neural Networks Currently
Being Used in Machine Learning. Retrieved from
https://analyticsindiamag.com/6-types-of-artificial-neural-networks-currentlybeing-used-in-todays-technology/
• EDUCBA, 2020. Application of Neural Network.
https://www.educba.com/application-of-neural-network/
76
Download