Uploaded by Ishika Hooda

DWDM-Assignment-2-PPT-BayesianClassification

advertisement
Data Warehouse & Data Management Assignment - 2
BAYESIAN
CLASSIFICATION
BY:Ishika Hooda (2K16/CO/133)
Jatin Artwani (2K16/CO/137)
Jessjit Singh (2K16/CO/142)
Introduction
❖
In numerous applications, the connection between the
attribute set and the class variable is non- deterministic.
In other words, we can say the class label of a test record
cant be assumed with certainty even though its attribute
set is the same as some of the training examples.
❖
These circumstances may emerge due to the noisy data
or the presence of certain confusing factors that influence
classification, but it is not included in the analysis
❖
Bayesian classification is based on Bayes' Theorem.
Bayesian classifiers are the statistical classifiers. Bayesian
classifiers can predict class membership probabilities
such as the probability that a given tuple belongs to a
particular class.
Baye's Theorem
❖
It came into existence after Thomas Bayes, who
first utilised conditional probability to provide
an algorithm that uses evidence to calculate
limits on an unknown parameter.
❖
There are two types of probabilities:
1. Posterior Probability [P(H/X)]
2. Prior Probability [P(H)]
where, X is data tuple and H is some hypothesis.
❖
P(H/X) is a conditional probability that describes the
occurrence of event H is given that X is true.
❖
P(X/H) is a conditional probability that describes the
occurrence of event X is given that H is true.
❖
P(H) and P(X) are the probabilities of observing H and
X independently of each other. This is known as the
marginal probability.
❖
Bayesian Interpretation
❖
❖
❖
In the Bayesian interpretation, probability determines a "degree of belief." Bayes theorem connects the degree of
belief in a hypothesis before and after accounting for evidence. For example, Lets us consider an example of the coin.
If we toss a coin, then we get either heads or tails, and the per cent of occurrence of either heads and tails is 50%. If the
coin is flipped numbers of times, and the outcomes are observed, the degree of belief may rise, fall, or remain the
same depending on the outcomes.
For proposition X and evidence Y,
❖
❖
P(X), the prior, is the primary degree of
belief in X
P(X/Y), the posterior is the dIn the Bayesian
interpretation, probability determines a
"degree of belief." Bayes theorem connects
the degree of belief in a hypothesis before
and after accounting for evidence.
❖
For proposition X and evidence Y,
❖
P(X), the prior, is the primary degree of belief in X
❖
P(X/Y), the posterior is the degree of belief
having accounted for Y.
❖
The quotient represents the supports Y provides
for X.
Bayesian Belief Networks
❖
❖
Bayesian Belief Networks specify joint
conditional probability distributions.
They are also known as Belief Networks,
Bayesian Networks, or Probabilistic
Networks.
A Belief Network allows class conditional
independencies to be defined between
subsets of variables.
❖
It provides a graphical model of a causal
relationship on which learning can be
performed.
❖
We can use a trained Bayesian Network
for classification.
❖
There are two components that define a
Bayesian Belief Network:
❖
Directed acyclic graph
❖
A set of conditional probability
tables
An example of a Bayesian Belief Network
Directed Acyclic Graph (DAG)
❖
In computer science and mathematics, a
Directed Acyclic Graph (DAG) is a graph
that is directed and without cycles
connecting the other edges. This means that
it is impossible to traverse the entire graph
starting at one edge. The edges of the
directed graph only go one way.
❖
Each node in a directed acyclic
graph represents a random variable.
❖
These variables may be discrete or
continuous-valued.
❖
These variables may correspond to
the actual attribute given in the data.
❖
An Example of a Directed Acyclic Graph
Conditional Probability Table (CPT)
In statistics, the conditional probability table (CPT) is defined for a set of discrete and mutually dependent
random variables to display conditional probabilities of a single variable with respect to the others (i.e., the
probability of each possible value of one variable if we know the values taken on by the other variables).
❖
Assumptions
❖
Assumes that all attributes are independent within
each class.
❖
Discrete attributes can take on arbitrary multinomial
distributions, and real-valued attributes are assumed
to be distributed normally.
❖
It is important to point out that we do not assume that the
classification parameters or the number of classes are "random
variables." Rather, we merely assume that they are unknown
quantities about which we wish to perform inference.
❖
Bayesian methods have often been discredited due to their use
of prior distributions, and the belief that this implies their
results are personalistic and therefore somewhat arbitrary.
An Example of a Conditional Probability Table
Naive Bayes Classifier
Fundamentals
❖
In machine learning, naïve Bayes classifiers are a family of simple "probabilistic
classifiers" based on applying Bayes' theorem with strong (naïve) independence
assumptions between the features. They are among the simplest Bayesian network
models.
❖
Abstractly, naive Bayes is a conditional probability model: given a problem instance to
be classified, represented by a vector
representing some n features
(independent variables), it assigns to this instance probabilities
for each
of K possible outcomes or classes
.
❖
The problem with the above formulation is that if the number of features n is large or if
a feature can take on a large number of values, then basing such a model on probability
tables is infeasible. We therefore reformulate the model to make it more tractable. Using
Bayes' theorem, the conditional probability can be decomposed as
Naive Bayes Classifier
Fundamentals (contd*)
❖
We need to calculate the following probability
as follows -
which can be rewritten
❖
Now the "naive" conditional independence assumptions come into play: assume that all
features of the vector x are independent, then we get the following result -
❖
Thus, the joint model can be expressed as -
Naive Bayes Classifier
Pipeline
Prediction of class requires
the following equation -
Naive Bayes Classifier
Example
We want to classify a Red Domestic SUV. Note there is
no example of a Red Domestic SUV in our data set. But,
firstly we need to calculate the probabilities P(Red|Yes), P(SUV|Yes), P(Domestic|Yes) ,
P(Red|No) , P(SUV|No), and P(Domestic|No)
and multiply them by P(Yes) & P(No) respectively.
Looking at P(Red | Yes), we have 5 cases where vj = Yes ,
and in 3 of those cases ai = Red. So for P(Red | Yes), n = 5
and nc = 3. Note that all attribute are binary (two possible
values). We are assuming no other information so, p = 1 /
(number-of-attribute-values) = 0.5 for all of our attributes.
Our m value is arbitrary, (We will use m = 3) but consistent
for all attributes
Naive Bayes Classifier
Example (contd*)
Further we calculate the respected probabilities
We have P(Y es) = .5 and P(No) = .5
For v = Yes, we have
P(Yes) * P(Red | Yes) * P(SUV | Yes) * P(Domestic|Yes) = .5 * .56 * .31 * .43 = .037
For v = No, we have
P(No) * P(Red | No) * P(SUV | No) * P (Domestic | No) = .5 * .43 * .56 * .56 = .069
Since 0.069 > 0.037, our example gets classified as ’NO’
Naive Bayes Classifier
Applications
1. Text Classification - The Bayesian classification is used as a
probabilistic learning method (Naive Bayes text classification).
Naive Bayes classifiers are among the most successful known
algorithms for learning to classify text documents.
2. Spam Filtering - Spam filtering is the best-known use of Naive
Bayesian text classification. It makes use of a naive Bayes classifier to
identify spam e-mail. Bayesian spam filtering has become a popular
mechanism to distinguish illegitimate spam email from legitimate
email.
3. Recommendation Systems - Recommender Systems apply
machine learning and data mining techniques for filtering unseen
information and can predict whether a user would like a given
resource. Naive Bayes plays an important role behind the algorithms
that help detect the likings of a user.
Filtering Spam using Naive Bayes
❖
This algorithm will classify each object by looking at all
of its features individually. Bayes Rule below shows us
how to calculate the posterior probability for just one
feature. The posterior probability of the object is
calculated for each feature and then these probabilities
are multiplied together to get a final probability.Which
ever has the greater probability that ultimately
determines what class the object is in.
Download