Naive Bayes Classification: Theory and Applications

Classification : Naïve Bayes Haramaya University College of Computing and Informatics Department of Software Engineering Mr. Dita Abdujebar(M.Sc.) Outline  What is Naïve Bayes  Pros and Cons of Naïve Bayes  Probability theory  Conditional probability  Naïve Bayes classification 8/13/2023 2 Probability Theory : Naïve Bayes  In both kNN and DT; we asked the classifiers to make hard decisions.  We asked for a defined answer for the question.  Asking the best guess about the class is better. (probability)  Probability theory forms the basis for many machine learning algorithms.  Probability theory can help us in classify things. 8/13/2023 3 Probability Theory : Naïve Bayes Classifying with Bayesian Decision Theory:  Pros:  Works with small amount of data, handles multiple classes.  Cons:  Sensitive to how the input data is prepared.  Works with:  Nominal values 8/13/2023 4 Probability Theory : Naïve Bayes  Naïve Bayes is a subset of Bayesian Decision Theory.  The decision tree wouldn’t be very successful, and kNN would require a lot of calculations compared to the simple probability calculation.  Conditional Probability:  P(gray/bucket B) = P(gray and bucket B) / P(bucket B) 8/13/2023 5 Probability Theory : Naïve Bayes Figure 1: Seven stones in two buckets 8/13/2023 6 Probability Theory : Naïve Bayes  Conditional Probability:  Calculating the probability of a gray stone, given that the unknown stone comes from bucket B.  P(gray / bucket B) = 1/3  P(gray / bucket A) = 2/4  To formalize how to calculate the conditional probability, we can say:  P(gray / bucket B) = P(gray and bucket B) / P(bucket B)  P(gray and bucket B) = 1/7 (gray stone in B / total stone )  P(bucket B) = 3/7 (Three stone in bucket B) – Simple 8/13/2023 7 Probability Theory : Naïve Bayes  Conditional Probability  P(gray / bucket B) = P(gray and bucket B) / P(bucket B)  P(gray / bucket B) = (1/7) / (3/7)  P(gray / bucket B) = 1/3  Another useful way to manipulate conditional probabilities is known as Bayes’ rule.  If we have P(x|c) but want to have P(c|x) 8/13/2023 8 Classifying with Conditional Probabilities  Bayesian decision theory can told us to find the two probabilities:  If P1(x, y) > P2(x, y) , then the class is 1.  If P1(x, y) < p2(x, y), then the class is 2.  What we really need to compare p(c1|x,y) and p(c2|x,y).  Given a point identified as x,y; what is the probability it came from class c1?  What is the probability it came from class c2? 8/13/2023 9 Classifying with Conditional Probabilities  Posterior = (likelihood * prior) / evidence  With these definitions, we can define the Bayesian classification rule:  If P(c1 | x, y) > P(c2 | x, y) , the class is c1.  If P(c1 | x, y) < p(c2 | x, y), the class is c2. 8/13/2023 10 Uses of Naïve Bayes Classification  Application of Naïve Bayes:  Naïve Bayes text classification  Spam filtering  Hybrid recommender system (Collaborative and Content based filtering)  Online application  Bayesian reasoning is applied to decision making and inferential statistics that deals with probability inference.  It used the knowledge of prior events to predict future events. 8/13/2023 11 Example One 8/13/2023 Figure 2: Example training data 12 Example One  X = ( age = youth, income = medium, student = yes, credit_rating = fair)  A person belonging to tuple X will buy a computer?  Maximum Posteriori Hypothesis :  P(Ci | X) = P(X | Ci) P(Ci) / P(X)  Maximum P(Ci | X) = P(X | Ci) P(Ci) as P(X) is constant 8/13/2023 13 Example One  P(C1=yes) = P(buys_computer = yes) = 9/14 = 0.643  P(C2=no) = P(buys_computer = no) = 5/14 = 0.357  P(age=youth /buys_computer = yes) = 2/9 = 0.222  P(age=youth /buys_computer = no) = 3/5 = 0.600  P(income=medium /buys_computer = yes) = 4/9 = 0.444  P(income=medium /buys_computer = no) = 2/5 = 0.400  P(student=yes /buys_computer = yes) = 6/9 = 0.667  P(student=yes/buys_computer = no) = 1/5 = 0.200  P(credit rating=fair /buys_computer = yes) = 6/9 = 0.667  P(credit rating=fair /buys_computer = no) = 2/5 =0.400 8/13/2023 14 Example One  P(X/Buys a computer = yes) = P(age=youth /buys_computer = yes) * P(income=medium /buys_computer = yes) * P(student=yes /buys_computer = yes) * P(credit rating=fair /buys_computer = yes)  P(X/Buys a computer = yes) = 0.222 * 0.444 * 0.667 * 0.667 = 0.044  P(X/Buys a computer = No) = 0.600 * 0.400 * 0.200 * 0.400 = 0.019 8/13/2023 15 Example One  Find class Ci that Maximizes P(X/Ci) * P(Ci):  P(X/Buys a computer = yes) * P(buys_computer = yes)  = 0.044 * 0.643  = 0.028  P(X/Buys a computer = No) * P(buys_computer = no)  = 0.019 * 0.357  = 0.007  Prediction : Buys a computer for Tuple X. (x can buy a computer) 8/13/2023 16 Example Two  Consider a set of documents, each of which is related either to Sports (S ) or to Informatics (I).  Given a training set of 11 documents, we would like to estimate a Naive Bayes classifier, using the Bernoulli document model, to classify unlabelled documents as S or I.  We define a vocabulary of eight words: 8/13/2023 17 Example Two  Types of Naïve Bayes: 8/13/2023 18 Example Two Figure 3: Vocabulary of eight words 8/13/2023 19 Example Two  Thus each document is represented as an 8-dimensional binary vector.  The training data is presented below as a matrix for each class, in which each row represents an 8-dimensional document vector. 8/13/2023 20 Example Two  Classify the following into Sports or Informatics using a Naive Bayes classifier.  b1 = (1, 0, 0, 1, 1, 1, 0, 1) = S or I  b2 = (0, 1, 1, 0, 1, 0, 1, 0) = S or I 8/13/2023 21 Example Two  The total number of documents in the training set N =11; NS =6, NI =5.  We can estimate the prior probabilities from the training data as:  P(S) = 6/11  P(I) = 5/11 8/13/2023 22 Example Two  The word count in the training data are: 8/13/2023 23 Example Two  We can estimate the word likelihood using: 8/13/2023 24 Example Two  The word likelihood for class I: 8/13/2023 25 Example Two  To compute the posterior probabilities of the two test vectors and hence classify them.  b1 = (1, 0, 0, 1, 1, 1, 0, 1)  P(S| b1) = P(wt | S) x P(S)  (1/2 X 5/6 X 2/3 X ½ X ½ X 2/3 X 1/3 X 2/3) x (6/11)  (5/891) = 5.6 x 10-3  P(I| b1) = P(wt | I) x P(I)  (1/5 X 2/5 X 2/5 X 1/5 X 1/5 X 1/5 X 2/5 X 1/5) x (5/11)  (8/859375) = 9.3 x 10-6  Classify this document as S. 8/13/2023 26 Example Two  To compute the posterior probabilities of the two test vectors and hence classify them.  b2 = (0, 1, 1, 0, 1, 0, 1, 0)  P(S| b2) = P(wt | S) x P(S)  (1/2 X 1/6 X 1/3 X ½ X ½ X 1/3 X 2/3 X 1/3) x (6/11)  (12/14256) = 8.4 x 10-4  P(I| b2) = P(wt | I) x P(I)  (4/5 X 3/5 X 3/5 X 4/5 X 1/5 X 4/5 X 3/5 X 4/5) x (5/11)  (34560/4296875) = 8.0 x 10-3  Classify this document as I. 8/13/2023 27 Naïve Bayes: Syntax  Import the class containing the classification method: 8/13/2023 28 Summary  Using probabilities can sometimes be more effective than using hard rules for classification.  Bayesian probability and Bayes’ rule gives us a way to estimate unknown probabilities from known values.  You can reduce the need for a lot of data by assuming conditional independence among the features in your data.  The assumption we make is that the probability of one word doesn’t depend on any other words in the document. 8/13/2023 29 Summary  Despite its incorrect assumptions, naïve Bayes is effective at classification.  Underflow is one problem that can be addressed by using the logarithm of probabilities in your calculations. 8/13/2023 30 Question & Answer 8/13/2023 31 Thank You !!! 8/13/2023 32 Assignment Three  Predict outcome for the following: x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) 8/13/2023 33

Naive Bayes Classification: Theory and Applications

Related documents

Products

Support

Naive Bayes Classification: Theory and Applications

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib