Matakuliah : M0614 / Data Mining & OLAP Tahun : Feb - 2010 Classification and Prediction Pertemuan 08 Learning Outcomes Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : • Mahasiswa dapat menggunakan teknik analisis classification by decision tree induction, Bayesian classification, classification by back propagation, dan lazy learners pada data mining. (C3) 3 Bina Nusantara Acknowledgments These slides have been adapted from Han, J., Kamber, M., & Pei, Y. Data Mining: Concepts and Technique and Tan, P.-N., Steinbach, M., & Kumar, V. Introduction to Data Mining. Bina Nusantara Outline Materi • Bayesian classification 5 Bina Nusantara Bayesian Classification: Why? • A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities • Foundation: Based on Bayes’ Theorem. • Performance: A simple Bayesian classifier, naïve Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers • Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct — prior knowledge can be combined with observed data • Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured June 28, 2016 Data Mining: Concepts and Techniques 6 Bayesian Theorem: Basics • • • • • • Let X be a data sample (“evidence”): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine P(H|X), (posteriori probability), the probability that the hypothesis holds given the observed data sample X P(H) (prior probability), the initial probability – E.g., X will buy computer, regardless of age, income, … P(X): probability that sample data is observed P(X|H) (likelyhood), the probability of observing the sample X, given that the hypothesis holds – E.g., Given that X will buy computer, the prob. that X is 31..40, medium income June 28, 2016 Data Mining: Concepts and Techniques 7 Bayesian Theorem • Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes theorem P(H | X) P(X | H )P(H ) P(X) • Informally, this can be written as posteriori = likelihood x prior/evidence • Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among all the P(Ck|X) for all the k classes • Practical difficulty: require initial knowledge of many probabilities, significant computational cost June 28, 2016 Data Mining: Concepts and Techniques 8 Example of Bayes Theorem • Given: – A doctor knows that meningitis causes stiff neck 50% of the time – Prior probability of any patient having meningitis is 1/50,000 – Prior probability of any patient having stiff neck is 1/20 • If a patient has stiff neck, what’s the probability he/she has meningitis? P( S | M ) P( M ) 0.5 1 / 50000 P( M | S ) 0.0002 P( S ) 1 / 20 Bayesian Classifiers • Consider each attribute and class label as random variables • Given a record with attributes (A1, A2,…,An) – Goal is to predict class C – Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An ) • Can we estimate P(C| A1, A2,…,An ) directly from data? Bayesian Classifiers • Approach: – compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes theorem P ( A A A | C ) P (C ) P (C | A A A ) P( A A A ) 1 1 2 2 n n 1 2 – Choose value of C that maximizes P(C | A1, A2, …, An) – Equivalent to choosing value of C that maximizes P(A1, A2, …, An|C) P(C) • How to estimate P(A1, A2, …, An | C )? n Naïve Bayes Classifier • Assume independence among attributes Ai when class is given: – P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj) – Can estimate P(Ai| Cj) for all Ai and Cj. – New point is classified to Cj if P(Cj) P(Ai| Cj) is maximal. al a l Estimate How rto u sProbabilities from Data? c c i i o r ca Tid 10 Refund t o g e ca t o g e co n u it n s s a cl Marital Status Taxable Income Evade 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes • Class: P(C) = Nc/N – e.g., P(No) = 7/10, P(Yes) = 3/10 • For discrete attributes: P(Ai | Ck) = |Aik|/ Nck – where |Aik| is number of instances having attribute Ai and belongs to class Ck – Examples: P(Status=Married|No) = 4/7 P(Refund=Yes|Yes)=0 How to Estimate Probabilities from Data? • For continuous attributes: – Discretize the range into bins • one ordinal attribute per bin • violates independence assumption – Two-way split: (A < v) or (A > v) • choose only one of the two splits as new attribute – Probability density estimation: • Assume attribute follows a normal distribution • Use data to estimate parameters of distribution (e.g., mean and standard deviation) • Once probability distribution is known, can use it to estimate the conditional probability P(Ai|c) a ric l a ric l u o u s Howegoto Estimate Probabilities from Data? o n s i g t s e t a c Tid Refund t a c Marital Status n o c Taxable Income a cl Evade 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes • Normal distribution: 1 P( A | c ) e 2 i j ( Ai ij ) 2 2 ij2 2 ij – One for each (Ai,ci) pair • For (Income, Class=No): – If Class=No • sample mean = 110 • sample variance = 2975 1 P( Income 120 | No) e 2 (54.54) 10 ( 120110) 2 2 ( 2975) 0.0072 Example of Naïve Bayes Classifier Given a Test Record: X (Refund No, Married, Income 120K) naive Bayes Classifier: P(Refund=Yes|No) = 3/7 P(Refund=No|No) = 4/7 P(Refund=Yes|Yes) = 0 P(Refund=No|Yes) = 1 P(Marital Status=Single|No) = 2/7 P(Marital Status=Divorced|No)=1/7 P(Marital Status=Married|No) = 4/7 P(Marital Status=Single|Yes) = 2/7 P(Marital Status=Divorced|Yes)=1/7 P(Marital Status=Married|Yes) = 0 For taxable income: If class=No: sample mean=110 sample variance=2975 If class=Yes: sample mean=90 sample variance=25 P(X|Class=No) = P(Refund=No|Class=No) P(Married| Class=No) P(Income=120K| Class=No) = 4/7 4/7 0.0072 = 0.0024 P(X|Class=Yes) = P(Refund=No| Class=Yes) P(Married| Class=Yes) P(Income=120K| Class=Yes) = 1 0 1.2 10-9 = 0 Since P(X|No)P(No) > P(X|Yes)P(Yes) Therefore P(No|X) > P(Yes|X) => Class = No Naïve Bayes Classifier • If one of the conditional probability is zero, then the entire expression becomes zero • Probability estimation: N ic c: number of classes Original : P( Ai | C ) Nc N ic 1 Laplace : P( Ai | C ) Nc c N ic mp m - estimate : P( Ai | C ) Nc m p: prior probability m: parameter Example of Naïve Bayes Classifier Name human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle Give Birth yes Give Birth yes no no yes no no yes no yes yes no no yes no no no no no yes no Can Fly no no no no no no yes yes no no no no no no no no no yes no yes Can Fly no Live in Water Have Legs no no yes yes sometimes no no no no yes sometimes sometimes no yes sometimes no no no yes no Class yes no no no yes yes yes yes yes no yes yes yes no yes yes yes yes no yes mammals non-mammals non-mammals mammals non-mammals non-mammals mammals non-mammals mammals non-mammals non-mammals non-mammals mammals non-mammals non-mammals non-mammals mammals non-mammals mammals non-mammals Live in Water Have Legs yes no Class ? A: attributes M: mammals N: non-mammals 6 6 2 2 P( A | M ) 0.06 7 7 7 7 1 10 3 4 P( A | N ) 0.0042 13 13 13 13 7 P( A | M ) P ( M ) 0.06 0.021 20 13 P( A | N ) P( N ) 0.004 0.0027 20 P(A|M)P(M) > P(A|N)P(N) => Mammals Example Naïve Bayesian Classifier: Training Dataset Class: C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair) June 28, 2016 age <=30 <=30 31…40 >40 >40 >40 31…40 <=30 <=30 >40 <=30 31…40 31…40 >40 income studentcredit_rating buys_compu high no fair no high no excellent no high no fair yes medium no fair yes low yes fair yes low yes excellent no low yes excellent yes medium no fair no low yes fair yes medium yes fair yes medium yes excellent yes medium no excellent yes high yes fair yes medium no excellent no Data Mining: Concepts and Techniques 19 Example Naïve Bayesian Classifier: Training Dataset • • • P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357 Compute P(X|Ci) for each class P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 X = (age <= 30 , income = medium, student = yes, credit_rating = fair) P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007 Therefore, X belongs to class (“buys_computer = yes”) June 28, 2016 Data Mining: Concepts and Techniques 20 Naïve Bayes: Summary • Robust to isolated noise points • Handle missing values by ignoring the instance during probability estimate calculations • Robust to irrelevant attributes • Independence assumption may not hold for some attributes – Use other techniques such as Bayesian Belief Networks (BBN) Naïve Bayesian Classifier: Comments • • • Advantages – Easy to implement – Good results obtained in most of the cases Disadvantages – Assumption: class conditional independence, therefore loss of accuracy – Practically, dependencies exist among variables • E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc. • Dependencies among these cannot be modeled by Naïve Bayesian Classifier How to deal with these dependencies? – Bayesian Belief Networks June 28, 2016 Data Mining: Concepts and Techniques 22 Dilanjutkan ke pert. 09 Classification and Prediction (cont.) Bina Nusantara