Classification A task of induction to find patterns 1/21/02 CSE 591: Data Mining by H. Liu 1 Outline ❚ ❚ ❚ ❚ ❚ Data and its format Problem of Classification Learning a classifier Different approaches Key issues 1/21/02 CSE 591: Data Mining by H. Liu 2 Data and its format ❚ Data ❙ attribute-value pairs ❙ with/without class ❚ Data type ❙ continuous/discrete ❙ nominal ❚ Data format ❙ flat 1/21/02 CSE 591: Data Mining by H. Liu 3 Sample data Outlook Temp Humidity Windy Class Sunny Sunny O’cast Rain Rain Rain O’cast Sunny Sunny Rain Sunny O’cast O’cast Rain 1/21/02 Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild High High High Normal Normal Normal Normal High Normal Normal Normal High Normal High CSE 591: Data Mining by H. Liu No Yes No No No Yes Yes No No No Yes Yes No Yes Yes Yes No No No Yes No Yes No No No No No Yes 4 Induction from databases ❚ Inferring knowledge from data ❚ The task of deduction ❙ infer information that is a logical consequence of querying a database ❘ Who conducted this class before? ❘ Which courses are attended by Mary? ❚ Deductive databases: extending the RDBMS 1/21/02 CSE 591: Data Mining by H. Liu 5 Classification ❚ It is one type of induction ❙ data with class labels ❚ Examples ❙ If weather is rainy then no golf ❙ If ❙ If 1/21/02 CSE 591: Data Mining by H. Liu 6 Different approaches ❚ There exist many techniques ❙ ❙ ❙ ❙ ❙ ❙ 1/21/02 Decision trees Neural networks K-nearest neighbors Naïve Bayesian classifiers Support Vector Machines and many more ... CSE 591: Data Mining by H. Liu 7 A decision tree Outlook sunny overcast Humidity high NO 1/21/02 rain Wind YES normal YES CSE 591: Data Mining by H. Liu strong weak NO YES 8 Inducing a decision tree ❚ There are many possible trees ❙ let’s try it on the golfing data ❚ How to find the most compact one ❙ that is consistent with the data? ❚ Why the most compact? ❙ Occam’s razor principle ❚ Issue of efficiency w.r.t. optimality 1/21/02 CSE 591: Data Mining by H. Liu 9 Information gain ❚ Entropy - − ∑ pi log pi and ∑ pi = 1. i i ❚ Information gain - the difference between the node before and after splitting A i 1/21/02 CSE 591: Data Mining by H. Liu 10 Building a compact tree ❚ The key to building a decision tree - which attribute to choose in order to branch. ❚ The heuristic is to choose the attribute with the maximum IG. ❚ Another explanation is to reduce uncertainty as much as possible. 1/21/02 CSE 591: Data Mining by H. Liu 11 Learn a decision tree Outlook sunny overcast Humidity high NO 1/21/02 rain Wind YES normal YES CSE 591: Data Mining by H. Liu strong weak NO YES 12 K-Nearest Neighbor ❚ One of the most intuitive classification algorithm ❚ An unseen instance’s class is determined by its nearest neighbor ❚ The problem is it is sensitive to noise ❚ Instead of using one neighbor, we can use k neighbors 1/21/02 CSE 591: Data Mining by H. Liu 13 K-NN ❚ New problems ❙ lazy learning ❙ large storage ❚ An example ❚ How good is k-NN? 1/21/02 CSE 591: Data Mining by H. Liu 14 Naïve Bayes Classifier ❚ This is a direct application of Bayes’ rule ❚ P(C|X) = P(X|C)P(C)/P(X) X - a vector of x1,x2,…,xn ❚ That’s the best classifier you can build ❚ But, there are problems 1/21/02 CSE 591: Data Mining by H. Liu 15 NBC (2) ❚ Assume conditional independence between xi’s ❚ We have ❚ An example ❚ How good is it in reality? 1/21/02 CSE 591: Data Mining by H. Liu 16 Classification via Neural Networks ∑ Squash A perceptron 1/21/02 CSE 591: Data Mining by H. Liu 17 What can a perceptron do? ❚ Neuron as a computing device ❚ To separate a linearly separable points ❚ Nice things about a perceptron ❙ distributed representation ❙ local learning ❙ weight adjusting 1/21/02 CSE 591: Data Mining by H. Liu 18 Linear threshold unit ❚ Basic concepts: projection, thresholding W vectors evoke 1 W = [.11 .6] L= [.7 .7] 1/21/02 CSE 591: Data Mining by H. Liu .5 19 Eg 1: solution region for AND problem •Find a weight vector that satisfies all the constraints AND problem 000 010 100 111 1/21/02 CSE 591: Data Mining by H. Liu 20 Eg 2: Solution region for XOR problem? XOR problem 000 011 101 110 1/21/02 CSE 591: Data Mining by H. Liu 21 Learning by error reduction ❚ Perceptron learning algorithm ❙ If the activation level of the output unit is 1 when it should be 0, reduce the weight on the link to the ith input unit by r*Li, where Li is the ith input value and r a learning rate ❙ If the activation level of the output unit is 0 when it should be 1, increase the weight on the link to the ith input unit by r*Li ❙ Otherwise, do nothing 1/21/02 CSE 591: Data Mining by H. Liu 22 Multi-layer perceptrons ❚ Using the chain rule, we can backpropagate the errors for a multi-layer perceptrons. Output layer Hidden layer Input layer 1/21/02 CSE 591: Data Mining by H. Liu 23