Classification A task of induction to find patterns 1/21/02

advertisement
Classification
A task of induction to find
patterns
1/21/02
CSE 591: Data Mining by H. Liu
1
Outline
❚
❚
❚
❚
❚
Data and its format
Problem of Classification
Learning a classifier
Different approaches
Key issues
1/21/02
CSE 591: Data Mining by H. Liu
2
Data and its format
❚ Data
❙ attribute-value pairs
❙ with/without class
❚ Data type
❙ continuous/discrete
❙ nominal
❚ Data format
❙ flat
1/21/02
CSE 591: Data Mining by H. Liu
3
Sample data
Outlook Temp Humidity Windy Class
Sunny
Sunny
O’cast
Rain
Rain
Rain
O’cast
Sunny
Sunny
Rain
Sunny
O’cast
O’cast
Rain
1/21/02
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
High
High
High
Normal
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
CSE 591: Data Mining by H. Liu
No
Yes
No
No
No
Yes
Yes
No
No
No
Yes
Yes
No
Yes
Yes
Yes
No
No
No
Yes
No
Yes
No
No
No
No
No
Yes
4
Induction from databases
❚ Inferring knowledge from data
❚ The task of deduction
❙ infer information that is a logical consequence
of querying a database
❘ Who conducted this class before?
❘ Which courses are attended by Mary?
❚ Deductive databases: extending the
RDBMS
1/21/02
CSE 591: Data Mining by H. Liu
5
Classification
❚ It is one type of induction
❙ data with class labels
❚ Examples ❙ If weather is rainy then no golf
❙ If
❙ If
1/21/02
CSE 591: Data Mining by H. Liu
6
Different approaches
❚ There exist many techniques
❙
❙
❙
❙
❙
❙
1/21/02
Decision trees
Neural networks
K-nearest neighbors
Naïve Bayesian classifiers
Support Vector Machines
and many more ...
CSE 591: Data Mining by H. Liu
7
A decision tree
Outlook
sunny
overcast
Humidity
high
NO
1/21/02
rain
Wind
YES
normal
YES
CSE 591: Data Mining by H. Liu
strong
weak
NO
YES
8
Inducing a decision tree
❚ There are many possible trees
❙ let’s try it on the golfing data
❚ How to find the most compact one
❙ that is consistent with the data?
❚ Why the most compact?
❙ Occam’s razor principle
❚ Issue of efficiency w.r.t. optimality
1/21/02
CSE 591: Data Mining by H. Liu
9
Information gain
❚ Entropy - − ∑ pi log pi and ∑ pi = 1.
i
i
❚ Information gain - the difference between
the node before and after splitting
A
i
1/21/02
CSE 591: Data Mining by H. Liu
10
Building a compact tree
❚ The key to building a decision tree - which
attribute to choose in order to branch.
❚ The heuristic is to choose the attribute
with the maximum IG.
❚ Another explanation is to reduce
uncertainty as much as possible.
1/21/02
CSE 591: Data Mining by H. Liu
11
Learn a decision tree
Outlook
sunny
overcast
Humidity
high
NO
1/21/02
rain
Wind
YES
normal
YES
CSE 591: Data Mining by H. Liu
strong
weak
NO
YES
12
K-Nearest Neighbor
❚ One of the most intuitive classification
algorithm
❚ An unseen instance’s class is determined
by its nearest neighbor
❚ The problem is it is sensitive to noise
❚ Instead of using one neighbor, we can
use k neighbors
1/21/02
CSE 591: Data Mining by H. Liu
13
K-NN
❚ New problems
❙ lazy learning
❙ large storage
❚ An example
❚ How good is k-NN?
1/21/02
CSE 591: Data Mining by H. Liu
14
Naïve Bayes Classifier
❚ This is a direct application of Bayes’ rule
❚ P(C|X) = P(X|C)P(C)/P(X)
X - a vector of x1,x2,…,xn
❚ That’s the best classifier you can build
❚ But, there are problems
1/21/02
CSE 591: Data Mining by H. Liu
15
NBC (2)
❚ Assume conditional independence
between xi’s
❚ We have
❚ An example
❚ How good is it in reality?
1/21/02
CSE 591: Data Mining by H. Liu
16
Classification via Neural
Networks
∑
Squash
A perceptron
1/21/02
CSE 591: Data Mining by H. Liu
17
What can a perceptron do?
❚ Neuron as a computing device
❚ To separate a linearly separable points
❚ Nice things about a perceptron
❙ distributed representation
❙ local learning
❙ weight adjusting
1/21/02
CSE 591: Data Mining by H. Liu
18
Linear threshold unit
❚ Basic concepts: projection, thresholding
W vectors
evoke 1
W = [.11 .6]
L= [.7 .7]
1/21/02
CSE 591: Data Mining by H. Liu
.5
19
Eg 1: solution region for
AND problem
•Find a weight vector that satisfies all the constraints
AND problem
000
010
100
111
1/21/02
CSE 591: Data Mining by H. Liu
20
Eg 2: Solution region for
XOR problem?
XOR problem
000
011
101
110
1/21/02
CSE 591: Data Mining by H. Liu
21
Learning by error reduction
❚ Perceptron learning algorithm
❙ If the activation level of the output unit is 1
when it should be 0, reduce the weight on
the link to the ith input unit by r*Li, where Li
is the ith input value and r a learning rate
❙ If the activation level of the output unit is 0
when it should be 1, increase the weight on
the link to the ith input unit by r*Li
❙ Otherwise, do nothing
1/21/02
CSE 591: Data Mining by H. Liu
22
Multi-layer perceptrons
❚ Using the chain rule, we can backpropagate the errors for a multi-layer
perceptrons.
Output layer
Hidden layer
Input layer
1/21/02
CSE 591: Data Mining by H. Liu
23
Download