Uploaded by Arkan Salem

chap4 naive bayes - 2

advertisement
Classification using Bayes Theorem
© Prentice Hall
1
Outlook
sunny
sunny
overcast
rain
rain
rain
overcast
sunny
sunny
rain
sunny
overcast
overcast
rain
Temperature Humidity W indy Class
hot
high
false
N
hot
high
true
N
hot
high
false
P
mild
high
false
P
cool
normal false
P
cool
normal true
N
cool
normal true
P
mild
high
false
N
cool
normal false
P
mild
normal false
P
mild
normal true
P
mild
high
true
P
hot
normal false
P
mild
high
true
N
© Prentice Hall
2
Bayesian classification
The classification problem may be formalized
using a-posteriori probabilities:
 P(C|X) = prob. that the sample tuple
X=<x1,…,xk> is of class C.
 E.g. P(class=N | outlook=sunny,
windy=true,…)
 Idea: assign to sample X the class label C
such that P(C|X) is maximal

© Prentice Hall
4
Estimating a-posteriori probabilities

Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)

P(X) is constant for all classes

P(C) = relative freq of class C samples

C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum

Problem: computing P(X|C) is unfeasible!
© Prentice Hall
5
Naïve Bayesian Classification
Naïve assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
 If i-th attribute is categorical:
P(xi|C) is estimated as the relative freq of
samples having value xi as i-th attribute in
class C
 If i-th attribute is continuous:
P(xi|C) is estimated thru a Gaussian
density function
© Prentice Hall
6
 Computationally
easy in both cases

Play-tennis example: estimating P(xi|C)
Outlook
sunny
sunny
overcast
rain
rain
rain
overcast
sunny
sunny
rain
sunny
overcast
overcast
rain
Temperature Humidity Windy Class
hot
high
false
N
hot
high
true
N
hot
high
false
P
mild
high
false
P
cool
normal false
P
cool
normal true
N
cool
normal true
P
mild
high
false
N
cool
normal false
P
mild
normal false
P
mild
normal true
P
mild
high
true
P
hot
normal false
P
mild
high
true
N
outlook
P(sunny|p) = 2/9 P(sunny|n) = 3/5
P(overcast|p) = 4/9
P(overcast|n) = 0
P(rain|p) = 3/9
P(rain|n) = 2/5
temperature
P(hot|p) = 2/9
P(hot|n) = 2/5
P(mild|p) = 4/9
P(mild|n) = 2/5
P(cool|p) = 3/9
P(cool|n) = 1/5
humidity
P(high|p) = 3/9
P(high|n) = 4/5
P(p) = 9/14
P(normal|p) = 6/9
P(normal|n) = 2/5
P(n) = 5/14
windy
P(true|p) = 3/9
P(true|n) = 3/5
P(false|p) = 6/9
P(false|n) = 2/5
© Prentice Hall
7
Play-tennis example:
classifying X
An unseen sample X = <rain, hot, high,
false>
 P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p
) = 3/9·2/9·3/9·6/9·9/14 = 0.010582
 P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n
) = 2/5·2/5·4/5·2/5·5/14 = 0.018286
 Sample X is classified in class n (don’t
play)

© Prentice Hall
8
P(X) = ? (does not depend on class
label and was ignored)
 P(X)= p(X/P)*p(P)+p(X/N)*p(N)
 P(X)= 0.010582+ 0.018286=0.028868
 The actual probability of (P/X) =
0.010582/0.028868
 The actual probability of (N/X) =
0.018286/0.028868

© Prentice Hall
9
The independence hypothesis…

… makes computation possible

… yields optimal classifiers when satisfied

… but is seldom satisfied in practice, as
attributes (variables) are often correlated.

Attempts to overcome this limitation:
– Bayesian networks, that combine Bayesian reasoning
with causal relationships between attributes
– Decision trees, that reason on one attribute at the
time, considering most important attributes first
© Prentice Hall
10
Download