XL-Miner Primer - Daniel Yerelian

advertisement
1
Classification
Naïve Bayes
Business Intelligence
2
Naïve Bayes: The concept
• Bayes Theorem is used for conditional
probability calculation in presence of some
information.
• The conditional probability is typically of
the following form:
Pr(C|X1, X2, X3, etc.) = [Pr(X1, X2, X3, etc.|C)*Pr(X)]/[Pr(X1, X2, X3,
etc.|C)*Pr(C)+ Pr(X1, X2, X3, etc.|C bar)*Pr(C bar)]
Where,
Pr(C|X1, X2, X3, etc.) means probability of event C in the presence of
condition/information X1, X2, X3, etc.) and C bar is the complement
event of C.
Example: Let C denotes the event that 405 will be moving really slow
without any prior information. We can estimate that from our prior
experience and put a value of 30%. Let now X1, X2, and X3 denote the
given information that it is raining, there is an accident, and one of the
left lane is closed, then clearly, the probability of C given this new
information will change dramatically. Bayes theorem provides a precise
way to calculate this.
Naïve Bayes contd..
• However, with many conditional information
present, calculation of posterior probability
with Bayes can be very involved.
• We then use a simplified version of Bayes
Theorem based on the independence of the
conditional probabilities.
• In this case we use the formula
• P(C|X1, X2, X3, etc.) = P(X1|C)*P(X2|C)*P(X3|C)etc.*P(C)/
P(X1|C)*P(X2|C)*P(X3|C)etc.*P(C)+ P(X1|C bar)*P(X2|C
bar)*P(X3|C bar), etc. * P(C bar)]
3
4
Naïve Bayes: Example
•
We have a list with the following information about the size of a
company, their audit status, and if there were filed charged against
them.
Charge Filed Company Size
y
small
n
small
n
large
n
large
n
small
n
small
y
small
y
large
n
large
y
large
Status
truthful
truthful
truthful
truthful
truthful
truthful
fraudulent
fraudulent
fraudulent
fraudulent
Count of Status
Status
fraudulent
truthful
Grand Total
Count of Status
Status
fraudulent
truthful
Grand Total
Company Size
large
n
1
2
3
Company Size
small
n
3
3
Charge Filed
large Total
y
2
3
2
5
2
Charge Filed
small Total
y
1
1
2
1
4
5
Naïve Bayes
• If we want to know the probability that a company will be
fraudulent given it is small in size and there is a charge filed
against it or, P(fraudulent|size = small, charges=y)
• From the crosstab/pivot tables we can see that the above
probability = ½ (there are 2 companies that are small and have
charges filed against them and 1 of them is fraudulent).
• Similarly, P(fraudulent|small, no) = 0/3 = 0
• P(fraudulent|large, y) = 2/2 = 1
• P(fraudulent|large. N) = 1/3 = 0.33
• Using Naïve Bayes we can get the following:
• P(fraudulent|small, y) =
P(small|fraudulent)*P(y|fraudulent)*P(fraudulent)/[P(small|fraud
ulent)*P(y|fraudulent)*P(fraudulent)+
P(small|truthful)*P(y|truthful)*P(truthfult)]
• = (1/4)*(3/4)*(4/10)/[(1/4)*(3/4)*(4/10)+(4/6)*(1/6)*(6/10)]
• = 0.53 and is very close to the 0.5 value that we had from exact
calculation!
5
6
Naïve Bayes: Flight Delay Example
• Let us use XLMiner to create the conditional probabilities
(each of the individual probability items) for flight delay.
• We will use only
– Carrier, Day of the week, Dep Time in one hour block,
Destination, Origin, and Weather
• Run XLMiner and look at the conditional probability of the
training set
• For any record in the validation set, the probability for
classification is computed by multiplying the corresponding
conditional probabilities and the prior probability of that
particular class
• Let us do two examples
7
Examples
• Example 1: Record Details (row 633 in …NNBforlecture.xlsx)
Row Id.
626
Cum
ontime
2
Predicted
Class
Actual
Class
ontime
ontime
Prob. for
ontime
(success)
CARRIER
DEP_TIME
DEST
DISTANCE
ORIGIN
Weather
DAY_WEEK
0.804686121
DH
1640
JFK
213
DCA
0
4
• Multiply all the relevant conditional probabilities for ontime and
get p1
• Multiply all the relevant conditional probabilities for delayed and
get p2
• Weigh each one of them with the corresponding prior class
probabilities and add the two numbers (w1*p1 + w2*p2)
• Probability for class i = wipi/(w1*p1 + w2*p2)
• Classify based on if the above prob > cut-off
• Example 2: Record Details (row 610 in …NNBforlecture.xlsx)
Row Id.
194
Cum
ontime
Predicted
Class
Actual
Class
ontime
delayed
Prob. for
ontime
(success)
CARRIER
DEP_TIME
DEST
DISTANCE
ORIGIN
Weather
DAY_WEEK
0.846384259
MQ
1936
LGA
214
DCA
0
7
•
•
Details
Record 1. Let us list the conditions.
Input
Variables
CARRIER
DH
DEP_TIME
1640
DEST
JFK
DISTANCE
213
ORIGIN
DCA
Weather
0
DAY_WEEK
4
Corresponding conditional probabilities
for ontime are (I used a vlookup from the
conditional probability tables given by
XLMiner) extracted from the ontime side.
Title
CARRIER
DEP_TIME
DEST
DISTANCE
ORIGIN
Weather
DAY_WEEK
•
•
8
Condition
DH
1640
JFK
213
DCA
0
4
Ontime
Conditional Prob
0.243192
0.004695
0.176526
0.187793
0.635681
1.000000
0.159624
Calculate p1 or by multiplying the
numbers above. For p1*w1, mutiply the
number below with 0.80620
p1 = 3.84059E-06
CARRIER
DEP_TIME
(Partially
show n)
DEST
DISTANCE
ORIGIN
Weather
ontime
Value
Prob
CO 0.0384977
DH 0.2431925
DL
0.2
MQ 0.1126761
OH 0.0178404
RU
0.170892
UA 0.0169014
US
0.2
548
0.000939
550
0.000939
552 0.0018779
553 0.0056338
EWR 0.2835681
JFK 0.1765258
LGA 0.5399061
169 0.0507042
184 0.0178404
199 0.1079812
213 0.1877934
214 0.4647887
228 0.0957746
229 0.0751174
BWI 0.0685446
DCA 0.6356808
IAD 0.2957746
0
1
1
0
delayed
Value
Prob
CO 0.06640625
DH 0.33984375
DL
0.109375
MQ
0.1796875
OH 0.01171875
RU 0.21484375
UA
0.0078125
US
0.0703125
548
0
550
0
552 0.00390625
553
0
EWR 0.38671875
JFK
0.1875
LGA 0.42578125
169 0.08203125
184 0.01171875
199 0.11328125
213 0.26953125
214 0.29296875
228 0.09765625
229
0.1328125
BWI
0.09375
DCA
0.484375
IAD
0.421875
0 0.92578125
1 0.07421875
Day of the week is not listed to save space
According to relative occurrences in training data
Prior class probabilities
Class
ontime
delayed
Prob.
0.806207419 <-- Success Class
0.193792581
Details
•
By following the exact same methods and
subsequent calculations for the p1*w1 and
p2*w2 we can easily get the following results.
9
Input
Variables
CARRIER
conditional
probability
0.804686121
0.195313879
DEP_TIME
(Partially
show n)
sum
pi*wi
3.09631E-06
7.51538E-07
pi
3.84059E-06
3.87805E-06
3.84785E-06
DEST
DISTANCE
Ontime
Delayed
Cut off Prob.Val. for Success (Updatable)
0.5
ORIGIN
626
2
ontime ontime 0.804686121
DH
1640
JFK
213
DCA
0
4
Weather
•
Verify the results for record 2.
ontime
Value
Prob
CO 0.0384977
DH 0.2431925
DL
0.2
MQ 0.1126761
OH 0.0178404
RU
0.170892
UA 0.0169014
US
0.2
548
0.000939
550
0.000939
552 0.0018779
553 0.0056338
EWR 0.2835681
JFK 0.1765258
LGA 0.5399061
169 0.0507042
184 0.0178404
199 0.1079812
213 0.1877934
214 0.4647887
228 0.0957746
229 0.0751174
BWI 0.0685446
DCA 0.6356808
IAD 0.2957746
0
1
1
0
delayed
Value
Prob
CO 0.06640625
DH 0.33984375
DL
0.109375
MQ
0.1796875
OH 0.01171875
RU 0.21484375
UA
0.0078125
US
0.0703125
548
0
550
0
552 0.00390625
553
0
EWR 0.38671875
JFK
0.1875
LGA 0.42578125
169 0.08203125
184 0.01171875
199 0.11328125
213 0.26953125
214 0.29296875
228 0.09765625
229
0.1328125
BWI
0.09375
DCA
0.484375
IAD
0.421875
0 0.92578125
1 0.07421875
Day of the week is not listed to save space
According to relative occurrences in training data
Prior class probabilities
Class
ontime
delayed
Prob.
0.806207419 <-- Success Class
0.193792581
10
Notes
• Quite simple and useful
• Better than exact Bayes approach because all
combinations may not be present in the data (Exact
Bayes will fail as there will be no conditional
probability for that particular combination)
• However, dependent on data and thus can give
erroneous results for small data set
• If an association makes sense, but is not present
then the classification scheme will not work
– Example: Yatch owners may be target for high value life
insurance. However, collected data has no incidence of
high value life insurance!
• Next: Other classification schemes!
Download