Lecture 11

INC 551 Artificial Intelligence Lecture 11 Machine Learning (Continue) Bayes Classifier Bayes Rule Play Tennis Example John wants to play tennis everyday. However, in some days, the condition is not good. So, he decide not to play. The following table is the record for the last 14 days. Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No Question: Today’s condition is <Sunny, Mild Temperature, Normal Humidity, Strong Wind> Do you think John will play tennis? Find P(condition | PlayTennis ) We need to use naïve Bayes assumption. Assume that all events are independent. P( sunny  mild  normal  strong | PlayTennis )  P( sunny | PlayTennis )  P(mild | PlayTennis )  P(normal | PlayTennis )  P( strong | PlayTennis ) Now, let’s look at each property P( sunny | PlayTennis  yes )  2 / 9  0.22 P( sunny | PlayTennis  no)  3 / 5  0.6 P(Temp  mild | PlayTennis  yes )  4 / 9  0.44 P(Temp  mild | PlayTennis  no)  2 / 5  0.4 P( Humid  normal | PlayTennis  yes )  6 / 9  0.66 P( Humid  normal | PlayTennis  no)  1 / 5  0.2 P(Wind  strong | PlayTennis  yes )  3 / 9  0.33 P(Wind  strong | PlayTennis  no)  3 / 5  0.6 P ( sunny  mild  normal  strong | PlayTennis  yes )  0.22  0.44  0.66  0.33  0.022 P ( sunny  mild  normal  strong | PlayTennis  no)  0.6  0.4  0.2  0.6  0.0288 Using Bayes rule P(condition | PlayTennis )  P( PlayTennis ) P( PlayTennis | condition )  P(condition ) 0.022  0.643 0.01415 P( PlayTennis  yes | condition )   P(condition ) P(condition ) 0.0288  0.357 0.01028 P( PlayTennis  no | condition )   P(condition ) P(condition ) Since P(condition) is the same, we can conclude that John is more likely to play tennis today. Note that, we do not need to compute P(condition) to get the answer. However, if you want to get the number, we can calculate P(condition) in the way similar to normalize the probability. P(condition )  P(condition  PlayTennis )  P(condition  PlayTennis ) P(condition )  0.01415  0.01028  0.02443 0.01415 P( PlayTennis  yes | condition )   0.58 0.02443 0.01028 P( PlayTennis  no | condition )   0.42 0.02443 Therefore, John is more likely to play tennis today with 58% chance. Learning and Bayes Classifier Learning is the adjustment of probability values to compute a posterior probability when new data Is added. Classifying Object Example Suppose we want to classify objects into two classes, A and B. There are two features that we can measure from each object, f1 and f2. We sample four objects randomly to be a database and classify it by hand. Sample f1 f2 Class 1 5.2 1.2 B 2 2.3 5.4 A 3 1.5 4.4 A 4 4.5 2.1 B Now, we have another sample that have f1=3.2 f2=4.2 we want to know what class it is. We want to find P(Class | feature) Using Bayes rule P( feature | Class )  P(Class ) P(Class | feature)  P( feature) From the table, we will count the number of events. P(Class  A)  2 / 4  0.5 P(Class  B )  2 / 4  0.5 Find P( feature | Class ) Again, we use the naïve Bayes assumption. Assume that all events are independent. P( f 1  f 2 | Class )  P( f 1 | Class )  P( f 2 | Class ) P( f 1 | Class ) we need to assume probability To find distribution because the features are continuous value. The most common form of distribution is Gaussian (normal) distribution. Gaussian distribution  ( x   )2   P( x)  exp   2 2 2 2   1 There are two parameters: mean µ and variance σ Using the maximum likelihood principle, the mean and the variance can be estimated from the samples In the database. Class A f1: Mean = (2.3+1.5 )/2 = 1.9 f2: Mean = (5.4+4.4 )/2 = 4.9 SD = 0.4 SD = 0.5 Class B f1: Mean = (5.2+4.5 )/2 = 4.85 f2: Mean = (1.2+2.1 )/2 = 1.65 SD = 0.35 SD = 0.45  ( x  1.9) 2    P( f 1 | A)  exp   2 2 (0.4 2 )  2(0.4 )  1 The object that we want to classify has f1=3.2 f2=4.2.  (3.2  1.9) 2    0.0051 P( f 1 | A)  exp   2 2(0.4 )  2 (0.4 2 )  1  (4.2  4.9) 2    0.2995 P( f 2 | A)  exp   2 2(0.5 )  2 (0.52 )  1  (3.2  4.85) 2    1.7016e - 05 P( f 1 | B)  exp   2 2(0.35 )  2 (0.352 )  1  (4.2  1.65) 2    9.4375e - 08 P( f 2 | B)  exp   2 2(0.45 )  2 (0.452 )  1 Therefore, P( f 1  f 2 | Class  A)  0.0051 0.2995  0.0015 P( f 1  f 2 | Class  B)  1.7016e - 5  9.4375e - 8  1.6059e - 12 P( feature | Class )  P(Class ) From Bayes P(Class | feature)  P( feature) 0.0015  0.5 P( A | feature)  P( feature) 1.6059e  12  0.5 P( B | feature)  P( feature) Therefore, we should classify the sample as Class A. Nearest Neighbor Classification NN is considered as no model classification. Nearest Neighbor’s Principle The unknown sample is classified to be the same class as the sample with closet distance. Feature 2 Closet Distance Feature 1 We classify the sample as a circle. Distance between Samples Sample X and Y have multi-dimension feature values. 2  3 X   1   0   2   1 Y   5   3 The distance between sample X,Y can be calculated by this formula. D( x, y)  k k k k k k ( x  y )  ( x  y )  ( x  y )  ..  ( x  y ) i1 i i 1 1 2 2 N N N D( x, y)  k k k k k k ( x  y )  ( x  y )  ( x  y )  ..  ( x  y ) i1 i i 1 1 2 2 N N N If k = 1 , the distance is called Manhattan distance If k = 2 , the distance is called Euclidean distance If k = ∞ , the distance is the maximum value of feature Euclidean is well-known and is the prefer one. Classifying Object with NN Sample f1 f2 Class 1 5.2 1.2 B 2 2.3 5.4 A 3 1.5 4.4 A 4 4.5 2.1 B Now, we have another sample, f1=3.2 f2=4.2 We want to know its class. Compute Euclidian distance from it to all other samples D( x, s1)  (3.2  5.2) 2  (4.2  1.2) 2  3.6056 D( x, s 2)  (3.2  2.3) 2  (4.2  5.4) 2  1.5 D( x, s3)  (3.2  1.5) 2  (4.2  4.4) 2  1.7117 D( x, s 4)  (3.2  4.5) 2  (4.2  2.1) 2  2.4698 The unknown sample has the closest distance to the second sample. Therefore, we classify it to be the same class as the second sample, which is Class A. K-Nearest Neighbor (KNN) Instead of using the closet sample as the decided class, we use the closet k samples as the decided class. Feature 2 Feature 1 Example k=3 The data is classified as a circle Feature 2 Feature 1 Example k=5 The data is classified as a star.

Lecture 11

Related documents

Products

Support

Lecture 11

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib