2(b) Naïve Baye’s classifier: Consider the problem of classifying an object based on the evidence provided by the feature vector x; A Naïve Baye’s classifier chooses a class that is most probable given the feature vector x; In other words it maximizes the a posteriori probability. If we assume that the priors are equal, then the likelihood is maximized; For a simple two class case, if 1, 2 represent the two classes; x denotes an observation then decide Class label = 1 if P(1 | x) > P(2 | x) Class label = 2 if P(1 | x) < P(2 | x); In our case (as the features are discrete), the probabilities (priors and likelihood) can be estimated using the counts of occurrence of each feature vector; For a continuous case one could use the kernel density estimation techniques; For instance P(class = yes) = 9/14 = 0.64 P(class = no) = 5/14 = 0.36 appx P(yes/sunny,hot,high,false) = 1/14; (if the features are assumed to be independent, then the joint distributions can be spilt into product of marginals); Estimation of N-d histogram is computationally expensive; So independence assumption is generally used; The experiment is repeated using a naïve baye’s classifier. Again, the test set is same as the training set. Results indicate that decision tree performs better than the baye’s classifier. Small number of samples may be the reason for the low classification rate (inaccurate estimation of density); However, it is observed to be a little faster than the decision trees which can be ignored for all practical purposes; One can asses the computational complexity and resource requirement of each algorithm more accurately, by using a larger dataset; Of the 14 samples that are used for testing, 13 of them are correctly classified and 1 is incorrectly classified; References: (1) Pattern Classification by Duda, Hart and Stork, 2nd edition, ISBN: 0-471-05669-3; (2) http://research.cs.tamu.edu/prism/lectures/pr/pr_l4.pdf (1 b) Path to be chosen at each node is decided by the attribute value; the numbers at each leaf denotes the number of occurrences of that selection (of attribute values);Attributes are listed on top of each node; All the leaves are named after the classes; (1 c) @attribute keyword followed by the attribute name and the set values the attribute can have @attribute outlook {sunny, overcast, rainy} @attribute temperature {hot, mild, cool} @attribute humidity {high, normal} @attribute windy {TRUE, FALSE} @attribute play {yes, no} (1 d) @data sunny,hot,high,FALSE,no sunny,hot,high,TRUE,no overcast,hot,high,FALSE,yes rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes rainy,cool,normal,TRUE,no overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no sunny,cool,normal,FALSE,yes rainy,mild,normal,FALSE,yes sunny,mild,normal,TRUE,yes overcast,mild,high,TRUE,yes overcast,hot,normal,FALSE,yes rainy,mild,high,TRUE,no (1 e) % (1 f) 5 attributes (4 independent and 1 dependent) (1 g) To Play or not is decided by the attributes outlook, temperature, windy, and humidity; so {yes, no} are the two class labels; (3a) (3b) (3c) (3d) Aode can handle only nominal attributes (weather.nominal.arff); Linear regression handles numeric attributes (veteran.arff) wNearest works for sonar.arff Assosiations can handle Weather.nominal.arff (3e) Aode – sonar.arff (3f) Linear regression cannot handle discrete types. (weather.nominal.arff) (3g) Clustering learners can handle everything as per the table; However I am not sure how this will be able to handle missing attributes; Weka cannot even handle huge datasets as it dumps everything into the memory; (3h) Associations cannot handle numeric discrete attributes. (sonar.arff) (4a) 11 ; lower bound = 0 is obtainted when Num = min; similarly Upperbound = N when Num = max; {0,1,..N} =>N+1; (4b &h) When M < N, I don’t see any potential problem except for the computational overhead. So this could be avoided if we make the number of bins a function of the number of unique values the attributes can take; nbins takes a log function; One could use any monotonic function which would map the unique(i) to (1 to unique(i)); sucha s sqrt[Doherty] , inversetan etc; Emperical Study conducted by doherty shows that the log functions performance is comparable to the Fayyad Irani’s supervised entropy based method; Authors mention that the SAS software uses the expression max (1, 2*log(unique(i)) ; The max function ensures that the number of bins is positive; The same expression can be used in the nbins code as well; The underlying assumption for the method to work is that the data is uniformly distributed. Several other methods are proposed in literature [Doherty] to handle nonuniform distributions. One simple way is apply a transform (log or tanh) to make the data uniform if it is not uniform. This process is referred to as normalization in Biometrics literature. Strictly speaking, any transformation would alter the joint density of the data and so should affect the classification performance. How ever experimental evidences do not show any significant influence on the performance of density based classification schemes; however, these normalization or transformation greatly affects simple combination rules and rules which make use of the distance metrics; Inorder to understand the affect of the bin size on the classification accuracy, One could conduct a simple experiment in which the error rate is calculated by varying the bin size. The experimental results are as indicated (4c) Inititally, data is being read and the max and min values of each attribute is being calculated; Then these values are being used to quantize; if bin log is one then one would use the adaptive scheme mentioned above; then the disctretised data is being written to a new arff file. (4d) round and label; (4e) initially the continuos data is classified using naïve bayes; Later data is subjected to discretisation and then is classified using the classifier again; (4f) FS is the formation specifier; for the input; so the values with in this separator is treated as one entity; OFS this is the output field separator Ignorecase would Ignore the case when set; 4(g) sub function in this code basically substituting the comments and other things with next; 4(i) replace log(unique(i) with int(max(1,2*log(unique(i))) as is done in SAS; Define the functions int, max;