CAP 4770 Introduction to Data Mining Project Report #2: Classification Pedro Exposito ID: 1826385 1. Introduction and Objectives The main goal of this project is to choose and evaluate classification mechanisms. For this purpose, some classifiers already available in WEKA will be used and tested on three different datasets. The Iris, Zoo, and Adult datasets from the UCI Machine Learning Repository website ( http://archive.ics.uci.edu/ml/datasets.html ) will be used in this project. The classification algorithms Decision Tree, Naïve Bayes, KNN, and SVM will be applied to each dataset through WEKA’s Explorer, after the appropriate preprocessing steps are taken for each dataset. Then, the four classifiers will be compared based on performance and a brief discussion about when to use each classifier, based on the general results, will be given in the conclusions section. It is not assumed that the reader has previous knowledge of WEKA, thus, this project will also demonstrate some of WEKA’s classification capabilities for any novice user, as well as the steps needed to obtain the results shown. 2. Performance Comparison Before comparing the performances of each algorithm, the datasets will be described and the preprocessing steps taken to do the experiment with each one will be covered. Different datasets might require different preprocessing methods even if the same experiment is intended for all. Therefore, it is important to understand how to set up each dataset before we compare their classification results. 2.1 Description of the data sets 2.1.1 Iris The Iris dataset contains three classes of 50 instances each, which leads to a set with 150 instances total. Each class refers to a type of Iris plant—Setosa, Versicolor, or Virginica. Each instance in the set has four numeric attributes, which represent sepal length, sepal width, petal length, and petal width, in that order. The fifth and last attribute for all instances is the nominal attribute class, which is the type of Iris plant. 2.1.2 Zoo The Zoo dataset contains 101 instances that describe the characteristics of each animal at a particular zoo. All instances refer to a different animal. All tuples contain a boolean value (0 or 1) for characteristics such as feathers, eggs, milk, and airbone, which tells us whether that animal has that characteristic—given by a 1— or not—given by 0. There are 15 such boolean values in every instance. The number of legs is the 14th of the comma separated attributes in this dataset and it can take a value from the set {0,2,4,6,8}. The animal type, which refers to the animal group each instance belongs to, is the last attribute. There are seven types in total. The following are all the attributes in the Zoo dataset: animal_name, hair, feathers, eggs, milk, airbone, aquatic, predator, toothed, backbone, breathes, venomous, fins, legs, tail, domestic, catsize, and type. 2.1.3 Adult The Adult dataset is by far the largest of the three. It contains more than 30,000 instances. It is also the only one out of the three that contains instances that have unknown values for some attributes. Each instance refers to an adult individual and contains attributes that describe their job, education, and some personal information. The 15 attributes for each instance in this dataset are the following: age, workclass, fnlwgt (weight), education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, annual_pay. The Adult dataset is meant to be used as a way to predict whether the previously mentioned attributes affect if an adult’s annual pay is above or below $50,000 per year. 2.2 Data Preparation Methods 2.2.1 File Format Change All datasets were originally downloaded from http://archive.ics.uci.edu/ml/datasets.html. The original data files were iris.data, zoo.data, and adult.data; however, these were not in compatible format for WEKA, so they had to be modified slightly. WEKA’s standard file extension type is .arff, so the .data files were converted to .arff. Files with .csv extension would have worked as well, but to make the experiment’s output more clear the .arff format is preferred because for .csv WEKA renames each attribute to the value seen in the first instance, thus, attribute names would be incorrect. In order to transform each dataset to .arff, each one was copied into a Notepad document and saved with .arff extension (i.e. iris.arff) and the appropriate header information for .arff files was added in the lines on top of the actual data. For example, the following was added on top of the data from iris.data: @RELATION iris @ATTRIBUTE sepal_length NUMERIC @ATTRIBUTE sepal_width NUMERIC @ATTRIBUTE petal_length NUMERIC @ATTRIBUTE petal_width NUMERIC @ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-virginica} @DATA The top line specifies the relation name. Then each attribute name and its type, or domain, is specified. Next, comes the actual data for each instance separated by comas. @DATA specifies that the data instances start below. The downloaded .data files already had attributes separated by commas and each instance in a row, therefore, just making these modifications was enough to convert them to the correct file format. After these changes were done, the file was saved as iris.arff and ready to be opened using WEKA. This procedure was repeated to obtain zoo.arff and adult.arff as well. In zoo.arff the boolean attributes were given a domain of {0,1} and the type attribute was given the domain {1,2,3,4,5,6,7} because there are seven animal groups in the dataset. The attribute animal_name was specified with type STRING. The same procedure was done in adult.arff specifying the possible set of values for each nominal attribute in its declaration at the top. The annual_pay attribute was represented by “>50K” or “<=50K”, which divide all instances into two categories based on annual pay of less or more than $50,000 per year. 2.2.1 Iris Data Preprocessing Open WEKA’s Explorer, then choose “Open file…” from the Preprocess tab and open the iris.arff file. WEKA will display the attributes of the dataset and various statistics in the Preprocess window. Before we move on to the classifiers in the Classify tab, the appropriate data preprocessing methods should be executed to prepare the data. WEKA provides various filters to do data preprocessing. For the Iris dataset, it is convenient to convert the numerical attributes to nominal attributes. To do so, click Choose, then open weak->filters->unsupervised->attribute ->Discretize. After opening the Discretize filter, double-click on its text field and a window with the filter’s options and chosen parameters is shown. Change the bins parameter to 5 and click OK, then click Apply to run the Discretize filter on the Iris data. This converts the four numerical attributes of Iris (sepal and petal length/width) to nominal attributes and divides their numerical range into five partitions. Five was chosen as the number of partitions because all the decimal values for these four attributes are between 1 and 9; thus, five is a number of partitions that makes sense to distribute this data. 2.2.2 Zoo Data Preprocessing The Zoo dataset is the one that requires the least amount of preprocessing because during the file format change procedure, covered in section 2.2.1, all the boolean attributes in zoo.arff were created as nominal attributes. Attributes type and legs were also created as nominal using their few possible values as their domain set. We deal here with the only non-nominal attribute, animal_name, which was created as a STRING. This attribute varies too much because it has a different animal name for every instance, and it does not provide valuable information like the other attributes. Therefore, we chose to remove it. The type attribute, which tells us what animal group each instance belongs to, is more meaningful for classification (based on animal types) and is enough to identify animal instances by group, even if they don’t have the animal’s name. To exclude animal_name from the dataset, mark the checkbox to the left of attribute animal_name, then click Remove. The Zoo dataset is ready for classification after the removal of this attribute. 2.2.3 Adult Data Preprocessing The Adult dataset was the only one out of the three datasets that required special preprocessing methods in order to apply one of the classifiers to it successfully. The Discretize filter was used on it, just as in the Iris dataset, to convert its numerical attributes to nominal attributes. However, before applying Discretize, the filter ReplaceMissingValues (weka->filters->unsupervised->attribute->ReplaceMissingValues) was applied because the Adult data contained instances with unknown values. This filter replaces the unknown values with the modes and means from the dataset’s training data. After replacing the missing values, the Discretize filter was applied with a bins parameter of 10, which was more suitable for the Adult dataset than the 5 used for the Iris dataset. Numerical attributes, such as age and education-num, were converted to nominal attributes of 10 partitions each. This procedure was enough to prepare the Adult dataset for classification using Decision Tree (J48), Naïve Bayes, and KNN (IBk), but the SVM (SMO in WEKA) classifier required further data preprocessing. Applying SMO to the entire Adult dataset left the program working nonstop for hours without signs of progress being made. This could be due to the dataset’s large size (+30,000 instances). Therefore, a different approach was taken to apply this classifier. The Adult data was reloaded and filters ReplaceMissingInstances and Discretize were applied again. However, the filter Resample (weak->filters->unsupervised->instance->Resample) was applied afterwards with a sampleSizePercent parameter of 40.0. This took a 40% sample of the Adult dataset and made that the new dataset used. This reduced the number of instances by 60% and still kept 40% of the same data. However, this was not enough. After several hours, WEKA’s SMO result after applying it to the Adult dataset resampled to 40% was the following error: Due to the previous error, the process was repeated again for the Adult dataset resampling it to 25% of the original, instead of 40%. This time the classifier worked correctly and gave the output results shown in 5.1.4, after approximately six hours. 2.3 Parameters for the Algorithms The classification algorithms to apply to the datasets are chosen from the Classify tab, just like filters are chosen from the Preprocess tab. The Decision Tree algorithm is selected by opening weka->classifiers->trees->J48. The J48 classifier performs the decision tree algorithm in WEKA. The Naïve Bayes algorithm is selected with weka->classifiers->bayes->NaiveBayes. The knearest neighbors algorithm (KNN) is done with weka->classifiers->lazy->IBk. Finally, SVM is done by the SMO classifier from weka->classifiers->functions->SMO. Some parameters can be changed for WEKA’s classifiers, in the same way as parameters can be changed in the filters used during data preprocessing. In order to compare the results later on, some classification options were kept the same for all datasets and all classifiers. For example, in the Classify tab the Cross-validation folds were kept at 10 for all tests, and only some parameters of the classifiers themselves were changed. If a change is not mentioned, assume that all other parameter values stayed with their standard values. Next, the parameter values used to obtain the outputs of section 5 are mentioned, and how different values affected the results is discussed as well. 2.3.1 Parameters for Iris The first classifier applied to the Iris dataset is J48, which uses a decision tree for classification. By double-clicking on the J48 textbox you get the window with the parameters for the classifier. The output of section 5.2.1 used a confidenceFactor parameter (used for pruning the tree) of 0.20 and the standard minNumObj (minimum number of instances per leaf) for J48, which is 2. Other tests with different values for confidenceFactor and minNumObj showed that, in general, raising these attributes did not produce significant changes in the percentage of misclassified instances. On the other hand, lowering the confidenceFactor to values below 0.01 usually gave four or five more misclassified instances than the other tests. The NaiveBayes classifier does not provide any parameters to modify, thus, it runs with the same preset functionality for all datasets. The NaiveBayes output for the Iris dataset is shown in 5.2.2. The KNN algorithm is performed by the classifier IBk in WEKA. The IBk output shown in 5.2.3 used KNN = 1, windowSize = 0, and all the other standard parameter values. In further testing, the values for KNN (k-nearest neighbors used) and windowSize were modified. The results showed that if windowSize does not equal 0—if it equals 0, unlimited instances are allowed in the training pool— then the misclassification percentage raised significantly, so it was left at zero for the printed output. The KNN value did not seem to affect results because using 5 and 10 for KNN gave very similar classification error rates. The SVM algorithm is done with the SMO classifier in WEKA. The SMO output for the Iris data is in section 5.2.4 and used the standard parameter values for the algorithm. Clicking on More in the parameters window for SMO leads us to information on the parameters, which specified that parameters c and toleranceParameter shouldn’t be changed, so those were kept the same throughout all the tests. The other modifiable parameters contained appropriate standard values as well. The only modification done, for further testing, was to change the filter type to Standardize Training Data, but the results obtained were very similar as the ones with the standard filter Normalize Training Data. 2.3.2 Parameters for Zoo The standard J48 parameters were used for the output shown in 5.3.1. A good observation is that a higher number for minNumObj gave smaller trees in the output, but it also raised the classification error percentage seen in 5.3.1. However, raising or lowering the value for confidenceFactor, did not affect the error percentage by a significant amount. NaiveBayes has no modifiable parameters. Its standard output for the Zoo dataset is shown in 5.3.2. IBk was run with KNN=1 to obtain the results shown in 5.3.3. Retrying the algorithm with higher values for KNN did not give a drastic change, but it did seem to show that a higher KNN produced a slightly higher error rate. For example, KNN = 6 gave 11 misclassified instances and KNN = 20 gave 16 misclassified instances. The standard SMO parameter values were used for the output in 5.3.4. 2.3.3 Parameters for Adult The output in 5.1.1 was obtained after running J48 with minNumObj equal to 30, confidenceFactor equal to 0.01, and the standard values for other parameters. This selection of parameter values gave a slightly better performance than the others (less misclassification rate). Raising the value for minNumObj and confidenceFactor gave similar output results, but raising minNumObj decreased the overall run-time. NaiveBayes’ standard output for the Adult dataset is shown in 5.1.2. This classifier has no modifiable parameters. The KNN, or IBk, output for the Adult dataset is shown in 5.1.3. The standard values, including KNN=1 and windowSize=0, were used for this output. However, it took a long time to produce the output (25 minutes), so for further testing IBk was applied to a 25% sample of the Adult dataset (Resample filter was applied to it beforehand) with the same parameter values. The error rate was very close to the initial one with the full dataset. Also, the 25% sample was run with KNN=40 and KNN=200, and the results were very similar, which means that in this case raising the number of k-nearest neighbors did not affect the obtained results. Finally, SMO’s output for the Adult dataset is in section 5.1.4. The standard parameter values were used for this output; however, this is the output produced from the 25% sample of the Adult dataset. Section 2.2.3 described the problems found while attempting to run this algorithm on the full dataset and on a 40% sample. This output should still be a fairly good representation of the general one since it is still one quarter of the original data. 2.4 Platform Information The results obtained in this project might be slightly different if the same experiment is replicated in another machine with more processing power. In particular, the running speed of WEKA’s classifiers might be faster in a better computer. The same tests could be performed using different machines and operating systems to compare their speed. The following specifications are from the machine used to obtain the results shown in section 5 of the report: OS Name: Microsoft Windows Vista Home Premium OS Version: 6.0.6002 Service Pack 2 Build 6002 System Model: Dell DM061 System Type: X86-based PC Processor: Intel(R) Core(TM)2 CPU 6300@ 1.86GHz, 1862 Mhz, 2 Core(s), 2 Logical Processors Installed Physical Memory (RAM): 3.00 GB 2.5 Classifier Performance Comparisons Let’s take a look at the performance of all four classifiers with respect to running times and classification error rates. As expected, all classifiers took more time to process the Adult dataset than the Iris or Zoo datasets. This is no surprise because the Adult dataset is the largest of all three by far. In addition, it is interesting to see that the J48 and SMO classifiers had slightly shorter run-times for Iris than for Zoo, despite the fact that Iris has 150 instances and Zoo has 101. This is probably due to the fact that the Zoo dataset has three times as many attributes and also some nominal attributes with more possible values than those in the Iris dataset. Therefore, instances in Zoo took a little more time to process than instances in Iris, which have just five attributes. The algorithm that took the longest to produce its output for all datasets was SVM (SMO classifier). In fact, it took so long to produce the output for the Adult dataset, that it had to be applied to a 25% sample of the dataset to obtain the results and it still took hours. SMO only took five seconds to classify the Iris data and 25 seconds to classify the Zoo data, but those times were still more than five times larger than the times other classifiers took to produce the results. Overall, the fastest classifier was NaïveBayes with two seconds as the longest it took to get the classification output, when it was applied to the Adult dataset. The time it took to get the results for Iris and Zoo was less than one second. In terms of accuracy, the most accurate classifier with the lowest misclassification percentages was SMO. This is possibly why it was a slower algorithm than the others. If we add the misclassification errors that J48, NaïveBayes, and IBk gave for the three dataset, we that their total is close to 30 for each algorithm, which leads us to believe that in terms of general accuracy these three are similar. However, if we compare individual results, some are better than others. For example, J48 had better accuracy and lower misclassification rate for the Iris set than the other classifiers, but it also did worse than all others for the Zoo dataset. This shows that some classifiers are better than others for specific datasets, which is the main topic covered in the conclusions section. The dataset with the worst accuracy results was the Adult dataset, possibly because it was the most complex one as well. The best accuracy results (lowest misclassification percentages) were obtained for the Zoo dataset using IBk and NaïveBayes. These two got a 4% error rate for the Zoo data. Overall, the best classifier in terms of speed was NaïveBayes and the slowest, but most accurate one, was SMO. The next section includes tables that show the results discussed here. 3. Experimental Results Summary WEKA uses several methods, such as TP Rate, FP Rate, Precision, Recall and F-Measure, to evaluate accuracy. Their results for each classifier are shown in the Reference section’s outputs. They appear near the end of each output, before the confusion matrix. I, however, decided to use a more simple and widely-used metric to evaluate accuracy, which is the standard way to compute accuracy—that is, dividing the number of instances that were correctly classified by the total number of instances in the dataset. Alternatively, one could also subtract the percentage of misclassified instances from 100 to obtain the same fraction of accuracy. The accuracy results, obtained using this method, are shown in tables of this section. The following accuracy comparison tables show each algorithm’s results with respect to the misclassification error rate and accuracy for each dataset. These results were briefly discussed in section 2.5: ACCURACY COMPARISON TABLE 1 Results Data sets Iris Zoo Adult Total Instances 150 101 32561 J48 Misclassified Instances 8 8 5519 Error Rate 5.3% 7.9% 16.9% Acc. 94.7% 92.1% 83.1% Total Instances 150 101 32561 Naïve Bayes Misclassified Instances 10 7 5885 Error Rate 6.7% 6.9% 18.1% SMO Misclassified Instances 9 4 1081* Error Rate 6.0% 4.0% 13.3%* Acc. 94.3% 93.1% 81.9% Acc. = Accuracy ACCURACY COMPARISON TABLE 2 Results Data sets Iris Zoo Adult Total Instances 150 101 32561 IBk Misclassified Instances 12 4 6078 Error Rate 8.0% 4.0% 18.7% Acc. 92.0% 96.0% 81.3% Total Instances 150 101 8140* Acc. = Accuracy * = Results based on the 25% sample from Adult dataset In addition, the running times recorded for the algorithms are shown next. The training time refers to the time it took WEKA to build the initial training model for the dataset. The testing time refers to the time each algorithm took to produce the output results, after building the training model. The results of the following table were already compared in section 2.5. Acc. 94.0% 96.0% 86.7%* RUNNING TIMES TABLE Datasets J48 Iris Zoo Adult 0.01 0.14 0.45 Training Time (seconds) Naïve IBk SMO Bayes 0 0 0.30 0 0 1.71 0.06 0.02 574.64* Testing Time (seconds) J48 Naïve IBk SMO Bayes ~0.3 ~0.3 0.5 5 0.4 ~0.3 ~0.3 28 15 2 1260 ~21600* (21min) (6hrs) ~ = Approximate time in seconds * = Results based on the 25% sample from Adult dataset 4. Conclusions After running several tests and obtaining the results for this project, it becomes more clear that some of the four classifiers used are more suitable for certain datasets and tasks than others. The decision tree classifier, J48, appears to be more suitable for datasets with few attributes and few values for each attribute. Such sets will produce a small and clear J48 pruned tree, as well as an easy-to-understand tree view with the Classifier Tree Visualizer. The fact that J48 is very good to classify datasets composed of mostly true/false or yes/no attributes (just two possible values for each attribute) becomes clear if the outputs in 5.1.1, 5.2.1, and 5.3.1 are compared. The J48 pruned tree and the visualized tree for the Zoo dataset are very clear because most of the Zoo attributes are boolean nominal attributes with just 0 and 1 as possible values. On the other hand, the visualized trees for Adult and Iris are a mess because they have either too many attributes or too many values per attribute. Their J48 pruned trees are not as clear as the one for the Zoo dataset either. The conclusion for J48 is that, despite the fact it is not the most accurate classifier out of the four, it is the best one to visualize classification of data for datasets composed of attributes with two possible values. The NaïveBayes classifier is very well suited to get detailed, well-organized, and easy-tounderstand statistics for large datasets, such as Adult, with attributes that may have many values. It does all the classification with respect to the “class” attribute, which is the last attribute for each instance. It gives general statistics (fraction of instances for each value of the class attribute) and specific statistics for each possible value, or partition, of every non-class attribute. This classifier is ideal for datasets were you want to see the number of instances that belong to each value of a class attribute based on the values for many other categories. This was the fastest classifier so it is also very good for large datasets. Its results and performance make it the best classifier out of the four for the Adult dataset, although it also gave very good results for the other two datasets as well. The KNN algorithm, performed by the IBk classifier, is good for datasets where the data instances could be classified into well-defined “clusters” of similar instances. In such cases, finding the best KNN (# of k-nearest neighbors to use) would greatly improve accuracy with this classifier. However, this is the least descriptive out of the four classifiers (no tables, no visualizers, no specific statistics on the results as in NaïveBayes) so it is not a great choice unless the user is not interested in detailed results and the dataset is expected to work well for IBk. Overall, IBk was the classifier with the least amount of useful information about the data. From the results of the SMO outputs and the general performance of all algorithms, it seems that SMO’s main advantage is that it had the best accuracy percentages; so, if classifying a dataset as accurately as possible is the main concern, SMO is the best choice out of the four classifiers used here. However, if descriptive and easy-to-understand statistics or a fast running time are important, then SMO should be avoided. It took longer than the other classifiers and its classification output is not as useful as that of NaïveBayes or J48. Overall, the best classifier for the Adult dataset was NaïveBayes. The best one for the Zoo dataset was either J48 or NaïveBayes. J48 provided helpful visual results and NaïveBayes gave useful statistics for Zoo data. The best one for the Iris dataset was NaïveBayes, the other three did not give as much useful output. In general, NaïveBayes was the most useful classifier in terms of output statistics and it was also the fastest. Each of the four classifiers is more suitable for different datasets though, so they are all useful. 5. Reference The following are the complete outputs obtained from WEKA’s sample runs for each classifier applied to each of the three datasets: 5.1 ADULT DATASET OUTPUTS 5.1.1 Decision Tree (J48) === Run information === Scheme: weka.classifiers.trees.J48 -C 0.01 -M 50 Relation: adult_data-weka.filters.unsupervised.attribute.ReplaceMissingValuesweka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last Instances: 32561 Attributes: 15 age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country annual_pay Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree -----------------marital-status = Married-civ-spouse | education-num = '(-inf-2.5]': <=50K (101.0/6.0) | education-num = '(2.5-4]': <=50K (531.0/50.0) | education-num = '(4-5.5]': <=50K (230.0/20.0) | education-num = '(5.5-7]': <=50K (703.0/99.0) | education-num = '(7-8.5]': <=50K (130.0/29.0) | education-num = '(8.5-10]' | | capital-gain = '(-inf-9999.9]' | | | capital-loss = '(-inf-435.6]': <=50K (7139.0/2383.0) | | | capital-loss = '(435.6-871.2]': <=50K (0.0) | | | capital-loss = '(871.2-1306.8]': <=50K (1.0) | | | capital-loss = '(1306.8-1742.4]': <=50K (120.0/22.0) | | | capital-loss = '(1742.4-2178]': >50K (244.0/47.0) | | | capital-loss = '(2178-2613.6]': <=50K (37.0/13.0) | | | capital-loss = '(2613.6-3049.2]': <=50K (0.0) | | | capital-loss = '(3049.2-3484.8]': <=50K (0.0) | | | capital-loss = '(3484.8-3920.4]': <=50K (0.0) | | | capital-loss = '(3920.4-inf)': <=50K (0.0) | | capital-gain = '(9999.9-19999.8]': >50K (82.0/4.0) | | capital-gain = '(19999.8-29999.7]': >50K (12.0/1.0) | | capital-gain = '(29999.7-39999.6]': <=50K (0.0) | | capital-gain = '(39999.6-49999.5]': <=50K (0.0) | | capital-gain = '(49999.5-59999.4]': <=50K (0.0) | | capital-gain = '(59999.4-69999.3]': <=50K (0.0) | | capital-gain = '(69999.3-79999.2]': <=50K (0.0) | | capital-gain = '(79999.2-89999.1]': <=50K (0.0) | | capital-gain = '(89999.1-inf)': >50K (28.0) | education-num = '(10-11.5]': <=50K (689.0/316.0) | education-num = '(11.5-13]': >50K (3228.0/1147.0) | education-num = '(13-14.5]': >50K (1003.0/229.0) | education-num = '(14.5-inf)': >50K (698.0/113.0) marital-status = Divorced | capital-gain = '(-inf-9999.9]': <=50K (4359.0/380.0) | capital-gain = '(9999.9-19999.8]': >50K (51.0) | capital-gain = '(19999.8-29999.7]': >50K (21.0) | capital-gain = '(29999.7-39999.6]': <=50K (1.0) | capital-gain = '(39999.6-49999.5]': <=50K (0.0) | capital-gain = '(49999.5-59999.4]': <=50K (0.0) | capital-gain = '(59999.4-69999.3]': <=50K (0.0) | capital-gain = '(69999.3-79999.2]': <=50K (0.0) | capital-gain = '(79999.2-89999.1]': <=50K (0.0) | capital-gain = '(89999.1-inf)': >50K (11.0) marital-status = Never-married | capital-gain = '(-inf-9999.9]': <=50K (10570.0/382.0) | capital-gain = '(9999.9-19999.8]': >50K (81.0) | capital-gain = '(19999.8-29999.7]': >50K (16.0) | capital-gain = '(29999.7-39999.6]': <=50K (4.0) | capital-gain = '(39999.6-49999.5]': <=50K (0.0) | capital-gain = '(49999.5-59999.4]': <=50K (0.0) | capital-gain = '(59999.4-69999.3]': <=50K (0.0) | capital-gain = '(69999.3-79999.2]': <=50K (0.0) | capital-gain = '(79999.2-89999.1]': <=50K (0.0) | capital-gain = '(89999.1-inf)': >50K (12.0) marital-status = Separated: <=50K (1025.0/66.0) marital-status = Widowed: <=50K (993.0/85.0) marital-status = Married-spouse-absent: <=50K (418.0/34.0) marital-status = Married-AF-spouse: <=50K (23.0/10.0) Number of Leaves : 52 Size of the tree : 58 Time taken to build model: 0.45 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 27042 Incorrectly Classified Instances 5519 Kappa statistic 0.4814 Mean absolute error 0.2443 Root mean squared error 0.3496 Relative absolute error 66.8124 % Root relative squared error 81.7692 % Total Number of Instances 32561 83.0503 % 16.9497 % === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.493 0.062 0.715 0.493 0.583 0.844 >50K 0.938 0.507 0.854 Weighted 0.831 0.4 0.82 Avg. === Confusion Matrix === 0.938 0.831 0.894 0.819 0.844 0.844 <=50K a b <-- classified as 3863 3978 | a = >50K 1541 23179 | b = <=50K WEKA’s Classifier Tree Visualizer 5.1.2 Naïve Bayes === Run information === Scheme: weka.classifiers.bayes.NaiveBayes Relation: adult_data-weka.filters.unsupervised.attribute.ReplaceMissingValuesweka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last Instances: 32561 Attributes: 15 age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country annual_pay Test mode: 10-fold cross-validation === Classifier model (full training set) === Naive Bayes Classifier Class Attribute >50K <=50K (0.24) (0.76) =============================================== age '(-inf-24.3]' 62.0 5510.0 '(24.3-31.6]' 805.0 5087.0 '(31.6-38.9]' 1678.0 4372.0 '(38.9-46.2]' 2230.0 3935.0 '(46.2-53.5]' 1596.0 2373.0 '(53.5-60.8]' 925.0 1668.0 '(60.8-68.1]' 418.0 1179.0 '(68.1-75.4]' 95.0 403.0 '(75.4-82.7]' 30.0 146.0 '(82.7-inf)' 12.0 57.0 [total] 7851.0 24730.0 workclass Private Self-emp-not-inc Self-emp-inc Federal-gov Local-gov State-gov Without-pay Never-worked [total] 5155.0 19379.0 725.0 1818.0 623.0 495.0 372.0 590.0 618.0 1477.0 354.0 946.0 1.0 15.0 1.0 8.0 7849.0 24728.0 fnlwgt '(-inf-159527]' 3231.0 9888.0 '(159527-306769]' 3621.0 11708.0 '(306769-454011]' 864.0 2636.0 '(454011-601253]' 100.0 379.0 '(601253-748495]' 22.0 82.0 '(748495-895737]' 5.0 17.0 '(895737-1042979]' 3.0 9.0 '(1042979-1190221]' 2.0 5.0 '(1190221-1337463]' 2.0 2.0 '(1337463-inf)' 1.0 4.0 [total] 7851.0 24730.0 education Bachelors Some-college 11th HS-grad Prof-school Assoc-acdm Assoc-voc 9th 7th-8th 12th Masters 1st-4th 10th Doctorate 5th-6th Preschool [total] 2222.0 3135.0 1388.0 5905.0 61.0 1116.0 1676.0 8827.0 424.0 154.0 266.0 803.0 362.0 1022.0 28.0 488.0 41.0 607.0 34.0 401.0 960.0 765.0 7.0 163.0 63.0 872.0 307.0 108.0 17.0 318.0 1.0 52.0 7857.0 24736.0 education-num '(-inf-2.5]' '(2.5-4]' '(4-5.5]' '(5.5-7]' '(7-8.5]' '(8.5-10]' '(10-11.5]' '(11.5-13]' '(13-14.5]' '(14.5-inf)' [total] 7.0 214.0 57.0 924.0 28.0 488.0 123.0 1987.0 34.0 401.0 3063.0 14731.0 362.0 1022.0 2487.0 3937.0 960.0 765.0 730.0 261.0 7851.0 24730.0 marital-status Married-civ-spouse 6693.0 8285.0 Divorced 464.0 3981.0 Never-married 492.0 10193.0 Separated 67.0 960.0 Widowed 86.0 909.0 Married-spouse-absent 35.0 385.0 Married-AF-spouse 11.0 14.0 [total] 7848.0 24727.0 occupation Tech-support 284.0 646.0 Craft-repair 930.0 3171.0 Other-service 138.0 3159.0 Sales 984.0 2668.0 Exec-managerial 1969.0 2099.0 Prof-specialty 2051.0 3934.0 Handlers-cleaners 87.0 1285.0 Machine-op-inspct 251.0 1753.0 Adm-clerical 508.0 3264.0 Farming-fishing 116.0 880.0 Transport-moving 321.0 1278.0 Priv-house-serv 2.0 149.0 Protective-serv 212.0 439.0 Armed-Forces 2.0 9.0 [total] 7855.0 24734.0 relationship Wife Own-child Husband Not-in-family Other-relative Unmarried [total] 746.0 824.0 68.0 5002.0 5919.0 7276.0 857.0 7450.0 38.0 945.0 219.0 3229.0 7847.0 24726.0 race White 7118.0 20700.0 Asian-Pac-Islander 277.0 764.0 Amer-Indian-Eskimo 37.0 276.0 Other 26.0 247.0 Black 388.0 2738.0 [total] 7846.0 24725.0 sex Female Male 1180.0 9593.0 6663.0 15129.0 [total] 7843.0 24722.0 capital-gain '(-inf-9999.9]' 7086.0 24707.0 '(9999.9-19999.8]' 512.0 7.0 '(19999.8-29999.7]' 87.0 2.0 '(29999.7-39999.6]' 1.0 6.0 '(39999.6-49999.5]' 1.0 3.0 '(49999.5-59999.4]' 1.0 1.0 '(59999.4-69999.3]' 1.0 1.0 '(69999.3-79999.2]' 1.0 1.0 '(79999.2-89999.1]' 1.0 1.0 '(89999.1-inf)' 160.0 1.0 [total] 7851.0 24730.0 capital-loss '(-inf-435.6]' '(435.6-871.2]' '(871.2-1306.8]' '(1306.8-1742.4]' '(1742.4-2178]' '(2178-2613.6]' '(2613.6-3049.2]' '(3049.2-3484.8]' '(3484.8-3920.4]' '(3920.4-inf)' [total] 7069.0 23986.0 3.0 16.0 1.0 22.0 57.0 406.0 581.0 200.0 123.0 86.0 13.0 3.0 1.0 1.0 2.0 6.0 1.0 4.0 7851.0 24730.0 hours-per-week '(-inf-10.8]' '(10.8-20.6]' '(20.6-30.4]' '(30.4-40.2]' '(40.2-50]' '(50-59.8]' '(59.8-69.6]' '(69.6-79.4]' '(79.4-89.2]' '(89.2-inf)' [total] 66.0 672.0 131.0 2063.0 159.0 2160.0 3633.0 14104.0 2353.0 3587.0 453.0 607.0 777.0 1021.0 157.0 293.0 80.0 124.0 42.0 99.0 7851.0 24730.0 native-country United-States Cambodia England Puerto-Rico 7318.0 22437.0 8.0 13.0 31.0 61.0 13.0 103.0 Canada 40.0 83.0 Germany 45.0 94.0 Outlying-US(Guam-USVI-etc) 1.0 15.0 India 41.0 61.0 Japan 25.0 39.0 Greece 9.0 22.0 South 17.0 65.0 China 21.0 56.0 Cuba 26.0 71.0 Iran 19.0 26.0 Honduras 2.0 13.0 Philippines 62.0 138.0 Italy 26.0 49.0 Poland 13.0 49.0 Jamaica 11.0 72.0 Vietnam 6.0 63.0 Mexico 34.0 611.0 Portugal 5.0 34.0 Ireland 6.0 20.0 France 13.0 18.0 Dominican-Republic 3.0 69.0 Laos 3.0 17.0 Ecuador 5.0 25.0 Taiwan 21.0 32.0 Haiti 5.0 41.0 Columbia 3.0 58.0 Hungary 4.0 11.0 Guatemala 4.0 62.0 Nicaragua 3.0 33.0 Scotland 4.0 10.0 Thailand 4.0 16.0 Yugoslavia 7.0 11.0 El-Salvador 10.0 98.0 Trinadad&Tobago 3.0 18.0 Peru 3.0 30.0 Hong 7.0 15.0 Holand-Netherlands 1.0 2.0 [total] 7882.0 24761.0 Time taken to build model: 0.06 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 26676 Incorrectly Classified Instances 5885 Kappa statistic 0.5507 Mean absolute error 0.1971 Root mean squared error 0.3631 Relative absolute error 53.909 % Root relative squared error 84.9288 % Total Number of Instances 32561 81.9262 % 18.0738 % === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.772 0.166 0.596 0.772 0.673 0.896 >50K 0.834 0.228 0.92 0.834 0.875 0.896 <=50K Weighted 0.819 0.213 0.842 0.819 0.826 0.896 Avg. === Confusion Matrix === a b <-- classified as 6050 1791 | a = >50K 4094 20626 | b = <=50K 5.1.3 KNN (IBk) === Run information === Scheme: weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\"" Relation: adult_data-weka.filters.unsupervised.attribute.ReplaceMissingValuesweka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last Instances: 32561 Attributes: 15 age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country annual_pay Test mode: 10-fold cross-validation === Classifier model (full training set) === IB1 instance-based classifier using 1 nearest neighbour(s) for classification Time taken to build model: 0.02 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 26483 Incorrectly Classified Instances 6078 Kappa statistic 0.4908 Mean absolute error 0.2106 Root mean squared error 0.3818 Relative absolute error 57.5896 % Root relative squared error 89.2849 % Total Number of Instances 32561 81.3335 % 18.6665 % === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.616 0.124 0.612 0.616 0.614 0.832 >50K 0.876 0.384 0.878 0.876 0.877 0.832 <=50K Weighted 0.813 0.321 0.814 0.813 0.814 0.832 Avg. === Confusion Matrix === a b <-- classified as 4832 3009 | a = >50K 3069 21651 | b = <=50K 5.1.4 SVM (SMO with 25% sample of Adult dataset) === Run information === Scheme: weka.classifiers.functions.SMO -C 1.0 -L 0.0010 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0" Relation: adult_data-weka.filters.supervised.attribute.Discretize-Rfirst-lastweka.filters.unsupervised.instance.Resample-S1-Z25.0weka.filters.unsupervised.attribute.ReplaceMissingValues Instances: 8140 Attributes: 15 age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country annual_pay Test mode: 10-fold cross-validation === Classifier model (full training set) === SMO Kernel used: Linear Kernel: K(x,y) = <x,y> Classifier for classes: >50K, <=50K BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + 0.1958 * (normalized) age='(-inf-21.5]' 0.6555 * (normalized) age='(21.5-23.5]' 0.4589 * (normalized) age='(23.5-27.5]' -0.0416 * (normalized) age='(27.5-29.5]' -0.1178 * (normalized) age='(29.5-35.5]' -0.3482 * (normalized) age='(35.5-43.5]' -0.4389 * (normalized) age='(43.5-54.5]' -0.4273 * (normalized) age='(54.5-61.5]' 0.0635 * (normalized) age='(61.5-inf)' -0.0508 * (normalized) workclass=Private 0.2203 * (normalized) workclass=Self-emp-not-inc + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -0.4643 * (normalized) workclass=Self-emp-inc -0.1311 * (normalized) workclass=Federal-gov -0.0108 * (normalized) workclass=Local-gov 0.1789 * (normalized) workclass=State-gov 0.2579 * (normalized) workclass=Never-worked -0.2787 * (normalized) education=Bachelors 0.181 * (normalized) education=Some-college -0.0973 * (normalized) education=11th 0.2712 * (normalized) education=HS-grad -0.411 * (normalized) education=Prof-school 0.199 * (normalized) education=Assoc-acdm -0.0063 * (normalized) education=Assoc-voc -0.1871 * (normalized) education=9th 0.1337 * (normalized) education=7th-8th 0 * (normalized) education=12th -0.3161 * (normalized) education=Masters 0.132 * (normalized) education=1st-4th -0.187 * (normalized) education=10th -0.3691 * (normalized) education=Doctorate 0.4092 * (normalized) education=5th-6th 0.5265 * (normalized) education=Preschool 0.7301 * (normalized) education-num='(-inf-8.5]' 0.2712 * (normalized) education-num='(8.5-9.5]' 0.181 * (normalized) education-num='(9.5-10.5]' 0.1926 * (normalized) education-num='(10.5-12.5]' -0.2787 * (normalized) education-num='(12.5-13.5]' -0.3161 * (normalized) education-num='(13.5-14.5]' -0.7801 * (normalized) education-num='(14.5-inf)' -0.4299 * (normalized) marital-status=Married-civ-spouse 0.2246 * (normalized) marital-status=Divorced 0.3066 * (normalized) marital-status=Never-married -0.0862 * (normalized) marital-status=Separated -0.1672 * (normalized) marital-status=Widowed 0.547 * (normalized) marital-status=Married-spouse-absent -0.3948 * (normalized) marital-status=Married-AF-spouse -0.4997 * (normalized) occupation=Tech-support 0.0927 * (normalized) occupation=Craft-repair 0.4868 * (normalized) occupation=Other-service -0.0874 * (normalized) occupation=Sales -0.501 * (normalized) occupation=Exec-managerial -0.1382 * (normalized) occupation=Prof-specialty 0.1826 * (normalized) occupation=Handlers-cleaners 0.1712 * (normalized) occupation=Machine-op-inspct -0.1376 * (normalized) occupation=Adm-clerical 0.3981 * (normalized) occupation=Farming-fishing 0.1706 * (normalized) occupation=Transport-moving + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.6423 * (normalized) occupation=Priv-house-serv -0.7803 * (normalized) occupation=Protective-serv 0 * (normalized) occupation=Armed-Forces -0.9709 * (normalized) relationship=Wife 0.4985 * (normalized) relationship=Own-child -0.3523 * (normalized) relationship=Husband 0.1661 * (normalized) relationship=Not-in-family 0.2024 * (normalized) relationship=Other-relative 0.4562 * (normalized) relationship=Unmarried -0.17 * (normalized) race=White 0.0623 * (normalized) race=Asian-Pac-Islander 0.5142 * (normalized) race=Amer-Indian-Eskimo -0.2474 * (normalized) race=Other -0.1591 * (normalized) race=Black -0.2996 * (normalized) sex 0.8433 * (normalized) capital-gain='(-inf-57]' 1.7478 * (normalized) capital-gain='(57-3048]' -1.2452 * (normalized) capital-gain='(3048-3120]' 2.5481 * (normalized) capital-gain='(3120-4243.5]' -1 * (normalized) capital-gain='(4243.5-4401]' 1.4045 * (normalized) capital-gain='(4401-4668.5]' -2 * (normalized) capital-gain='(4668.5-4826]' 1 * (normalized) capital-gain='(4826-4932.5]' -1.7386 * (normalized) capital-gain='(4932.5-4973.5]' 2.2733 * (normalized) capital-gain='(4973.5-5119]' -1.8125 * (normalized) capital-gain='(5119-5316.5]' 0.7985 * (normalized) capital-gain='(5316.5-5505.5]' -0.3352 * (normalized) capital-gain='(5505.5-6618.5]' 0.4184 * (normalized) capital-gain='(6618.5-7073.5]' -2.9024 * (normalized) capital-gain='(7073.5-inf)' 0.9743 * (normalized) capital-loss='(-inf-1551.5]' -1.1147 * (normalized) capital-loss='(1551.5-1568.5]' 1.8271 * (normalized) capital-loss='(1568.5-1820.5]' -1 * (normalized) capital-loss='(1820.5-1862]' 1.4065 * (normalized) capital-loss='(1862-1881.5]' -1.1672 * (normalized) capital-loss='(1881.5-1923]' 0.1524 * (normalized) capital-loss='(1923-1975.5]' -1 * (normalized) capital-loss='(1975.5-1978.5]' 1.4759 * (normalized) capital-loss='(1978.5-2168.5]' 1 * (normalized) capital-loss='(2176.5-2218.5]' 0 * (normalized) capital-loss='(2218.5-2384.5]' -1.6022 * (normalized) capital-loss='(2384.5-2450.5]' -1.0032 * (normalized) capital-loss='(2450.5-3726.5]' 0.0511 * (normalized) capital-loss='(3726.5-inf)' 0.5095 * (normalized) hours-per-week='(-inf-34.5]' 0.0488 * (normalized) hours-per-week='(34.5-39.5]' + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0491 * (normalized) hours-per-week='(39.5-41.5]' -0.1823 * (normalized) hours-per-week='(41.5-49.5]' -0.2586 * (normalized) hours-per-week='(49.5-65.5]' -0.1666 * (normalized) hours-per-week='(65.5-inf)' -0.0123 * (normalized) native-country=United-States 0 * (normalized) native-country=Cambodia -0.8794 * (normalized) native-country=England 0.0783 * (normalized) native-country=Puerto-Rico -0.2733 * (normalized) native-country=Canada -0.4264 * (normalized) native-country=Germany 0 * (normalized) native-country=Outlying-US(Guam-USVI-etc) -0.1786 * (normalized) native-country=India -0.244 * (normalized) native-country=Japan 0.2454 * (normalized) native-country=Greece 1.2475 * (normalized) native-country=South 0 * (normalized) native-country=China -0.0123 * (normalized) native-country=Cuba 0 * (normalized) native-country=Iran 0 * (normalized) native-country=Honduras -0.6894 * (normalized) native-country=Philippines -0.4222 * (normalized) native-country=Italy 0.2954 * (normalized) native-country=Poland 1 * (normalized) native-country=Jamaica 1 * (normalized) native-country=Vietnam -0.086 * (normalized) native-country=Mexico -0.2418 * (normalized) native-country=Portugal -0.0921 * (normalized) native-country=Ireland 0.7286 * (normalized) native-country=France 0.7175 * (normalized) native-country=Dominican-Republic 0 * (normalized) native-country=Laos 0.0002 * (normalized) native-country=Ecuador 0 * (normalized) native-country=Taiwan 0.0673 * (normalized) native-country=Haiti 0.2742 * (normalized) native-country=Columbia -1 * (normalized) native-country=Hungary 0.2587 * (normalized) native-country=Guatemala 0.3218 * (normalized) native-country=Nicaragua 0 * (normalized) native-country=Scotland -0.2457 * (normalized) native-country=Thailand 0 * (normalized) native-country=El-Salvador -1.6907 * (normalized) native-country=Trinadad&Tobago 0.5544 * (normalized) native-country=Peru -0.295 * (normalized) native-country=Hong 0.1615 Number of kernel evaluations: 79605087 (49.256% cached) Time taken to build model: 574.64 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 7059 Incorrectly Classified Instances 1081 Kappa statistic 0.5975 Mean absolute error 0.1328 Root mean squared error 0.3644 Relative absolute error 36.7839 % Root relative squared error 85.7762 % Total Number of Instances 8140 86.7199 % 13.2801 % === Detailed Accuracy By Class === TP Rate 0.595 0.952 Weighted 0.867 Avg. FP Rate 0.048 0.405 0.321 Precision 0.792 0.883 0.862 Recall 0.595 0.952 0.867 F-Measure ROC Area Class 0.679 0.773 >50K 0.916 0.773 <=50K 0.86 0.773 === Confusion Matrix === a b <-- classified as 1144 780 | a = >50K 301 5915 | b = <=50K 5.2 IRIS DATASET OUTPUTS 5.2.1 Decision Tree (J48) === Run information === Scheme: weka.classifiers.trees.J48 -C 0.1 -M 2 Relation: iris-weka.filters.unsupervised.attribute.Discretize-B5-M-1.0-Rfirst-last Instances: 150 Attributes: 5 sepal_length sepal_width petal_length petal_width class Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree -----------------petal_length = '(-inf-2.18]': Iris-setosa (50.0) petal_length = '(2.18-3.36]': Iris-versicolor (3.0) petal_length = '(3.36-4.54]': Iris-versicolor (34.0/1.0) petal_length = '(4.54-5.72]' | petal_width = '(-inf-0.58]': Iris-virginica (0.0) | petal_width = '(0.58-1.06]': Iris-virginica (0.0) | petal_width = '(1.06-1.54]': Iris-versicolor (13.0/3.0) | petal_width = '(1.54-2.02]': Iris-virginica (20.0/4.0) | petal_width = '(2.02-inf)': Iris-virginica (14.0) petal_length = '(5.72-inf)': Iris-virginica (16.0) Number of Leaves : 9 Size of the tree : 11 Time taken to build model: 0.01 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 142 Incorrectly Classified Instances 8 Kappa statistic 0.92 Mean absolute error 0.0607 Root mean squared error 0.1788 Relative absolute error 13.6478 % Root relative squared error 37.9371 % Total Number of Instances 150 94.6667 % 5.3333 % === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 Iris-setosa 0.92 0.04 0.92 0.92 0.92 0.971 Iris-versicolor 0.92 0.04 0.92 0.92 0.92 0.966 Iris-virginica Weighted 0.947 0.027 0.947 0.947 0.947 0.979 Avg. === Confusion Matrix === a b c <-- classified as 50 0 0 | a = Iris-setosa 0 46 4 | b = Iris-versicolor 0 4 46 | c = Iris-virginica WEKA’s Classifier Tree Visualizer 5.2.2 Naïve Bayes === Run information === Scheme: weka.classifiers.bayes.NaiveBayes Relation: iris-weka.filters.unsupervised.attribute.Discretize-B5-M-1.0-Rfirst-last Instances: 150 Attributes: 5 sepal_length sepal_width petal_length petal_width class Test mode: 10-fold cross-validation === Classifier model (full training set) === Naive Bayes Classifier Class Attribute Iris-setosa Iris-versicolor Iris-virginica (0.33) (0.33) (0.33) ================================================================== sepal_length '(-inf-5.02]' 29.0 4.0 2.0 '(5.02-5.74]' 22.0 19.0 3.0 '(5.74-6.46]' 2.0 21.0 22.0 '(6.46-7.18]' 1.0 10.0 16.0 '(7.18-inf)' 1.0 1.0 12.0 [total] 55.0 55.0 55.0 sepal_width '(-inf-2.48]' '(2.48-2.96]' '(2.96-3.44]' '(3.44-3.92]' '(3.92-inf)' [total] 2.0 2.0 28.0 18.0 5.0 55.0 10.0 26.0 17.0 1.0 1.0 55.0 2.0 21.0 27.0 4.0 1.0 55.0 petal_length '(-inf-2.18]' '(2.18-3.36]' '(3.36-4.54]' '(4.54-5.72]' '(5.72-inf)' [total] 51.0 1.0 1.0 1.0 1.0 55.0 1.0 4.0 34.0 15.0 1.0 55.0 1.0 1.0 2.0 34.0 17.0 55.0 petal_width '(-inf-0.58]' '(0.58-1.06]' '(1.06-1.54]' '(1.54-2.02]' '(2.02-inf)' [total] 50.0 2.0 1.0 1.0 1.0 55.0 1.0 8.0 39.0 6.0 1.0 55.0 1.0 1.0 4.0 25.0 24.0 55.0 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 140 Incorrectly Classified Instances 10 Kappa statistic 0.9 Mean absolute error 0.0629 Root mean squared error 0.2036 Relative absolute error 14.1535 % Root relative squared error 43.1989 % Total Number of Instances 150 93.3333 % 6.6667 % === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 Iris-setosa 0.92 0.06 0.885 0.92 0.902 0.974 Iris-versicolor 0.88 0.04 0.917 0.88 0.898 0.975 Iris-virginica Weighted 0.933 0.033 0.934 0.933 0.933 0.983 Avg. === Confusion Matrix === a b c <-- classified as 50 0 0 | a = Iris-setosa 0 46 4 | b = Iris-versicolor 0 6 44 | c = Iris-virginica 5.2.3 KNN (IBk) === Run information === Scheme: weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\"" Relation: iris-weka.filters.unsupervised.attribute.Discretize-B5-M-1.0-Rfirst-last Instances: 150 Attributes: 5 sepal_length sepal_width petal_length petal_width class Test mode: 10-fold cross-validation === Classifier model (full training set) === IB1 instance-based classifier using 1 nearest neighbour(s) for classification Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 138 Incorrectly Classified Instances 12 Kappa statistic 0.88 Mean absolute error 0.0596 Root mean squared error 0.1851 Relative absolute error 13.4013 % Root relative squared error 39.2653 % Total Number of Instances 150 92 8 % % === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 Iris-setosa 0.88 0.06 0.88 0.88 0.88 0.971 Iris-versicolor 0.88 0.06 0.88 0.88 0.88 0.975 Iris-virginica Weighted 0.92 0.04 0.92 0.92 0.92 0.982 Avg. === Confusion Matrix === a b c <-- classified as 50 0 0 | a = Iris-setosa 0 44 6 | b = Iris-versicolor 0 6 44 | c = Iris-virginica 5.2.4 SVM (SMO) === Run information === Scheme: weka.classifiers.functions.SMO -C 1.0 -L 0.0010 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0" Relation: iris-weka.filters.unsupervised.attribute.Discretize-B5-M-1.0-Rfirst-last Instances: 150 Attributes: 5 sepal_length sepal_width petal_length petal_width class Test mode: 10-fold cross-validation === Classifier model (full training set) === SMO Kernel used: Linear Kernel: K(x,y) = <x,y> Classifier for classes: Iris-setosa, Iris-versicolor BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + + + + + -0.1785 * (normalized) sepal_length='(-inf-5.02]' -0.0258 * (normalized) sepal_length='(5.02-5.74]' 0.1021 * (normalized) sepal_length='(5.74-6.46]' 0.1022 * (normalized) sepal_length='(6.46-7.18]' 0.2549 * (normalized) sepal_width='(-inf-2.48]' 0.1033 * (normalized) sepal_width='(2.48-2.96]' 0.1011 * (normalized) sepal_width='(2.96-3.44]' -0.4334 * (normalized) sepal_width='(3.44-3.92]' -0.026 * (normalized) sepal_width='(3.92-inf)' -0.9508 * (normalized) petal_length='(-inf-2.18]' 0.3609 * (normalized) petal_length='(2.18-3.36]' 0.3596 * (normalized) petal_length='(3.36-4.54]' 0.2302 * (normalized) petal_length='(4.54-5.72]' -0.5173 * (normalized) petal_width='(-inf-0.58]' 0.1705 * (normalized) petal_width='(0.58-1.06]' 0.1731 * (normalized) petal_width='(1.06-1.54]' 0.1737 * (normalized) petal_width='(1.54-2.02]' 0.3926 Number of kernel evaluations: 1314 (85.461% cached) Classifier for classes: Iris-setosa, Iris-virginica BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + + + + + + + -0.2116 * (normalized) sepal_length='(-inf-5.02]' -0.0979 * (normalized) sepal_length='(5.02-5.74]' 0.0647 * (normalized) sepal_length='(5.74-6.46]' 0.0641 * (normalized) sepal_length='(6.46-7.18]' 0.1806 * (normalized) sepal_length='(7.18-inf)' 0.1241 * (normalized) sepal_width='(-inf-2.48]' 0.1247 * (normalized) sepal_width='(2.48-2.96]' 0.0103 * (normalized) sepal_width='(2.96-3.44]' -0.106 * (normalized) sepal_width='(3.44-3.92]' -0.153 * (normalized) sepal_width='(3.92-inf)' -0.8027 * (normalized) petal_length='(-inf-2.18]' 0.3403 * (normalized) petal_length='(3.36-4.54]' 0.2313 * (normalized) petal_length='(4.54-5.72]' 0.2311 * (normalized) petal_length='(5.72-inf)' -0.5161 * (normalized) petal_width='(-inf-0.58]' -0.2866 * (normalized) petal_width='(0.58-1.06]' 0.1737 * (normalized) petal_width='(1.06-1.54]' 0.3403 * (normalized) petal_width='(1.54-2.02]' 0.2887 * (normalized) petal_width='(2.02-inf)' 0.4062 Number of kernel evaluations: 1677 (82.133% cached) Classifier for classes: Iris-versicolor, Iris-virginica BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + 0.1874 * (normalized) sepal_length='(-inf-5.02]' 0 * (normalized) sepal_length='(5.02-5.74]' -0.0937 * (normalized) sepal_length='(5.74-6.46]' -0.0937 * (normalized) sepal_length='(6.46-7.18]' 0 * (normalized) sepal_length='(7.18-inf)' 0.1874 * (normalized) sepal_width='(-inf-2.48]' -0.0937 * (normalized) sepal_width='(2.48-2.96]' -0.0938 * (normalized) sepal_width='(2.96-3.44]' 0 * (normalized) sepal_width='(3.44-3.92]' -0.4055 * (normalized) petal_length='(2.18-3.36]' -0.407 * (normalized) petal_length='(3.36-4.54]' 0.4065 * (normalized) petal_length='(4.54-5.72]' 0.406 * (normalized) petal_length='(5.72-inf)' -0.8126 * (normalized) petal_width='(0.58-1.06]' + + + - -1.0627 * (normalized) petal_width='(1.06-1.54]' 0.9378 * (normalized) petal_width='(1.54-2.02]' 0.9375 * (normalized) petal_width='(2.02-inf)' 0.1561 Number of kernel evaluations: 2285 (82.537% cached) Time taken to build model: 0.3 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 141 Incorrectly Classified Instances 9 Kappa statistic 0.91 Mean absolute error 0.2356 Root mean squared error 0.2956 Relative absolute error 53 % Root relative squared error 62.7163 % Total Number of Instances 150 94 % 6 % === Detailed Accuracy By Class === TP Rate FP Rate 1 0 0.9 0.04 0.92 0.05 Weighted 0.94 0.03 Avg. === Confusion Matrix === Precision Recall F-Measure ROC Area Class 1 1 1 1 Iris-setosa 0.918 0.9 0.909 0.934 Iris-versicolor 0.902 0.92 0.911 0.953 Iris-virginica 0.94 0.94 0.94 0.962 a b c <-- classified as 50 0 0 | a = Iris-setosa 0 45 5 | b = Iris-versicolor 0 4 46 | c = Iris-virginica 5.3 ZOO DATASET OUTPUTS 5.3.1 Decision Tree (J48) === Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: zoo-weka.filters.unsupervised.attribute.Remove-R1 Instances: 101 Attributes: 17 hair feathers eggs milk airborne aquatic predator toothed backbone breathes venomous fins legs tail domestic catsize type Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree -----------------feathers = 0 | milk = 0 | | backbone = 0 | | | airborne = 0 | | | | predator = 0 | | | | | legs = 0: 7 (2.0) | | | | | legs = 2: 6 (0.0) | | | | | legs = 4: 6 (0.0) | | | | | legs = 5: 6 (0.0) | | | | | legs = 6: 6 (2.0) | | | | | legs = 8: 6 (0.0) | | | | predator = 1: 7 (8.0) | | | airborne = 1: 6 (6.0) | | backbone = 1 | | | fins = 0 | | | | tail = 0: 5 (3.0) | | | | tail = 1: 3 (6.0/1.0) | | | fins = 1: 4 (13.0) | milk = 1: 1 (41.0) feathers = 1: 2 (20.0) Number of Leaves : 13 Size of the tree : 21 Time taken to build model: 0.14 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 93 92.0792 % Incorrectly Classified Instances 8 7.9208 % Kappa statistic 0.8955 Mean absolute error 0.0225 Root mean squared error 0.1375 Relative absolute error 10.2478 % Root relative squared error 41.6673 % Total Number of Instances 101 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 1 1 0 1 1 1 1 2 0.6 0.01 0.75 0.6 0.667 0.793 3 1 0.011 0.929 1 0.963 0.994 4 0.75 0 1 0.75 0.857 0.872 5 0.625 0.032 0.625 0.625 0.625 0.923 6 0.8 0.033 0.727 0.8 0.762 0.984 7 Weighted 0.921 0.008 0.922 0.921 0.92 0.976 Avg. === Confusion Matrix === a b c d e f g <-- classified as 41 0 0 0 0 0 0 | a = 1 0 20 0 0 0 0 0 | b = 2 0 0 3 1 0 1 0| c=3 0 0 0 13 0 0 0 | d = 4 0 0 1 0 3 0 0| e=5 0 0 0 0 0 5 3| f=6 0 0 0 0 0 2 8| g=7 WEKA’S Classifier Tree Visualizer 5.3.2 Naïve Bayes === Run information === Scheme: weka.classifiers.bayes.NaiveBayes Relation: zoo-weka.filters.unsupervised.attribute.Remove-R1 Instances: 101 Attributes: 17 hair feathers eggs milk airborne aquatic predator toothed backbone breathes venomous fins legs tail domestic catsize type Test mode: 10-fold cross-validation === Classifier model (full training set) === Naive Bayes Classifier Class Attribute 1 2 3 4 5 6 7 (0.39) (0.19) (0.06) (0.13) (0.05) (0.08) (0.1) ================================================================ hair 0 3.0 21.0 6.0 14.0 5.0 5.0 11.0 1 40.0 1.0 1.0 1.0 1.0 5.0 1.0 [total] 43.0 22.0 7.0 15.0 6.0 10.0 12.0 feathers 0 1 [total] 42.0 1.0 6.0 14.0 5.0 9.0 11.0 1.0 21.0 1.0 1.0 1.0 1.0 1.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 eggs 0 1 [total] 41.0 1.0 2.0 1.0 1.0 1.0 2.0 2.0 21.0 5.0 14.0 5.0 9.0 10.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 milk 0 1 [total] 1.0 21.0 6.0 14.0 5.0 9.0 11.0 42.0 1.0 1.0 1.0 1.0 1.0 1.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 airborne 0 1 [total] 40.0 5.0 6.0 14.0 5.0 3.0 11.0 3.0 17.0 1.0 1.0 1.0 7.0 1.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 aquatic 0 1 [total] 36.0 15.0 5.0 1.0 1.0 9.0 5.0 7.0 7.0 2.0 14.0 5.0 1.0 7.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 predator 0 20.0 12.0 2.0 5.0 2.0 8.0 3.0 1 [total] 23.0 10.0 5.0 10.0 4.0 2.0 9.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 toothed 0 1 [total] 2.0 21.0 2.0 1.0 1.0 9.0 11.0 41.0 1.0 5.0 14.0 5.0 1.0 1.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 backbone 0 1.0 1.0 1.0 1.0 1.0 9.0 11.0 1 42.0 21.0 6.0 14.0 5.0 1.0 1.0 [total] 43.0 22.0 7.0 15.0 6.0 10.0 12.0 breathes 0 1 [total] 1.0 1.0 2.0 14.0 1.0 1.0 8.0 42.0 21.0 5.0 1.0 5.0 9.0 4.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 venomous 0 42.0 21.0 4.0 13.0 4.0 7.0 9.0 1 1.0 1.0 3.0 2.0 2.0 3.0 3.0 [total] 43.0 22.0 7.0 15.0 6.0 10.0 12.0 fins 0 1 [total] 38.0 21.0 6.0 1.0 5.0 9.0 11.0 5.0 1.0 1.0 14.0 1.0 1.0 1.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 legs 0 2 4 5 6 8 [total] 4.0 1.0 4.0 14.0 1.0 1.0 5.0 8.0 21.0 1.0 1.0 1.0 1.0 1.0 32.0 1.0 3.0 1.0 5.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 9.0 3.0 1.0 1.0 1.0 1.0 1.0 1.0 3.0 47.0 26.0 11.0 19.0 10.0 14.0 16.0 tail 0 1 [total] 7.0 1.0 1.0 1.0 4.0 9.0 10.0 36.0 21.0 6.0 14.0 2.0 1.0 2.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 domestic 0 1 [total] 34.0 18.0 6.0 13.0 5.0 8.0 11.0 9.0 4.0 1.0 2.0 1.0 2.0 1.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 catsize 0 1 [total] 10.0 15.0 5.0 10.0 5.0 9.0 10.0 33.0 7.0 2.0 5.0 1.0 1.0 2.0 43.0 22.0 7.0 15.0 6.0 10.0 12.0 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 94 93.0693 % Incorrectly Classified Instances 7 6.9307 % Kappa statistic 0.9089 Mean absolute error 0.0203 Root mean squared error 0.1025 Relative absolute error 9.2616 % Root relative squared error 31.0791 % Total Number of Instances 101 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.976 0 1 0.976 0.988 1 1 1 0.012 0.952 1 0.976 1 2 0.6 0.021 0.6 0.6 0.6 0.983 3 1 0.023 0.867 1 0.929 1 4 0.75 0 1 0.75 0.857 1 5 1 0.022 0.8 1 0.889 1 6 0.7 0 1 0.7 0.824 0.998 7 Weighted 0.931 0.008 0.938 0.931 0.929 0.999 Avg. === Confusion Matrix === a b c d e f g <-- classified as 40 0 0 1 0 0 0 | a = 1 0 20 0 0 0 0 0 | b = 2 0 1 3 1 0 0 0| c=3 0 0 0 13 0 0 0 | d = 4 0 0 1 0 3 0 0| e=5 0 0 0 0 0 8 0| f=6 0 0 1 0 0 2 7| g=7 5.3.3 KNN (IBk) === Run information === Scheme: weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\"" Relation: zoo-weka.filters.unsupervised.attribute.Remove-R1 Instances: 101 Attributes: 17 hair feathers eggs milk airborne aquatic predator toothed backbone breathes venomous fins legs tail domestic catsize type Test mode: 10-fold cross-validation === Classifier model (full training set) === IB1 instance-based classifier using 1 nearest neighbour(s) for classification Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 97 Incorrectly Classified Instances 4 Kappa statistic 0.9477 Mean absolute error 0.0195 Root mean squared error 0.0941 96.0396 % 3.9604 % Relative absolute error Root relative squared error Total Number of Instances 8.894 % 28.5252 % 101 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 1 1 0.012 0.952 1 0.976 1 2 0.6 0.021 0.6 0.6 0.6 0.985 3 1 0.011 0.929 1 0.963 1 4 0.75 0 1 0.75 0.857 0.997 5 1 0 1 1 1 1 6 0.9 0 1 0.9 0.947 0.984 7 Weighted 0.96 0.005 0.962 0.96 0.96 0.998 Avg. === Confusion Matrix === a b c d e f g <-- classified as 41 0 0 0 0 0 0 | a = 1 0 20 0 0 0 0 0 | b = 2 0 1 3 1 0 0 0| c=3 0 0 0 13 0 0 0 | d = 4 0 0 1 0 3 0 0| e=5 0 0 0 0 0 8 0| f=6 0 0 1 0 0 0 9| g=7 5.3.4 SVM (SMO) === Run information === Scheme: weka.classifiers.functions.SMO -C 1.0 -L 0.0010 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0" Relation: zoo-weka.filters.unsupervised.attribute.Remove-R1 Instances: 101 Attributes: 17 hair feathers eggs milk airborne aquatic predator toothed backbone breathes venomous fins legs tail domestic catsize type Test mode: 10-fold cross-validation === Classifier model (full training set) === SMO Kernel used: Linear Kernel: K(x,y) = <x,y> Classifier for classes: 1, 2 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + + + + - -0.4407 * (normalized) hair 0.5084 * (normalized) feathers 0.2712 * (normalized) eggs -0.5084 * (normalized) milk 0.0011 * (normalized) airborne -0.0008 * (normalized) aquatic 0.0014 * (normalized) predator -0.2712 * (normalized) toothed 0 * (normalized) backbone 0 * (normalized) breathes -0.0678 * (normalized) fins -0.0678 * (normalized) legs=0 0.305 * (normalized) legs=2 -0.2372 * (normalized) legs=4 0.0012 * (normalized) tail 0 * (normalized) domestic 0.0011 * (normalized) catsize 0.0872 Number of kernel evaluations: 807 (78.808% cached) Classifier for classes: 1, 3 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + + + -0.7034 * (normalized) hair 0.3609 * (normalized) eggs -0.9743 * (normalized) milk -0.2835 * (normalized) aquatic -0.0387 * (normalized) predator -0.1161 * (normalized) toothed 0 * (normalized) backbone -0.3005 * (normalized) breathes 0.3005 * (normalized) venomous -0.271 * (normalized) fins 0.0295 * (normalized) legs=0 -0.0145 * (normalized) legs=2 -0.015 * (normalized) legs=4 0 * (normalized) tail -0.1549 * (normalized) catsize 1.1089 Number of kernel evaluations: 329 (80.839% cached) Classifier for classes: 1, 4 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + + -0.1175 * (normalized) hair 0.5885 * (normalized) eggs -0.706 * (normalized) milk 0 * (normalized) airborne 0 * (normalized) aquatic 0 * (normalized) predator 0.1175 * (normalized) toothed 0 * (normalized) backbone -0.706 * (normalized) breathes 0.1175 * (normalized) fins 0.1175 * (normalized) legs=0 0 * (normalized) legs=2 -0.1175 * (normalized) legs=4 0 * (normalized) tail -0.0003 * (normalized) catsize + 0.0594 Number of kernel evaluations: 269 (70.729% cached) Classifier for classes: 1, 5 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + - -0.5713 * (normalized) hair 0.4283 * (normalized) eggs -0.7142 * (normalized) milk 0.2855 * (normalized) aquatic 0.0009 * (normalized) predator 0.2858 * (normalized) toothed 0 * (normalized) backbone 0 * (normalized) breathes -0.1428 * (normalized) fins -0.1428 * (normalized) legs=0 0.1428 * (normalized) legs=4 -0.0014 * (normalized) tail -0.0009 * (normalized) domestic -0.4287 * (normalized) catsize 0.1427 Number of kernel evaluations: 145 (83.726% cached) Classifier for classes: 1, 6 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + -0.1153 * (normalized) hair 0.2588 * (normalized) eggs -0.3648 * (normalized) milk 0.1167 * (normalized) airborne -0.106 * (normalized) aquatic -0.106 * (normalized) predator -0.2588 * (normalized) toothed -0.3648 * (normalized) backbone 0 * (normalized) breathes 0.0548 * (normalized) venomous -0.149 * (normalized) legs=2 -0.2159 * (normalized) legs=4 + + + + + 0.3648 * (normalized) legs=6 -0.2388 * (normalized) tail -0.0551 * (normalized) domestic -0.1222 * (normalized) catsize 0.3755 Number of kernel evaluations: 245 (81.523% cached) Classifier for classes: 1, 7 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + + + + + + -0.3679 * (normalized) hair 0.1327 * (normalized) eggs -0.4727 * (normalized) milk -0.0004 * (normalized) airborne -0.0982 * (normalized) aquatic 0.0492 * (normalized) predator -0.2801 * (normalized) toothed -0.4727 * (normalized) backbone -0.1993 * (normalized) breathes 0.1474 * (normalized) venomous -0.1048 * (normalized) fins 0.0213 * (normalized) legs=0 -0.0511 * (normalized) legs=2 -0.1287 * (normalized) legs=4 0.1585 * (normalized) legs=8 -0.2012 * (normalized) tail -0.1242 * (normalized) domestic -0.2863 * (normalized) catsize 1.0448 Number of kernel evaluations: 179 (82.046% cached) Classifier for classes: 2, 3 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + -0.7497 * (normalized) feathers -0.0002 * (normalized) eggs -0.0019 * (normalized) airborne 0.0002 * (normalized) aquatic + + + + + + + + + -0.0019 * (normalized) predator 0.2501 * (normalized) toothed -0.0002 * (normalized) breathes 0.0002 * (normalized) venomous 0.2501 * (normalized) legs=0 -0.7497 * (normalized) legs=2 0.4997 * (normalized) legs=4 -0.0011 * (normalized) catsize 0.502 Number of kernel evaluations: 126 (89.367% cached) Classifier for classes: 2, 4 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + - -0.333 * (normalized) feathers 0 * (normalized) eggs -0.0027 * (normalized) airborne 0.0018 * (normalized) aquatic 0.0004 * (normalized) predator 0.333 * (normalized) toothed 0 * (normalized) backbone -0.333 * (normalized) breathes 0.333 * (normalized) fins 0.333 * (normalized) legs=0 -0.333 * (normalized) legs=2 0 * (normalized) tail -0.0011 * (normalized) catsize 0.0012 Number of kernel evaluations: 200 (91.431% cached) Classifier for classes: 2, 5 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + -0.4616 * (normalized) feathers 0 * (normalized) eggs -0.1546 * (normalized) airborne 0.1535 * (normalized) aquatic 0 * (normalized) predator + + + + + + + - 0.4616 * (normalized) toothed 0 * (normalized) backbone 0 * (normalized) breathes -0.4616 * (normalized) legs=2 0.4616 * (normalized) legs=4 0 * (normalized) tail -0.1535 * (normalized) catsize 0.0764 Number of kernel evaluations: 53 (72.959% cached) Classifier for classes: 2, 6 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + -0.4 0 0 0 0 -0.4 0 -0.4 0.4 -0.4 0.6 * (normalized) feathers * (normalized) eggs * (normalized) airborne * (normalized) aquatic * (normalized) predator * (normalized) backbone * (normalized) breathes * (normalized) legs=2 * (normalized) legs=6 * (normalized) tail Number of kernel evaluations: 104 (62.044% cached) Classifier for classes: 2, 7 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + -0.4886 * (normalized) feathers -0.1778 * (normalized) eggs -0.0891 * (normalized) airborne 0.0435 * (normalized) aquatic -0.0445 * (normalized) predator -0.4886 * (normalized) backbone -0.1338 * (normalized) breathes 0.1778 * (normalized) venomous 0.1771 * (normalized) legs=0 + + + + + + + + -0.4886 * (normalized) legs=2 0.0446 * (normalized) legs=4 0.0442 * (normalized) legs=5 0.045 * (normalized) legs=6 0.1778 * (normalized) legs=8 -0.3108 * (normalized) tail -0.0447 * (normalized) catsize 1.1337 Number of kernel evaluations: 231 (82.162% cached) Classifier for classes: 3, 4 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + - 0.6359 * (normalized) eggs 0.4552 * (normalized) aquatic 0.0019 * (normalized) predator 0.09 * (normalized) toothed 0 * (normalized) backbone -0.4552 * (normalized) breathes -0.273 * (normalized) venomous 1.0911 * (normalized) fins 0.09 * (normalized) legs=0 -0.09 * (normalized) legs=4 0 * (normalized) tail 0.273 * (normalized) catsize 1.3631 Number of kernel evaluations: 66 (84.793% cached) Classifier for classes: 3, 5 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + 0.3685 * (normalized) eggs 1.1579 * (normalized) aquatic 0.001 * (normalized) predator 0.1579 * (normalized) toothed 0 * (normalized) backbone 0.3685 * (normalized) breathes 0 * (normalized) venomous + + + + - -0.3685 * (normalized) legs=0 0.3685 * (normalized) legs=4 -0.5264 * (normalized) tail -0.1579 * (normalized) catsize 1.4219 Number of kernel evaluations: 44 (88.832% cached) Classifier for classes: 3, 6 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + 0 * (normalized) eggs 0.1075 * (normalized) airborne -0.1086 * (normalized) predator -0.2161 * (normalized) toothed -0.4866 * (normalized) backbone 0 * (normalized) breathes -0.0011 * (normalized) venomous -0.2161 * (normalized) legs=0 -0.2706 * (normalized) legs=4 0.4866 * (normalized) legs=6 -0.4866 * (normalized) tail -0.2706 * (normalized) catsize 0.5142 Number of kernel evaluations: 63 (85.246% cached) Classifier for classes: 3, 7 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + -0.0531 * (normalized) eggs -0.026 * (normalized) aquatic 0.0694 * (normalized) predator -0.5156 * (normalized) toothed -0.9434 * (normalized) backbone 0.026 * (normalized) breathes -0.0006 * (normalized) venomous -0.1571 * (normalized) legs=0 -0.1749 * (normalized) legs=4 0.332 * (normalized) legs=8 + + + -0.6114 * (normalized) tail -0.4278 * (normalized) catsize 1.185 Number of kernel evaluations: 110 (91.941% cached) Classifier for classes: 4, 5 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + 0 0 0 0 0 0.5 -0.5 -0.5 0.5 0 0 * (normalized) eggs * (normalized) aquatic * (normalized) predator * (normalized) toothed * (normalized) backbone * (normalized) breathes * (normalized) fins * (normalized) legs=0 * (normalized) legs=4 * (normalized) tail Number of kernel evaluations: 9 (30.769% cached) Classifier for classes: 4, 6 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + 0 * (normalized) eggs 0.0018 * (normalized) airborne -0.2498 * (normalized) aquatic -0.0018 * (normalized) predator -0.2498 * (normalized) toothed -0.2498 * (normalized) backbone 0.2498 * (normalized) breathes -0.2498 * (normalized) fins -0.2498 * (normalized) legs=0 0.2498 * (normalized) legs=6 -0.2498 * (normalized) tail -0.0005 * (normalized) domestic 0.4997 Number of kernel evaluations: 66 (87.687% cached) Classifier for classes: 4, 7 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + + + -0.0401 * (normalized) eggs -0.1207 * (normalized) aquatic 0 * (normalized) predator -0.48 * (normalized) toothed -0.48 * (normalized) backbone 0.0401 * (normalized) breathes 0.1205 * (normalized) venomous -0.48 * (normalized) fins -0.12 * (normalized) legs=0 0.0006 * (normalized) legs=6 0.1194 * (normalized) legs=8 -0.44 * (normalized) tail -0.1196 * (normalized) catsize 1.1605 Number of kernel evaluations: 226 (89.35% cached) Classifier for classes: 5, 6 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + 0.0017 * (normalized) airborne -0.4 * (normalized) aquatic -0.0017 * (normalized) predator -0.4 * (normalized) toothed -0.4 * (normalized) backbone -0.4 * (normalized) legs=4 0.4 * (normalized) legs=6 0.6008 Number of kernel evaluations: 22 (92.81% cached) Classifier for classes: 5, 7 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + -0.0659 * (normalized) eggs -0.1995 * (normalized) aquatic -0.0005 * (normalized) predator -0.7335 * (normalized) toothed -0.7335 * (normalized) backbone -0.534 * (normalized) breathes 0.0008 * (normalized) venomous 0.1336 * (normalized) legs=0 -0.1995 * (normalized) legs=4 0.0659 * (normalized) legs=8 0.0015 * (normalized) tail 1.4656 Number of kernel evaluations: 89 (91.127% cached) Classifier for classes: 6, 7 BinarySMO Machine linear: showing attribute weights, not support vectors. + + + + + + + + + + + 0 * (normalized) hair -0.1228 * (normalized) eggs -0.4496 * (normalized) airborne 0.7752 * (normalized) aquatic 0.4484 * (normalized) predator -0.7752 * (normalized) breathes 0.1228 * (normalized) venomous 0.9378 * (normalized) legs=0 -1.0606 * (normalized) legs=6 0.1228 * (normalized) legs=8 0.1228 * (normalized) tail 0.9593 Number of kernel evaluations: 52 (84.478% cached) Time taken to build model: 1.71 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances Incorrectly Classified Instances 97 4 96.0396 % 3.9604 % Kappa statistic Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances 0.9478 0.2048 0.3018 93.3993 % 91.4742 % 101 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 1 1 0 1 1 1 1 2 0.6 0.021 0.6 0.6 0.6 0.752 3 1 0.011 0.929 1 0.963 0.994 4 0.75 0.01 0.75 0.75 0.75 0.983 5 1 0 1 1 1 1 6 0.9 0 1 0.9 0.947 0.996 7 Weighted 0.96 0.003 0.961 0.96 0.96 0.986 Avg. === Confusion Matrix === a b c d e f g <-- classified as 41 0 0 0 0 0 0 | a = 1 0 20 0 0 0 0 0 | b = 2 0 0 3 1 1 0 0| c=3 0 0 0 13 0 0 0 | d = 4 0 0 1 0 3 0 0| e=5 0 0 0 0 0 8 0| f=6 0 0 1 0 0 0 9| g=7