MS Word - Computer Science

advertisement
WEKA
CS 595
Knowledge Discovery and Datamining
Assignment # 1
Evaluation Report for WEKA
(Waikato Environment for Knowledge Analysis)
Presented By:
Manoj Wartikar
Sameer Sagade
Date:
14 March, 2000.
th
1 of 23
WEKA
Weka Machine Learning Project.
Machine Learning:
An exciting and potentially far-reaching development in contemporary
computer science is the invention and application of methods of Machine
Learning. These enable a computer program to automatically analyze a large
body of data and decide what information is most relevant. This crystallized
information can then be used to help people make decision faster and more
accurately.
One of the central problems of the information age is dealing with the
enormous explosion in the amount of raw information that is available. Machine
learning (ML) has the potential to sift through this mass of information and
convert it into knowledge that people can use. So far, however, it has been used
mainly on small problems under well-controlled conditions.
The aim of the Weka Project is to bring the technology out of the
laboratory and provide solutions that can make a difference to people. The
overall goal of this research programme is to build a state-of-the art facility for
development of techniques of ML.
Objectives:
The team at Waikato has incorporated several standard ML techniques
into software “Workbench” abbreviated WEKA (Waikato Environment for
Knowledge Analysis). With the use of WEKA, a specialist in a particular field is
able to use ML and derive useful knowledge from databases that are far too large
to be analyzed by hand. The main objectives of WEKA are to
 Make Machine Learning (ML) techniques generally available;
 Apply them to practical problems as in agriculture;
 Develop new machine learning algorithms;
 Design a theoretical framework for the field.
Documented Features:
The WEKA presents a collection of algorithms for solving real-world data
mining problems. The software is written in Java 2 and includes a uniform
interface to the standard techniques in machine learning. The following
techniques in Data mining are implemented in WEKA.
1. Attribute Selection.
2. Clustering.
3. Classifiers (both numeric and non-numeric).
4. Association Rules.
5. Filters.
6. Estimators.
2 of 23
WEKA
Out of these options, only Classifiers, association rules and Filters are
available as direct executables. All the remaining functions are available as
API’s. The data required by the software is in the “.Arff” format. Sample
databases are also provided with the software.
Features:
The WEKA package is comprised of a number of classes and
inheritances. We have to create an instance of any class to execute it. The
functionality of WEKA is classified based on the steps of Machine learning.
Classifiers:
The Classifiers class prints out a decision tree classifier for the dataset
given as input. Also A ten-fold cross-validation estimation of its performance is
also calculated. The Classifiers package implements the most common
techniques separately for categorical and numerical values
a) Classifiers for categorical prediction:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Weka.classifiers.IBk
Weka.classifiers.j48.J48
Weka.classifiers.j48.PART
Weka.classifiers.NaiveBayes
Weka.classifiers.OneR
Weka.classifiers.KernelDensity
Weka.classifiers.SMO
Weka.classifiers.Logistic
Weka.classifiers.AdaBoostM1
Weka.classifiers.LogitBoost
Weka.classifiers.DecisionStump
K-nearest neighbor learner
C4.5 decision trees
Rule learner
Naive Bayes with/without kernels
Holte's oner
Kernel density classifier
Support vector machines
Logistic regression
Adaboost
Logit boost
Decision stumps (for boosting)
3 of 23
WEKA
Sample Executions of the various categorical CLASSIFIER Algorithms:
K Nearest Neighbour Algorithm:
>java weka.classifiers.IBk -t data/iris.arff
IB1 instance-based classifier
using 1 nearest neighbour(s) for classification
=== Error on training data ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
150
0
0.0085
0.0091
150
100
%
0
%
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 50 0 | b = Iris-versicolor
0 0 50 | c = Iris-virginica
=== Stratified cross-validation ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
144
6
0.0356
0.1618
150
96
%
4
%
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 47 3 | b = Iris-versicolor
0 3 47 | c = Iris-virginica
4 of 23
WEKA
J48 Pruned Tree Algorithm:
>java weka.classifiers.j48.J48 -t data/iris.arff
J48 pruned tree
-----------------petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)
Number of Leaves :
Size of the tree :
5
9
=== Error on training data ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
147
3
0.0233
0.108
150
98
%
2
%
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 49 1 | b = Iris-versicolor
0 2 48 | c = Iris-virginica
=== Stratified cross-validation ===
Correctly Classified Instances
143
Incorrectly Classified Instances
7
Mean absolute error
0.0391
Root mean squared error
0.1707
Total Number of Instances
150
95.3333 %
4.6667 %
=== Confusion Matrix ===
a b c <-- classified as
49 1 0 | a = Iris-setosa
0 47 3 | b = Iris-versicolor
0 3 47 | c = Iris-virginica
5 of 23
WEKA
=== Error on training data ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
50 0 0 | a = Iris-setosa
0 48 2 | b = Iris-versicolor
0 4 46 | c = Iris-virginica
144
6
0.0324
0.1495
96
%
4
%
SMO (support vector machines) and logistic regression algorithms can
handle only two class data sets so are not evaluated.
AdaBoost, Logit Boost,Decision Stump are algorithms which boost the
performance of the two classifier algorithms. The boosted algorithms are run
inside these booster algorithms. These booster algorithms monitor the execution
and applies appropriate boosting patches to the them.
6 of 23
WEKA
b) Classifiers for numerical prediction:
1.
2.
3.
4.
5.
weka.classifiers.LinearRegression
weka.classifiers.m5.M5Prime
weka.classifiers.Ibk
weka.classifiers.LWR
weka.classifiers.RegressionByDiscretization
Linear regression
Model trees
K-nearest neighbor learner
Locally weighted regression
Uses categorical classifiers
Sample Executions of the various categorical CLASSIFIER Algorithms:
Linear Regression Model:
> java weka.classifiers.LinearRegression -t data/cpu.arff
Linear Regression Model
class =
-152.7641 * vendor=microdata,formation,prime,harris,dec,wang,perkinelmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,n
cr,basf,gould,siemens,nas,adviser,sperry,amdahl +
141.8644 * vendor=formation,prime,harris,dec,wang,perkinelmer,nixdorf,bti,sratus,dg,burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,n
cr,basf,gould,siemens,nas,adviser,sperry,amdahl +
-38.2268 *
vendor=burroughs,cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siem
ens,nas,adviser,sperry,amdahl +
39.4748 *
vendor=cambex,magnuson,honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,ad
viser,sperry,amdahl +
-39.5986 *
vendor=honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl
+
21.4119 *
vendor=ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,amdahl +
-41.2396 * vendor=gould,siemens,nas,adviser,sperry,amdahl +
32.0545 * vendor=siemens,nas,adviser,sperry,amdahl +
-113.6927 * vendor=adviser,sperry,amdahl +
176.5204 * vendor=sperry,amdahl +
-51.2583 * vendor=amdahl +
0.0616 * MYCT +
0.0171 * MMIN +
0.0054 * MMAX +
0.6654 * CACH +
-1.4159 * CHMIN +
1.5538 * CHMAX +
7 of 23
WEKA
-41.4854
=== Error on training data ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.963
28.4042
41.6084
32.5055 %
26.9508 %
209
=== Cross-validation ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.9328
35.014
55.6291
39.9885 %
35.9513 %
209
8 of 23
WEKA
Pruned Training Model Tree:
> java weka.classifiers.m5.M5Prime -t data/cpu.arff
Pruned training model tree:
MMAX <= 14000 : LM1 (141/4.18%)
MMAX > 14000 : LM2 (68/51.8%)
Models at the leaves:
Smoothed (complex):
LM1: class = 4.15
2.05vendor=honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,am
dahl
+ 5.43vendor=adviser,sperry,amdahl - 5.78vendor=amdahl
+ 0.00638MYCT + 0.00158MMIN + 0.00345MMAX + 0.552CACH
+ 1.14CHMIN + 0.0945CHMAX
LM2: class = -113
56.1vendor=honeywell,ipl,ibm,cdc,ncr,basf,gould,siemens,nas,adviser,sperry,am
dahl
+ 10.2vendor=adviser,sperry,amdahl - 10.9vendor=amdahl
+ 0.012MYCT + 0.0145MMIN + 0.0089MMAX + 0.808CACH +
1.29CHMAX
Number of Leaves : 2
=== Error on training data ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.9853
13.4072
26.3977
15.3431 %
17.0985 %
209
=== Cross-validation ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
0.9767
13.1239
33.4455
14.9884 %
9 of 23
WEKA
Root relative squared error
Total Number of Instances
21.6147 %
209
10 of 23
WEKA
K Nearest Neighbour classifier Algorithm:
> java weka.classifiers.IBk -t data/cpu.arff
IB1 instance-based classifier
using 1 nearest neighbour(s) for classification
=== Error on training data ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
1
0
0
0
%
0
%
209
=== Cross-validation ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.9475
20.8589
53.8162
23.8223 %
34.7797 %
209
11 of 23
WEKA
Locally Weighted Regression:
> java weka.classifiers.LWR -t data/cpu.arff
Locally weighted regression
===========================
Using linear weighting kernels
Using all neighbours
=== Error on training data ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.9967
8.9683
12.6133
10.2633 %
8.1699 %
209
=== Cross-validation ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.9808
14.9006
31.0836
17.0176 %
20.0884 %
209
12 of 23
WEKA
Regression by Descretization:
> java weka.classifiers.RegressionByDiscretization -t data/cpu.arff -W
weka.classifiers.Ibk
// Sub classifier is selected by categorical classification
Regression by discretization
Class attribute discretized into 10 values
Subclassifier: weka.classifiers.Ibk
IB1 instance-based classifier
using 1 nearest neighbour(s) for classification
=== Error on training data ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.9783
32.0353
35.6977
36.6609 %
23.1223 %
209
=== Cross-validation ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.9244
41.5572
64.7253
47.4612 %
41.8299 %
209
13 of 23
WEKA
Association rules:
Association rule mining finds interesting association or correlation
relationships among a large set of data items. With massive amounts of data
continuously being collected and stored in databases, many industries are
becoming interested in mining association rules from their databases. For
example, the discovery of interesting association relationships among huge
amounts of business transaction records can help catalog design, cross
marketing, loss-leader analysis, and other business decision making processes.
A typical example of association rule mining is market basket analysis.
This process analyzes customer-buying habits by finding associations between
the different items that customer’s place in their “shopping baskets". The
discovery of such associations can help retailers develop marketing strategies by
gaining insight into which items are frequently purchased together by customers.
For instance, if customers are buying milk, how likely are they to also buy bread
(and what kind of bread) on the same trip to the supermarket? Such information
can lead to increased sales.
The WEKA software efficiently produces association rules for the given
data set. The Apriori algorithm is used as the foundation of the package. It gives
all the itemsets and the subsequent frequent sets for the specified minimal
support and confidence.
A typical output of the Association package is :
Apriori Principle:
> java weka.associations.Apriori -t data/weather.nominal.arff -I yes
Apriori
=======
Minimum support: 0.2
Minimum confidence: 0.9
Number of cycles performed: 17
Generated sets of large itemsets:
Size of set of large itemsets L(1): 12
Large Itemsets L(1):
outlook=sunny 5
outlook=overcast 4
outlook=rainy 5
temperature=hot 4
temperature=mild 6
14 of 23
WEKA
temperature=cool 4
humidity=high 7
humidity=normal 7
windy=TRUE 6
windy=FALSE 8
play=yes 9
play=no 5
Size of set of large itemsets L(2): 47
Large Itemsets L(2):
outlook=sunny temperature=hot 2
outlook=sunny temperature=mild 2
outlook=sunny humidity=high 3
outlook=sunny humidity=normal 2
outlook=sunny windy=TRUE 2
outlook=sunny windy=FALSE 3
outlook=sunny play=yes 2
outlook=sunny play=no 3
outlook=overcast temperature=hot 2
outlook=overcast humidity=high 2
outlook=overcast humidity=normal 2
outlook=overcast windy=TRUE 2
outlook=overcast windy=FALSE 2
outlook=overcast play=yes 4
outlook=rainy temperature=mild 3
outlook=rainy temperature=cool 2
outlook=rainy humidity=high 2
outlook=rainy humidity=normal 3
outlook=rainy windy=TRUE 2
outlook=rainy windy=FALSE 3
outlook=rainy play=yes 3
outlook=rainy play=no 2
temperature=hot humidity=high 3
temperature=hot windy=FALSE 3
temperature=hot play=yes 2
temperature=hot play=no 2
temperature=mild humidity=high 4
temperature=mild humidity=normal 2
temperature=mild windy=TRUE 3
temperature=mild windy=FALSE 3
temperature=mild play=yes 4
temperature=mild play=no 2
temperature=cool humidity=normal 4
temperature=cool windy=TRUE 2
temperature=cool windy=FALSE 2
15 of 23
WEKA
temperature=cool play=yes 3
humidity=high windy=TRUE 3
humidity=high windy=FALSE 4
humidity=high play=yes 3
humidity=high play=no 4
humidity=normal windy=TRUE 3
humidity=normal windy=FALSE 4
humidity=normal play=yes 6
windy=TRUE play=yes 3
windy=TRUE play=no 3
windy=FALSE play=yes 6
windy=FALSE play=no 2
Size of set of large itemsets L(3): 39
Large Itemsets L(3):
outlook=sunny temperature=hot humidity=high 2
outlook=sunny temperature=hot play=no 2
outlook=sunny humidity=high windy=FALSE 2
outlook=sunny humidity=high play=no 3
outlook=sunny humidity=normal play=yes 2
outlook=sunny windy=FALSE play=no 2
outlook=overcast temperature=hot windy=FALSE 2
outlook=overcast temperature=hot play=yes 2
outlook=overcast humidity=high play=yes 2
outlook=overcast humidity=normal play=yes 2
outlook=overcast windy=TRUE play=yes 2
outlook=overcast windy=FALSE play=yes 2
outlook=rainy temperature=mild humidity=high 2
outlook=rainy temperature=mild windy=FALSE 2
outlook=rainy temperature=mild play=yes 2
outlook=rainy temperature=cool humidity=normal 2
outlook=rainy humidity=normal windy=FALSE 2
outlook=rainy humidity=normal play=yes 2
outlook=rainy windy=TRUE play=no 2
outlook=rainy windy=FALSE play=yes 3
temperature=hot humidity=high windy=FALSE 2
temperature=hot humidity=high play=no 2
temperature=hot windy=FALSE play=yes 2
temperature=mild humidity=high windy=TRUE 2
temperature=mild humidity=high windy=FALSE 2
temperature=mild humidity=high play=yes 2
temperature=mild humidity=high play=no 2
temperature=mild humidity=normal play=yes 2
temperature=mild windy=TRUE play=yes 2
temperature=mild windy=FALSE play=yes 2
16 of 23
WEKA
temperature=cool humidity=normal windy=TRUE 2
temperature=cool humidity=normal windy=FALSE 2
temperature=cool humidity=normal play=yes 3
temperature=cool windy=FALSE play=yes 2
humidity=high windy=TRUE play=no 2
humidity=high windy=FALSE play=yes 2
humidity=high windy=FALSE play=no 2
humidity=normal windy=TRUE play=yes 2
humidity=normal windy=FALSE play=yes 4
Size of set of large itemsets L(4): 6
Large Itemsets L(4):
outlook=sunny temperature=hot humidity=high play=no 2
outlook=sunny humidity=high windy=FALSE play=no 2
outlook=overcast temperature=hot windy=FALSE play=yes 2
outlook=rainy temperature=mild windy=FALSE play=yes 2
outlook=rainy humidity=normal windy=FALSE play=yes 2
temperature=cool humidity=normal windy=FALSE play=yes 2
Best rules found:
1. humidity=normal windy=FALSE 4 ==> play=yes 4 (1)
2. temperature=cool 4 ==> humidity=normal 4 (1)
3. outlook=overcast 4 ==> play=yes 4 (1)
4. temperature=cool play=yes 3 ==> humidity=normal 3 (1)
5. outlook=rainy windy=FALSE 3 ==> play=yes 3 (1)
6. outlook=rainy play=yes 3 ==> windy=FALSE 3 (1)
7. outlook=sunny humidity=high 3 ==> play=no 3 (1)
8. outlook=sunny play=no 3 ==> humidity=high 3 (1)
9. temperature=cool windy=FALSE 2 ==> humidity=normal play=yes 2 (1)
10. temperature=cool humidity=normal windy=FALSE 2 ==> play=yes 2 (1)
17 of 23
WEKA
Advantages, disadvantages and Future Upgradations:
The WEKA system has covered the entire machine learning (knowledge
discovery) process. Although an research project, the WEKA system has been
able to implement and evaluate a number of different Algorithms for different
steps in the machine learning process.
The output and the information provided by the package is sufficient for an
expert in machine learning and related topics. The results as displayed by the
system show a detailed description of the flow and the steps involved in the
entire machine learning process. The outputs provided by different algorithms are
easy to compare and hence make the analysis easier.
ARFF dataset is one of the most widely used data storage formats for
research databases, making this system easier for use in research oriented
projects.
This package provides and number of application program interfaces (API)
which help novice Dataminers build their systems using the ”core WEKA
system”.
Since the system provides a number of switches and options, we can
customize the output of the system to suit our needs.
First, major disadvantage is that the system is a Java based system and
requires Java Virtual Machine installed for its execution. Since the system is
entirely based on Command Line parameters and switches, it is difficult for an
amateur to use the system efficiently. A Textual interface and output makes it all
the more difficult to interpret and visualize the results.
Important results such as the pruned trees, hierarchy based outputs
cannot be displayed making it difficult to visualize the results.
Although a commonly used dataset, ARFF is the only format that the
WEKA system supports.
All the current version i.e. 3.0.1 has some bugs or disadvantages, the
developers are working on a better system and have come up with a new version
which has a graphical user interface making the system complete.
18 of 23
WEKA
Appendix
(Sample executions for other algorithms covered)
19 of 23
WEKA
PART Decision List Algorithm
>java weka.classifiers.j48.PART -t data/iris.arff
PART decision list
-----------------petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth <= 1.7 AND
petallength <= 4.9: Iris-versicolor (48.0/1.0)
: Iris-virginica (52.0/3.0)
Number of Rules : 3
=== Error on training data ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
146
4
0.0338
0.1301
150
97.3333 %
2.6667 %
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 47 3 | b = Iris-versicolor
0 1 49 | c = Iris-virginica
=== Stratified cross-validation ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
142
8
0.0454
0.1805
150
94.6667 %
5.3333 %
=== Confusion Matrix ===
a b c <-- classified as
49 1 0 | a = Iris-setosa
0 47 3 | b = Iris-versicolor
0 4 46 | c = Iris-virginica
20 of 23
WEKA
Naïve Bayes Classifier Algorithm:
> java weka.classifiers.NaiveBayes -t data/iris.arff
Naive Bayes Classifier
Class Iris-setosa: Prior probability = 0.33
sepallength: Normal Distribution. Mean = 4.9913 StandardDev = 0.355
WeightSum = 50 Precision = 0.10588235294117648
sepalwidth: Normal Distribution. Mean = 3.4015 StandardDev = 0.3925
WeightSum = 50 Precision = 0.10909090909090911
petallength: Normal Distribution. Mean = 1.4694 StandardDev = 0.1782
WeightSum = 50 Precision = 0.14047619047619048
petalwidth: Normal Distribution. Mean = 0.2743 StandardDev = 0.1096
WeightSum = 50 Precision = 0.11428571428571428
Class Iris-versicolor: Prior probability = 0.33
sepallength: Normal Distribution. Mean = 5.9379 StandardDev = 0.5042
WeightSum = 50 Precision = 0.10588235294117648
sepalwidth: Normal Distribution. Mean = 2.7687 StandardDev = 0.3038
WeightSum = 50 Precision = 0.10909090909090911
petallength: Normal Distribution. Mean = 4.2452 StandardDev = 0.4712
WeightSum = 50 Precision = 0.14047619047619048
petalwidth: Normal Distribution. Mean = 1.3097 StandardDev = 0.1915
WeightSum = 50 Precision = 0.11428571428571428
Class Iris-virginica: Prior probability = 0.33
sepallength: Normal Distribution. Mean = 6.5795 StandardDev = 0.6353
WeightSum = 50 Precision = 0.10588235294117648
sepalwidth: Normal Distribution. Mean = 2.9629 StandardDev = 0.3088
WeightSum = 50 Precision = 0.10909090909090911
petallength: Normal Distribution. Mean = 5.5516 StandardDev = 0.5529
WeightSum = 50 Precision = 0.14047619047619048
petalwidth: Normal Distribution. Mean = 2.0343 StandardDev = 0.2646
WeightSum = 50 Precision = 0.11428571428571428
21 of 23
WEKA
OneR Classifier Algorithm:
> java weka.classifiers.OneR -t data/iris.arff
petallength:
< 2.45 -> Iris-setosa
< 4.75 -> Iris-versicolor
>= 4.75
-> Iris-virginica
(143/150 instances correct)
=== Error on training data ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
143
7
0.0311
0.1764
150
95.3333 %
4.6667 %
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 44 6 | b = Iris-versicolor
0 1 49 | c = Iris-virginica
=== Stratified cross-validation ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
142
8
0.0356
0.1886
150
94.6667 %
5.3333 %
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 44 6 | b = Iris-versicolor
0 2 48 | c = Iris-virginica
22 of 23
WEKA
Kernel Density Algorithm:
> java weka.classifiers.KernelDensity -t data/iris.arff
Kernel Density Estimator
=== Error on training data ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
148
2
0.0313
0.0944
150
98.6667 %
1.3333 %
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 49 1 | b = Iris-versicolor
0 1 49 | c = Iris-virginica
=== Stratified cross-validation ===
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
Total Number of Instances
144
6
0.0466
0.1389
150
96
%
4
%
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 48 2 | b = Iris-versicolor
0 4 46 | c = Iris-virginica
23 of 23
Download