t-test: wine

advertisement
Phase (4)
Evaluation models and knowledge
Index:
1) introduction:………………………………………..2
2) T-Test and confusion matrix:…………………2
3) Conclusion:…………………………………………. 8
3.1) the best technique: :…………………………. 8
3.2) the knowledge representation: .…………. 8
1
1) Introduction:
In this phase we present how to evaluate our work by using T-Test model and how
to chose the best model between two model work on in phase 2. then we wrote which
task are suitable for each dataset (White wine and Brest tissue). Then print about the
technique that gives us the best knowledge.
2) T-Test and confusion matrix:
2.1)White wine Dataset:
Analysis: using T-Test on White wine dataset to compare between K-NN and Rule
Induction at alpha equal to 0.05 see figure 2.1.1. The result in figure 2.1.2 show t
value which is equal 0.019 less than 2.145. So, the two model are not statistically
significantly. Then we can chose any model. See figure 2.1.4 and figure 2.1.5 where
the accuracy of Rule Induction is 74.54% and Decision tree is 72.85%. When you see
figure 2.1.3 the performance of T-Test is the performance of K-NN as a selected
model.
Figure 2.1.1: T-Test model between K-NN and Rule induction model on white wine
dataset
Figure 2.1.2: T-Test significance on white wine dataset
2
Figure 2.1.3: Confusing matrix by T-Test on white wine dataset
Figure 2.1.4: Confusing matrix by Rule induction on white wine dataset
Figure 2.1.5: Confusing matrix by K-NN on white wine dataset
3
Figure 2.1.6: Rule model by Rule induction on white wine dataset
Figure 2.1.7: K-NN classification by K-NN on white wine dataset
2.2) Brest Tissue dataset:
Analysis: using T-Test on Brest Tissue to compare between Naïve Bayes and
Decision Tree at alpha equal to 0.05 see figure 2.2.1. The result in figure 2.2.2 show t
value which is equal 0.282 less than 2.145. So, the two model are not statistically
significantly. Then you we can chose any model. See figure 2.2.4 and figure 2.2.5
where the accuracy of Naïve Bayes is 93.45% and Decision tree is 96.09%. When you
see figure 2.2.3 the performance of T-Test is the performance of Naïve Bayes as a
selected model.
4
Figure 2.2.1: T-Test model between Naïve Bayes and Decision Tree model on Brest
Tissue dataset
Figure 2.2.2: T-Test significance on Brest Tissue dataset
5
Figure 2.2.3: T-Test performance between two model [ Decision Tree and
Naïve Nayes ] on Brest Tissue dataset
Figure 2.2.4: Confusing matrix by Naïve Bayes on Brest Tissue dataset
Figure 2.2.5: Confusing matrix by Decision Tree on Brest Tissue dataset
6
Figure 2.2.6: Naïve Bayes model on Brest Tissue dataset
Figure 2.2.7: Decision Tree model on Brest Tissue dataset
7
3) conclusion:
3.1) For all tasks and techniques you used in your project, which
one is the best and why?
Whit wine:
The best technique on white wine dataset is classification like Rule induction and
K-NN to build a suitable model according find the accuracy for training and testing
and the other kind of classification is Decision Tree because the rule out of this model
is satisfy the pure rule and the correlated features .
Brest tissue:
Decision Tree is the best model in Brest tissue dataset because it has a high
accuracy and also give the related features and pure rule and the other task is the LOF
outlier detection because this technique is very useful to give a correct noise or
knowledge.
Note: maybe classification is useful in two dataset because we can specify a label
so, classification is better than clustering.
3.2) which is the best knowledge presentation is suitable for your
chosen method.
3.2.1) Whit wine:
We take the best knowledge from one dimension Histogram graph to show what is
the relation between each column and quality feature. And the Pie chart to know how
much row for each class and which is take the greatest and smallest ratio. For
example see figure 3.2.1.1, 3.2.1.2, 3.2.1.3 and 3.2.1.4. In figure 3.2.1.1 we can know
that class number six have maximum ratio of residual sugar, then class 5, and class 8
and 9 have the same ratio of residual sugar. But class 7 has the minimum ratio of
residual sugar. In figure 3.2.1.2 we represent the maximum ratio of alcohol found in
class 9 and the minimum ratio in class 4. In figure 3.2.1.3 and 3.2.1.4 represent the
number of instances in each quality class. In figure 3.2.1.5 we use Decision tree to
represent the relation between the attribute and we conclude from it the important
attribute then select only the related attribute to work on it. Finally we use data table
in figure 3.2.1.6 to know how much instances and attributes, attributes name, data
types, role, number of missing and the range of values in each attribute.
8
Figure 3.2.1.1: Bars histogram represent the relation between quality and residual
sugar
Figure 3.2.1.2: Bars histogram represent the relation between quality and alcohol
Figure 3.2.1.3: Ring graph represent the number of instances in each quality
9
Figure 3.2.1.4: Pie chart represent the number of instances in each quality
Figure 3.2.1.5: Decision tree represent the important attribute and the rule to classify
in each class (bad, good, very good and excellent)
Figure 3.2.1.6: data table represent how much instances and attributes, attributes
name, data types, role, number of missing and the range of values in each attribute.
10
3.2.2) Brest tissue:
The best knowledge we get in Brest tissue dataset from Decision tree and rule out
from it see figure 3.2.2.5 . And the important knowledge get from scattering the
outlier see figure 3.2.2.4 to show a noise data in row 103. We also use one dimension
chart like Pie chart and Bare chart to represent number of instances in each class and
which take the maximum ratio. See figure 3.2.2.1 and 3.2.2.2. And we use data table
in figure 3.2.2.3 to know how much instances and attributes, attributes name, data
types, role, number of missing and the range of values in each attribute.
Figure 3.2.2.1: Bars chart represent number of instances in each class
Figure 3.2.2.2: Pie chart represent number of instances in each class and the ratio for
each class by looking the area of colors for each one
11
Figure 3.2.2.3: data table represent how much instances and attributes, attributes
name, data types, role, number of missing and the range of values in each attribute.
Figure 3.2.2.4: the plot view of LOF outlier on Brest Tissue
Figure 3.2.2.5: Decision Tree model on Brest Tissue dataset and the rule out
from it.
12
Download