Phase (4) Evaluation models and knowledge Index: 1) introduction:………………………………………..2 2) T-Test and confusion matrix:…………………2 3) Conclusion:…………………………………………. 8 3.1) the best technique: :…………………………. 8 3.2) the knowledge representation: .…………. 8 1 1) Introduction: In this phase we present how to evaluate our work by using T-Test model and how to chose the best model between two model work on in phase 2. then we wrote which task are suitable for each dataset (White wine and Brest tissue). Then print about the technique that gives us the best knowledge. 2) T-Test and confusion matrix: 2.1)White wine Dataset: Analysis: using T-Test on White wine dataset to compare between K-NN and Rule Induction at alpha equal to 0.05 see figure 2.1.1. The result in figure 2.1.2 show t value which is equal 0.019 less than 2.145. So, the two model are not statistically significantly. Then we can chose any model. See figure 2.1.4 and figure 2.1.5 where the accuracy of Rule Induction is 74.54% and Decision tree is 72.85%. When you see figure 2.1.3 the performance of T-Test is the performance of K-NN as a selected model. Figure 2.1.1: T-Test model between K-NN and Rule induction model on white wine dataset Figure 2.1.2: T-Test significance on white wine dataset 2 Figure 2.1.3: Confusing matrix by T-Test on white wine dataset Figure 2.1.4: Confusing matrix by Rule induction on white wine dataset Figure 2.1.5: Confusing matrix by K-NN on white wine dataset 3 Figure 2.1.6: Rule model by Rule induction on white wine dataset Figure 2.1.7: K-NN classification by K-NN on white wine dataset 2.2) Brest Tissue dataset: Analysis: using T-Test on Brest Tissue to compare between Naïve Bayes and Decision Tree at alpha equal to 0.05 see figure 2.2.1. The result in figure 2.2.2 show t value which is equal 0.282 less than 2.145. So, the two model are not statistically significantly. Then you we can chose any model. See figure 2.2.4 and figure 2.2.5 where the accuracy of Naïve Bayes is 93.45% and Decision tree is 96.09%. When you see figure 2.2.3 the performance of T-Test is the performance of Naïve Bayes as a selected model. 4 Figure 2.2.1: T-Test model between Naïve Bayes and Decision Tree model on Brest Tissue dataset Figure 2.2.2: T-Test significance on Brest Tissue dataset 5 Figure 2.2.3: T-Test performance between two model [ Decision Tree and Naïve Nayes ] on Brest Tissue dataset Figure 2.2.4: Confusing matrix by Naïve Bayes on Brest Tissue dataset Figure 2.2.5: Confusing matrix by Decision Tree on Brest Tissue dataset 6 Figure 2.2.6: Naïve Bayes model on Brest Tissue dataset Figure 2.2.7: Decision Tree model on Brest Tissue dataset 7 3) conclusion: 3.1) For all tasks and techniques you used in your project, which one is the best and why? Whit wine: The best technique on white wine dataset is classification like Rule induction and K-NN to build a suitable model according find the accuracy for training and testing and the other kind of classification is Decision Tree because the rule out of this model is satisfy the pure rule and the correlated features . Brest tissue: Decision Tree is the best model in Brest tissue dataset because it has a high accuracy and also give the related features and pure rule and the other task is the LOF outlier detection because this technique is very useful to give a correct noise or knowledge. Note: maybe classification is useful in two dataset because we can specify a label so, classification is better than clustering. 3.2) which is the best knowledge presentation is suitable for your chosen method. 3.2.1) Whit wine: We take the best knowledge from one dimension Histogram graph to show what is the relation between each column and quality feature. And the Pie chart to know how much row for each class and which is take the greatest and smallest ratio. For example see figure 3.2.1.1, 3.2.1.2, 3.2.1.3 and 3.2.1.4. In figure 3.2.1.1 we can know that class number six have maximum ratio of residual sugar, then class 5, and class 8 and 9 have the same ratio of residual sugar. But class 7 has the minimum ratio of residual sugar. In figure 3.2.1.2 we represent the maximum ratio of alcohol found in class 9 and the minimum ratio in class 4. In figure 3.2.1.3 and 3.2.1.4 represent the number of instances in each quality class. In figure 3.2.1.5 we use Decision tree to represent the relation between the attribute and we conclude from it the important attribute then select only the related attribute to work on it. Finally we use data table in figure 3.2.1.6 to know how much instances and attributes, attributes name, data types, role, number of missing and the range of values in each attribute. 8 Figure 3.2.1.1: Bars histogram represent the relation between quality and residual sugar Figure 3.2.1.2: Bars histogram represent the relation between quality and alcohol Figure 3.2.1.3: Ring graph represent the number of instances in each quality 9 Figure 3.2.1.4: Pie chart represent the number of instances in each quality Figure 3.2.1.5: Decision tree represent the important attribute and the rule to classify in each class (bad, good, very good and excellent) Figure 3.2.1.6: data table represent how much instances and attributes, attributes name, data types, role, number of missing and the range of values in each attribute. 10 3.2.2) Brest tissue: The best knowledge we get in Brest tissue dataset from Decision tree and rule out from it see figure 3.2.2.5 . And the important knowledge get from scattering the outlier see figure 3.2.2.4 to show a noise data in row 103. We also use one dimension chart like Pie chart and Bare chart to represent number of instances in each class and which take the maximum ratio. See figure 3.2.2.1 and 3.2.2.2. And we use data table in figure 3.2.2.3 to know how much instances and attributes, attributes name, data types, role, number of missing and the range of values in each attribute. Figure 3.2.2.1: Bars chart represent number of instances in each class Figure 3.2.2.2: Pie chart represent number of instances in each class and the ratio for each class by looking the area of colors for each one 11 Figure 3.2.2.3: data table represent how much instances and attributes, attributes name, data types, role, number of missing and the range of values in each attribute. Figure 3.2.2.4: the plot view of LOF outlier on Brest Tissue Figure 3.2.2.5: Decision Tree model on Brest Tissue dataset and the rule out from it. 12