FUZZY DECISION TREES (FDT) DECISION MAKING SUPPORT SYSTEM BASED ON FDT prof. Ing. Vitaly Levashenko, PhD. Department of Informatics, University of Žilina (Slovakia) DECISION SUPPORT SYSTEMS Decision Support Systems are a specific class of computer-based information systems that support your decision-making activities. A decision support system analyzes data and provide interactive information support to professionals during the decision-making process. Decision making implies selection of the best decision from a set of possible options. In some cases, this selection is based on past experience. Past experience is used to analyze the situations and the choice made in these situations. 2 DECISION MAKING BY ANALYSIS OF PREVIOUS SITUATIONS Outlook Temperature Humidity Windy ? Play Our goal is building a model for the recognition of the new situation: Outlook Temperature Humidity Windy sunny cool high true Play ? 3 SEVERAL MODELS FOR SOLVING THIS k-nearest neighbors (k-NN) Regression Models a Support Vector Machine Naïve-Bayes Classifications Models Neural Networks Decision Trees TASK 4 DECISION TREES (1). INTRODUCTION Decision Tree is a flow-chart like structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label. sunny Outlook rain overcast Humidity normal yes Windy high no yes false yes true no 5 MVL VS. FUZZY. cool MVL CLUSTERING hot mild 2 1 0 100 FUZZY 200 0 t, C 1,0 cool is 0,7 0,5 mild is 0,3 80 100 200 250 Linguistic attribute A2 (Temperature) Numeric attribute A2 А2,1 (cool) x1 = 8 x2 = 10 x3 = 15 x4 = 20 x5 = 23 x6 = 25 0.7 0.5 0.0 0.0 0.0 0.0 А2,2 (mild) А2,3 (hot) 0.3 0.5 1.0 0.5 0.8 0.0 0.0 0.0 0.0 0.5 0.2 1.0 0 t, C An object x can belong simultaneously to more than one class and do so to varying degrees called memberships H.-M. Lee, C.M. Chen, etc, An Efficient Fuzzy Classifier with Feature Selection Based on Fuzzy Entropy, Journal of IEEE Trans. on Systems, Man and Cybernetics, Part B, vol.31(3), 2001, pp. 426-432 6 FUZZY DATA * N Attribute А1 А11 Cost A1 A2 A3 A4 ? B 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. А12 А13 Cost (A1)=2.5 0.9 0.8 0.0 0.2 0.0 0.0 0.0 0.0 1.0 0.9 0.7 0.2 0.9 0.0 0.0 1.0 0.1 0.2 0.7 0.7 0.1 0.7 0.3 1.0 0.0 0.1 0.3 0.6 0.1 0.9 0.0 0.0 0.0 0.0 0.3 0.1 0.9 0.3 0.7 0.0 0.0 0.0 0.0 0.2 0.0 0.1 1.0 0.0 Attribute А2 А21 А22 А23 Cost (A2)=2.0 1.0 0.6 0.8 0.3 0.7 0.0 0. 0 0.0 1.0 0.0 1.0 0.0 0.2 0.0 0.0 0.5 0.0 0.4 0.2 0.7 0.3 0.3 0.0 0.2 0.0 0.3 0.0 1.0 0.8 0.9 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.7 1.0 0.8 0.0 0.7 0.0 0.0 0.0 0.1 1.0 0.0 AttributeА3 Attribute А4 А31 А32 А41 А42 Output B B1 B2 B3 0.0 0.6 0.3 0.9 0.0 0.2 0.0 0.7 0.2 0.0 0.3 0.7 0.0 0.0 0.0 0.5 ? 0.8 0.4 0.6 0.1 0.0 0.0 0.0 0.0 0.8 0.3 0.7 0.2 0.0 0.0 0.0 0.5 ? 0.2 0.0 0.1 0.0 1.0 0.8 1.0 0.3 0.0 0.7 0.0 0.1 1.0 1.0 1.0 0.0 ? Cost(A3)=1.7 Cost(A4)=1.8 0.8 0.0 0.1 0.2 0.5 0.7 0.0 0.2 0.6 0.0 1.0 0.3 0.1 0.1 1.0 0.0 0.2 1.0 0.9 0.8 0.5 0.3 1.0 0.8 0.4 1.0 0.0 0.7 0.9 0.9 0.0 1.0 0.4 0.0 0.2 0.3 0.5 0.4 0.1 0.0 0.7 0.9 0.2 0.3 1.0 0.7 0.8 0.0 0.6 1.0 0.8 0.7 0.5 0.6 0.9 1.0 0.3 0.1 0.8 0.7 0.0 0.3 0.2 1.0 *Y. Yuan, M.J. Shaw, Induction of Fuzzy Decision Trees, Fuzzy Sets and Systems, 69, 1995, pp.125-139 Measuring the value of each input attribute requires resource costs (money or time): Cost (A1), Cost (A2), Cost (A3), Cost (A4) . Our goal is find a method for transform values of input attributes into the value of output attribute with minimal resources: sum Cost (Ai) → minimum 7 ALGORITHMS ID3 AND C4.5 (BY PROF. ROSS QUINLAN) We compared the information gain and classical concepts of information theory (information and entropy). These mathematical expression are similar and common. Algorithm ID3 Prof. Ross Quinlan Gain(A) = mb l 1 N bl N log2 N bl N ma j 1 Naj N mb Information theory N bl l 1 aj Naj log2 N bl aj Naj I(B;A) = H(B) – H(B|A) (Abs.) GainRatio(A) = Gain (A) / SplitInfo(A), C4.5 ma SplitInfo(A) = j 1 Naj N log2 Naj N . s(Ai|B) = I(B;A) / H(A) (Rel.) 8 REVIEW OF INFORMATION ESTIMATION Entropy Calculation mi H pi log 2 pi Another Entropies: Hybrid, Yager, Koufmann, Kosko, ... i 1 N mi i , j i 1 N j 1 H.Ichihashi, 1996 H-M.Lee, 2001 N log 2 j 1 i, j N 1 mi N i, j log 2 i, j 1 i, j log 2 1 i, j N i 1 j 1 A.de Luca and S.Termini, 1972 Y.Yuan and M.Shaw, 1995 X.Wang etc, 2000 9 NEW CUMULATIVE INFORMATION ESTIMATIONS We have proposed new cumulative information estimations Personal Information I(Ai1,j1) Entropy H(Ai1) Joint Conditional I(Ai2,j2 ,Ai1,j1) I(Ai2,j2 |Ai1,j1) H(Ai2 ,Ai1) H(Ai2|Ai1) Mutual I(Ai2,j2 ; Ai1,j1) I(Ai2 ;Ai1) Levashenko V., Zaitseva E. Usage of new information estimations for induction of fuzzy decision trees. Intelligent Data Engineering and Automated Learning, Lecture Notes in Computer Science, 2412, 2002, 493-499 10 NEW CRITERIA OF CHOICE EXPANDED ATTRIBUTES • Unordered FDT Ordered FDT • • Stable FDT I(B; Ai1,j1,…,Aiq-1, j q-1, Aiq) Cost (Aiq) max I(B; Ai1,…,Aiq-1, Aiq) Cost (Aiq) max I(Aiq; B, Ai1,…,Aiq-1) Cost (Aiq) max etc. Levashenko V., Zaitseva E., Fuzzy Decision Trees in Medical Decision Making Support System, Proc. of the IEEE Fed. Conf. on Computer Science and Information Systems, 2012, 213-219 11 UNORDERED FUZZY DECISION TREES H(B)=24,684 • I(B; Ai1,j1,…,Aiq-1, j q-1, Aiq) Cost (Aiq) max f = 1.000 H(B| A2)=20,932 I(B; A2) = 3,752 B1=26,2% B2=64,4% B3= 9,4% f =0,239 = 0.16 = 0.75 B1 = 0.275 B2 = 0.275 B3 = 0.450 A2 H(B | A2,1)=8,820 H(B| A2,2)=8,201 H(B| A2,3)=3,911 B1=26,6% B2=54,1% B3=19,3% B1=37,1% B2=15,9% B3=47,0% B1 =16,3% B2 = 4,9% B3=78,8% f= 0,381 f = 0,350 A1 A4 B1=37,6% B2=50,4% B3=12,0% f=0,086 B1=11,0% B2=16,3% B3=72,7% f=0,056 B1=17,2% B2= 7,4% B3=75,4% f=0,157 A4 f = 0,269 B1=53,4% B2=22,9% B3=23,7% f =0,193 A1 B1=14,2% B2=67,8% B3=18,0% B1=33,2% B2=62,3% B3= 4,5% B1=57,9% B2=39,1% B3= 3,0% B1=55,1% B2=14,2% B3=30,7% B1= 36,9% B2= 13,6% B3=49,5% f =0,088 f =0,151 f =0,068 f =0,096 f =0,028 12 FUZZY DECISION RULES (1). A PRIORI • Fuzzy Decision Rule is path from root to leaf: If (A2 is A2 ,3 ) then B (with degree of truth [0.169 0.049 0.788]) If (A2 is A2 ,2 ) and (A4 is A4 ,1 ) then B is B3 (with degree of truth 0.754) ….. Input attribute А3 have not influence to attribute B (for given thresholds = 0,16 и = 0,75). A2 A1 A4 B2=64,4% B1=50,4% B3=72,7% f=0,086 f=0,056 B3=75,4% B1=53,4% f=0,157 A1 A4 B1=67,8% B1=62,3% f =0,088 B3=78,8% f =0,151 B2=57,9% f =0,068 B2=55,1% B3=49,5% f =0,096 f =0,028 13 FUZZY DECISION RULES (2). A POSTERIORI • Fuzzy Decision Rule is path from root to leaf • One example describes by several Fuzzy Decision Rules A1 A2 A3 A4 A1,1 A1,2 A1,3 A2,1 A2,2 A2,3 A3,1 A3,2 A4,1 A4,2 0.9 0.1 0.0 1.0 0.0 0.0 0.8 0.2 0.4 0.6 1.0 0.0 A2 0.0 0.9 0.0 A1 B1 =16,3% B2 = 4,9% B3=78,8% A4 0.1 0.4 B1=14,2% B2=67,8% B3=18,0% W1 = 0.36 B1=14,2% B2=67,8% B3=18,0% B1=37,6% A4 0.6 B2=50,4% B3=12,0% B1=33,2% B2=62,3% B3= 4,5% B1=11,0% B2=16,3% B3=72,7% B1=17,2% B2= 7,4% B3=75,4% A1 W3=(A2,1A1,2)=(1.00.1)=0.10 W2=(A2,1A1,1A4,2)=(1.00.90.6)=0.54 B1=33,2% 0.36 + B2=62,3% 0.54 + B3= 4,5% B1=57,9% B2=39,1% B3= 3,0% B1=37,6% B2=50,4% B3=12,0% B1=55,1% B2=14,2% B3=30,7% B1= 36,9% B2= 13,6% B3=49,5% B =26,8% 0.10 = B1=63,1% 2 B3=10,1% 14 DECISION TABLES A2 B2 B2 B3 A1 A4 B1 B3 B1 B3 A1 A4 B1 B3 B1 B2 B2 B3 Truth table vector column: Х = [1111 2020 2222 1111 2020 2222 A1 A2 A3 A4 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 1 0 1 1 2 0 0 0 1 1 1 0 1 1 1 0 1 0 2 0 0 0 0 2 2 2 0 0 1 0 1 0 2 2 2 0 1 1 2 0 0 1 0 0 1 0 1 2 1 1 1 1 1 0 0 1 1 1 0 0 1 0 1 1 2 1 1 1 1 1 1 0 1 1 1 0 1 0 2 0 1 1 1 2 2 2 0 0 1 0 1 0 2 2 2 1 2 2 2 0 0 1 0 0 1 0 1 2 2 2 2 2 2 0 0 1 1 1 0 0 1 0 2 2 2 2 2 2 1 1 1 0 1 1 1 0 1 2 2 2 2 2 2 2 2 2 0 0 1 0 1 0 2 2 2 2 2 1 1 2 2222 2222 2222]T X 15 BASIC OF KNOWLEDGE REPRESENTATION Initial date Fuzzy Decision Tree Fuzzy Decision Rules Multiple-Valued Logic Decision Tables Truth table vector column Sensitivity Analysis Testability Analysis Reliability Analysis 16 FUZZY DECISION MAKING SUPPORT SYSTEM Fuzzy Decision Rules (FDT, DT) Numeric variables Fuzzy Linguistic variables Fuzzy Analysis de-Fuzzy 17 SOFTWARE FOR EXPERIMENTAL INVESTIGATIONS We create software application Multiprognos by C++ ver. 5.02 Block 1. Data preparation Read and write data Separate data into 2 parts (learning and testing data) (1000) File initialization Learning data (70%) Testing data (30%) Block 2. Initial data fuzzification Transformation from numeric values of the input attributes into linguistic values Block 3. Design of models Block 4. Analysis of the results Induction of Fuzzy Decision Trees Induction of Decision Trees YS-FDT C4.5 CART nFDT C4.5p CARTp oFDT sFDT Statistical methods and algorithms Bayes kNN Calculation of the decision errors Writing data with incorrect decisions Saving of the models 18 RESULTS OF EXPERIMENT Machine Learning Repository http://archive.ics.uci.edu/ml/ Normalization into [0; 1] xold xmin xnew xmax xmin 19 RESULTS OF EXPERIMENT (2) o We have selected 44 databases from Machine Learning Repository http://archive.ics.uci.edu/ml/ blood, breast, bupa, cmc, diagnosis, haberman, heart, ilpd, nursery, parkinsons, pima, thyroid, vertebral2, vertebral3, wdbc, wpbc, etc. o We have calculated normalized error of misclassification: Methods’ Name Rating Fuzzy Decision Trees 0,1884 Method C4.5 0,2135 Method CART 0,3065 Fuzzy Decision Trees (Y. Yuan & M. Show) 0,4793 Naïve-Bayes Classifications Models 0,5106 k-nearest neighbors 0,6465 20 IMPLEMENTATION INTO PROJECTS 1. Project FP7-ICT-2013-10. Regional Anesthesia Simulator and Assistant (RASimAs), Reg. No. 610425, Nov. 2013-2016. 2. Project APVV SK-PL. Support Systems for Medical Decision Making, Reg. No. SK-PL-0023-12, 2013-2014. 3. Project TEMPUS. Green Computing & Communications (GreenCo), Reg. No. 530270-TEMPUS-1-2012-1-UK-TEMPUS-JPCR, 2012-2015. 4. Project NATO. Intelligent Assistance Systems: Multisensor Processing and Reliability Analysis, NATO Collaborative Linkage Grant, CanadaSlovenskо-Czech-Belarus, Reg. No. CBP.EAP.CLG 984, 2011-2012. 21 THANKS FOR ATTENTION Vitaly.Levashenko @ fri. uniza. sk Department of Informatics, FRI, University of Žilina (Slovakia)