EEL 6876 Intelligent Diagnostics Case Based Reasoning P ro jec t Repo r t Submitted by: Madan Bharadwaj Instructor: Dr. Avelino Gonzalez [1/28] Project Specifications Problem Definition: To develop a case based diagnostic reasoning system for an oil cooler setup using the case library provided. Use no other resource. Given: Case Library of 21 cases. 7 Test cases. Programming Language used: C++ Programming Approach: Object-Oriented [1/28] Problem Overview Case Based reasoning is a part of a family of knowledge-based systems used to make intelligent decisions automatically. This paradigm was developed basically to overcome the drawbacks of rule-based systems which required the knowledge engineer to know all the rules before hand. This implied that expert knowledge had to be accrued by the development team from subject matter experts and hardcode them into the system to develop a rule based system. Further more it also laid a huge penalty for rules inducted later into the system, where the new rules had to be tested for compatibility with the old ones and the whole system overhauled whenever there is an update. These weaknesses were overcome by the case based reasoning approach, which used historical cases already prepared for diagnosis. These historical cases reflect a wealth of information previously seen and documented and hence using them saves a lot of development time and resources. These factors contributed largely to the development of case based reasoning as a viable approach towards knowledge engineering applications. In the CBR algorithm described below case based reasoning is implemented with a library of historical cases to provide the knowledge base and a handful of test cases to test its utility. [2/28] Approach to the Problem A purely case based reasoning approach is adopted for this assignment. Even though some information about the system is known beforehand, like the need for TOIL to be greater than 120 for the diagnosis to have any significance, the detail is consciously ignored. This aspect will be elaborated more in the final section. Adaptation is not used primarily because of lack of confidence in the answers the system may generate and the inability to check answers with expert knowledge. Though using Visual Basic would have been a far more comfortable then C++, it was chosen because of its potential to be used in real time applications. One can visualize an industry situation with many hundred test cases may be even thousands of historical cases used as a library in which place execution time will turn out to be a crucial entity to ensure user acceptability of the system and hence C++ was chosen over VB. The CBR Algorithm Overview The CBR (Case Based Reasoning) Algorithm was developed using C++ programming language with an object-oriented approach. The algorithm utilizes the cases provided in the case library and uses them as a reference to compare with when it is tested with a test case. A pattern matching code block was developed to serve that purpose. The results of the pattern matching code block provide the degree of match, which provides the user an intuitive degree of belief in the results. For the sake of clarity and readability the program is divided into several code blocks that perform certain specific functions. In the following section every code block is briefly explained and then followed up in the next section with the pseudo code that will give a programmer a hands on view of the algorithm. [3/28] Code Blocks in the CBR Algorithm The CBR algorithm is composed of several code blocks. They are the following. 1. Case Library Initialization Code Block This code block initializes the case library to be used by the algorithm. The case library is ‘hard coded’ into the program to sidestep complexities invited by other methods such as using databases or text files. This decision was also made in view of the educational purpose of the assignment, which is to implement a Case Based Reasoning system rather than implement complex data retrieval techniques which add little in terms of educational value for this project. The code block simply initializes all the cases with their predefined values. 2. CBR Performance Code Block This code blocks requests for user inputs or in other words requests for test case inputs. These test case inputs are then stored in appropriate variables and are then processed using the Pattern Matching code block to find suitable matches for the test case in the library. After a match has been found, the appropriate diagnosis are retrieved from the library and displayed to the user as output. Since the output also shows the degree of match in percentages it makes it gives the user the confidence level that can be attached to the results. 3. Pattern Matching Code Block This is the centerpiece of the algorithm providing the match information to the Performance Code block. In this part of the program every attribute of the test case is compared with its counterpart in a library case and their match is normalized and averaged to provide a final number representing the error between the test and library cases. This process is done for every library case and the library case with the least error is chosen as the solution for the test case. [4/28] 4. Class Definition Code Block This code block lays down the structure for addressing the problem. Here a class is defined based on which the cases are inputted. The class is defined with the six attributes of a case as public variables of that class. The pattern matching function is also a part of the class definition. Pseudo Code for CBR Algorithm // Define Class class cbrcase { // Initialize attribute variables; int TOIL, TWTR, TWTR_RISE, OIL_PUR, OIL_FLO, WTR_DP; char Diagnosis[3]; //Define function to set variables void setdata() //Define Pattern Matching Function; float compare(library_case, test_case) { // Compare library case and test case attributes // Return Match Value } } // Main function void main() { //Initialize Case Library case1.setdata(100,0,0,0,0,0,diag1); [5/28] case2.setdata(125,40,20,85,70,22,diag2); . . . case21.setdata(121,56,20,70,80,30,diag21); // Input Test Case and Compare with Library cases cin>>testcase.TOIL; cin>>testcase.TWTR; //Compare with case library tval[0] = c1.compare(case1,testcase); tval[1] = c1.compare(case2,testcase); . . . tval[20] = c1.compare(case21,testcase); //Find Minimum Error //Retrieve Diagnosis switch(min_position) { case 1: strcpy(test_diagnosis , case1.Diagnosis); break; case 2: strcpy(test_diagnosis , case2.Diagnosis); break; . [6/28] . . } //Display to user cout<<"\nThe Diagnosis for the test case are :"<<endl; cout<<test_diagnosis[0]<<endl; cout<<test_diagnosis[1]<<endl; cout<<test_diagnosis[2]<<endl; } Results To prove the correctness of the system a few tests were conducted. Since the algorithm looks for similarity between the test case and the library cases, if we were to input a particular library case or a case very similar to a library case then we must see the system throw up a 100% match or close to 100% match for the test cases. This check is done first to prove the system credibility. To begin with a library case is chosen as input. We expect to see the system pull up the same library case from the library and present it as our matching case and produce the same diagnosis. Custom Test Cases to prove System Validity Chosen Library Case: Case #2 CBR Algorithm implemented in C++ This script implements a case based reasoning system with a case library of 21 cases. The test cases have to be keyed in when the program requests for input. Input the attributes of the test Case. [7/28] TOIL :125 TWTR :40 TWTR-RISE :20 OIL-PUR :85 OIL-FLO :70 WTR-DP :22 We have a match with a library case with a match degree of : 100 % The library case number which is the closest match to the test case is : 2 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK D - RUPT E - BLOCK F - HI- TEMP G - SEN-FAIL H - NO-CONCLUSION The Diagnosis for the test case are : B The results match our expectations. It has a 100% match with case #2 in the library. Now a case very similar to case 2 is tested. Now we expect a to see a match close to 100% but not 100%. Input the attributes of the test Case. TOIL :123 TWTR :42 TWTR-RISE :21 OIL-PUR :85 [8/28] OIL-FLO :75 WTR-DP :20 We have a match with a library case with a match degree of : 95.3639 % The library case number which is the closest match to the test case is : 2 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK D - RUPT E - BLOCK F - HI- TEMP G - SEN-FAIL H - NO-CONCLUSION The Diagnosis for the test case are : B We have a 95% match with case #2. These results certify the credibility of the system. Now we extend the testing to the test cases provided by the instructor. Solutions for Test Cases Provided by Instructor Test #1 This script implements a case based reasoning system with a case library of 21 cases. The test cases have to be keyed in when the program requests for input. Input the attributes of the test Case. TOIL :112 TWTR :0 TWTR-RISE :0 OIL-PUR :0 OIL-FLO :0 WTR-DP :0 We have a match with a library case with a match degree of : 99.7024 % [9/28] Test #2 The library case number which is the closest match to the test case is : 5 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK D - RUPT E - BLOCK F - HI- TEMP G - SEN-FAIL H - NO-CONCLUSION The Diagnosis for the test case are : A Do you want to try another test case? Answer Y or N Y Input the attributes of the test Case. TOIL :128 TWTR :35 TWTR-RISE :31 OIL-PUR :97 OIL-FLO :68 WTR-DP :27 We have a match with a library case with a match degree of : 89.0216 % The library case number which is the closest match to the test case is : 12 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK D - RUPT E - BLOCK F - HI- TEMP G - SEN-FAIL H - NO-CONCLUSION The Diagnosis for the test case are : B [10/28] Test #3 Do you want to try another test case? Answer Y or N y Input the attributes of the test Case. TOIL :131 TWTR :39 TWTR-RISE :33 OIL-PUR :83 OIL-FLO :63 WTR-DP :24 We have a match with a library case with a match degree of : 95.0395 % The library case number which is the closest match to the test case is : 12 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK D - RUPT E - BLOCK F - HI- TEMP G - SEN-FAIL H - NO-CONCLUSION The Diagnosis for the test case are : B Test #4 Do you want to try another test case? Answer Y or N y Input the attributes of the test Case. TOIL :127 TWTR :53 TWTR-RISE :31 OIL-PUR :0 [11/28] OIL-FLO :0 WTR-DP :0 We have a match with a library case with a match degree of : 95.6888 % The library case number which is the closest match to the test case is : 7 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK D - RUPT E - BLOCK F - HI- TEMP G - SEN-FAIL H - NO-CONCLUSION The Diagnosis for the test case are : F Test #5 Do you want to try another test case? Answer Y or N y Input the attributes of the test Case. TOIL :139 TWTR :45 TWTR-RISE :-6 OIL-PUR :0 OIL-FLO :0 WTR-DP :0 We have a match with a library case with a match degree of : 83.3 % The library case number which is the closest match to the test case is : 8 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK [12/28] D - RUPT E - BLOCK F - HI- TEMP The Diagnosis for the test case are G - SEN-FAIL H - NO-CONCLUSION : G Do you want to try another test case? Answer Y or N y Test #6 Input the attributes of the test Case. TOIL :141 TWTR :70 TWTR-RISE :48 OIL-PUR :95 OIL-FLO :63 WTR-DP :23 We have a match with a library case with a match degree of : 91.6016 % The library case number which is the closest match to the test case is : 11 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK D - RUPT E - BLOCK F - HI- TEMP G - SEN-FAIL H - NO-CONCLUSION The Diagnosis for the test case are : D Test #7 Do you want to try another test case? Answer Y or N y [13/28] Input the attributes of the test Case. TOIL :136 TWTR :39 TWTR-RISE :31 OIL-PUR :91 OIL-FLO :40 WTR-DP :23 We have a match with a library case with a match degree of : 94.3521 % The library case number which is the closest match to the test case is : 10 The Notations used for representing the various Diagnosis are... A - NO-PROBLEM B - LEAK C - REVLEAK D - RUPT E - BLOCK F - HI- TEMP G - SEN-FAIL H - NO-CONCLUSION The Diagnosis for the test case are : E C Do you want to try another test case? Answer Y or N N [14/28] Conclusions 1. Purely Case Based Reasoning: The system was implemented with only a case based reasoning paradigm even though some information was available outside the case library. The information about the oil cooler system provided by the instructor revealed the inner dynamics of the system but incorporating that knowledge into the case based reasoning system would mean diluting the case based reasoning paradigm and adopting a hybrid approach towards the problem. This was violating the tenets set by the instructor that the system cannot include any foreign knowledge other than the case library. This also meant that one would not have been able to see the system’s performance with just the case library knowledge alone, since there would be uncertainty as to how much the foreign knowledge helped the system in its performance. Hence only the case library was used as the source of knowledge even though knowledge was available outside it. 2. Performance Evaluation: The performance of the system depends entirely upon the nature of the case library. If the case library has a large number of cases with representing diverse problems faced and their solutions the chances of a good diagnosis are good. But on the other hand if the library is limited and there is significant over representation of certain problems and under representation of certain others, then the chances of a good diagnosis are considerably weakened. With the given case library the performance of the system on the test cases seem viable and acceptable. The real test of the system would be know the exact values of the thresholds of the systems, which could not be extracted completely from the given information, and then present fabricated cases to system based on the thresholds. If the system gives us matching answers to the ones we worked out then we can have a very high level of confidence in the system. This however as explained above would depend on the quantity and quality of the cases in the library. [15/28] 3. Simple Design: It can be safely inferred that case based reasoning systems are far more simple than rule based systems. Now all that needs to be done to incorporate more knowledge is to add more cases to the case library. This will automatically improve the performance of the CBR algorithm and will not cause any logical exceptions in the system. However with a rule based system it is necessary to check for rule compatibility when we add new rules. If this aspect is not checked thoroughly then the system may come down critically during run time. This may be ruled out using case based reasoning systems. 4. Execution Time: One can see that the execution time for a case based reasoning system depends directly upon the number of cases in the case library. The more the number of cases the longer will it take for the algorithm to compare with each library case to produce the closest match. This introduces a new type of problem for the knowledge engineer. He has to be able to eliminate cases which are very similar so that he cab reduce the execution time of the system. This may require expert help sometimes since some subtle differences in case attributes may all that may be required to infer a different diagnosis. Choosing cases judiciously would be an important part of building a case based reasoning system. 5. Choice of Programming Language: The first choice for the programming language was Visual Basic, since this provided all the computational basis for executing the system and the user friendliness that could be provided is far greater than other conventional programming languages. However Visual Basic is not very efficient when it comes to comparing execution times with languages like C++. The ideal solution would have been to do all the computation in C++ and display the results using Visual Basic forms, but this placed a huge programming load which was added no educational value to the project. Since C++ would be used if one were to implement a case based reasoning system for real time purposes considering its code reusability and computational speed, C++ was chosen for implementing the project. [16/28] Appendix – I. User Instructions 1. Run executable file. 2. Enter test case attributes. All attributes are by default integers. 3. View Result 4. Answer ‘Y’ to continue and ‘N’ to stop. [17/28] Appendix – II – Source Code // cbr_1.cpp // CBR Algorithm implemented in C++ // This script implements a case based reasoning system with a case library of 21 cases. // The test cases have to be keyed in when the program requests for input. #include <iostream.h> #include <stdio.h> #include <string.h> #include <fstream.h> #include <math.h> //CLASS DEFINITION CODE BLOCK class cbrcase //Class defining all the attributes of a case { public: int TOIL, TWTR, TWTR_RISE, OIL_PUR, OIL_FLO, WTR_DP; char Diagnosis[3]; void setdata(int toil, int twtr, int twtr_rise, int oil_pur, int oil_flo, int wtr_dp, char diagnosis[3]) { TOIL = toil; TWTR = twtr; TWTR_RISE = twtr_rise; OIL_PUR = oil_pur; OIL_FLO = oil_flo; WTR_DP = wtr_dp; for(int i=0;i<3;i++){Diagnosis[i] = diagnosis[i];} } //PATTERN MATCHING CODE BLOCK //Pattern Matching Function to find the degree of match between library case and test case float compare(cbrcase lcase,cbrcase tcase) { [18/28] //Comparing TOIL values float toil_val; if(tcase.TOIL!=0) { int t1 = tcase.TOIL - lcase.TOIL; toil_val= fabsf(float(t1)/float(tcase.TOIL)); } else if(tcase.TOIL == 0 && lcase.TOIL == 0) { toil_val = 0; } else {toil_val=1;} //Comparing TWTR values float twtr_val; if(tcase.TWTR!=0) { int t1 = tcase.TWTR - lcase.TWTR; twtr_val = fabsf(float(t1)/float(tcase.TWTR)); } else if(tcase.TWTR == 0 && lcase.TWTR == 0) { twtr_val = 0; } else {twtr_val=1;} //Comparing TWTR-RISE values float twtrrise_val; if(tcase.TWTR_RISE!=0) { int t1 = tcase.TWTR_RISE - lcase.TWTR_RISE; twtrrise_val = fabsf(float(t1)/float(tcase.TWTR_RISE)); [19/28] } else if(tcase.TWTR_RISE == 0 && lcase.TWTR_RISE == 0) { twtrrise_val = 0; } else {twtrrise_val=1;} //Comparing OIL_PUR values float oilpur_val; if(tcase.OIL_PUR!=0) { int t1 = tcase.OIL_PUR - lcase.OIL_PUR; oilpur_val = fabsf(float(t1)/float(tcase.OIL_PUR)); } else if(tcase.OIL_PUR == 0 && lcase.OIL_PUR == 0) { oilpur_val = 0; } else {oilpur_val=1;} //Comparing OIL_FLO values float oilflo_val; if(tcase.OIL_FLO!=0) { int t1 = tcase.OIL_FLO - lcase.OIL_FLO; oilflo_val = fabsf(float(t1)/float(tcase.OIL_FLO)); } else if(tcase.OIL_FLO == 0 && lcase.OIL_FLO == 0) { oilflo_val = 0; } else { oilflo_val=1; } [20/28] //Comparing WTR-DP values float wtrdp_val; if(tcase.WTR_DP!=0) { int t1 = tcase.WTR_DP - lcase.WTR_DP; wtrdp_val = fabsf(float(t1)/float(tcase.WTR_DP)); } else if(tcase.WTR_DP == 0 && lcase.WTR_DP == 0) { wtrdp_val = 0; } else { wtrdp_val=1; } float val=(toil_val+twtr_val+twtrrise_val+oilpur_val+oilflo_val+wtrdp_val)/6; return val; } }; void main() { cout<<"CBR Algorithm implemented in C++"<<endl; cout<<"This script implements a case based reasoning system with a case library of 21 cases."<<endl; cout<<"The test cases have to be keyed in when the program requests for input.\n"<<endl; [21/28] float compare(cbrcase,cbrcase); //CASE LIBRARY INITIALIZATION CODE BLOCK //Initializing Case Library cbrcase case1,case2,case3,case4,case5,case6,case7,case8,case9,case10,case11,case12,case 13,case14,case15,case16,case17,case18,case19,case20,case21; char diag1[3] = {'A',0,0}; case1.setdata(100,0,0,0,0,0,diag1); char diag2[3] = {'B',0,0}; case2.setdata(125,40,20,85,70,22,diag2); char diag3[3] = {'G',0,0}; case3.setdata(140,30,-10,0,0,0,diag3); char diag4[3] = {'D',0,0}; case4.setdata(130,20,40,90,70,10,diag4); char diag5[3] = {'A',0,0}; case5.setdata(110,0,0,0,0,0,diag5); char diag6[3] = {'D','B',0}; case6.setdata(123,60,40,90,80,20,diag6); char diag7[3] = {'F',0,0}; case7.setdata(127,65,32,0,0,0,diag7); char diag8[3] = {'G',0,0}; case8.setdata(131,65,-3,0,0,0,diag8); char diag9[3] = {'D',0,0}; case9.setdata(176,30,31,92,60,10,diag9); char diag10[3] = {'E','C',0}; case10.setdata(140,32,33,93,40,22,diag10); char diag11[3] = {'D',0,0}; case11.setdata(130,62,40,91,62,21,diag11); char diag12[3] = {'B',0,0}; case12.setdata(121,42,31,80,60,24,diag12); char diag13[3] = {'B',0,0}; case13.setdata(146,38,34,75,61,22,diag13); char diag14[3] = {'E','C',0}; case14.setdata(121,39,35,95,43,21,diag14); char diag15[3] = {'B',0,0}; case15.setdata(126,46,33,93,80,21,diag15); char diag16[3] = {'D',0,0}; case16.setdata(129,49,32,94,51,15,diag16); char diag17[3] = {'C',0,0}; [22/28] case17.setdata(135,48,31,97,63,40,diag17); char diag18[3] = {'H',0,0}; case18.setdata(120,46,34,99,70,22,diag18); char diag19[3] = {'A',0,0}; case19.setdata(119,0,0,0,0,0,diag19); char diag20[3] = {'D','E','B'}; case20.setdata(126,52,42,60,40,12,diag20); char diag21[3] = {'B',0,0}; case21.setdata(121,56,20,70,80,30,diag21); char ans; do { //CBR PERFORMANCE CODE BLOCK //Input Test Case cbrcase testcase; cout<<"Input the attributes of the test Case.\n\nTOIL :"; cin>>testcase.TOIL; cout<<"\nTWTR :";cin>>testcase.TWTR; cout<<"\nTWTR-RISE :";cin>>testcase.TWTR_RISE; cout<<"\nOIL-PUR :";cin>>testcase.OIL_PUR; cout<<"\nOIL-FLO :";cin>>testcase.OIL_FLO; cout<<"\nWTR-DP :";cin>>testcase.WTR_DP; cbrcase c1; float tval[21]; tval[0] = c1.compare(case1,testcase); tval[1] = c1.compare(case2,testcase); tval[2] = c1.compare(case3,testcase); tval[3] = c1.compare(case4,testcase); tval[4] = c1.compare(case5,testcase); tval[5] = c1.compare(case6,testcase); tval[6] = c1.compare(case7,testcase); tval[7] = c1.compare(case8,testcase); tval[8] = c1.compare(case9,testcase); tval[9] = c1.compare(case10,testcase); tval[10] = c1.compare(case11,testcase); tval[11] = c1.compare(case12,testcase); tval[12] = c1.compare(case13,testcase); tval[13] = c1.compare(case14,testcase); [23/28] tval[14] = c1.compare(case15,testcase); tval[15] = c1.compare(case16,testcase); tval[16] = c1.compare(case17,testcase); tval[17] = c1.compare(case18,testcase); tval[18] = c1.compare(case19,testcase); tval[19] = c1.compare(case20,testcase); tval[20] = c1.compare(case21,testcase); float temp; int min_position; temp = tval[0]; for(int i = 0;i<20;i++) { //cout<<"\nMatch Values are : "<<tval[i]; if(tval[i+1]<temp) { temp = tval[i+1]; min_position = i+2; } } //cout<<"\nMatch Values are : "<<tval[20]; cout<<"\nWe have a match with a library case with a match degree of : "<<100-(temp*100)<<" %\n"<<endl; cout<<"\nThe library case number which is the closest match to the test case is : "<<min_position<<endl; //SUB BLOCK - OUTPUT RETRIEVAL // Collecting Diagnosis char test_diagnosis[3] = {NULL, NULL, NULL}; switch(min_position) { case 1: strcpy(test_diagnosis , case1.Diagnosis); break; case 2: [24/28] strcpy(test_diagnosis , case2.Diagnosis); break; case 3: strcpy(test_diagnosis , case3.Diagnosis); break; case 4: strcpy(test_diagnosis , case4.Diagnosis); break; case 5: strcpy(test_diagnosis , case5.Diagnosis); break; case 6: strcpy(test_diagnosis , case6.Diagnosis); break; case 7: strcpy(test_diagnosis , case7.Diagnosis); break; case 8: strcpy(test_diagnosis , case8.Diagnosis); break; case 9: strcpy(test_diagnosis , case9.Diagnosis); break; case 10: [25/28] strcpy(test_diagnosis , case10.Diagnosis); break; case 11: strcpy(test_diagnosis , case11.Diagnosis); break; case 12: strcpy(test_diagnosis , case12.Diagnosis); break; case 13: strcpy(test_diagnosis , case13.Diagnosis); break; case 14: strcpy(test_diagnosis , case14.Diagnosis); break; case 15: strcpy(test_diagnosis , case15.Diagnosis); break; case 16: strcpy(test_diagnosis , case16.Diagnosis); break; case 17: strcpy(test_diagnosis , case17.Diagnosis); break; case 18: strcpy(test_diagnosis , case18.Diagnosis); break; [26/28] case 19: strcpy(test_diagnosis , case19.Diagnosis); break; case 20: strcpy(test_diagnosis , case20.Diagnosis); break; case 21: strcpy(test_diagnosis , case21.Diagnosis); break; } cout<<"\nThe Notations used for representing the various Diagnosis are..."<<endl; cout<<"A - NO-PROBLEM B - LEAK C - REVLEAK cout<<"E - BLOCK F - HI- TEMP G - SEN-FAIL D - RUPT"<<endl; H - NO-CONCLUSION"<<endl; cout<<"\nThe Diagnosis for the test case are :"<<endl; cout<<test_diagnosis[0]<<endl; cout<<test_diagnosis[1]<<endl; cout<<test_diagnosis[2]<<endl; cout<<"\nDo you want to try another test case? Answer Y or N\n"; cin>>ans; }while(ans !='N'); } [27/28]