A predictive model for cerebrovascular disease using data mining Presentation by: Swapna Savvana, Graduate Student, Institute of Technology, University of Washington, Tacoma Authors: Duen-Yian Yeh Ching-Hsue Cheng Yen-Wen Chen Agenda Introduction and Motivation Cerebrovascular disease Features Major diseases related with cerebrovascular disease Data Set, Data Mining Classification Techniques Data Mining procedure Comparison of classification Models Diagnosis rules Conclusion Introduction and Motivation Cerebrovascular disease is a type of pathological change in brain blood vessels and a general artery sclerosis complication Cerebrovascular disease ranks 2 out of 10 death causes in Taiwan High Medical Expenses, sometimes costs life Pathogenesis of cerebrovascular disease is complex and variable Accurate diagnosis in advance is difficult Predictive model enhances the preventive medicine diagnosis Cerebrovascular disease Features High prevalence High fatality rate High disability rate High recurrence rate Major diseases related with cerebrovascular disease Diabetes Mellitus(DM) Hypertension (hp(b)) Myocardial infarction (mi(H)) Cardiogenic shock (car(H)) Hyperlipaemia(lip(B)) Arrhythmi (aarr(H)) Ischemic heart disease hd(H) Body mass index(bmi) Data set 493 samples Physical exam results Blood test results Diagnosis data Data Mining Classification Techniques Decision Tree (C4.5 algorithm is used ) Bayesian Classifier Back propagation Neural network(BPNN) Data Mining procedure Data collection and variable screening Attribute Symbolization(N,CD,BH,DM,SM class codes) Input Data Splitting (T1, T2, T3) Classification algorithms used to construct the models Classification efficiency analyses and comparison • Sensitivity: the probability of positive test given that the patient is ill (Mii/Mri ) • Accuracy: the no of correctly classified instances percentage [(M11+M22+M33+M44+M55)/M] Extraction of diagnosis classification rules Comparison of classification Models T1 T2 T3 Sensitivity Accuracy Sensitivity Accuracy Sensitivity Accuracy Decision Tree 95.29% 98.01% 94.68% 98.01% 62.81% 66.93% Bayesian Classifier 87.10% 91.30% 86.30% 91.36% 66.60% 71.83% BPNN 94.82% 97.87% 93.80% 98.05% 64.21% 69.32% Diagnosis Rules Rule No. dm(D) 1 mi(H) Hp(B) car(H) lip(B) arr(H) hd(H) bmi Y 2 DM(71), 0.9859 Y BH(6), 0.8333 3 Y BH(13), 1.0 4 Y 5 Y 7 Y 8 Y 9 Y 10 11 BH(2), 1.0 Y SM(51), 1.0 Y SM(7), 1.0 Y SM(2), 1.0 Y Y SM(2), 1.0 Y Y 12 13 BH(23), 1.0 Y 6 Prediction results SM(5), 1.0 Y Y SM(9), 1.0 Y Y 14 Y SM(2), 1.0 Y SM(11), 1.0 Y SM(3), 1.0 15 Y Y <=26.8 SM(1), 1.0 16 Y Y >26.8 SM(1), 1.0 Conclusion Decision Tree has high accuracy and sensitivity 16 Diagnosis rules are accurate