PMFCDUDM_Overview_p - University of Washington

advertisement
A predictive model for cerebrovascular
disease using data mining
Presentation by:
Swapna Savvana,
Graduate Student,
Institute of Technology,
University of Washington,
Tacoma
Authors:
Duen-Yian Yeh
Ching-Hsue Cheng
Yen-Wen Chen
Agenda

Introduction and Motivation

Cerebrovascular disease Features

Major diseases related with cerebrovascular disease

Data Set, Data Mining Classification Techniques

Data Mining procedure

Comparison of classification Models

Diagnosis rules

Conclusion
Introduction and Motivation
 Cerebrovascular disease is a type of pathological change in brain
blood vessels and a general artery sclerosis complication
 Cerebrovascular disease ranks 2 out of 10 death causes in Taiwan
 High Medical Expenses, sometimes costs life
 Pathogenesis of cerebrovascular disease is complex and variable
 Accurate diagnosis in advance is difficult
 Predictive model enhances the preventive medicine diagnosis
Cerebrovascular disease Features

High prevalence

High fatality rate

High disability rate

High recurrence rate
Major diseases related with cerebrovascular disease

Diabetes Mellitus(DM)

Hypertension (hp(b))

Myocardial infarction (mi(H))

Cardiogenic shock (car(H))

Hyperlipaemia(lip(B))

Arrhythmi (aarr(H))

Ischemic heart disease hd(H)

Body mass index(bmi)
Data set

493 samples

Physical exam results

Blood test results

Diagnosis data
Data Mining Classification Techniques

Decision Tree (C4.5 algorithm is used )

Bayesian Classifier

Back propagation Neural network(BPNN)
Data Mining procedure

Data collection and variable screening

Attribute Symbolization(N,CD,BH,DM,SM class codes)

Input Data Splitting (T1, T2, T3)

Classification algorithms used to construct the models

Classification efficiency analyses and comparison
•
Sensitivity: the probability of positive test given that
the patient is ill (Mii/Mri )
•
Accuracy: the no of correctly classified instances
percentage [(M11+M22+M33+M44+M55)/M]

Extraction of diagnosis classification rules
Comparison of classification Models
T1
T2
T3
Sensitivity
Accuracy Sensitivity
Accuracy
Sensitivity
Accuracy
Decision
Tree
95.29%
98.01%
94.68%
98.01%
62.81%
66.93%
Bayesian
Classifier
87.10%
91.30%
86.30%
91.36%
66.60%
71.83%
BPNN
94.82%
97.87%
93.80%
98.05%
64.21%
69.32%
Diagnosis Rules
Rule No. dm(D)
1
mi(H)
Hp(B) car(H) lip(B)
arr(H) hd(H) bmi
Y
2
DM(71), 0.9859
Y
BH(6), 0.8333
3
Y
BH(13), 1.0
4
Y
5
Y
7
Y
8
Y
9
Y
10
11
BH(2), 1.0
Y
SM(51), 1.0
Y
SM(7), 1.0
Y
SM(2), 1.0
Y
Y
SM(2), 1.0
Y
Y
12
13
BH(23), 1.0
Y
6
Prediction
results
SM(5), 1.0
Y
Y
SM(9), 1.0
Y
Y
14
Y
SM(2), 1.0
Y
SM(11), 1.0
Y
SM(3), 1.0
15
Y
Y
<=26.8
SM(1), 1.0
16
Y
Y
>26.8
SM(1), 1.0
Conclusion

Decision Tree has high accuracy and sensitivity

16 Diagnosis rules are accurate
Download