DETECTING CARDIOVASCULAR DISEASE Christian Castillo 20160066 SUMMARY • Implement decision tree classifier • Objective: classify if someone has cardiovascular disease • My recommendation: add some variables • Other models tested performed not as good OUTLINE • The Data • The Model • Comparison with other models • Recommendations THE DATA • Taken from Kaggle • “Cardio”, outcome variable, binary • 16 features: 5 numerical and 11 dummies • 70k observations CLASS BALANCE • Approximately equally distributed classes • However, undersampling was done • 35,021 – 34979 (42) SYSTOLIC PRESSURE • One of the most important features according to feature importance. • Normally distributed THE MODEL • Decision tree • Uses a set of hierarchical decision to classify • Purpose in the project • Detect cardiovascular disease • Proved to be the best in performance. ~70% recall • Other models had lower score DECISION TREE • True Positive Rate of 69.73% OTHER MODELS ADABOOST • Lower recall 65.98% RANDOM FOREST • Lower recall 67.83% RECOMMENDATIONS • Implement decisión tree classifier • Objective: classify if someone has cardiovascular disease • My recommendation: add some variables • Caveats: • Dont limit only on those that are taking medical examination