Uploaded by 2005000504

Cardiovascular disease

advertisement
DETECTING CARDIOVASCULAR
DISEASE
Christian Castillo
20160066
SUMMARY
• Implement decision tree classifier
• Objective: classify if someone has cardiovascular disease
• My recommendation: add some variables
• Other models tested performed not as good
OUTLINE
• The Data
• The Model
• Comparison with other models
• Recommendations
THE DATA
• Taken from Kaggle
• “Cardio”, outcome variable, binary
• 16 features: 5 numerical and 11 dummies
• 70k observations
CLASS BALANCE
• Approximately equally distributed classes
• However, undersampling was done
• 35,021 – 34979
(42)
SYSTOLIC PRESSURE
• One of the most important features
according to feature importance.
• Normally distributed
THE MODEL
• Decision tree
• Uses a set of hierarchical decision to classify
• Purpose in the project
• Detect cardiovascular disease
• Proved to be the best in performance. ~70% recall
• Other models had lower score
DECISION TREE
• True Positive Rate of 69.73%
OTHER MODELS
ADABOOST
• Lower recall 65.98%
RANDOM FOREST
• Lower recall 67.83%
RECOMMENDATIONS
• Implement decisión tree classifier
• Objective: classify if someone has cardiovascular disease
• My recommendation: add some variables
• Caveats:
• Dont limit only on those that are taking medical examination
Download