Introduction Sleep apnea is an illness that impairs a person’s breathing during sleep. It causes difficulties in breathing while sleeping. Some other risk factors are hypertension, diabetes or other cardiovascular disorders. Dataset Description The data consists of 35 records of patients. ECG voltage value is taken in every 10 milliseconds. Every 1-minute of ECG signals were classified as apnea or not by the health experts. The classification does not include stages of apnea. In total, there were 12 features – 6 time domain features and 6 frequency domain features. (refer to report for details). Standardization and Scaling All distance-based algorithms such as KNN and SVM are affected by the scale of the variables. This is because the performance of all distance-based models is determined by Euclidean distance which is greatly dependent on the magnitudes of the variables. In order to eliminate the impact of the magnitude of these variables, all the variables must be brought to the same scale. The technique of Standardization is used for this purpose: Z= (X-u)/(sigma) https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalizationstandardization/ Tree-based algorithms, on the other hand, are fairly insensitive to the scale of the features. A decision tree is only splitting a node based on a single feature. The decision tree splits a node on a feature that increases the homogeneity of the node. This split on a feature is not influenced by other features. [Ignore hyperparameter tuning part if not confident] Feature Selection Irrelevant (completely or partially) features can impact performance of a model negatively and can reduce the model accuracy. From the set of all the features, applying feature selection methods, the best subset of features are selected and Machine learning algorithms are applied on these obtained features to find the performance in terms of accuracy, recall rate and precision. Advantages of Feature Selection: ● Overfitting reduced: As less redundant data leads to less noise based decisions ● Accuracy improved: As data with lesser noise gives chance for more accuracy. ● Training Time reduced: Fewer data points reduce algorithm complexity and algorithms train faster. Chi Square Test Chi-square test is done for selection of features in ECG dataset. Chi-square is calculated for each feature along with the target and the desired features are selected with the best Chi-square values. Association between the two categorical variables (by converting features into integers) is calculated through this in the dataset. The method used the chi-square statistical test for nonnegative features to select 10 best features of the ECG extracted dataset. This is done using SelectKBest library. From the obtained 10 best features, the same Machine learning algorithms were applied which were applied before feature selection namely KNN, SVM, Random Forest, Naïve Bayes using the sklearn library. On application of algorithms, the accuracy, Recall rate and precision of the algorithms were calculated. Video of Krish Naik https://www.youtube.com/watch?v=EqLBAmtKMnQ Models KNN Adv: Simple and Intuitive, immediately adapts to new data. Disadv: High computational complexity as training data increases, affected by outliers and imbalanced data. SVM Adv: It is more efficient in high dimensional spaces, relatively memory efficient, works well when we have no idea on the data. Disadv: Requires higher training time for large datasets, Difficult to choose the right kernel function Naïve Bayes Adv: Simple to implement, Very fast (since conditional probabilities can be directly computed), works well with large datasets Disadv: Assumes independence of features Logistic Regression Adv: Logistic regression is easier to implement, interpret, and very efficient to train. t not only provides a measure of how appropriate a predictor(coefficient size)is, but also its direction of association (positive or negative). It is very fast at classifying unknown records. Disadv: The major limitation of Logistic Regression is the assumption of linearity between the dependent variable and the independent variables. Random Forest Adv: Doesn’t overfit (Low bias and low variance), can handle both continuous and categorical variables, no feature scaling required, robust to outliers. Disadv: Requires a lot of time for training as it combines a lot of decision trees to determine the class, it also fails to determine the significance of each variable