Machine Learning – RIME 832 Unsupervised Learning & Anomaly Detection Dr. Sara Ali Supervised vs Unsupervised Unsupervised Setting Supervised Setting Introduction • Goal: Find structure/pattern in the data • Clustering is the most common type of unsupervised learning algorithm, although many other exist Clustering Applications K-Means Clustering Algorithm • Most common and popular • Cluster data into 2 groups Cluster centroid 1 Cluster centroid 2 K-Means Clustering Algorithm K-Means Clustering Algorithm K-Means Clustering Algorithm • After many iterations K-Means Algorithm K-Means Algorithm K-Means for non-separated clusters • For 3 sizes: small, medium and large K-Means Optimization Objective • Optimization objective is to minimize cost function K-Means cluster centroid initialization K-Means cluster centroid initialization K-Means cluster centroid initialization K-Means cluster centroid initialization • Local Optima problems with random initialization – affects optimization objective K-Means cluster centroid initialization • Multiple random initialization • Pick clustering with the lowest cost K-Means Algorithm • Ideal for – small number of clusters? – large number of clusters? Choosing number of clusters • Most common is to visualize and then pick the number of clusters Number of Clusters? Choosing number of clusters • Elbow method: Does not always work Choosing number of clusters Dimensionality Reduction • Another common type of unsupervised algorithm • Applications/motivation – Data compression – Visualization Data Compression • Data Redundancy/high correlated features • Too many features makes it intractable to choose and retain best/relevant features only Data Compression Data Compression Data Visualization • Lets suppose we have collected lots of statistical data for many countries N-dimensional feature vector for each country Data Visualization • Can we reduce dimensions to visualize data such as below Principal Component Analysis • Minimize squared error distance or projection error Principal Component Analysis • Magenta vs Red Line Principal Component Analysis Principal Component Analysis Formulation • Find a way to project data on a k-dimensional space to minimize projection error Principal Component Analysis Algorithm Principal Component Analysis Algorithm nxn nxn Principal Component Analysis Algorithm • Finally pick k-components from U matrix i.e. n x k dimensional matrix Principal Component Analysis Algorithm Principal Component Analysis • Reconstruction of compressed data • Low dimensional data to high dimensional data Z = UT x xapprox = U Z Number of Principal Components? Average squared projection error Total variation in data Number of Principal Components? • Algorithm: Try for different k = 1, 2 ,…… values until ratio is less than 0.01 or whatever value you choose Number of Principal Components? • S is a matrix with non-zero diagonal entries • Let sii denote the diagonal entries, then we can use the following condition to select k PCA Applications • Supervised Learning Speedup PCA Applications • PCA should NOT be used to avoid over fitting by reducing the number of features • Another Caution Anomaly Detection Anomaly Detection Example Anomaly Detection Example Gaussian Distribution • If X is a distributed Gaussian with mean and variance Parameter Estimation Density Estimation Anomaly Detection Algorithm Anomaly Detection Example Algorithm Evaluation Aircraft Engine Example Is Anomaly Detection Supervised Learning? Example Applications Non-Gaussian Features? Multivariate Gaussian (Normal) Distribution Multivariate Gaussian (Normal) Distribution Multivariate Gaussian Examples Multivariate Gaussian Examples Multivariate Gaussian Examples Multivariate Gaussian Examples Multivariate Gaussian Examples Multivariate Gaussian Examples Multivariate Gaussian (Normal) Distribution Anomaly Detection with the multivariate Gaussian Relationship to Original Model Original vs Multivariate Gaussian