Uploaded by umer.arshad101

Lec 9 - Unsupervised Learning & Anomaly Detection (1)

advertisement
Machine Learning – RIME 832
Unsupervised Learning
&
Anomaly Detection
Dr. Sara Ali
Supervised vs Unsupervised
Unsupervised Setting
Supervised Setting
Introduction
• Goal: Find structure/pattern in the data
• Clustering is the most common type of
unsupervised learning algorithm, although
many other exist
Clustering Applications
K-Means Clustering Algorithm
• Most common and popular
• Cluster data into 2 groups
Cluster centroid 1
Cluster centroid 2
K-Means Clustering Algorithm
K-Means Clustering Algorithm
K-Means Clustering Algorithm
• After many iterations
K-Means Algorithm
K-Means Algorithm
K-Means for non-separated clusters
• For 3 sizes: small, medium and large
K-Means Optimization Objective
• Optimization objective is to minimize cost
function
K-Means cluster centroid initialization
K-Means cluster centroid initialization
K-Means cluster centroid initialization
K-Means cluster centroid initialization
• Local Optima problems with random
initialization – affects optimization objective
K-Means cluster centroid initialization
• Multiple random initialization
• Pick clustering with the lowest cost
K-Means Algorithm
• Ideal for
– small number of clusters?
– large number of clusters?
Choosing number of clusters
• Most common is to visualize and then pick the
number of clusters
Number of Clusters?
Choosing number of clusters
• Elbow method: Does not always work
Choosing number of clusters
Dimensionality Reduction
• Another common type of unsupervised
algorithm
• Applications/motivation
– Data compression
– Visualization
Data Compression
• Data Redundancy/high correlated features
• Too many features makes it intractable to
choose and retain best/relevant features only
Data Compression
Data Compression
Data Visualization
• Lets suppose we have collected lots of
statistical data for many countries
N-dimensional feature vector for each country
Data Visualization
• Can we reduce dimensions to visualize data
such as below
Principal Component Analysis
• Minimize squared error distance or projection
error
Principal Component Analysis
• Magenta vs Red Line
Principal Component Analysis
Principal Component Analysis Formulation
• Find a way to project data on a k-dimensional
space to minimize projection error
Principal Component Analysis Algorithm
Principal Component Analysis Algorithm
nxn
nxn
Principal Component Analysis Algorithm
• Finally pick k-components from U matrix i.e.
n x k dimensional matrix
Principal Component Analysis Algorithm
Principal Component Analysis
• Reconstruction of compressed data
• Low dimensional data to high dimensional
data
Z = UT x
xapprox = U Z
Number of Principal Components?
Average squared
projection error
Total variation in data
Number of Principal Components?
• Algorithm:
Try for different k = 1, 2 ,…… values until ratio
is less than 0.01 or whatever value you choose
Number of Principal Components?
• S is a matrix with non-zero diagonal entries
• Let sii denote the diagonal entries, then we
can use the following condition to select k
PCA Applications
• Supervised Learning Speedup
PCA Applications
• PCA should NOT be used to avoid over fitting
by reducing the number of features
• Another Caution
Anomaly Detection
Anomaly Detection Example
Anomaly Detection Example
Gaussian Distribution
• If X is a distributed Gaussian with mean and
variance
Parameter Estimation
Density Estimation
Anomaly Detection Algorithm
Anomaly Detection Example
Algorithm Evaluation
Aircraft Engine Example
Is Anomaly Detection Supervised Learning?
Example Applications
Non-Gaussian Features?
Multivariate Gaussian (Normal)
Distribution
Multivariate Gaussian (Normal)
Distribution
Multivariate Gaussian Examples
Multivariate Gaussian Examples
Multivariate Gaussian Examples
Multivariate Gaussian Examples
Multivariate Gaussian Examples
Multivariate Gaussian Examples
Multivariate Gaussian (Normal)
Distribution
Anomaly Detection with the
multivariate Gaussian
Relationship to Original Model
Original vs Multivariate Gaussian
Download