DBSCAN Core Point w.r.t. Radius Eps and MinPts Eps = d MinPts = 3 d d p |NEps(p)|=3 q |NEps(q)|=2 DBSCAN. Eps = d. MinPts = 4 d d d a b 5 |NEps(b)|=__ 3 |NEps(a)|=__ Which are Core-Points? c 4 |NEps(c)|=__ DBSCAN Directly Density Reachable (DDR) Eps = d, MinPts = 3 p1 is a core point d d p1 p2 p2 is a core point p2 belongs to Neps(p1), So, p2 is DDR from p1 |NEps(p1)|=_ |NEps(p2)|=__ DBSCAN p1 and p2 are DDR p2 and p3 are DDR So, p1 and p3 are DR Density-Reachable (DR) Eps = d, MinPts = 3 d d p3 d p1 p2 |NEps(p3)|=4 |NEps(p1)|=_ |NEps(p2)|=__ DBSCAN Density-Connected (DC) Eps = d, MinPts = 4 p1 o p1 and o are DR o and p2 are DR So, p1 and p2 are DC p2 DBSCAN Algorithm 1. 2. 3. 4. 5. 6. Randomly select a point o If o is a non-Core-point, label as a noise. If o is a Core-point, create a new cluster Ci. Retrieve all points that are DR from o Add the points to Ci. Repeat until no more core-points are found. p1 o p2 The STING Clustering Method 7 Each cell at a high level is partitioned into a number of smaller cells in the next lower level Statistical info of each cell is calculated and stored beforehand and is used to answer queries Parameters of higher level cells can be easily calculated from parameters of lower level cell count, mean, s, min, max type of distribution—normal, uniform, etc. Use a top-down approach to answer spatial data queries Start from a pre-selected layer—typically with a small number of cells For each cell in the current level compute the confidence interval Remove the irrelevant cells from further consideration When finish examining the current layer, proceed to the next lower level Repeat this process until the bottom layer is reached Comments on STING 8 Advantages: Query-independent Easy to parallelize, incremental update O(K), where K is the number of grid cells at the lowest level Disadvantages: All the cluster boundaries are either horizontal or vertical, and no diagonal boundary is detected WaveCluster: Clustering by Wavelet Analysis (1998) 9 Sheikholeslami, Chatterjee, and Zhang (VLDB’98) A multi-resolution clustering approach which applies wavelet transform to the feature space How to apply wavelet transform to find clusters Summarizes the data by imposing a multidimensional grid structure onto data space These multidimensional spatial data objects are represented in a ndimensional feature space Apply wavelet transform on feature space to find the dense regions in the feature space Apply wavelet transform multiple times which result in clusters at different scales from fine to coarse Quantization & Transformation 10 First, quantize data into m-D grid structure, then wavelet transform a) scale 1: high resolution b) scale 2: medium resolution c) scale 3: low resolution The EM (Expectation Maximization) Algorithm 11 Initially, randomly assign k cluster centers Iteratively refine the clusters based on two steps Expectation step: assign each data point Xi to cluster Ci with the following probability Maximization step: Estimation of model parameters Self-Organizing Feature Map (SOM) 12 Initialize Wij d i ( wij x j ) 2 j W12 W11 x1 X = [x1, x2] x2 Input Vector X = [ 1, 2] W1 = [3, 4] d1 = (1-3)2 + (2-4)2 = 8 wij wij ( x j wij )