Hotspot/cluster detection methods(1) • Spatial Scan Statistics: Hypothesis testing – Input: data – Using continuous Poisson model • Null hypothesis H0: points are randomly distributed (CSR) • Alternative hypothesis H1: points are clustered in zone Z • Enumerate all the zones and find the one that maximizes likelihood ratio – L = p(H1|data)/p(H0|data) • Test statistical significance: Monte Carlo simulation – Generate the data for 1000 times and see how many times can we get a higher L Hotspot/cluster detection methods(2) • DBSCAN: Density-based spatial clustering of application with noise – Input: data, radius, min_neighbors – For each data point P: • If neighbors<min_neighbors then mark P as noise • eles – – – – Add P to a new cluster C Expand P by looking at points P’ in the current neighborhood of C If P’ is not in any cluster then add P’ to C If neighbors of P’> min_neighbors then add P’s neighbor to C’s neighborhood SatScan Result 1 clusters found But insignificant DBSCAN results: CSR DBSCAN output on CSR dataset: min neighbors=3, radius=4 100 2 clusters found 90 80 70 Y 60 50 40 30 20 10 0 0 20 40 60 X 80 100 DBSCAN results: CSR DBSCAN output on CSR dataset: min neighbors=3, radius=7 100 6 clusters found 90 80 70 Y 60 50 40 30 20 10 0 0 20 40 60 X 80 100 DBSCAN results: CSR DBSCAN output on CSR dataset: min neighbors=3, radius=10 100 7 clusters found 90 80 70 Y 60 50 40 30 20 10 0 0 20 40 60 X 80 100 Results from SatScan and DBSCAN A Clustered Dataset 100 90 80 70 Y 60 50 40 30 20 10 0 0 20 40 60 X 80 100 SatScan results DBSCAN result DBSCAN output on clustered dataset: min neighbors=3, radius=1 5 clusters found 100 90 80 70 Y 60 50 40 30 20 10 0 0 20 40 60 X 80 100 DBSCAN result DBSCAN output on clustered dataset: min neighbors=3, radius=4 100 90 80 70 Y 60 50 40 30 20 10 0 0 20 40 60 X 80 100 3 clusters found DBSCAN result DBSCAN output on clustered dataset: min neighbors=3, radius=7 100 90 80 70 Y 60 50 40 30 20 10 0 0 20 40 60 X 80 100 6 clusters found DBSCAN result DBSCAN output on clustered dataset: min neighbors=3, radius=10 6 clusters found 100 90 80 70 Y 60 50 40 30 20 10 0 0 20 40 60 X 80 100