Cluster/hotspot detection

advertisement
Hotspot/cluster detection methods(1)
• Spatial Scan Statistics: Hypothesis testing
– Input: data
– Using continuous Poisson model
• Null hypothesis H0: points are randomly distributed (CSR)
• Alternative hypothesis H1: points are clustered in zone Z
• Enumerate all the zones and find the one that maximizes
likelihood ratio
– L = p(H1|data)/p(H0|data)
• Test statistical significance: Monte Carlo simulation
– Generate the data for 1000 times and see how many times can we
get a higher L
Hotspot/cluster detection methods(2)
• DBSCAN: Density-based spatial clustering of
application with noise
– Input: data, radius, min_neighbors
– For each data point P:
• If neighbors<min_neighbors then mark P as noise
• eles
–
–
–
–
Add P to a new cluster C
Expand P by looking at points P’ in the current neighborhood of C
If P’ is not in any cluster then add P’ to C
If neighbors of P’> min_neighbors then add P’s neighbor to C’s
neighborhood
SatScan Result
1 clusters found
But insignificant
DBSCAN results: CSR
DBSCAN output on CSR dataset: min neighbors=3, radius=4
100
2 clusters found
90
80
70
Y
60
50
40
30
20
10
0
0
20
40
60
X
80
100
DBSCAN results: CSR
DBSCAN output on CSR dataset: min neighbors=3, radius=7
100
6 clusters found
90
80
70
Y
60
50
40
30
20
10
0
0
20
40
60
X
80
100
DBSCAN results: CSR
DBSCAN output on CSR dataset: min neighbors=3, radius=10
100
7 clusters found
90
80
70
Y
60
50
40
30
20
10
0
0
20
40
60
X
80
100
Results from SatScan and DBSCAN
A Clustered Dataset
100
90
80
70
Y
60
50
40
30
20
10
0
0
20
40
60
X
80
100
SatScan results
DBSCAN result
DBSCAN output on clustered dataset: min neighbors=3, radius=1
5 clusters found
100
90
80
70
Y
60
50
40
30
20
10
0
0
20
40
60
X
80
100
DBSCAN result
DBSCAN output on clustered dataset: min neighbors=3, radius=4
100
90
80
70
Y
60
50
40
30
20
10
0
0
20
40
60
X
80
100
3 clusters found
DBSCAN result
DBSCAN output on clustered dataset: min neighbors=3, radius=7
100
90
80
70
Y
60
50
40
30
20
10
0
0
20
40
60
X
80
100
6 clusters found
DBSCAN result
DBSCAN output on clustered dataset: min neighbors=3, radius=10
6 clusters found
100
90
80
70
Y
60
50
40
30
20
10
0
0
20
40
60
X
80
100
Download