Name: Sujing Wang
Advisor: Dr. Christoph F. Eick
Data Mining & Machine Learning Group
1.
2.
3.
4.
5.
Data Mining & Machine Learning Sujing Wang 2
the process of analyzing and discovering interesting and useful patterns, associations, or relationships from large spatial datasets.
Spatial object structures:
(<spatial attributes>;<non-spatial attributes>)
Example:
Data Mining & Machine Learning Sujing Wang 3
Spatial objects: point, trajectory(line) polygon(region)
Data Mining & Machine Learning Sujing Wang 4
Challenges:
Complexity of spatial data types
Spatial relationships
Spatial autocorrelation
Motivation:
Polygons, specially overlapping polygons are very important for mining spatial datasets.
Traditional Clustering algorithms do not work for spatial polygons.
Research goal:
Develop new distance functions and new spatial clustering algorithms for polygons clustering.
Implement novel post-clustering techniques with plug-in reward functions to capture domain experts notation of interestingness.
Data Mining & Machine Learning Sujing Wang 5
Geospatial Datasets Domain Experts
DCONTOUR
Spatial Clusters
Poly_SNN
Meta Clusters
Post-processing
Notion of Interestingness
Reward Functions
Summaries and Interesting
Patterns
1. Domain Driven Final Clustering Generation Methodology
Inputs:
A meta-clustering M={X
1
, …, X k
} —at most one object will be selected from each meta-cluster X i
(i=1,...k).
The user provides the individual cluster reward function
Reward
U whose values are in [0,
).
A reward threshold
U
—clusters with low rewards are not included in the final clusterings.
A cluster distance threshold
d, which expresses to what extent the user would like to tolerate cluster overlap.
A cluster distance function dist.
Find Z
X
1
…
X k that maximizes: q ( Z )
subject to:
x
Z
x’
Z (x
x’
Dist(x,x’)>
x
Z (Reward
U
(x)>
U
)
x
Z
x’
Z ((x
X
i
x’
X k
d
)
x
x’ )
i
k)
c
Z reward
U
( c )
Data Mining & Machine Learning Sujing Wang 7
2. Finding interesting clusters with respect to continuous non spatial variable V:
Let X i
2 A be a cluster in the A-space
be the variance of v with respect in dataset D
(X i
) be the variance of variable v in a cluster X i mv(X i
) the mean value of variable v in a cluster X i t
1
0 a mean value reward threshold and t
2
1 be a variance reward threshold
Interestingness function
for each cluster:
( X i
) = max (0, |mv(X i
)| - t
1
) × max(0,
- (
(X i
) × t
2
))
Data Mining & Machine Learning Sujing Wang 8
1. Meta-clusters generated from multiple spatial datasets:
30.4
30.2
30.0
29.8
29.6
29.4
29.2
29.0
-95.8 -95.6 -95.4 -95.2 -95.0 -94.8
Longitude
Data Mining & Machine Learning Sujing Wang 9
2. Final Clusters with area of polygons as plug-in reward function
30.4
30.2
80
30.0
Polygon ID
Temperature ( o F)
Solar Radiation (Langleys per minute)
Wind Speed (Miles per hour)
Time of Day
29.8
150
29.6
21
29.4
125
29.2
13
29.0
-95.8 -95.6 -95.4 -95.2 -95.0 -94.8
Longitude
13
79.0
N/A
4.50
6 p.m.
21
86.35
1.33
6.10
1 p.m.
80
89.10
1.17
6.20
2 p.m.
125
84.10
0.13
4.90
2 p.m.
150
88.87
1.10
5.39
12 p.m.
Data Mining & Machine Learning Sujing Wang 10
3. Finding interesting meta-clusters with respect to solar radiation:
30.2
30.0
5
29.8
29.6
15
21
29.4
29.2
29.0
-95.8 -95.6 -95.4 -95.2 -95.0 -94.8
Longitude
Longitude
Cluster ID
5
15
21
Mean
-0.9144
1.1218
1.0184
Variance
0.1981
0.1334
0.0350
Number of Polygon
5
5
3
Data Mining & Machine Learning Sujing Wang 11
Conclusions:
Our framework can effectively cluster spatial overlapping polygons similar in size, shape and locations.
Our post-clustering techniques with different plug-in reward functions can guide the knowledge extraction of interesting patterns and generate summaries from large spatial datasets.
Future Works:
Develop novel spatial-temporal clustering techniques and embed them to our framework.
Investigating novel change analysis techniques to identify spatial and temporal changes of spatial data.
Evaluate our framework in challenging case studies.
Data Mining & Machine Learning Sujing Wang 12
S. Wang, C.S. Chen, V. Rinsourongkawong, F. Akdag, C.F. Eick, “Polygon-based
Methodology for Mining Related Spatial Datasets”, ACM SIGSPATIAL GIS
Workshop on Data Mining for Geoinformatics (DMG) in conjunction with
ACM SIGSPATIAL GIS 2010, San Jose, CA, Nov. 2010.
NSF travel Award for ACM GIS 2010
S. Wang, C. Eick, Q. Xu, “A Space-Time Analysis Framework for Mining
Geospatial Datasets”, CyberGIS’12 the First International Conference on Space,
Time, and CyberGIS, University of Illinois at Urbana-Champaign, Champaign,
IL Aug 6-9, 2012.
NSF travel Award for CyberGIS 2012
C. Eick, G. Forestier, S. Wang, Z. Cao, S. Goyal, “A Methodology for Finding
Uniform Regions in Spatial Data”, CyberGIS’12 the First International
Conference on Space, Time, and CyberGIS, University of Illinois at Urbana-
Champaign, Champaign, IL Aug 6-9, 2012.
S. Wang, C.F. Eick, “A Polygon-based Clustering and Analysis Framework for
Mining Spatial Datasets”, Geoinformatica, (Under Review).
Data Mining & Machine Learning Sujing Wang 13
Data Mining & Machine Learning Sujing Wang 14