Spatial Clustering

Name: Sujing Wang

Advisor: Dr. Christoph F. Eick

Data Mining & Machine Learning Group

Outline

1.

Introduction

2.

Framework Architecture

3.

Methodology

4.

Case Study

5.

Conclusion and Future Work

Data Mining & Machine Learning Sujing Wang 2

Introduction



Spatial Data Mining (SDM):

 the process of analyzing and discovering interesting and useful patterns, associations, or relationships from large spatial datasets.

 Spatial object structures:

(<spatial attributes>;<non-spatial attributes>)

 Example:


Introduction

 Spatial objects: point, trajectory(line) polygon(region)


Introduction

 Challenges:

 Complexity of spatial data types

 Spatial relationships

 Spatial autocorrelation

 Motivation:

 Polygons, specially overlapping polygons are very important for mining spatial datasets.

 Traditional Clustering algorithms do not work for spatial polygons.

 Research goal:





Develop new distance functions and new spatial clustering algorithms for polygons clustering.

Implement novel post-clustering techniques with plug-in reward functions to capture domain experts notation of interestingness.


A Polygon-based Clustering and Analysis

Framework for Mining Spatial Datasets

Geospatial Datasets Domain Experts

DCONTOUR

Spatial Clusters

Poly_SNN

Meta Clusters

Post-processing

Notion of Interestingness

Reward Functions

Summaries and Interesting

Patterns

Methodology

1. Domain Driven Final Clustering Generation Methodology

Inputs:









A meta-clustering M={X

1

, …, X k

} —at most one object will be selected from each meta-cluster X i

(i=1,...k).

The user provides the individual cluster reward function

Reward

U whose values are in [0,



).

A reward threshold



U

—clusters with low rewards are not included in the final clusterings.

A cluster distance threshold

 d, which expresses to what extent the user would like to tolerate cluster overlap.

 A cluster distance function dist.

Find Z



X

1



…



X k that maximizes: q ( Z )

 subject to:

 x



Z

 x’



Z (x

 x’



Dist(x,x’)>

 x



Z (Reward

U

(x)>

U

)

 x



Z

 x’



Z ((x



X

 i

 x’



X k

 d

)

 x

 x’ )

 i



k)

 c



Z reward

U

( c )


Methodology

2. Finding interesting clusters with respect to continuous non spatial variable V:

Let X i



2 A be a cluster in the A-space

 be the variance of v with respect in dataset D



(X i

) be the variance of variable v in a cluster X i mv(X i

) the mean value of variable v in a cluster X i t

1



0 a mean value reward threshold and t

2



1 be a variance reward threshold

Interestingness function

 for each cluster:



( X i

) = max (0, |mv(X i

)| - t

1

) × max(0,



- (



(X i

) × t

2

))


Case Study

1. Meta-clusters generated from multiple spatial datasets:

30.4

30.2

30.0

29.8

29.6

29.4

29.2

29.0

-95.8 -95.6 -95.4 -95.2 -95.0 -94.8

Longitude


Case Study

2. Final Clusters with area of polygons as plug-in reward function

30.4

30.2

80

30.0

Polygon ID

Temperature ( o F)

Solar Radiation (Langleys per minute)

Wind Speed (Miles per hour)

Time of Day

29.8

150

29.6

21

29.4

125

29.2

13

29.0

-95.8 -95.6 -95.4 -95.2 -95.0 -94.8

Longitude

13

79.0

N/A

4.50

6 p.m.

21

86.35

1.33

6.10

1 p.m.

80

89.10

1.17

6.20

2 p.m.

125

84.10

0.13

4.90

2 p.m.

150

88.87

1.10

5.39

12 p.m.


Case Study

3. Finding interesting meta-clusters with respect to solar radiation:

30.2

30.0

5

29.8

29.6

15

21

29.4

29.2

29.0

-95.8 -95.6 -95.4 -95.2 -95.0 -94.8

Longitude

Longitude

Cluster ID

5

15

21

Mean

-0.9144

1.1218

1.0184

Variance

0.1981

0.1334

0.0350

Number of Polygon

5

5

3


Conclusion & future work

 Conclusions:

 Our framework can effectively cluster spatial overlapping polygons similar in size, shape and locations.

 Our post-clustering techniques with different plug-in reward functions can guide the knowledge extraction of interesting patterns and generate summaries from large spatial datasets.

 Future Works:

 Develop novel spatial-temporal clustering techniques and embed them to our framework.

 Investigating novel change analysis techniques to identify spatial and temporal changes of spatial data.

 Evaluate our framework in challenging case studies.


Publication:

 S. Wang, C.S. Chen, V. Rinsourongkawong, F. Akdag, C.F. Eick, “Polygon-based

Methodology for Mining Related Spatial Datasets”, ACM SIGSPATIAL GIS

Workshop on Data Mining for Geoinformatics (DMG) in conjunction with

ACM SIGSPATIAL GIS 2010, San Jose, CA, Nov. 2010.

NSF travel Award for ACM GIS 2010

 S. Wang, C. Eick, Q. Xu, “A Space-Time Analysis Framework for Mining

Geospatial Datasets”, CyberGIS’12 the First International Conference on Space,

Time, and CyberGIS, University of Illinois at Urbana-Champaign, Champaign,

IL Aug 6-9, 2012.

NSF travel Award for CyberGIS 2012

 C. Eick, G. Forestier, S. Wang, Z. Cao, S. Goyal, “A Methodology for Finding

Uniform Regions in Spatial Data”, CyberGIS’12 the First International

Conference on Space, Time, and CyberGIS, University of Illinois at Urbana-

Champaign, Champaign, IL Aug 6-9, 2012.

 S. Wang, C.F. Eick, “A Polygon-based Clustering and Analysis Framework for

Mining Spatial Datasets”, Geoinformatica, (Under Review).


Thank you!


Spatial Clustering

Outline

Introduction

Framework Architecture

Methodology

Case Study

Conclusion and Future Work

Introduction

Spatial Data Mining (SDM):

Introduction

Introduction

A Polygon-based Clustering and Analysis

Framework for Mining Spatial Datasets

Methodology

Methodology

Case Study

Case Study

Case Study

Conclusion & future work

Publication:

Thank you!

Related documents

Products

Support

Spatial Clustering

Outline

Introduction

Framework Architecture

Methodology

Case Study

Conclusion and Future Work

Introduction

Spatial Data Mining (SDM):

Introduction

Introduction

A Polygon-based Clustering and Analysis

Framework for Mining Spatial Datasets

Methodology

Methodology

Case Study

Case Study

Case Study

Conclusion & future work

Publication:

Thank you!

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib