Co-location pattern mining (for CSCI 5715) Charandeep Parisineti

advertisement
Co-location pattern mining
(for CSCI 5715)
Charandeep Parisineti,
Bhavtosh Rath
Chapter 7: Spatial Data Mining
[1]Yan Huang, Shashi Shekhar, Hui Xiong. Discovering Co-location patterns from Spatial
Datasets: A General Approach. IEEE Transactions on Knowledge and Data Engineering, 2004.
[2]Sajib Barua, Jörg Sander. Mining Statistically Significant Co-location and Segregation
Patterns. IEEE Transactions on Knowledge and Data Engineering, 26(5), 2014.
[3] Shashi Shekar, et al., Trends in Spatial Data Mining
Where do we find Co –location?

In ecology
Symbiotic species : Ox-pecker and giraffe

Public Safety
To determine possible causes of disease outbreak
(London cholera)

In cities
{‘auto dealers’, ‘auto repair shops’}
{‘departmental stores’, ’gift stores’}
Other domains: Earth science, public
safety, transportation, tourism etc.
What are association rules?
Association rules vs Co-location rules
Class Exercise:
Co-location rule mining approaches
b) Reference feature centric approach c) Data partition approach d) Event centric model
• Co-location (A,B) will not be found in (b) since it does not involve the reference feature
• Data partition approach can have many distinct ways of partitioning the data, each yielding
a distinct set of transactions and hence support.
 In (c) support of (A,B) is different for different partitions
 Event centric model finds the subsets of spatial features likely to occur in a neighborhood
around instances of given subsets of event types
Event Centric model
Drawbacks of event centric model
• In (a) there are only a few instances of A but B is abundant.
 As many Bs are without As B’s participation ratio will be small which results the
participation index of {A,B} to be low
• In (b) A and B are abundant in a spatial area but randomly distributed
 We might see enough instances of {A,B} even without true spatial dependency
• In (c) both features A and B are spatially auto-correlated. A cluster of A and a cluster of B
happen to overlap by chance.
 Enough instances of {A,B} will falsely report a spatial co-location
Statistical approaches

Similar problems as above in association rule mining are handled using Interest
measures such as phi coefficient for a 2x2 contingency table
Coffee
~Coffee
Tea
15
5
~Tea
75
5
~Coffee means the transactions which
don’t contain coffee

The absence of an item doesn’t make sense in spatial data as boolean spatial
features are embedded in continuous space

Hence phi coefficient, odds ratio etc, used in traditional data mining don’t work for
Spatial data

The basic idea in Spatial statistical approach is comparing the observed PI value to a
PI value under no spatial relationship instead of a global threshold

The process is repeated over several spaces with CSR using Monte Carlo simulation

Computing co-location patterns using cross k function for all possible co-location
patterns can be computationally expensive
Q?
Download