Co-location pattern mining (for CSCI 5715) Charandeep Parisineti

Co-location pattern mining (for CSCI 5715) Charandeep Parisineti, Bhavtosh Rath Chapter 7: Spatial Data Mining [1]Yan Huang, Shashi Shekhar, Hui Xiong. Discovering Co-location patterns from Spatial Datasets: A General Approach. IEEE Transactions on Knowledge and Data Engineering, 2004. [2]Sajib Barua, Jörg Sander. Mining Statistically Significant Co-location and Segregation Patterns. IEEE Transactions on Knowledge and Data Engineering, 26(5), 2014. [3] Shashi Shekar, et al., Trends in Spatial Data Mining Where do we find Co –location?  In ecology Symbiotic species : Ox-pecker and giraffe  Public Safety To determine possible causes of disease outbreak (London cholera)  In cities {‘auto dealers’, ‘auto repair shops’} {‘departmental stores’, ’gift stores’} Other domains: Earth science, public safety, transportation, tourism etc. What are association rules? Association rules vs Co-location rules Class Exercise: Co-location rule mining approaches b) Reference feature centric approach c) Data partition approach d) Event centric model • Co-location (A,B) will not be found in (b) since it does not involve the reference feature • Data partition approach can have many distinct ways of partitioning the data, each yielding a distinct set of transactions and hence support.  In (c) support of (A,B) is different for different partitions  Event centric model finds the subsets of spatial features likely to occur in a neighborhood around instances of given subsets of event types Event Centric model Drawbacks of event centric model • In (a) there are only a few instances of A but B is abundant.  As many Bs are without As B’s participation ratio will be small which results the participation index of {A,B} to be low • In (b) A and B are abundant in a spatial area but randomly distributed  We might see enough instances of {A,B} even without true spatial dependency • In (c) both features A and B are spatially auto-correlated. A cluster of A and a cluster of B happen to overlap by chance.  Enough instances of {A,B} will falsely report a spatial co-location Statistical approaches  Similar problems as above in association rule mining are handled using Interest measures such as phi coefficient for a 2x2 contingency table Coffee ~Coffee Tea 15 5 ~Tea 75 5 ~Coffee means the transactions which don’t contain coffee  The absence of an item doesn’t make sense in spatial data as boolean spatial features are embedded in continuous space  Hence phi coefficient, odds ratio etc, used in traditional data mining don’t work for Spatial data  The basic idea in Spatial statistical approach is comparing the observed PI value to a PI value under no spatial relationship instead of a global threshold  The process is repeated over several spaces with CSR using Monte Carlo simulation  Computing co-location patterns using cross k function for all possible co-location patterns can be computationally expensive Q?

Co-location pattern mining (for CSCI 5715) Charandeep Parisineti

Related documents

Products

Support

Co-location pattern mining (for CSCI 5715) Charandeep Parisineti

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib