Spatio – Temporal Cluster Detection Using AMOEBA Jimmy Kroon Pennsylvania State University Advisor: Dr. Frank Hardisty This is a parody – Original Art: http://projectswordtoys.blogspot.com/2009/05/project-sword-annual-1967.html Outline • Introduction – Clustering and Project Direction • The Spatial Scan Statistic and SatScan • AMOEBA • Proposed Spatio-Temporal AMOEBA Method • Software, Data, and Progress Cluster Detection Cluster: “a geographically and/or temporally bounded group of occurrences of sufficient size and concentration to be unlikely to have occurred by chance” (Knox, 1989) Two Typical Uses Disease Surveillance Epidemiological Studies Week of 2/7/2010 Data: Google Flu Trends – Analysis: GeoDa Brain Cancer in NM Kulldorff et al. 1998 Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Time in Spatial Analysis Time Matters: • Many geographic phenomena are dynamic. • Spatial patterns we see probably change over time • The American Association of Geographers describes temporal geography as a ‘frontier’ of GIScience. Spatio-temporal clusters may exhibit behaviors not seen in purely spatial clusters. • Growth • Movement • Splits / Joins Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Research Problem Primary: No method exists for the determining the true extent of irregularly shaped clusters in spatio-temporal datasets. Secondary: Spatial AMOEBA has not been implemented in R Project Goals • A demonstration of spatio-temporal cluster detection based on the AMOEBA procedure. • R scripts for running spatial and spatio-temporal AMOEBA will be contributed to the R community. Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress The Spatial Scan Statistic • Scan data with a moving ‘window’, calculating local autocorrelation for spatial units that fall within the window. • Select the window(s) with the highest calculated autocorrelation value as possible cluster(s). • The spatial scan statistic is by far the most popular cluster detection technique, largely due to the availability of SaTScan software by Martin Kulldorff. Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress The Spatial Scan Statistic Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Drawbacks of the Spatial Scan Statistic Clusters that are not similar in shape to the scanning window can produce errors. • False inclusions • False exclusions • Identify thin clusters as multiple small clusters • Cannot detect holes in clusters Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress The Elliptical Spatial Scan Statistic • Must choose shapes a priori to avoid pre-selection bias See Kulldorff et al. 2006 Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress AMOEBA • • • Ecotope-Based – Regions of contiguous spatial units that are related in terms of z-value Multidirectional – Search in all directions. Optimum – Procedure takes place at the finest spatial scale possible and is capable of revealing all spatial association present in the dataset (Aldstadt and Getis, 2006). AMOEBA Clusters Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress AMOEBA Defining an Ecotope • Add a seed location (one polygon) to the ecotope • Calculate Gi* (Getis-Ord local autocorrelation statistic) • • Search in all directions for contiguous polygons Those that increase Gi* are added to the growing ecotope for that seed location • Keep searching for more neighbors, growing the ecotope until Gi* no longer increases Repeat – creating ecotopes for each polygon in the dataset Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress The R Neighbor Object Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Finding an Ecotope with AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress AMOEBA From Ecotopes to Clusters • • • • • Rank ecotopes by final Gi* Select that with the highest Gi* as a cluster Eliminate intersecting ecotopes Select the ecotope with the next highest Gi* as a second cluster Repeat • Probability of clusters can be tested using Monte Carlo simulation Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Incorporating Time into AMOEBA Remember - Spatio-temporal clusters may exhibit behaviors not seen in purely spatial clusters. • Growth • Movement • Splits / Joins Visualize temporal data as layers of data with time extending vertically through the layers. • Each spatio-temporal unit has spatial neighbors and temporal neighbors Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress The Spatio-Temporal Scan Statistic See Kulldorff et al. 1998 Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Software Environment and Test Data The R Project • Free, open source statistical software • Extendable with user contributed packages • www.r-project.org Google Flu Trends • Estimates flu incidence levels using aggregated data about user searches for certain keywords • 90% accurate compared to CDC data • State-level data - updated daily • www.google.org/googleflu SEER (Surveillance Epidemiology and End Results) • National Cancer Institute incidence, survival, and mortality data Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress AMOEBA ArcToolbox for ArcGIS Python Scripts by Jared Aldstadt and Yeming Fan (Aldstadt, 2010) Google Flu Trends – Feb 1, 2009 Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Spatio-Temporal AMOEBA in Python: 2009 Flu Epidemic Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress Hmmm… Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress R Programming Progress Compete … Geoprocessing tasks • Create spatio-temporal neighbor list • Delineate ecotopes • Sort and eliminate intersecting ecotopes • Returns primary cluster PolyID’s that match the Python results To Do … • Monte Carlo simulation • Process results and add to the output shapefile • Test, test, test Clusters : SaTScan : AMOEBA : ST AMOEBA : Progress References Aldstadt, Jared, and Arthur Getis. 2006. Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters. Geographical Analysis 38: 327-343. Aldstadt, Jared. 2010. Spatial Analysis Tools (ArcGIS). Spatial Analysis Tools. http://www.acsu.buffalo.edu/~geojared/tools.htm. Bellec, S, D Hémon, J Rudant, A Goubin, and J Clavel. 2006. Spatial and space–time clustering of childhood acute leukaemia in France from 1990 to 2000: a nationwide study. British Journal of Cancer Duczmal, Luiz, Martin Kulldorff, and Lan Huang. 2006. Evaluation of Spatial Scan Statistics for Irregularly Shaped Clusters. Journal of Computational and Graphical Statistics 15(2): 428-442. Knox, G. 1989. Detection of Clusters. In Methodology of Enquiries into Disease Clustering, ed. P Elliott, 17-22. London: Small Area Health Statistics Unit. Kulldorff, Martin, Athas, William, Feuer, Eric, Miller, Barry, and Key, Charles. 1998. Evaluating cluster alarms: A space-time scan statistic and brain cancer in Los Alamos, New Mexico. American Journal of Public Health 88(9): 1377-1380. Kulldorff, Martin, Lan Huang, Linda Pickle, and Luiz Duczmal. 2006. An elliptic spatial scan statistic. Statistics in Medicine 25(22): 3929. Kulldorff, Martin. 1999. Geographic Information Systems (GIS) community health: Some statistical issues. Journal of Public Health Management and Practice 5(2): 100-106. Original artwork for parody title slide: http://projectswordtoys.blogspot.com/2009/05/project-sword-annual-1967.html