Statistical approaches
for detecting
clusters of disease.
Feb. 26, 2013
Thomas Talbot
New York State Department of Health
Bureau of Environmental and Occupational Epidemiology
Geographic Research and Analysis Section
Cluster
• A number of similar things grouped closely
together
Webster’s Dictionary
• Researchers are often interested in unexplained
concentrations of health events in space and/or
time.
Adverse health events can cluster by:
• Occupation
• Sex, Age
• Socioeconomic class
• Behavior (smoking)
• Race
• Time
• Space
Spatial Autocorrelation
“Everything is related to everything else, but near things are more
related than distant things.”
- Tobler’s first law of
geography
Positive
autocorrelation
Negative autocorrelation
Moran’s I
• A test for spatial autocorrelation in
disease rates.
• Nearby areas tend to have similar rates
of disease. Moran I is greater than 1,
positive spatial autocorrelation.
• When nearby areas are dissimilar
Moran I is less than 1, negative spatial
autocorrelation.
GeoDA Overview
• GeoDA is a tool for exploratory analysis of geographic data.
• Primarily analyzes polygon data, but can also do some things
with point data
• Some useful functions.
– creates spatial weights matrices
– histograms, scatter plots
– calculates and maps local Indices of spatial association (local
Moran’s I).
• Multiple regression full diagnostics for spatial effects
• ArcGIS not required, but requires a shapefile for data input.
• Download site: http://geodacenter.asu.edu/projects/opengeoda
Detecting Clusters
• Consider scale
• Consider zone
• Control for multiple testing
Talbot
Cluster Questions
• Does a disease cluster in space?
• Does a disease cluster in both time and
space?
• Where is the most likely cluster?
• Where is the most likely cluster in both
time and space?
More Cluster Questions
• At what geographic or population scale
do clusters appear?
• Are cases of disease clustered in areas
of high exposure?
Nearest Neighbor Analysis
Cuzick & Edwards Method
• Count the the number of cases whose
nearest neighbors are cases and not
controls.
• When cases are clustered the nearest
neighbor to a case will tend to be
another case, and the test statistic will
be large.
Nearest Neighbor Analyses
Advantages
• Accounts for the geographic variation in
population density
• Accounts for confounders through
judicious selection of controls
• Can detect clustering with many small
clusters
Disadvantages
• Must have spatial locations of cases &
controls
• Doesn’t show location of the clusters
Spatial Scan Statistic
Martin Kulldorff
•Determines the location with elevated
rate that is statistically significant.
•Adjust for multiple testing of the many
possible locations and area sizes of
clusters.
•Uses Monte Carlo testing techniques
The Space-Time
Scan Statistic
• Cylindrical window with a circular
geographic base and a height
corresponding to time.
• Cylindrical window is moved in space
and time.
• P value for each cylinder calculated.
Knox Method
test for space-time interaction
• When space-time interaction is present cases
near in space will be near in time, the test
statistic will be large.
• Test statistic: The number of pairs of cases
that are near in both time and space.
Focal tests for clustering
• Cross sectional or cohort approach: Is there
a higher rate of disease in populations living
in contaminated areas compared to
populations in uncontaminated areas?
(Relative risk)
• Case/control approach: Are there more cases
than controls living in a contaminated area?
(Odds ratio)
Focal Case-Control Design
500 m.
250 m.
Case
Control
Regression Analysis
• Control for know risk factors before analyzing
for spatial clustering
• Analyze for unexplained clusters.
• Follow-up in areas with large regression
residuals with traditional case-control or
cohort studies
• Obtain additional risk factor data to account
for the large residuals.
At what geographic or
population scale do clusters
appear?
Multiresolution mapping.
A cluster of cases in a
neighborhood provides a different
epidemiological meaning then a
cluster of cases across several
adjacent counties.
Results can change dramatically
with the scale of analysis.
1995-1999
Interactive Selections by rate, population and p value
Apparent Spatial Clustering
of Health Events
is Often due to Data Quality Issues
Apparent cluster of low birth weights.
NYSDOH Vital Statistics Data
Remove out-of-state births & cluster disappears.
Rutland Hospital data coded in wrong weight units.
Potential Birth Defect Clusters identified by
Spatial Scan Statistic
Hospital reporting rates presented on a map. Hospitals
with poor reporting represented by blue & yellow circles
Remove NYC from analysis and clusters disappear.
Conclusion: Reporting problems in NYC lead to the clusters
SaTScan
We will be using a beta version of SatScan in the next Lab.
Download SaTScan from Talbot’s website to your Flash Drive.
Launch
Make sure you choose an installation path on your flash drive so you
can run it in class from your flash drive.
Homework
•
Talbot TO, Kulldorff M, Forand SP, and Haley VB. Evaluation of Spatial
Filters to Create Smoothed Maps of Health Data. Statistics in
Medicine. 2000, 19:2451-2467
•
Forand SP, Talbot TO, Druschel C, Cross PK. Data Quality and the
Spatial Analysis of Disease Rates: Congenital Malformations in New
York. 2002. Health and Place. 2002, 8:191-199
•
Kuldorff M, National Cancer Institute. SatScan User Guide
www.satscan.org
•
Cromley and McLafferty. GIS and Public Health, 2012. Chapter 5
The End