Proposal - Spatial Database Group

advertisement
Modeling of Spatio-Temporal Co-occurrence Patterns:
An Approach to Knowledge Discovery from Spatio-Temporal Databases
A. TECHNICAL RELEVANCE
A.1. Problem: Traditional data mining methods (e.g. association rules, classifications and
clustering) often assume that data observations have the same properties regardless of their
spatial location. This violates Tobler’s First Law of Geography: everything is related to
everything else, but nearby objects in space are more related than distant objects [2,3]. The
same principle also applies to objects near or far in time from one another (as shown in time
series modeling). As a result, the values of attributes of neighboring spatio-temporal (ST) data
objects tend to affect each other. Traditional knowledge discovery techniques (which assume
that data objects are independent and identically distributed with regard to space and time)
perform poorly on ST data. New models, objective functions, and patterns more suited for ST
databases and their unique properties are needed for knowledge discovery from these
databases.
The focus of this proposal is to create and explore new models of ST co-occurrence
patterns. Formally, given a collection of Boolean (binary) ST features and their instances, these
new models will identify the subsets of features frequently located together in space and time.
These models will form a new ST dataset analysis framework to discover and identify
interesting, useful, non-trivial patterns, and to facilitate their uses in descriptive as well as
predictive tasks.
As an example, one can examine ST co-occurrence patterns in the context of Improvised
Explosive Devices (IEDs) in Iraq. The proficiency of IED attacks in Iraq has increased in parallel
with the frequency of attacks directed toward interdicting convoys. Most Iraqi highways are
paved roads with 4 to 8 lanes. They have many bridges and overpasses as well as frequent
traffic circles, all of which are potential convoy chokepoints. Built-up or vegetated medians
divide most roadways, and many IEDs have been placed in these medians. Soda cans,
manholes, tunnels burrowed under roads, cement-encased bomb projectiles, and even dead
animal carcasses have been used to conceal IEDs. Some of the IEDs have been remotely
detonated using garage door openers, car alarms, key fobs, door bells, toy car remotes, twoway radios, cellular telephones and pagers. This implies that target area observation requires
line-of-sight attacker observation points, but the adaptation of radios and cell phones has given
the attackers a greater ability to watch convoys from a distance and not be compromised.
Some Boolean ST features related to IEDs include: time of IED emplacement, time of
day of detonation, day of week of detonation, day of month of detonation, explosion location,
distance to nearest highway, closest highway type, distance to nearest bridge, distance to
nearest traffic circle, distance to nearest buildings, building types of nearest buildings,
vegetation and vegetation height of surrounding area, soil type and slope of surrounding area,
and demographics of surrounding area. Other ST features specifically tied to moving objects
include military vehicles and types, convoy size, and whether a civilian vehicle is following or
leading a military vehicle. A simple ST co-occurrence pattern, for example, would be {IED, road,
time_of_day,civilian_vehicle_front} if frequency is the interest measure, a certain road had a
high incidence of IED attacks, and the attacks were most likely at a certain time of day or
when a civilian vehicle led the convoy. One may also categorize ST co-occurrence patterns by
frequency of occurrence. For example, emerging co-occurrence patterns may be categorized by
a significant increase in frequency, and vanishing patterns may be categorized by a significant
decrease in frequency. This example will be referenced in section C.2.4.
A.2. Technical Barriers: Adapting classical data mining methods to mine ST patterns is far
from trivial. For example, co-occurrence patterns often look similar to association patterns [1],
which identify subsets of item-types that co-occur frequently in a given collection of
transactions, each specifying a subset of item-types. However, instances of ST features are
embedded in continuous space and time, and share a variety of ST relationships. Conceptual
modeling of these patterns is challenging due to the absence of pre-defined transactions in
many datasets and the unique interest measures. Using association rule mining for ST data
requires a transaction database which is not natural [2,4], because the transaction boundaries
may split co-occurrence pattern instances across distinct transactions, for example, those
defined by cells of a rectangular grid.
No standard taxonomy for ST co-occurrence patterns exists in the current literature. The
challenge in creating this taxonomy is to find categories of interesting and nontrivial patterns.
The proposed research will explore and define new ST co-occurrence patterns using concepts
such as periodicity, moving objects, emergence and disappearance, time merging, and time
fragmenting.
Another problem and challenge is the incorporation of temporal data. Current cooccurrence patterns have been defined for spatial data features, but not for ST data types. Also,
these defined spatial data feature patterns assume stationarity in space and not for
nonstationary features such as are frequently found in ST data.
The most challenging technical barrier is to create and formalize new interest measures
to mine interesting and non-trivial ST co-occurrence patterns. Current scalar interest measures
are not sufficient to mine interesting and non-trivial ST co-occurrence patterns. To the best of
our knowledge, there are no such composite interest measures to handle these kinds of
patterns. These and other new methods will be created and explored to achieve this goal.
There are also no existing methods to mine ST co-occurrence patterns out of massive
ST datasets in a computationally efficient manner. In terms of data size, ST datasets will be
larger than the classical datasets because of addition of the time dimension. To handle these
massive datasets the challenge will be to explore efficient and scalable methods.
A.3. Innovation: This project will create new models for ST datasets with a new taxonomy of
ST co-occurrence patterns and interest measures. In contrast to current scalar interest
measures, new composite interest measures will be designed to characterize interesting and
useful ST co-occurrence patterns. These new interest measures such as temporal, spatial and
spatio-temporal probabilities of co-occurrence patterns will be used to characterize where and
when a co-occurrence is prevalent and will be compared with traditional ST statistical measures
(such as the K function) [20] to assess the quality of the patterns. Also, while current interest
measures are scalars, there is a need for new, non-scalar types of patterns and composite
interest measures that are functions of time, location, or other parameters. Figure 1(a) shows
how interest measures change for emerging and varnishing ST co-occurrence patterns. The xaxis shows time and y-axis shows the interest measure. Further, scalar interest measure for a
given feature subset may be periodic or have a trend (e.g. increase or decrease over time)
defining periodic, emerging and vanishing co-occurrence patterns. Composite interest
measures, e.g. spatial map of locations of co-occurrence instances, may show complete spatial
randomness [6], hot-spots or regularity. In addition, the hot-spots may be merging (or
fragmenting) over time. These give rise to merging or fragmenting co-occurrence patterns.
Feature sets may include static or moving objects leading to additional classes.
In recent years, many studies have focused on finding spatial co-location [4] patterns,
which are subsets of features whose instances are frequently located together in geographic
space. However, traditional co-location patterns are based purely on geographic proximity and
do not account for temporal relationships. For example, they cannot differentiate between
emerging ST co-occurrence patterns and vanishing ones. They also cannot identify ST cooccurrence of moving objects, such as a civilian vehicle traveling in front of an Army vehicle to
block traffic and stall the Army vehicle near an IED (Figure 1(b)).
(a) Composite interest measure
(b) Example of ST co-occurrence patterns of moving objects
Figure 1: Composite interest measures and ST co-occurrence patterns
Computationally, novel ST co-occurrence pattern mining methods will be developed to
reduce computational cost. We plan to identify the performance bottleneck tasks and explore
new methods to control their computational cost.
B. CONNECTIONS TO THE BROADER RESEARCH COMMUNITY
B.1. Significance/Potential Impact: The Army generates, accesses, and manages huge
amounts of ST data in a variety of databases. This data is stored in attributes, values and
tables, with important relationships and information obscured by great masses of irrelevant
information. This information is vital to increasing knowledge and understanding of terrain
effects on modern tactical warfare and strategic battlefield planning. The current state of data
mining makes little use of spatial information in the mining process, does not integrate spatial
and temporal information, and as a result is limited in the patterns, models and metrics which
are currently available to discover new knowledge.
Scientific examination of further use of spatial information, ST correlation, patterns and
relationships mined by new ST data mining techniques can greatly increase the discovery of this
new knowledge. For example, tools for summarizing the ST patterns of enemy troop movement
can be invaluable for military commanders. Also, ST techniques such as co-occurrence models
can be used to predict near-future locations of enemy units given current location based on a
sensor network, battlefields terrain, and historic war tactics. This research will explore novel ST
co-occurrence approaches to handle ST datasets. This work will also help improve capabilities of
information processing in many domains, including Earth Science, environmental management,
government services, and transportation.
B.2. Related Work: Most studies on spatial co-location mining have focused on discovering
co-location patterns at a particular time and ignored the temporal aspects of the spatial colocation patterns. These studies can be collected in two groups: spatial statistics and data
mining [4]. Spatial statistics-based approaches use measures of spatial correlation to
characterize the relationship between different types of spatial features [5,6]. Data mining
approaches are based on spatial proximity of the spatial features and can be classified as
clustering-based map overlay approaches [15] and association rule-based approaches. A
clustering-based map overlay approach treats every spatial attribute as a map layer and
considers spatial clusters (regions) of point-data in each layer as candidates for mining
associations. Association rule-based approaches include transaction-based approaches and
distance-based approaches. Transaction based approaches [1,7] focus on defining transactions
over space so that an Apriori-like approach can be used. Zhang et al [10] proposed a referencefeature centric model by using multi-way spatial join methods to mine spatial co-locations. The
association rules are derived using the Apriori approach. A distance-based approach was
proposed concurrently by Morimoto [8] and Shekhar and Huang [9, 4]. Morimoto defined
distance-based patterns called k-neighboring class sets. In his work, the number of instances
for each pattern is used as the interest measure, which does not possess an anti-monotone
property by nature. Anti-monotonic interest measures [1] can help reduce the computational
cost and search space. However, Morimoto used a non-overlapping-instance constraint to get
the anti-monotone property for this measure. In contrast, Shekhar and Huang developed an
event centric model, which does away with the nonoverlapping- instance constraint, as well as
a new interest measure called the participation index (which possesses the desirable antimonotone property).
Existing literature focuses on finding the spatial co-location patterns and cannot model
ST co-occurrence patterns, such as periodic co-occurrence patterns, co-occurrence patterns of
the moving objects, emerging or vanishing co-occurrence patterns and co-occurrence patterns
merging and fragmenting with time. In contrast, this proposal will identify these unique ST cooccurrence patterns.
B.3. Leveraging Other’s Work: This proposal is unique and is not leveraging efforts funded
elsewhere.
B.4. Collaborative Activities: The PIs (James Rogers and Dr. James Shine) at TEC are
collaborating with the Spatial Databases Research Group at The University of Minnesota.
C. RESEARCH METHODOLOGY
C.1. Strategy / Rationale: The strategy of this research can be depicted as shown in Figure
2, with the boxes showing the process phases, the arrows representing flow direction, and the
loop outlining the iterative part of the ST co-occurrence pattern mining process. The phases are
described as follows:
Create and Define ST
Pattern Taxonomy
Create and Define
Models of ST
Co-occurrence Patterns
Create and Define
Composite
Interest Measures
Create and Design
Computationally
Efficient Methods
Mine
Patterns
Validate
Patterns
Figure 2: Phases of the ST co-occurrence pattern mining
The phases of the ST co-occurrence pattern mining process:
1) Create and Define ST Pattern Taxonomy Based on Army Objectives: We will
create and define a taxonomy using Army objectives, requirements, assumptions and
constraints as listed in D.3. From this knowledge, we will outline data mining problems,
collect initial ST data, examine characteristics of this data, verify data quality, and
explore initial hypotheses.
2) Create and Define Conceptual Model of ST Co-occurrence Patterns: We will
create and define a conceptual model of ST co-occurrence patterns and prepare initial
ST raw data as an input for the mining method. The model helps define concepts and
expand the taxonomy for ST co-occurrence patterns.
3) Create and Define Composite Interest Measures: These measures will quantify
new composite definitions such as temporal, spatial and ST probabilities of ST data.
These measures shall capture the characteristics of ST datasets.
4) Create and Design Computationally Efficient Methods: Existing data mining
methods will be examined and new computationally efficient methods for mining ST cooccurrence patterns will be created and tested.
5) Mine Patterns: Run the new methods to mine the ST co-occurrence patterns, defined
by ST taxonomy, from the ST datasets.
6) Validate Patterns: We will check accuracy and completeness of the newly discovered
ST co-occurrence patterns, via user evaluation.
C.2. Methods / Techniques: The goal of a conceptual model of ST co-occurrence patterns is
to provide a framework to identify interesting and non-trivial ST co-occurrence patterns and to
facilitate their uses in descriptive as well as predictive tasks. Key challenges in exploring a
conceptual model of ST co-occurrence patterns include taxonomy, requirements and interest
measures. In that context research tasks can be listed as:
1. To create/define taxonomy for ST co-occurrence patterns and their use-cases,
2. To create/define new composite interest / confidence measures,
3. To use new composite interest measures to get better computational efficiency,
4. To mine interesting and non-trivial ST co-occurrence patterns,
5. Validation and experimentation for patterns and composite interest measures.
C.2.1 Create/define taxonomy for ST co-occurrence patterns and their use-cases.
Current taxonomy of spatial data deals with the patterns/objects, which are fixed at a
time. These patterns/objects are associated with geometry and position in space. OGIS [21]
defines a taxonomy for spatial data and provides a framework for exchanging spatial data. In
contrast to this, there is no accepted framework for taxonomy of ST data, which deals with the
patterns in space and time and captures relationship between a pattern and space-time such as
periodic co-occurrence patterns, patterns moving, emerging, merging, vanishing and
fragmenting over time.
A taxonomy of ST co-occurrence patterns provides a classification of the patterns listed
in section C.2.4. It is natural to ask if there are other interesting and useful classes of ST cooccurrence patterns. One approach to address this issue is to examine current application
domains and application domain scientists and create a consensus classification scheme.
Another approach is to use taxonomies for ST data types and their relationships, and study their
implications for classes of ST data [11]; taxonomies of spatial data types may even be extended
for this purpose. Object and field are two common models of spatial data [12]. An object model
is ideal for representing discrete identifiable entities such as lakes, road networks, and cities.
This model may be generalized to ST datasets by categorizing objects into stationary and
mobile objects [11] as well as subclasses such as rigid and deforming. A field model is defined
by a spatial framework (SF) and a set of field functions mapping the SF to attribute value
domains. This model may be generalized to ST datasets by defining a ST framework (STF) and
field mapping the STF to attribute domains. STF fields may be categorized as largely static
(e.g., elevation) or dynamic (e.g., temperature) for a given time scale [11].
We plan to use a combination of these approaches by leveraging the work of application
domain scientists in military terrain, ecology, climatology, and Earth science to create a new
taxonomy for ST data [11, 13]. It is also important to create a taxonomy of the common usages
of ST patterns by domain scientists. Common activities include evaluation, explanation, and
prediction, where scientists observe where and when the co-occurrence patterns are valid. As
part of this research we would explore use-cases of how ST co-occurrence patterns will be used
by the Army.
C.2.2 Create/define new composite interest / confidence measures
Interest measures are designed to characterize interesting and useful ST co-occurrence
patterns. Interest measures may be used to specify a subset of patterns in a post-processing
phase to evaluate, interpret, and use these patterns. Alternatively, they may be used to reduce
computational costs for ST data mining methods. One key challenge in the design of interest
measures for ST co-occurrence patterns arises from “where and when” questions posed by the
domain scientist. For example, Earth scientists may be more interested in a co-occurrence
pattern if it occurs in geographic areas of homogeneous forest types, such as shrubland or
grassland, since it may allow an explanation of the co-occurrence rule. Similarly, scientists may
wish to identify co-occurrence rules whose temporal occurrence correlates with a special time
series, for example El Niño index time series, to understand the impact of climate disturbance
events. To support a where and when analysis, we propose to use composite interest measures
which are functions of time, location, or other parameters (e.g., time-lags or distance).
A basic set of disjoint events for defining temporal probability is a collection of time
instances. Temporal probability of a Boolean ST event represents the time dependency of the
probability of the event happening across time. In other words it defines probability or
probability density of a ST event at a given time. For example, the temporal probability (interest
measure) at a location, e.g., Alexandria, VA, in Figure 3(b) is computed from local data about
co-occurrence events in Alexandria for different months over a 17-year series. Similarly, a basic
set of disjoint events for defining spatial probability is a collection of spatial instances. Spatial
probability of a Boolean ST event represents the spatial dependency of the probability of the
event happening in space. It defines probability or probability density of a ST event at a given
location. Figure 3(a) shows an interest measure, which is a function of space and is defined by
aggregating over all locations for each time snapshot. A basic set of disjoint events for defining
ST probability is a collection of ST instances. The ST probability of a Boolean ST event
represents the temporal and spatial dependency of the probability of the event happening
across time and in space. It defines probability or probability density of a ST event at a given
time and location.
(a) Spatial variation (gray scale=% of
months supporting the co-occurrence
pattern at a location)
(b) Temporal variance
Figure 3: Visualization of where and when co-occurrence pattern occurred
Composite interest measures provide useful information to domain scientists for
evaluating and explaining ST co-occurrence patterns. However, there is a risk of information
over-loading, particularly when the mining algorithm produces a large number of co-occurrence
patterns. One way to address this problem is to define ST Boolean constraints on the composite
interest measures, which may be used by either a post-processing method or the a cooccurrence method to eliminate many uninteresting patterns. Examples of Boolean constraints
on time series interest measures include many well-known tests for periodicity [14], stationary
as well as binary tests for correlation [6, 17] with a given time series (such as the El Niño index
time series). Many binary time series similarity measures have been identified [16]. Boolean
constraints on spatial maps include well-known unary tests for clusteredness [6]. Binary map
similarity measures are sparse in current literature, and we will explore new and more detailed
use of such measures.
The use of composite interest measures raises the issue of whether the ST Boolean
constraints on the composite interest measure are anti-monotonic. We will explore common ST
Boolean constraints on composite interest measures for the anti-monotone property.
C.2.3 Use new composite interest measures to get better computational efficiency
Due to the increasing volume of data and ST co-occurrence patterns, computational
costs are likely to be very high. We will create new algorithms to reduce this computational
cost. In current data mining algorithms, most time is consumed during the preprocessing and
generation of candidate sets. We also plan to explore new methods for generation of cooccurrence rules using composite interest measures.
The idea behind preprocessing is to partition ST dataset for generating neighborhood
transactions. In literature, there are many partition based preprocessing methods, such as,
maximal cliques [18] and max-clique[19]. We will explore these pre-processing methods to
determine the most appropriate one for ST datasets and different ST co-occurrence patterns.
Apriori-based approaches may be used to generate candidate co-occurrences because of
the anti-monotone property of participation index. Size k co-occurrences may be used to
generate size k+1 co-occurrences. Pruning may be done if produced co-occurrences do not
satisfy appropriate interest measures such as the participation index. For large and dense
datasets, and long-length frequent itemsets, the computation cost of this step may be very
high. To overcome this problem we will explore new solutions such as top-down approaches
[1]. A top-down approach starts with the maximum possible itemsets and checks subsets to
find frequent ones. The top-down approach is based on two lemmas: “all subsets of a frequent
itemset are also frequent” and “all supersets of an infrequent itemset are also infrequent” [1].
Whenever such an approach finds frequent subsets, it does not check subsets of them because
of anti-monotonic property. For this work a new composite interest measure, with an antimonotone property will be created and developed for pruning.
To generate co-occurrence rules conditional probability of co-occurrence rule c1  c2 is
used and can be defined as fraction of events where c1 occurs that (c1  c2 ) also occurs
( |  C1 (table _ ins tan ce(c1  c 2 ) | / | table _ ins tan ce(c1 ) | ). To make efficient computing bitmap
or other data structures will be used.
C.2.4 Mine interesting and non-trivial ST co-occurrence patterns
We will mine interesting and non-trivial ST co-occurrence patterns by creating new methods.
New types of ST co-occurrence patterns can be listed in four major categories:
1.Periodic co-occurrence patterns: If a new feature or a new instance of existing features
is introduced to the space or extracted from the space, new co-occurrence patterns can appear
or existing ones may disappear, or a new co-occurrence pattern may reflect periodicity. Assume
that an “insurgent neighborhood” feature is added to the space in the IED example discussed in
Section A.1. The features, {insurgent_neighborhood} and {IED, road, time_of_day}, could form
a new ST co-occurrence pattern, {IED, road, time_of_day, near_insurgent_neighborhood}. In
contrast, extraction of an instance, such as {time_of_day}, may break the {IED, road,
time_of_day, near_insurgent_neighborhood} pattern. Another example can be ST cooccurrence patterns that occur on Fridays but not on other weekdays.
2.Co-occurrence patterns of moving objects: In this kind of ST co-occurrence pattern, at
least one of the co-occurring objects can be a moving object (i.e., IED attacks moving from
one road or area to another). On the other hand, all co-located objects can be moving objects
(i.e., IED attacks, location of suspected insurgents). An example for this pattern can be seen in
Figure 1(b). The x-axis gives the time information and the y-axis gives the location information
(highway mile point). In the figure the tracks represent the movement of the military and
civilian vehicles in space. The civilian vehicle leads the Army vehicle until they reach a specific
location and then stalls the Army vehicle at this location when an IED is detonated. After a
certain time an attack event also occurs on the Army vehicle.
3.Emerging or vanishing co-occurrence patterns: Users may want to know which cooccurrence patterns have interest measures getting stronger or weaker with time (i.e., road
locations or municipal areas where the incidence of IED attacks increase or decrease over a
period of time).
4.Co-occurrence patterns merging or fragmenting with the time: Baath
neighborhoods, Shiite neighborhoods, and neighborhoods harboring non-Iraqi insurgents may
merge into neighborhoods likely to be the source of an IED attack. As the political process
evolves, the Shiite neighborhoods may no longer be sources of these attacks.
C.2.5. Validation and experimentation for patterns and composite interest measures.
To evaluate the performances of different preprocessing methods and candidate cooccurrence generation algorithms, new composite interest measures will be defined. We will
conduct numerous experiments by controlling different combinations of parameters. We will
answer questions such as the followings: What are the dominance zones (the parameter values
for which a specific algorithm is fastest) among the different preprocessing methods for large
ST datasets? Which method, bottom-up or top-down, is suitable for ST datasets in what
conditions? Top-down approaches may be the most suitable when datasets are dense or length
of co-occurrence pattern is long. What is the effect of the number of co-occurrences patterns
and/or co-occurrence pattern lengths on the computational cost of different approaches? How
do the composite interest measures affect the solution set or performance of the algorithms?
To evaluate the algorithms, synthetic and real-world datasets (i.e. IED datasets) will be
used. Synthetic datasets can be used to test algorithms from various aspects by controlling
dataset generation parameters. The experimental setup can be seen in Figure 4.
Generation
Parameters
Neighborhood
Parameters
Generate
Dataset
Generate
Neighborhoods
Real-world data
Preprocessing
Methods
Co-occurrence Interest
Methods
Measures
Synthetic
Data
Preprocessing
ST Co-occurrence
Methods
Analysis
Figure 4: Experimental setup and design
C.3. Anticipated Results: This research will help define a taxonomy for interesting and nontrivial ST co-occurrence patterns and will give a conceptual framework for these patterns. In
contrast to the limitations of current data mining methods, we expect to create and explore new
methods and composite interest measures overcoming these limitations for different ST cooccurrence pattern mining problems. The performances of these new methods will be compared
and dominance zones of each of them will be determined. An Iraqi IED dataset will be used for
discovery, test and evaluation.
In the first year we expect to create and define an ST pattern taxonomy, to collect and
test initial data and hypotheses, and to create and define new interest measures for ST cooccurrence patterns. In the second year we will explore computationally efficient ST cooccurrence pattern mining methods. In the third year validation and experimentation for
patterns and new interest measures will be done.
D. OPERATIONAL RELEVANCE AND TECHNOLOGY TRANSITION
D.1. Opportunities for Transition: This proposed research for modeling ST co-occurrence
patterns will support the Geospatial Information Integration and Generation Tools (GIIGT) and
Distributed Geospatial Intelligence work packages.
D.2. Productions of the Research: Papers will be published in conferences and journals
such as the Army Science Conference (ASC), the Association for Computing Machinery
Symposium on Advances in Geographic Information Systems (ACMGIS), the Conference on
Geographic Information Science (GIScience), the Symposium on Spatial and Temporal
Databases (SSTD), and the IEEE Transactions on Knowledge and Data Engineering (TKDE).
D.3. ERDC and Army Relevance
Impacts on the Army and Relevance to the Future Force: The knowledge and models
provided from this research will support geospatial intelligence tasks for Future Combat System
(FCS) as specified in the Situational Awareness section of the FCS Mission Needs Statement
(MNS). The results of this research could be effectively utilized in urban and non-urban
environments. The results will also address Force Operating Capabilities (FOC) described in
TRADOC Pamphlet 525-66. For FOC-01-04, an automated running estimate of the situation
incorporating predictive analysis, the results of this research will provide a capability for more
rapid decision action cycles with much less effort required to understand what is happening;
improved situational understanding; improved information superiority; and timely, relevant and
predictive intelligence. For FOC-02-01, Sensor Fusion, this research will utilize data from
multiple sensors and products of the research will provide streaming ST data correlated and
mined to create information and knowledge. The results will draw relationships, provide
meaning to the information, convert information into actionable information, sense the
environment quickly, and provide automated pattern analysis. For FOC-02-02, Situational
Understanding, the research products will support the goal of understand first, identify a
pattern or critical elements, and develop adaptive reasoning tools that provide knowledge
discovery. For FOC-02-04, Understand the Battlespace Environment, the research will directly
support the goal of understanding of the environment including terrain, weather, infrastructure,
hazards, populations, and their interaction thru the ability to both predict and understand, in
real time, the impact of the environment.
ERDC Relevance: The results of this research will identify important relationships that are
critical for understanding ST data and identifying useful, nontrivial patterns in the ST data from
existing databases or from a network of sensors and probes for use in descriptive and predictive
tasks. The results could also be used to predict missing values in ST datasets, and identify
potential errors in ST datasets.
REFERENCES
[1] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann,
ISBN:1558604898, 2000.
[2] S. Shekhar, P. Zhang, Y. Huang, and R. R. Vatsavai, Spatial Data Mining, Book Chapter in
Data Mining: Next Generation Challenges and Future Directions (Ed. H. Kargupta et al),
MIT Press, ISBN:0262612038, 2004.
[3] S. Shekhar,S. Chawla, Spatial Databases:A Tour, Prentice Hall, ISBN:0130174807, 2003.
[4] Y. Huang, S. Shekhar, and H. Xiong, Discovering Co-location Patterns from Spatial
Datasets: A General Approach, IEEE Trans. on Knowledge and Data Eng., 16(12),
pp.1472-1485, Dec. 2004.
[5] Y. Chou. Exploring Spatial Analysis in Geographic Information System, Onward Press,
ISBN:1566901197, 1997.
[6] N.A.C. Cressie, Statistics for Spatial Data, Wiley and Sons, ISBN:0471843369, 1991.
[7] K. Koperski and J. Han, Discovery of Spatial Association Rules in Geographic Information
Database, in Proc. of the 4th Int’l Symp. on Spatial Databases, pp. 47-66, 1995.
[8] Y. Morimoto, Mining Frequent Neighboring Class Sets in Spatial Databases. in Proc. ACM
SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 353-358, 2001.
[9] S. Shekhar and Y. Huang, Co-location Rules Mining: A Summary of Results, In Proc. 7th
Int’l. Symp. on Spatio-temporal Databases, 2001.
[10] X. Zhang, N. Mamoulis, D. W. Cheung, & Y. Shou, Fast Mining of Spatial Collocations, in
Proc. ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp.384-93,2004.
[11] A. Frank, Ontology for Spatio-temporal Databases, In Spatio-Temporal Databases: The
Chorochronos Approach, (Ed. M. Koubarakis, T. Sellis et al), Springer, 2003
[12] M.F. Worboys, GIS:A Computing Perspective, Taylor and Francis, ISBN:0748400648, 1995.
[13] M.F. Mokbel, W.G. Aref, S.E. Hambrusch, and S. Prabhakar, Towards scalable locationaware services: requirements and research issues, ACMGIS, pp. 110-117, 2003
[14] G. Box, G. Jenkins, and G. Reinsel, Time Series Analysis: Forecasting and Control, Prentice
Hall, ISBN:0130607746 1994.
[15] P. Rigaux, M.O. Scholl, and A. Voisard, Spatial Databases: With Application to GIS, Morgan
Kaufmann, ISBN:1558605886, 2001.
[16] D. Gunopulos and G. Das. Time Series Similarity Measures and Time Series Indexing.
SIGMOD Records, 30(2), 2001.
[17] P.Zhang, Y. Huang, S. Shekhar, and V. Kumar, Exploiting Spatial Autocorrelation to
Efficiently Process Correlation-Based Similarity Queries, In the Proc. Of the 8th Int’l Symp.
On Spatial and Temporal Databases, pp. 179-198, 2003.
[18] C. Berge, Graphs and Hypergraphs. American Elsevier, 1976.
[19] Y.Zhao and G. Karypis, Evaluation of Hierarchical Clustering Algorithms for Document
Datasets, In Proc. Of ACM Conference on Information and Knowledge Management
(CIKM), pp. 515-524, 2002.
[20] S. Hwang, Temporal Extensions of K Function, UCGIS Assembly (conjunction with
GIScience 2004), 2004.
[21] OGIS. Open GIS consortium: Open GIS Simple Features Specification for SQL (Revision
1.1) in URL http://www.opengis.org/techno/specs.htm.
Download