Spatial Data Mining Methods and Problems

advertisement
Spatial Data Mining Methods and Problems
Cankai Fu
Yunnan Province Map hospital 650032
Abstract
Use summarizing method,Characteristics of each spatial data mining and
spatial data mining method applied in GIS,Pointed out that the space
limitations of current data mining,Analysis of the current problems in
spatial data mining,explore the development trend of spatial data mining.
Key words:spatial data;mining;method;
1.Introduction
Due to the rapid development of earth observation technology,database
technology, network technology and other space within the field of
information technology in recent years, a large number of spatial data
collected from remote sensing, GIS, GPS, multimedia systems, medical
and satellite images, and other applications.The complexity and number
of these data far beyond the analytical capacity of the human
brain.Although the spatial database objects have the ability to save space
by the spatial relationship of these spatial data types and objects to
represent ,However, users can not detail all of the data on knowledge and
extract interest,Data mining will be an effective tool,Spatial data mining
technology to solve this problem provides an opportunity.
Spatial Data Mining research data later than the general relational
database or transaction database mining ,but in recent years has aroused
widespread concern.GIS International Conference 1994 held in
Ottawa,Academician Dere Le First proposed the concept of spatial data
mining and knowledge discovery and the first to study the discovery of
knowledge from GIS database,build a theoretical framework of spatial
data mining and knowledge discovery,Murray retrospective exploratory
analysis of spatial data clustering discovery technology,based on analysis
of the spatial pattern recognition and knowledge of statistics, data mining
and GIS discovery method. Koperski and so on spatial data generation,
spatial data clustering and spatial data mining association rules
summarize the development of spatial data mining.Ankest spatial data
mining visualization process of analyzing the shape properties of spatial
objects,and using three-dimensional histogram similar spatial database
search and classification. Keim proposed pixel raster data for large spatial
clustering result expression notation.Sun Lianying proposed fusion super
graph
theory,
hypergraph
model
objects
and
visualization
technologies.And for spatial data mining.Duan Xiaojun will be embedded
in the software Mat lab VB matrix computing environments, simple to
implement data mining higher-order correlation matrix operations
required for the analysis and dynamic mapping functions.Many of our
universities and other research institutes and has launched a study of the
theory and application of spatial data mining and knowledge
discovery,State Spatial Data Mining and Knowledge Discovery also
given a great deal of attention.Among them, Li Deren academician of
innovative research by the international peer approval.Moreover, Li
Deren Li academicians with Deyi academicians and cooperation, as early
as in 1999 the first to cultivate the first doctor of the Spatial Data Mining
and Knowledge Discovery direction -Dr.Di Kaichang.After that,Wang
Shuliang has done a lot of intensive work, based on cloud theory on
Academician Li Deyi, perfected the concept of data fields,Proposed the
concept of visual spatial data mining and implementation,and
successfully applied to landslide monitoring data mining,Achieved good
results.Dr. Qin Kun let spatial data mining practice how to apply a lot of
meaningful work. Basis of his theory and method for image data mining
system studies Implications for remote sensing image data content, such
as spectral characteristics, texture, shape feature and spatial distribution
characteristics for mining.Digging a higher level of abstraction
knowledge Dr. Qin Kun developed a framework for image sensing image
data mining software prototype system, designed and developed the
image sensing image data mining software prototype RS Image Miner.
RS Image Miner successful development, marks the spatial data mining
gradually moving from theory to practical application.
2 Spatial Data Mining Overview
2.1 Definition of Spatial Data Mining
Spatial Data Mining, also known as data mining and knowledge-based
spatial database found.As a new branch of data mining,it refers to the
extraction of spatial patterns and characteristics of interest to the user
from the spatial database, spatial relations and general non-spatial data in
the database and some of its implicit universal data features.But SDM is
different from the DM, data different from the conventional transactional
database mining, increasing the spatial scale than the discovery state
space theory of general database.
2.2Spatial data mining features
Spatial data mining is the inevitable result of the development of
spatial information technology, is a particular area of data mining,
different from the general affairs or relational data mining.Content spatial
data mining is much more than the general data mining wealth,
knowledge that can be found mainly include spatial characteristic rules,
the rules distinguish between space, distribution space, space
classification rules, the rules of spatial clustering, spatial association rules,
spatial evolution, Object-oriented knowledge and spatial variation type
knowledge.Spatial data mining has the following characteristics:
(1) Data source is rich, the huge amount of data, information vague, data
types, complex access methods;
(2) The use of spatial indexing mechanism to organize data;
(3) Wide range of applications, data and spatial location can be related to
mining;
(4) Mining methods and algorithms very much, and most complex
algorithm;
(5) Diverse expressions of knowledge, understanding and appreciation of
knowledge depend on the person's awareness of the objective world;
(6) Multi-scale spatial data, high-dimensional, and highly self-correlation
between each other.
2.3 The main method of spatial data mining
Spatial data mining is a multidisciplinary and cross-integration of a
variety of new areas of technology, a collection of artificial intelligence,
machine learning, databases, pattern recognition, statistics, GIS,
knowledge-based systems, visualization and other areas related
technologies.Current methods commonly used are:
(1) Spatial analysis methods:use a variety of GIS spatial analysis model
and spatial operations on data crucial database for further processing to
produce new information and knowledge.Spatial analysis methods
currently used by the comprehensive property data analysis, topology
analysis, buffer analysis, density analysis, from the analysis, stack value
analysis, network analysis, terrain analysis, trend surface analysis,
predictive analysis, can find the target in space connected to the adjacent
and symbiosis association rules, or find the shortest path between the
objective knowledge, decision support optimal paths.Spatial analysis is
often used as pretreatment and feature extraction methods used in
conjunction with other data mining methods.
(2) Statistical analysis methods:statistical methods have been used to
analyze spatial data, analysis focused on space objects and phenomena of
non-spatial characteristics.Statistical method has a strong theoretical
foundation, with a large sophisticated algorithms, including many
optimization techniques.In the use of statistical methods for data mining,
the general nature of the data is not the space to be considered as a
limiting factor, the specific spatial location spatial data described things
in such mining is not a limiting factor.Although the results of this
excavation mode and general data mining is no essential difference, but
the results were found after digging in the form of maps to describe, and
the results found that the interpretation is bound to rely on geographic
space, mining explanation and it must be reflected in space law.The
shortcomings of statistical methods is difficult to deal with character data,
and generally up to the rich experience of statistical experts.The biggest
drawback of the statistical method is to assume that the spatial
distribution of data are statistically uncorrelated, which cause problems in
practice, because a lot of spatial data are interrelated.Variogram and now
represented by Geostatistics Kriging method is the more popular method
of statistical analysis.
(3)Induction learning methods : inductive learning methods are
summarized from a large number of empirical data to extract the general
rules and patterns, most of which comes from the field of machine
learning algorithms.Inductive learning in many ways, of which the most
famous is Qauinlan proposed C5 • 0. That is a decision tree algorithm,
developed from the ID3 algorithm used to select properties, classification
speed, suitable for the study of large databases, and C5 • 0 increases the
decision tree is converted to equivalent on the basis of ID3 production
rules function, and can solve the problem of continuous learning value of
the data.Professor Han Jawed inductive method is proposed for a property
of (Attribute Oriented Induction, AOI), dedicated to the discovery of
knowledge from the database, by enhancing the concept of tree data
summary and synthesis, induction of high-level models and features.
(4) Spatial association rule mining method:Mining association rule first
proposed by Agrawal et al. The most famous is the Apriori algorithm,
whose main idea is to count the number of a variety of goods in the first
purchase in the frequency of co-occurrence, and then will appear with
more frequency converted to association rules.On this basis, Han et al
binding properties for the induction (Attribute-Oriented Induction), a
multi-level association rules mining algorithms ML _T2LI etc.First find
frequent pattern (Frequent ltemset) high level of generalization, gradually
specific, frequent pattern mining low generalization layer, and finally
solved by the frequent pattern of association rules.In addition, the
algorithm is more commonly used method of K • Koperski raised
stepwise refinement of spatial association rule mining.
(5) Clustering (Clustering Approach) and classification (Classification
Approach):Clustering is a certain distance or similarity coefficient data
into a series of groups distinguished from each other.Commonly used
classical clustering methods have K2mean, K2medoids, ISODATA and
CLARANS algorithms for larger datasets.
(6) Neural network (Neura1 Network Approach):Neural networks are a
large number of neurons adaptive nonlinear dynamic systems through
extremely rich and well connected to each, and have distributed memory,
associative memory, massively parallel processing, self-learning,
self-organizing, adaptive and other functions.Neural network consists of
an input layer, an intermediate layer and output layer.Large number of
neurons collectively through training to learn to be analyzed patterns in
the data, describe the formation of complex nonlinear systems nonlinear
function of environmental information adapted from complex background
fuzzy inference rules are not explicit nonlinear space systems in mining
classification knowledge in spatial data mining can be used for
classification, clustering, characterized mining operations.Currently used
in spatial data mining neural network can be divided into three categories:
for the prediction, pattern recognition feedforward networks, such as
back-propagation
model,
function
networks
and
fuzzy
neural
networks;Associative memory and optimization of the feedback network,
such as discrete models and continuous models for Hopfield etc;Ad hoc
network for clustering, such as ART models and Kohloen die hope and so
on.Neural networks have a distinct "to analyze specific issues," the
characteristics of its convergence, stability, local minima and parameter
adjustment issues to be more in-depth research, especially for multi-input
variables, system complexity and nonlinearity of large cases .
(7) Data visualization method:visualization technology is:mainly used
to achieve a variety of purposes, including a visual analysis of the
thinking process, visual analysis of the visual evoked insight and refining
the concept as a distinct research methods.Data visualization technology
represented a lot of data in various forms to help people find data
structure,
characteristics,
patterns,
trends,
anomalies
or
related
relations.Data visualization is not just a calculation method, is more
important is to provide people with a cognitive tool that can greatly
enhance the data processing capacity, is at all times be effectively utilized
to generate massive amounts of data can be data in humans, information
transmission between people, so that people can observe the hiding
information,is found and provide a powerful tool for understanding the
laws of science can be achieved on computing and programming
guidance and control, the process is based on the condition change
through interactive tools and observe its effects.
(9) Rough Sets Theory:rough Sets Theory is an intelligent
decision-making data analysis tool Z • Pawlak professor at the
University of Warsaw in 1982 proposed, has been extensively studied and
applied imprecise, uncertain, incomplete classification analysis and
knowledge to information.Rough Sets Theory is important attributes of
spatial data, attribute dependency attribute table to establish minimum
decision-making and classification algorithm generation.Rough Sets
Theory and other knowledge discovery methods could obtain more
knowledge of uncertainty in the case of spatial data in the
database.Currently Rough Sets Theory research is a hot spatial data
mining research.
(10)Decision Tree Approach:Depending on the characteristics of a
decision tree to classification or decision tree represents a specific set of
rules and discover the laws.Spatial Data Mining, the first use of the
training set to generate spatial entities measured as a function;Second,
depending on the value of the establishment of tree branches, centralized
repeat establish lower nodes and sub-branches in each branch, tree
form;Then the decision tree pruning process, the tree is converted to data
in the new entity classification rules.ID3 (InteractiveDichotomizer3)
method to establish tree or tree of decision rules based on the principles
of information theory, it calculates the amount of information in the
database of the fields, looking for security segment having the maximum
amount of information in the database.Build a decision tree node. In the
establishment of different values of the tree branch buildings segment in
each branch subset repeat the achievements lower nodes and branches,
leaf node as positive or negative examples.The estimated aggregate value
of non-space tree near the object, based on non-spatial attributes
classification object descriptions are property of classified objects and
spatial relationships of proximity feature, predicates, and functions
Koperski put forward a two-step decision-classification of spatial data,
looking for after a rough description of sample objects, the use of
machine learning algorithms to extract spatial predicate Relief combined
spatial and non-spatial predicates predicates that classification decision
knowledge.
(11) Other methods:In addition to the above-described method, spatial
data mining method are: spatial characteristics and trend detection
method (Characterization and Trend De2tection), cloud theory (Cloudy
Theory), image analysis and pattern recognition methods (Image Analysis
and Pattern Recognition). Theory of evidence (Evidence Theory), Geo informatic Tupu method (Geo-informatic Graphic Methodology), the
computer and the method (Computer Geometry Methods), fuzzy set
theory (Fuzzy Sets Theory) and the like.
3 spatial data mining architecture and processes
3.1Architecture of Spatial Data Mining
Matheus using more general multi-component spatial data mining
architecture, shown in Figure 1.SDB interfaces mainly by the mining
process, focus, model extraction and evaluation of four modules to
complete.Wherein the SDB (Spatial Database) is a spatial database,
SDBMS (Spatial Database Management System) is a spatial database
management system, KDB (Knowledge Database) is the knowledge
base.SDB interface utilizes spatial index structures (such as trees or R- R
* - trees, etc.) to retrieve data from the data source to query optimization;
focus module of object and extract attributes; model extraction module
based on the module's focus on the use of the machine learning, neural
networks, decision trees and other methods to find patterns or
"knowledge"; evaluation module to tap into the "knowledge" to assess the
removal of redundant information or known reality.Four modules are not
completely in only one direction, they interact through the controller.
Therefore, based on this architecture, spatial data mining is a process of
continuous feedback and adjustment. Finally, in the process, spatial data
mining results are presented to the user.
Control
user
SDBMS
SDB
connect
or
Focus
Model
Extractio
n
SDB
KDB
Assess
Knowledge Areas
Figure 1 Architecture of Spatial Data Mining
3.2 Spatial data mining process
Spatial data mining is an essential step process spatial KDD. Data
mining step is interesting model provided by the user, or as not new
knowledge stored in the knowledge base, the most important step in the
process of knowledge and the way users interact with or knowledge to
carry out the discovery, because it can reveal hidden -known pattern. It
consists of the following steps:
(1) Data Cleanup: value by filling vacancies. Smooth noisy data, identify,
remove the outliers and "clean up" inconsistent data;
(2) Data Integration: to integrate multiple data sources;
(3) Data Selection: The data retrieved from the database associated with
the task;
(4) data transformation: summary or aggregation operations by
transforming data into a form suitable for data mining;
(5) Data Mining: Using intelligent way to extract the data model. Prior
knowledge of the target and the type of data mining will be OK, and then
select the appropriate mining algorithm based on the type of knowledge
needed to finally acquire the knowledge required from the database in the
selected mining algorithms;
(6) Mode Assessment: to assess the knowledge model really interesting
measure by some interest;
(7) Knowledge Representation: Visualization through knowledge
representation technology showcase mining knowledge to the user,
through the above process continuous cycle operation, you can dig out of
that knowledge for continuous refinement and deepened.
4 Spatial Data Mining Applications in GIS
Spatial Data Mining combination of technology and GIS has a very
broad application space.Spatial Data Mining with GIS has three modes:
one for loose coupling type, also known as external spatial data mining
model that essentially GIS viewed as a spatial database in GIS
environment by means of other external software or computer language
spatial data mining, data communication between the GIS and the use of
contact. The other is embedded, also known as the internal spatial data
mining model, that in the spatial data mining technology integration in
GIS spatial analysis functions to go. The third is a hybrid space model
method is a combination of the first two methods, namely the use of GIS
functionality provided as to minimize the workload and difficulty of the
user self-developed, remain flexible external spatial data mining models.
The use of spatial data mining techniques can be found in the following
several major types of knowledge from spatial databases: general
knowledge of geometry, spatial distribution, spatial association rules,
spatial clustering rules, spatial characteristic rules, the rules distinguish
between space, spatial evolution of the rules for object. At present, this
knowledge has been used in more mature Explorer military, land,
electricity,
telecommunications,
oil
and
gas,
urban
planning,
transportation, environmental monitoring and protection, 110 and 120
rapid response systems and urban management. In the market analysis,
customer relationship management, banking, insurance, demographics,
real estate development, personal location services and other areas are
also received extensive attention and application, in fact, it is deep into
every aspect of people work and live.
5 Current spatial data mining Problems
Spatial data mining has become a database of information and
decision-making is an important research direction, despite some progress,
but it is still attractive and challenging, there are still many issues to be
studied:
(1) the majority of spatial data mining algorithms is a general
migration from data mining algorithms, and did not consider the spatial
data storage, processing and spatial characteristics of the data itself.
Spatial data is different from the data in a relational database, is the use of
complex, multi-dimensional spatial data index structure of the
organization, has its unique spatial data access methods, thus traditional
data mining technology is often not a good analysis of complex spatial
phenomena and space object.
(2) the spatial data mining algorithms is not efficient, not scouring
discovery mode. Faced with massive database systems, spatial data
mining process appears uncertain, the possibility of errors dimension
model and problems to be solved are great, not only increases the
algorithm of the search space, but also increased the blind searches
possibility. And therefore it must be removed with the use of domain
knowledge discovery tasks unrelated data, effectively reducing the
dimension of the problem, design a more effective knowledge discovery
algorithms.
(3) There is no accepted standardized spatial data mining query
language. One reason for the rapid development of database technology is
the continuous improvement and development of a database query
language, therefore, to continue to improve and develop spatial data
mining is necessary to develop spatial data mining query language,
digging the foundation for efficient spatial data.
(4) Spatial Data Mining Knowledge Discovery System interaction is
not strong ,in the knowledge discovery process is difficult full and
effective use of expert knowledge in the field, they can not very well
control the spatial data mining process.
(5) spatial data mining and integration with other systems is not
enough, ignoring the GIS spatial knowledge discovery process in the
role.One way and features a single scope of spatial data mining system
will be subject to many restrictions, the development of the knowledge
system is limited to the database field, if you want to find in a wider area,
knowledge discovery system should be a database, knowledge base,
expert systems, decision support systems, visualization tools, network
systems integration and many other technologies.
(6) spatial data mining method and single task,Basically for a specific
problem,It is possible to find limited knowledge.
6 trends of spatial data mining
Due to space data has massive, non-linear, multi-scale and fuzzy and
other characteristics,extract knowledge from spatial databases more
difficult
than
extracting
knowledge
from
traditional
relational
databases,his gives spatial data mining research challenges.Spatial data
mining in the future, there are many theories and methods need further
study:
(1) Algorithms and spatial data mining techniques.Spatial association
rule mining algorithm, time series data mining technology, space parity
arithmetic, spatial classification technology, space outlier data mining
algorithms, spatial research focus, while improving the efficiency of
spatial data mining algorithms is also very important.
(2) pre-processing of multi-source spatial data..Spatial data includes
DLG data, image data, digital elevation models and feature attribute data,
due to the difficulties of its own complexity and data collection, spatial
data, there is inevitably missing value, noise and inconsistent data data,
pre-processing of multi-source spatial data is particularly important.
(3) Spatial data mining network environments, visual data mining,
integration of spatial data mining raster vector, background concept tree
automatically generated (location, property, time, etc.) based on spatial
data mining uncertainty, increasing data mining, multi-resolution and
multi-level data mining, parallel data mining, data remote sensing image
database mining, knowledge discovery multimedia spatial database
integration of different spatial data mining methods and techniques of the
future research directions.
It is foreseeable that spatial data mining will not only promote space
science, the development of computer science, but also will enhance
human understanding of the world, the discovery of knowledge, in order
to better transform the world, the service of human society.
References
[1]Nicholas R. Jennings A Roadmap of Agent Research and Development,
Autonomous Agents and Multi-Agent System 1[M]. Boston: Klumer Academic
Publidkus,1998.
[2]Li Denren,Wang shuliang,Li deyi,Theory and methods of spatial data mining and
knowledge discovery[J].Wuhan University Science Journal (Information Science
Edition),2002,27(3):221-233.
[3]CHEN Y L,CHEN J M,TUNG C W.A data mining approach for retail knowledge
discovery with consideration of the effect of shelf-space adjacency on
sales[J].Decision Support Systems,2006,42(3):1503-1520.
[4]LEE A J T,HONG R W,KO W Metal. Mining spatial association rules in image
databases [J]. Information Sciences,2007,177(7):1593-1608.
[5]BEAUBOUEF T,PETRY F E,LADNER R. Spatial data methods and vague
regions: A rough set approach[J].Applied SoftComputing,2007,7(1):425-440.
[6]WANG C H. Recognition of semiconductor defect patterns using spatial filtering
and spectral clustering[J].Expert Systems with Applications,2008,34(3):1914-1923.
[7] Wang xinzhou.Spatial Data Processing and Spatial Data Mining [D].Wuhan
University Science Journal (Information Science Edition),2006,31( 1).
[8]Cao jifeng.Mining Research Based on GIS Spatial Data[J].West Anhui University,
2010,4:43-46.
Download