Spatial Data Mining Methods and Problems Cankai Fu Yunnan Province Map hospital 650032 Abstract Use summarizing method,Characteristics of each spatial data mining and spatial data mining method applied in GIS,Pointed out that the space limitations of current data mining,Analysis of the current problems in spatial data mining,explore the development trend of spatial data mining. Key words:spatial data;mining;method; 1.Introduction Due to the rapid development of earth observation technology,database technology, network technology and other space within the field of information technology in recent years, a large number of spatial data collected from remote sensing, GIS, GPS, multimedia systems, medical and satellite images, and other applications.The complexity and number of these data far beyond the analytical capacity of the human brain.Although the spatial database objects have the ability to save space by the spatial relationship of these spatial data types and objects to represent ,However, users can not detail all of the data on knowledge and extract interest,Data mining will be an effective tool,Spatial data mining technology to solve this problem provides an opportunity. Spatial Data Mining research data later than the general relational database or transaction database mining ,but in recent years has aroused widespread concern.GIS International Conference 1994 held in Ottawa,Academician Dere Le First proposed the concept of spatial data mining and knowledge discovery and the first to study the discovery of knowledge from GIS database,build a theoretical framework of spatial data mining and knowledge discovery,Murray retrospective exploratory analysis of spatial data clustering discovery technology,based on analysis of the spatial pattern recognition and knowledge of statistics, data mining and GIS discovery method. Koperski and so on spatial data generation, spatial data clustering and spatial data mining association rules summarize the development of spatial data mining.Ankest spatial data mining visualization process of analyzing the shape properties of spatial objects,and using three-dimensional histogram similar spatial database search and classification. Keim proposed pixel raster data for large spatial clustering result expression notation.Sun Lianying proposed fusion super graph theory, hypergraph model objects and visualization technologies.And for spatial data mining.Duan Xiaojun will be embedded in the software Mat lab VB matrix computing environments, simple to implement data mining higher-order correlation matrix operations required for the analysis and dynamic mapping functions.Many of our universities and other research institutes and has launched a study of the theory and application of spatial data mining and knowledge discovery,State Spatial Data Mining and Knowledge Discovery also given a great deal of attention.Among them, Li Deren academician of innovative research by the international peer approval.Moreover, Li Deren Li academicians with Deyi academicians and cooperation, as early as in 1999 the first to cultivate the first doctor of the Spatial Data Mining and Knowledge Discovery direction -Dr.Di Kaichang.After that,Wang Shuliang has done a lot of intensive work, based on cloud theory on Academician Li Deyi, perfected the concept of data fields,Proposed the concept of visual spatial data mining and implementation,and successfully applied to landslide monitoring data mining,Achieved good results.Dr. Qin Kun let spatial data mining practice how to apply a lot of meaningful work. Basis of his theory and method for image data mining system studies Implications for remote sensing image data content, such as spectral characteristics, texture, shape feature and spatial distribution characteristics for mining.Digging a higher level of abstraction knowledge Dr. Qin Kun developed a framework for image sensing image data mining software prototype system, designed and developed the image sensing image data mining software prototype RS Image Miner. RS Image Miner successful development, marks the spatial data mining gradually moving from theory to practical application. 2 Spatial Data Mining Overview 2.1 Definition of Spatial Data Mining Spatial Data Mining, also known as data mining and knowledge-based spatial database found.As a new branch of data mining,it refers to the extraction of spatial patterns and characteristics of interest to the user from the spatial database, spatial relations and general non-spatial data in the database and some of its implicit universal data features.But SDM is different from the DM, data different from the conventional transactional database mining, increasing the spatial scale than the discovery state space theory of general database. 2.2Spatial data mining features Spatial data mining is the inevitable result of the development of spatial information technology, is a particular area of data mining, different from the general affairs or relational data mining.Content spatial data mining is much more than the general data mining wealth, knowledge that can be found mainly include spatial characteristic rules, the rules distinguish between space, distribution space, space classification rules, the rules of spatial clustering, spatial association rules, spatial evolution, Object-oriented knowledge and spatial variation type knowledge.Spatial data mining has the following characteristics: (1) Data source is rich, the huge amount of data, information vague, data types, complex access methods; (2) The use of spatial indexing mechanism to organize data; (3) Wide range of applications, data and spatial location can be related to mining; (4) Mining methods and algorithms very much, and most complex algorithm; (5) Diverse expressions of knowledge, understanding and appreciation of knowledge depend on the person's awareness of the objective world; (6) Multi-scale spatial data, high-dimensional, and highly self-correlation between each other. 2.3 The main method of spatial data mining Spatial data mining is a multidisciplinary and cross-integration of a variety of new areas of technology, a collection of artificial intelligence, machine learning, databases, pattern recognition, statistics, GIS, knowledge-based systems, visualization and other areas related technologies.Current methods commonly used are: (1) Spatial analysis methods:use a variety of GIS spatial analysis model and spatial operations on data crucial database for further processing to produce new information and knowledge.Spatial analysis methods currently used by the comprehensive property data analysis, topology analysis, buffer analysis, density analysis, from the analysis, stack value analysis, network analysis, terrain analysis, trend surface analysis, predictive analysis, can find the target in space connected to the adjacent and symbiosis association rules, or find the shortest path between the objective knowledge, decision support optimal paths.Spatial analysis is often used as pretreatment and feature extraction methods used in conjunction with other data mining methods. (2) Statistical analysis methods:statistical methods have been used to analyze spatial data, analysis focused on space objects and phenomena of non-spatial characteristics.Statistical method has a strong theoretical foundation, with a large sophisticated algorithms, including many optimization techniques.In the use of statistical methods for data mining, the general nature of the data is not the space to be considered as a limiting factor, the specific spatial location spatial data described things in such mining is not a limiting factor.Although the results of this excavation mode and general data mining is no essential difference, but the results were found after digging in the form of maps to describe, and the results found that the interpretation is bound to rely on geographic space, mining explanation and it must be reflected in space law.The shortcomings of statistical methods is difficult to deal with character data, and generally up to the rich experience of statistical experts.The biggest drawback of the statistical method is to assume that the spatial distribution of data are statistically uncorrelated, which cause problems in practice, because a lot of spatial data are interrelated.Variogram and now represented by Geostatistics Kriging method is the more popular method of statistical analysis. (3)Induction learning methods : inductive learning methods are summarized from a large number of empirical data to extract the general rules and patterns, most of which comes from the field of machine learning algorithms.Inductive learning in many ways, of which the most famous is Qauinlan proposed C5 • 0. That is a decision tree algorithm, developed from the ID3 algorithm used to select properties, classification speed, suitable for the study of large databases, and C5 • 0 increases the decision tree is converted to equivalent on the basis of ID3 production rules function, and can solve the problem of continuous learning value of the data.Professor Han Jawed inductive method is proposed for a property of (Attribute Oriented Induction, AOI), dedicated to the discovery of knowledge from the database, by enhancing the concept of tree data summary and synthesis, induction of high-level models and features. (4) Spatial association rule mining method:Mining association rule first proposed by Agrawal et al. The most famous is the Apriori algorithm, whose main idea is to count the number of a variety of goods in the first purchase in the frequency of co-occurrence, and then will appear with more frequency converted to association rules.On this basis, Han et al binding properties for the induction (Attribute-Oriented Induction), a multi-level association rules mining algorithms ML _T2LI etc.First find frequent pattern (Frequent ltemset) high level of generalization, gradually specific, frequent pattern mining low generalization layer, and finally solved by the frequent pattern of association rules.In addition, the algorithm is more commonly used method of K • Koperski raised stepwise refinement of spatial association rule mining. (5) Clustering (Clustering Approach) and classification (Classification Approach):Clustering is a certain distance or similarity coefficient data into a series of groups distinguished from each other.Commonly used classical clustering methods have K2mean, K2medoids, ISODATA and CLARANS algorithms for larger datasets. (6) Neural network (Neura1 Network Approach):Neural networks are a large number of neurons adaptive nonlinear dynamic systems through extremely rich and well connected to each, and have distributed memory, associative memory, massively parallel processing, self-learning, self-organizing, adaptive and other functions.Neural network consists of an input layer, an intermediate layer and output layer.Large number of neurons collectively through training to learn to be analyzed patterns in the data, describe the formation of complex nonlinear systems nonlinear function of environmental information adapted from complex background fuzzy inference rules are not explicit nonlinear space systems in mining classification knowledge in spatial data mining can be used for classification, clustering, characterized mining operations.Currently used in spatial data mining neural network can be divided into three categories: for the prediction, pattern recognition feedforward networks, such as back-propagation model, function networks and fuzzy neural networks;Associative memory and optimization of the feedback network, such as discrete models and continuous models for Hopfield etc;Ad hoc network for clustering, such as ART models and Kohloen die hope and so on.Neural networks have a distinct "to analyze specific issues," the characteristics of its convergence, stability, local minima and parameter adjustment issues to be more in-depth research, especially for multi-input variables, system complexity and nonlinearity of large cases . (7) Data visualization method:visualization technology is:mainly used to achieve a variety of purposes, including a visual analysis of the thinking process, visual analysis of the visual evoked insight and refining the concept as a distinct research methods.Data visualization technology represented a lot of data in various forms to help people find data structure, characteristics, patterns, trends, anomalies or related relations.Data visualization is not just a calculation method, is more important is to provide people with a cognitive tool that can greatly enhance the data processing capacity, is at all times be effectively utilized to generate massive amounts of data can be data in humans, information transmission between people, so that people can observe the hiding information,is found and provide a powerful tool for understanding the laws of science can be achieved on computing and programming guidance and control, the process is based on the condition change through interactive tools and observe its effects. (9) Rough Sets Theory:rough Sets Theory is an intelligent decision-making data analysis tool Z • Pawlak professor at the University of Warsaw in 1982 proposed, has been extensively studied and applied imprecise, uncertain, incomplete classification analysis and knowledge to information.Rough Sets Theory is important attributes of spatial data, attribute dependency attribute table to establish minimum decision-making and classification algorithm generation.Rough Sets Theory and other knowledge discovery methods could obtain more knowledge of uncertainty in the case of spatial data in the database.Currently Rough Sets Theory research is a hot spatial data mining research. (10)Decision Tree Approach:Depending on the characteristics of a decision tree to classification or decision tree represents a specific set of rules and discover the laws.Spatial Data Mining, the first use of the training set to generate spatial entities measured as a function;Second, depending on the value of the establishment of tree branches, centralized repeat establish lower nodes and sub-branches in each branch, tree form;Then the decision tree pruning process, the tree is converted to data in the new entity classification rules.ID3 (InteractiveDichotomizer3) method to establish tree or tree of decision rules based on the principles of information theory, it calculates the amount of information in the database of the fields, looking for security segment having the maximum amount of information in the database.Build a decision tree node. In the establishment of different values of the tree branch buildings segment in each branch subset repeat the achievements lower nodes and branches, leaf node as positive or negative examples.The estimated aggregate value of non-space tree near the object, based on non-spatial attributes classification object descriptions are property of classified objects and spatial relationships of proximity feature, predicates, and functions Koperski put forward a two-step decision-classification of spatial data, looking for after a rough description of sample objects, the use of machine learning algorithms to extract spatial predicate Relief combined spatial and non-spatial predicates predicates that classification decision knowledge. (11) Other methods:In addition to the above-described method, spatial data mining method are: spatial characteristics and trend detection method (Characterization and Trend De2tection), cloud theory (Cloudy Theory), image analysis and pattern recognition methods (Image Analysis and Pattern Recognition). Theory of evidence (Evidence Theory), Geo informatic Tupu method (Geo-informatic Graphic Methodology), the computer and the method (Computer Geometry Methods), fuzzy set theory (Fuzzy Sets Theory) and the like. 3 spatial data mining architecture and processes 3.1Architecture of Spatial Data Mining Matheus using more general multi-component spatial data mining architecture, shown in Figure 1.SDB interfaces mainly by the mining process, focus, model extraction and evaluation of four modules to complete.Wherein the SDB (Spatial Database) is a spatial database, SDBMS (Spatial Database Management System) is a spatial database management system, KDB (Knowledge Database) is the knowledge base.SDB interface utilizes spatial index structures (such as trees or R- R * - trees, etc.) to retrieve data from the data source to query optimization; focus module of object and extract attributes; model extraction module based on the module's focus on the use of the machine learning, neural networks, decision trees and other methods to find patterns or "knowledge"; evaluation module to tap into the "knowledge" to assess the removal of redundant information or known reality.Four modules are not completely in only one direction, they interact through the controller. Therefore, based on this architecture, spatial data mining is a process of continuous feedback and adjustment. Finally, in the process, spatial data mining results are presented to the user. Control user SDBMS SDB connect or Focus Model Extractio n SDB KDB Assess Knowledge Areas Figure 1 Architecture of Spatial Data Mining 3.2 Spatial data mining process Spatial data mining is an essential step process spatial KDD. Data mining step is interesting model provided by the user, or as not new knowledge stored in the knowledge base, the most important step in the process of knowledge and the way users interact with or knowledge to carry out the discovery, because it can reveal hidden -known pattern. It consists of the following steps: (1) Data Cleanup: value by filling vacancies. Smooth noisy data, identify, remove the outliers and "clean up" inconsistent data; (2) Data Integration: to integrate multiple data sources; (3) Data Selection: The data retrieved from the database associated with the task; (4) data transformation: summary or aggregation operations by transforming data into a form suitable for data mining; (5) Data Mining: Using intelligent way to extract the data model. Prior knowledge of the target and the type of data mining will be OK, and then select the appropriate mining algorithm based on the type of knowledge needed to finally acquire the knowledge required from the database in the selected mining algorithms; (6) Mode Assessment: to assess the knowledge model really interesting measure by some interest; (7) Knowledge Representation: Visualization through knowledge representation technology showcase mining knowledge to the user, through the above process continuous cycle operation, you can dig out of that knowledge for continuous refinement and deepened. 4 Spatial Data Mining Applications in GIS Spatial Data Mining combination of technology and GIS has a very broad application space.Spatial Data Mining with GIS has three modes: one for loose coupling type, also known as external spatial data mining model that essentially GIS viewed as a spatial database in GIS environment by means of other external software or computer language spatial data mining, data communication between the GIS and the use of contact. The other is embedded, also known as the internal spatial data mining model, that in the spatial data mining technology integration in GIS spatial analysis functions to go. The third is a hybrid space model method is a combination of the first two methods, namely the use of GIS functionality provided as to minimize the workload and difficulty of the user self-developed, remain flexible external spatial data mining models. The use of spatial data mining techniques can be found in the following several major types of knowledge from spatial databases: general knowledge of geometry, spatial distribution, spatial association rules, spatial clustering rules, spatial characteristic rules, the rules distinguish between space, spatial evolution of the rules for object. At present, this knowledge has been used in more mature Explorer military, land, electricity, telecommunications, oil and gas, urban planning, transportation, environmental monitoring and protection, 110 and 120 rapid response systems and urban management. In the market analysis, customer relationship management, banking, insurance, demographics, real estate development, personal location services and other areas are also received extensive attention and application, in fact, it is deep into every aspect of people work and live. 5 Current spatial data mining Problems Spatial data mining has become a database of information and decision-making is an important research direction, despite some progress, but it is still attractive and challenging, there are still many issues to be studied: (1) the majority of spatial data mining algorithms is a general migration from data mining algorithms, and did not consider the spatial data storage, processing and spatial characteristics of the data itself. Spatial data is different from the data in a relational database, is the use of complex, multi-dimensional spatial data index structure of the organization, has its unique spatial data access methods, thus traditional data mining technology is often not a good analysis of complex spatial phenomena and space object. (2) the spatial data mining algorithms is not efficient, not scouring discovery mode. Faced with massive database systems, spatial data mining process appears uncertain, the possibility of errors dimension model and problems to be solved are great, not only increases the algorithm of the search space, but also increased the blind searches possibility. And therefore it must be removed with the use of domain knowledge discovery tasks unrelated data, effectively reducing the dimension of the problem, design a more effective knowledge discovery algorithms. (3) There is no accepted standardized spatial data mining query language. One reason for the rapid development of database technology is the continuous improvement and development of a database query language, therefore, to continue to improve and develop spatial data mining is necessary to develop spatial data mining query language, digging the foundation for efficient spatial data. (4) Spatial Data Mining Knowledge Discovery System interaction is not strong ,in the knowledge discovery process is difficult full and effective use of expert knowledge in the field, they can not very well control the spatial data mining process. (5) spatial data mining and integration with other systems is not enough, ignoring the GIS spatial knowledge discovery process in the role.One way and features a single scope of spatial data mining system will be subject to many restrictions, the development of the knowledge system is limited to the database field, if you want to find in a wider area, knowledge discovery system should be a database, knowledge base, expert systems, decision support systems, visualization tools, network systems integration and many other technologies. (6) spatial data mining method and single task,Basically for a specific problem,It is possible to find limited knowledge. 6 trends of spatial data mining Due to space data has massive, non-linear, multi-scale and fuzzy and other characteristics,extract knowledge from spatial databases more difficult than extracting knowledge from traditional relational databases,his gives spatial data mining research challenges.Spatial data mining in the future, there are many theories and methods need further study: (1) Algorithms and spatial data mining techniques.Spatial association rule mining algorithm, time series data mining technology, space parity arithmetic, spatial classification technology, space outlier data mining algorithms, spatial research focus, while improving the efficiency of spatial data mining algorithms is also very important. (2) pre-processing of multi-source spatial data..Spatial data includes DLG data, image data, digital elevation models and feature attribute data, due to the difficulties of its own complexity and data collection, spatial data, there is inevitably missing value, noise and inconsistent data data, pre-processing of multi-source spatial data is particularly important. (3) Spatial data mining network environments, visual data mining, integration of spatial data mining raster vector, background concept tree automatically generated (location, property, time, etc.) based on spatial data mining uncertainty, increasing data mining, multi-resolution and multi-level data mining, parallel data mining, data remote sensing image database mining, knowledge discovery multimedia spatial database integration of different spatial data mining methods and techniques of the future research directions. It is foreseeable that spatial data mining will not only promote space science, the development of computer science, but also will enhance human understanding of the world, the discovery of knowledge, in order to better transform the world, the service of human society. References [1]Nicholas R. Jennings A Roadmap of Agent Research and Development, Autonomous Agents and Multi-Agent System 1[M]. Boston: Klumer Academic Publidkus,1998. [2]Li Denren,Wang shuliang,Li deyi,Theory and methods of spatial data mining and knowledge discovery[J].Wuhan University Science Journal (Information Science Edition),2002,27(3):221-233. [3]CHEN Y L,CHEN J M,TUNG C W.A data mining approach for retail knowledge discovery with consideration of the effect of shelf-space adjacency on sales[J].Decision Support Systems,2006,42(3):1503-1520. [4]LEE A J T,HONG R W,KO W Metal. Mining spatial association rules in image databases [J]. Information Sciences,2007,177(7):1593-1608. [5]BEAUBOUEF T,PETRY F E,LADNER R. Spatial data methods and vague regions: A rough set approach[J].Applied SoftComputing,2007,7(1):425-440. [6]WANG C H. Recognition of semiconductor defect patterns using spatial filtering and spectral clustering[J].Expert Systems with Applications,2008,34(3):1914-1923. [7] Wang xinzhou.Spatial Data Processing and Spatial Data Mining [D].Wuhan University Science Journal (Information Science Edition),2006,31( 1). [8]Cao jifeng.Mining Research Based on GIS Spatial Data[J].West Anhui University, 2010,4:43-46.