Distributed Data Mining Solution for Detecting Behavior of Climate Changes T. L. I. M Hemachandra and M. G. Noel A. S Fernando Abstract Mitigation and Adaptation addresses a wide range of economic and environment topics including global climate changes. Climate mitigation involve with permanently get rid of or reducing long term risks of climate changes to human life and the climate adaptation refers to the ability of a system or people to prepare or adjust to climate changes to minimize overall damage, to take benefits of opportunities, or to handle the consequences. This paper is basically describes a distributed data mining solution for detecting the behavior of climate changes. It's a critical thing to predict or detect climate changes by discovering knowledge from increasingly growing temporal, spatial and spatial-temporal data because of the complexities of analysis need to be performed. Such as when integrating geographically distributed, decentralized recent data to a single location to anlayse with historical data, analyse non-linear dependencies and dynamical behaviors after removing noisy, inconsistent data and transpose them into appropriate forms. Because of this bottleneck this survey is trying to map solutions generated by distributed data mining methods, techniques with climate predicting requirements and analyse environmental facts which effect on climate changes. The survey finally discusses based on the recent literature and proposes new directions and opportunities to detect changing behavior of climate, enabling the people or a system to take preventive actions and implement mitigation and adaptation strategies for climate changes. Keywords: data mining, distributed systems, climate changes 1. Introduction 1.1 Approach Climate changes make significant impacts on social, economic and environmental stability. That could bring negative effects as well as positive effects. The people of South Asia and East Africa were beat by a tsunami in 2004 without prior notice from the related organizations. Sri Lanka was particularly affected and faced a huge damage. In Sri Lanka, the tsunami resulted in 31,187 deaths, 4,280 missing people, 23,189 injured people, and the displacement of 545,715 people [1]. After the earthquake, tsunami and nuclear crisis in 2011, Japan has begun major reconstruction efforts in the affected areas, worked to regain control of damaged nuclear power plants, and carried out relief efforts, the rest of the nation has progressively returned to normal life [2]. As a developing country Sri Lanka, tried hard in past few years to recover the damage caused by a disaster like tsunami. So there is a great demand for a climate prediction system in order to getting adaptive actions. It is an impossible thing to come to a straight decision or a conclusion regarding climate changes only by analyzing the data from one area or from a one region [3]. Because there will be plenty of reasons effect for a single change directly or indirectly. So it is very important thing to analyze all those area when come to a conclusion. A basic approach for predict climate changes is to analyze observations from remotely established satellites, wireless mobile devices, sensors or from radars which detects weather related information. The final output types of geographical distributed information systems would be varying from one to another. In such cases the analyst has to face many problems due to rapidly growing and rapidly changing dynamic behavior, limits of predicting the future state of atmospheric systems because of its confused states, turning multi-source data in to a single type for the analysis purposes, managing recourses such as computational memory and network connections etc. The major challenge has to face when analyzing regional changes and sensor data streams is the uncertainty and risk in decision making because of its nonlinear relationships. Although it is available a ripe literature related to data mining applications and climate statistics there are lack of approaches in climate data mining systematic efforts. Data mining on climate changes is a matured and vast area. There are environmental factors which effect climate changes such as distance from the sea, ocean currents, relief, proximity to the equator and the El Nino phenomenon. So only analyzing regional impacts related to temperature, pressure, rainfall etc is not efficient. It’s required to analyze human related activities such as effect of population increasing, forest distortion, fuel burning, farms, cities, and roads development, usage of lands etc. According to the IPCC Fourth Assessment Report Climate Change 2007, aerosphere is the basic component which effects mostly to the climate. Mean and the variability of the temperature, rain fall and wind patterns over period of time are the most important terms to describe the climate. The climate system is influenced by its own internal dynamics and by the external forcing factors as well. That report is clearly points out that presence of greenhouse gases is a main cause for increase the global warming. Greenhouse gases help to ambush the hotness in atmosphere, and absorb the destructive radiation, infrared and ultra violet rays from sun [9]. Due to the human interactive activities such as forest distortion, burning fossil fuels etc, changes occur in stable percentages of greenhouse gases since last few decades. When economy is developing rapidly people are tend to use, invent high technologies improve the industries. Changing patterns in population is also effect for climate changes. To predict or come to a conclusion of a climate change, it’s needed to analyze data from geographically distributed recourses. When performing data mining methodologies and techniques on these kinds of data its need to be perform lot of data preprocessing activities. The thing that needs to be emphasizing from above data is that is all of them are in different scales and different distributed regions. To increase the accuracy and consistency of the final outcomes are directly depend on the accuracy and the consistency of inputs. The basic data types need to be analyzed related to the climate changes are temporal or spatial data. So there is a need of collecting concurrently generated data from distributed recourses. In such a situation impotency of a distributed data mining solution is raise. This research is trying to find the feasibility of a predicting system for analyze observations produced by distributed resources focused on special and temporal data series by examine general purpose data mining techniques such as clustering, classification etc, to capture the dynamic behavior of factors and physical processes which effects climate changes directly or indirectly and indicate the future state of them and achieving the ability to taking adaptation and mitigation decisions. 1.2 Motivation Today climate changes are happening in an unexpected manner [12]. These changes are directly impact for both society and the environment. Changes in living conditions, loss of human lives, spreading different kind of diseases are clearly seen in last few decades [13] [14] and huge damages for natural recourses due to natural disasters and even changes in atmospheric, soil and water layers on the earth. In these days the most of the natural disasters can be hear from developing countries such as tornados from America, earthquakes and Japan, Newzeland etc. Because of this deteriorate condition and economical compensation all the parties are interesting and pay attention for detecting climate changes. So there is a huge demand for climate predicting system from whole parties all around the world. In Sri Lanka the infrastructure to monitor and enforce environmental convention is insufficient [4]. Environmental data are lacking for the analysis purposes and there also limitations in resources. These limitations challenge the process of climate prediction and climate impact assessment [4]. But as the technology is developing fast, new technologies and equipments introduce for detecting climate changes. That is a new opportunity for data analysts to get information fast through geographical distributed information systems and increasing trend towards decentralized systems. The core needs are the generation of predictive insights, risk management, and uncertainty characterization, with the ultimate aim of informing adaptation and mitigation decisions. 2. Existing Material and Methods 2.1 Existing Climate Models - Community Climate System Model The Community Climate System Model is one of the most sophisticated climate modeling tool for simulate interconnected events that force climate on earth. CCSM3.0 version is consisted with four basic modules which simulate concurrently and its allows researchers, interest parties related to climate changes for to follow a line of investigations about climate states in present, future even in the past. CCSM does not act as single climate model [8]. It can be consider as a framework which can be used to test and build different kind of climate models. As a major coupled climate model which consists of four individual component models (atmosphere, ocean, land surface, and sea ice) for simulating the climate system of the earth. As a computer model it’s trying to achieve performance portability, extensibility and modularity. This is a freely available model for climate community. So it is required to run CCSM on verity of computer platforms and different type of machine architectures. The performance portability expect here is maintaining a single source code which achieve good performance across different types of computer architectures while support in climate prediction, climate research and assessment of climate change scenarios that dealing with computationally intensive tasks [6]. Fig 2.1 - Four separate models in CCSM version 3.0 A good code should always modular and extensible. Because of that the system might be able to extend with new functionalities and the users of the system will be able to customize the system according to their preference. These capabilities achieved in most of the CCSM component models by letting its users to swap, add, or choose between physical parameterizations at their lower level. Scientists tried to create complicated computer programs to find solutions to climate questions since 1950s[16] because researchers, other parties that interesting in climate cannot recreate models like atmosphere in a test tube and run experiments on it. So a model like CCSM is imperative to climate science. This provide a great assist to predict the possibility of climate patterns for specific regions or individual states by providing required information that can use to protect investments and save human lives. When making decision through a model like this, it’s required to get an assurance of its predictions or output results. So the model outputs need to be comparing with real time observations. Those observations may be obtained from remote sensors, satellites or from historical records. These sources are rapidly producing massive volume of data. As the increasing rate of the data is exceed the rate they analyzed. In a situation like this the data mining approach will provide a much better solution. [3] There are lots of standalone climate researching centers all over the world [17]. Each of them having their own technologies and equipments in order to collect / analyse climate related data. But there is a possibility to share that distributed knowledge with each center. Countries like Sri Lanka can take a great benefit through this distributed, shared technology model. 2.2 Data Mining Approach for Climate Prediction Climate data are related with geographical properties and it inherits the relationships between spatial, temporal data sources. Significance of global changes detection is coming over regional changes detection related to climate prediction. It’s a massive problem to handle different types of data in variety of scales. One of a major purpose for discover knowledge in climate related data is to connect the gap between different scales of data types available and find the hidden knowledge inside the huge tombs of data collections relate to spatial and temporal. So there is a need of a system which gathers, preprocess, transform and integration huge amount of data. Existing climate models based on data mining pulling out patterns out from the huge groups of observations, model outputs and perform data guided modeling and simulation outputs as inputs. There are lots of data mining techniques which can be use for prediction cases such as frequent pattern matching, classification, cluster analysis and they also included in statistical methods or theories, neural networks, Decision trees etc. With the development of the technology it is introducing novel techniques for gather data such as sensor technologies, satellites advance computational paradigms. So the sizes of the data sets are increasing proportionally with the technology. Data mining is a massive computing task that deals with memory resident data. With the huge amount of stored data in centralized or distributed systems, traditional data mining techniques encounter limitations and shortcomings that often lead to inefficiencies. The need for parallel and distributed computing becomes obligation to deal with large scale data mining and for addressing complex needs and scenarios encountered when deal with data sets that increasingly large, heterogeneous and dynamic. 3. Methodology Figure 3.1 - Model of the proposed system This model is proposing a concept to share recourses and knowledge which is distributed in nature. In order to getting accurate results or predictions related to climate changes, it is required to analyse data from geographically dispersed locations. Determine a relevant special scale, should be the first mandatory step with domain expert’s knowledge. Subsequently we can identify the recourses associated with the relevant locations in that scale. We can consider this distributed resources / centers as the front-end client layer which use for detecting climate observations with reporting tools. Then these observations / data send to an online analytical processing (OLAP) server to access and filter out data. Data is arriving from geographically distributed resources such as remote satellites, sensors etc. That data are massive (terabytes in volume), temporally ordered, fast changing, and potentially infinite [3] and may be in different scales or different types. So when analyzing them, it is required to bring them in to a same scale with mapping tools and need to extract necessary parameters out of them by removing the unwanted or noisy data. This is the most important task because accuracy of the results is strongly depending on the processed input. Finally the processing phrase of the model is performed by using backend tools and utilities related to data mining. We are retrieving large volumes of data, it is required to extract, transform and load the data relevant only for analyzing phrase by optimizing them. This system is going to combine data from multiple resources. So that enables a central view across the multiple organizations. Figure 3.2 - Data mining as a step in the process of knowledge discovery [7] Before performing data mining task we need to decide the necessary parameters for the analysis part according to the data mining method going to be use. Then it’s required to extract that parameters from data warehouses and perform data mining task on them. The data mining step may interact with the user or a knowledge base. The interesting patterns are presented to the user and may be stored as new knowledge in the knowledge base. 4. Discussion and Conclusion Due to the uncertainty of the climate changing patterns climate prediction is still a challenge. Still the most of the analysis are made on traditional properties such as rain fall measurements, temperature variations etc. But there is lot of other factors which effects climate changes indirectly. The population deviation, levels of greenhouse gases, effects of the radiation are some of the factors which can be consider when predicting climate changes. When consider those factors it need to look for the contribution from each countries and regions to even get preventive actions. So in future there is an essential needing for consider gas emissions, compositions of the atmosphere, variations in the population growth with respect to the development of economy and technology. In Sri Lanka, the infrastructure to monitor and enforce environmental convention is insufficient. Environmental data are lacking for the analysis purposes and there also limitations in resources. Domain experts in Sri Lanka are migrating to developed countries. These limitations challenge the process of climate prediction and climate impact assessment. But as the technology is developing fast, new technologies and equipments introduce for detecting climate changes. The consultancy of foreign experts on the latest technologies is still a requirement. Increasing the amount of data, bringing the data from different sources into single scales for analytical purposes cause high computational times and cost. Traditional off line data mining techniques can be extend up to on line decision supporting knowledge discovery applications with the integration of distributed systems. Climate data sets are mostly related to the spatial or temporal data and collecting via satellites, remote sensors and other distributed technologies. So it is required an efficient data mining technique for extract patterns and knowledge with an acceptable speed and higher accuracy. Data mining approach is already supporting data cleaning, outlier detection, data transformation etc but there is a demand for a system for enhancing data sub setting, query processing, visualization, reformatting and filter data according to the user or the requirement throughout the above outcomes and overcome the challenges related to uncertainty handling and identification of non linier dependencies and relationships among spatial and temporal data sets. As we are dealing with distributed climate data, it is not required to bother about data security. Enhance the accuracy of predictions and knowledge sharing are the major advantages that can be gain through distributed data, centers or organizations. The four components atmosphere, ice, Land and ocean have its own relationship with each other. So when climate predicting it will be help to increase the accuracy of the outcomes when perform analysis task based on above four models. So a model like CCSM is an enormous finding for climate changes identifying. As a future direction there is a demand for a system which can analyze factors such as radiation, CO2 levels and other gas levels together. 5. References [1] Tull, Matthew. The Psychological Impact of the 2004 Tsunami. July 20, 2009. [2] Fujisaki, Ichiro. Japan’s Recovery Six Months after the Earthquake, Tsunami and Nuclear Crisis. Cond. Richard C. Bush. 2011. [3] "Data Mining for Climate Change and Impacts." Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference . Pisa , 2008 . [4] Zubair, Lareef. "Challenges for environmental impact assessment in Sri Lanka." Environmental Impact Assessment Review, 2001: 469-478 [5] Baboo, S. Santhosh, and I. Kadar Shereef. "Applicability of Data Mining Techniques for Climate Prediction – A Survey Approach." International Journal of Computer Science and Information Security, 2010: 203-206. [6] John B. Drake, Philip W. Jones, George R. Carr Jr. "Overview of the Software Design of the Community Climate System Model." The International Journal of High Performance Computing Applications, 2005: 177–186. [7] Jiawei Han, Micheline Kamber. Data Mining Concepts and Techniques. Diane Cerra, 2006 [8] CCSM Brochure. March 09, 2004. http://www.ucar.edu/communications/CCSM/. [9] Global warming - Wikipedia, the free encyclopedia:. November 19, 2011. http://en.wikipedia.org/wiki/Global_warming#Greenhouse_gases. [10] Jones, Dr. Michael Pidwirny & Scott. 7(y) Causes of Climate Change:. October 04, 2011. http://www.physicalgeography.net/fundamentals/7y.html. [11] Deforestation in Sri Lanka - Wikipedia, the free encyclopedia:. November 14, 2011. http://en.wikipedia.org/wiki/Deforestation_in_Sri_Lanka. [12] McGhee, Robert. Climate and People in the Prehistoric Arctic. http://www.carc.org/pubs/v15no5/5.htm. [13] Hasham, Alyshah. Banner. January 25 , 2011. http://www.internationalnewsservices.com/articles/1latest-news/17833-climate-change-spreads-infectious-diseases-worldwide. [14] Institute, Climate. Human Health and Climate Change. July 27, 2011. http://www.climate.org/topics/health.html [15] CESM Models: CCSM3.0 Public Release. 2011. http://www.cesm.ucar.edu/models/ccsm3.0/. [16] CCSM Brochure:History. http://www.ucar.edu/communications/CCSM/history.html [17] UVA Climate--Climate Organizations:. http://climate.virginia.edu/orgs.htm.