Imalka_Hemachandra_Research_paper

advertisement
Distributed Data Mining Solution for Detecting Behavior of
Climate Changes
T. L. I. M Hemachandra and M. G. Noel A. S Fernando
Abstract
Mitigation and Adaptation addresses a wide range of economic and environment topics including global
climate changes. Climate mitigation involve with permanently get rid of or reducing long term risks of
climate changes to human life and the climate adaptation refers to the ability of a system or people to
prepare or adjust to climate changes to minimize overall damage, to take benefits of opportunities, or to
handle the consequences.
This paper is basically describes a distributed data mining solution for detecting the behavior of climate
changes. It's a critical thing to predict or detect climate changes by discovering knowledge from
increasingly growing temporal, spatial and spatial-temporal data because of the complexities of analysis
need to be performed. Such as when integrating geographically distributed, decentralized recent data to a
single location to anlayse with historical data, analyse non-linear dependencies and dynamical behaviors
after removing noisy, inconsistent data and transpose them into appropriate forms. Because of this
bottleneck this survey is trying to map solutions generated by distributed data mining methods, techniques
with climate predicting requirements and analyse environmental facts which effect on climate changes.
The survey finally discusses based on the recent literature and proposes new directions and opportunities
to detect changing behavior of climate, enabling the people or a system to take preventive actions and
implement mitigation and adaptation strategies for climate changes.
Keywords: data mining, distributed systems, climate changes
1. Introduction
1.1 Approach
Climate changes make significant impacts on social, economic and environmental stability. That could
bring negative effects as well as positive effects. The people of South Asia and East Africa were beat by a
tsunami in 2004 without prior notice from the related organizations. Sri Lanka was particularly affected
and faced a huge damage. In Sri Lanka, the tsunami resulted in 31,187 deaths, 4,280 missing people,
23,189 injured people, and the displacement of 545,715 people [1]. After the earthquake, tsunami and
nuclear crisis in 2011, Japan has begun major reconstruction efforts in the affected areas, worked to
regain control of damaged nuclear power plants, and carried out relief efforts, the rest of the nation has
progressively returned to normal life [2]. As a developing country Sri Lanka, tried hard in past few years
to recover the damage caused by a disaster like tsunami. So there is a great demand for a climate
prediction system in order to getting adaptive actions.
It is an impossible thing to come to a straight decision or a conclusion regarding climate changes only by
analyzing the data from one area or from a one region [3]. Because there will be plenty of reasons effect
for a single change directly or indirectly. So it is very important thing to analyze all those area when come
to a conclusion. A basic approach for predict climate changes is to analyze observations from remotely
established satellites, wireless mobile devices, sensors or from radars which detects weather related
information.
The final output types of geographical distributed information systems would be varying from one to
another. In such cases the analyst has to face many problems due to rapidly growing and rapidly changing
dynamic behavior, limits of predicting the future state of atmospheric systems because of its confused
states, turning multi-source data in to a single type for the analysis purposes, managing recourses such as
computational memory and network connections etc. The major challenge has to face when analyzing
regional changes and sensor data streams is the uncertainty and risk in decision making because of its
nonlinear relationships. Although it is available a ripe literature related to data mining applications and
climate statistics there are lack of approaches in climate data mining systematic efforts.
Data mining on climate changes is a matured and vast area. There are environmental factors which effect
climate changes such as distance from the sea, ocean currents, relief, proximity to the equator and the El
Nino phenomenon. So only analyzing regional impacts related to temperature, pressure, rainfall etc is not
efficient. It’s required to analyze human related activities such as effect of population increasing, forest
distortion, fuel burning, farms, cities, and roads development, usage of lands etc. According to the IPCC
Fourth Assessment Report Climate Change 2007, aerosphere is the basic component which effects mostly
to the climate. Mean and the variability of the temperature, rain fall and wind patterns over period of time
are the most important terms to describe the climate. The climate system is influenced by its own internal
dynamics and by the external forcing factors as well. That report is clearly points out that presence of
greenhouse gases is a main cause for increase the global warming. Greenhouse gases help to ambush the
hotness in atmosphere, and absorb the destructive radiation, infrared and ultra violet rays from sun [9].
Due to the human interactive activities such as forest distortion, burning fossil fuels etc, changes occur in
stable percentages of greenhouse gases since last few decades. When economy is developing rapidly
people are tend to use, invent high technologies improve the industries. Changing patterns in population is
also effect for climate changes.
To predict or come to a conclusion of a climate change, it’s needed to analyze data from geographically
distributed recourses. When performing data mining methodologies and techniques on these kinds of data
its need to be perform lot of data preprocessing activities. The thing that needs to be emphasizing from
above data is that is all of them are in different scales and different distributed regions. To increase the
accuracy and consistency of the final outcomes are directly depend on the accuracy and the consistency of
inputs. The basic data types need to be analyzed related to the climate changes are temporal or spatial
data. So there is a need of collecting concurrently generated data from distributed recourses. In such a
situation impotency of a distributed data mining solution is raise.
This research is trying to find the feasibility of a predicting system for analyze observations produced by
distributed resources focused on special and temporal data series by examine general purpose data mining
techniques such as clustering, classification etc, to capture the dynamic behavior of factors and physical
processes which effects climate changes directly or indirectly and indicate the future state of them and
achieving the ability to taking adaptation and mitigation decisions.
1.2 Motivation
Today climate changes are happening in an unexpected manner [12]. These changes are directly impact
for both society and the environment. Changes in living conditions, loss of human lives, spreading
different kind of diseases are clearly seen in last few decades [13] [14] and huge damages for natural
recourses due to natural disasters and even changes in atmospheric, soil and water layers on the earth. In
these days the most of the natural disasters can be hear from developing countries such as tornados from
America, earthquakes and Japan, Newzeland etc. Because of this deteriorate condition and economical
compensation all the parties are interesting and pay attention for detecting climate changes. So there is a
huge demand for climate predicting system from whole parties all around the world.
In Sri Lanka the infrastructure to monitor and enforce environmental convention is insufficient [4].
Environmental data are lacking for the analysis purposes and there also limitations in resources. These
limitations challenge the process of climate prediction and climate impact assessment [4]. But as the
technology is developing fast, new technologies and equipments introduce for detecting climate changes.
That is a new opportunity for data analysts to get information fast through geographical distributed
information systems and increasing trend towards decentralized systems. The core needs are the
generation of predictive insights, risk management, and uncertainty characterization, with the ultimate
aim of informing adaptation and mitigation decisions.
2. Existing Material and Methods
2.1 Existing Climate Models - Community Climate System Model
The Community Climate System Model is one of the most sophisticated climate modeling tool for
simulate interconnected events that force climate on earth. CCSM3.0 version is consisted with four basic
modules which simulate concurrently and its allows researchers, interest parties related to climate changes
for to follow a line of investigations about climate states in present, future even in the past. CCSM does
not act as single climate model [8]. It can be consider as a framework which can be used to test and build
different kind of climate models.
As a major coupled climate model which consists of four individual component models (atmosphere,
ocean, land surface, and sea ice) for simulating the climate system of the earth. As a computer model it’s
trying to achieve performance portability, extensibility and modularity. This is a freely available model
for climate community. So it is required to run CCSM on verity of computer platforms and different type
of machine architectures. The performance portability expect here is maintaining a single source code
which achieve good performance across different types of computer architectures while support in climate
prediction, climate research and assessment of climate change scenarios that dealing with computationally
intensive tasks [6].
Fig 2.1 - Four separate models in CCSM version 3.0
A good code should always modular and extensible. Because of that the system might be able to extend
with new functionalities and the users of the system will be able to customize the system according to
their preference. These capabilities achieved in most of the CCSM component models by letting its users
to swap, add, or choose between physical parameterizations at their lower level.
Scientists tried to create complicated computer programs to find solutions to climate questions since
1950s[16] because researchers, other parties that interesting in climate cannot recreate models like
atmosphere in a test tube and run experiments on it. So a model like CCSM is imperative to climate
science. This provide a great assist to predict the possibility of climate patterns for specific regions or
individual states by providing required information that can use to protect investments and save human
lives. When making decision through a model like this, it’s required to get an assurance of its predictions
or output results. So the model outputs need to be comparing with real time observations. Those
observations may be obtained from remote sensors, satellites or from historical records. These sources are
rapidly producing massive volume of data. As the increasing rate of the data is exceed the rate they
analyzed. In a situation like this the data mining approach will provide a much better solution. [3]
There are lots of standalone climate researching centers all over the world [17]. Each of them having their
own technologies and equipments in order to collect / analyse climate related data. But there is a
possibility to share that distributed knowledge with each center. Countries like Sri Lanka can take a great
benefit through this distributed, shared technology model.
2.2 Data Mining Approach for Climate Prediction
Climate data are related with geographical properties and it inherits the relationships between spatial,
temporal data sources. Significance of global changes detection is coming over regional changes detection
related to climate prediction. It’s a massive problem to handle different types of data in variety of scales.
One of a major purpose for discover knowledge in climate related data is to connect the gap between
different scales of data types available and find the hidden knowledge inside the huge tombs of data
collections relate to spatial and temporal. So there is a need of a system which gathers, preprocess,
transform and integration huge amount of data. Existing climate models based on data mining pulling out
patterns out from the huge groups of observations, model outputs and perform data guided modeling and
simulation outputs as inputs. There are lots of data mining techniques which can be use for prediction
cases such as frequent pattern matching, classification, cluster analysis and they also included in statistical
methods or theories, neural networks, Decision trees etc.
With the development of the technology it is introducing novel techniques for gather data such as sensor
technologies, satellites advance computational paradigms. So the sizes of the data sets are increasing
proportionally with the technology. Data mining is a massive computing task that deals with memory
resident data. With the huge amount of stored data in centralized or distributed systems, traditional data
mining techniques encounter limitations and shortcomings that often lead to inefficiencies. The need for
parallel and distributed computing becomes obligation to deal with large scale data mining and for
addressing complex needs and scenarios encountered when deal with data sets that increasingly large,
heterogeneous and dynamic.
3. Methodology
Figure 3.1 - Model of the proposed system
This model is proposing a concept to share recourses and knowledge which is distributed in nature. In
order to getting accurate results or predictions related to climate changes, it is required to analyse data
from geographically dispersed locations. Determine a relevant special scale, should be the first mandatory
step with domain expert’s knowledge. Subsequently we can identify the recourses associated with the
relevant locations in that scale. We can consider this distributed resources / centers as the front-end client
layer which use for detecting climate observations with reporting tools. Then these observations / data
send to an online analytical processing (OLAP) server to access and filter out data. Data is arriving from
geographically distributed resources such as remote satellites, sensors etc. That data are massive
(terabytes in volume), temporally ordered, fast changing, and potentially infinite [3] and may be in
different scales or different types. So when analyzing them, it is required to bring them in to a same scale
with mapping tools and need to extract necessary parameters out of them by removing the unwanted or
noisy data. This is the most important task because accuracy of the results is strongly depending on the
processed input. Finally the processing phrase of the model is performed by using backend tools and
utilities related to data mining. We are retrieving large volumes of data, it is required to extract, transform
and load the data relevant only for analyzing phrase by optimizing them. This system is going to combine
data from multiple resources. So that enables a central view across the multiple organizations.
Figure 3.2 - Data mining as a step in the process of knowledge discovery [7]
Before performing data mining task we need to decide the necessary parameters for the analysis part
according to the data mining method going to be use. Then it’s required to extract that parameters from
data warehouses and perform data mining task on them. The data mining step may interact with the user
or a knowledge base. The interesting patterns are presented to the user and may be stored as new
knowledge in the knowledge base.
4. Discussion and Conclusion
Due to the uncertainty of the climate changing patterns climate prediction is still a challenge. Still the
most of the analysis are made on traditional properties such as rain fall measurements, temperature
variations etc. But there is lot of other factors which effects climate changes indirectly. The population
deviation, levels of greenhouse gases, effects of the radiation are some of the factors which can be
consider when predicting climate changes. When consider those factors it need to look for the
contribution from each countries and regions to even get preventive actions. So in future there is an
essential needing for consider gas emissions, compositions of the atmosphere, variations in the population
growth with respect to the development of economy and technology.
In Sri Lanka, the infrastructure to monitor and enforce environmental convention is insufficient.
Environmental data are lacking for the analysis purposes and there also limitations in resources. Domain
experts in Sri Lanka are migrating to developed countries. These limitations challenge the process of
climate prediction and climate impact assessment. But as the technology is developing fast, new
technologies and equipments introduce for detecting climate changes. The consultancy of foreign experts
on the latest technologies is still a requirement.
Increasing the amount of data, bringing the data from different sources into single scales for analytical
purposes cause high computational times and cost. Traditional off line data mining techniques can be
extend up to on line decision supporting knowledge discovery applications with the integration of
distributed systems. Climate data sets are mostly related to the spatial or temporal data and collecting via
satellites, remote sensors and other distributed technologies. So it is required an efficient data mining
technique for extract patterns and knowledge with an acceptable speed and higher accuracy. Data mining
approach is already supporting data cleaning, outlier detection, data transformation etc but there is a
demand for a system for enhancing data sub setting, query processing, visualization, reformatting and
filter data according to the user or the requirement throughout the above outcomes and overcome the
challenges related to uncertainty handling and identification of non linier dependencies and relationships
among spatial and temporal data sets. As we are dealing with distributed climate data, it is not required to
bother about data security. Enhance the accuracy of predictions and knowledge sharing are the major
advantages that can be gain through distributed data, centers or organizations.
The four components atmosphere, ice, Land and ocean have its own relationship with each other. So
when climate predicting it will be help to increase the accuracy of the outcomes when perform analysis
task based on above four models. So a model like CCSM is an enormous finding for climate changes
identifying. As a future direction there is a demand for a system which can analyze factors such as
radiation, CO2 levels and other gas levels together.
5. References
[1] Tull, Matthew. The Psychological Impact of the 2004 Tsunami. July 20, 2009.
[2] Fujisaki, Ichiro. Japan’s Recovery Six Months after the Earthquake, Tsunami and Nuclear Crisis.
Cond. Richard C. Bush. 2011.
[3] "Data Mining for Climate Change and Impacts." Data Mining Workshops, 2008. ICDMW '08. IEEE
International Conference . Pisa , 2008 .
[4] Zubair, Lareef. "Challenges for environmental impact assessment in Sri Lanka." Environmental
Impact Assessment Review, 2001: 469-478
[5] Baboo, S. Santhosh, and I. Kadar Shereef. "Applicability of Data Mining Techniques for Climate
Prediction – A Survey Approach." International Journal of Computer Science and Information
Security, 2010: 203-206.
[6] John B. Drake, Philip W. Jones, George R. Carr Jr. "Overview of the Software Design of the
Community Climate System Model." The International Journal of High Performance Computing
Applications, 2005: 177–186.
[7] Jiawei Han, Micheline Kamber. Data Mining Concepts and Techniques. Diane Cerra, 2006
[8] CCSM Brochure. March 09, 2004. http://www.ucar.edu/communications/CCSM/.
[9] Global warming - Wikipedia, the free encyclopedia:. November 19, 2011.
http://en.wikipedia.org/wiki/Global_warming#Greenhouse_gases.
[10] Jones, Dr. Michael Pidwirny & Scott. 7(y) Causes of Climate Change:. October 04, 2011.
http://www.physicalgeography.net/fundamentals/7y.html.
[11] Deforestation in Sri Lanka - Wikipedia, the free encyclopedia:. November 14, 2011.
http://en.wikipedia.org/wiki/Deforestation_in_Sri_Lanka.
[12]
McGhee,
Robert.
Climate
and
People
in
the
Prehistoric
Arctic.
http://www.carc.org/pubs/v15no5/5.htm.
[13] Hasham, Alyshah. Banner. January 25 , 2011. http://www.internationalnewsservices.com/articles/1latest-news/17833-climate-change-spreads-infectious-diseases-worldwide.
[14]
Institute,
Climate.
Human Health and Climate
Change.
July 27, 2011.
http://www.climate.org/topics/health.html
[15] CESM Models: CCSM3.0 Public Release. 2011. http://www.cesm.ucar.edu/models/ccsm3.0/.
[16] CCSM Brochure:History. http://www.ucar.edu/communications/CCSM/history.html
[17] UVA Climate--Climate Organizations:. http://climate.virginia.edu/orgs.htm.
Download