10 - Ganga River Basin Environment Management Plan

10.1 Preamble
The Ganga river basin management plan is an ambitious and unique proposal. It has
been conceived to understand and rectify the various environmental issues that
have cropped up due to the continuing expansion of human habitat in the basin. In
order to achieve the goals of the project, scientists from different fields need to
work in a synergistic manner. A crucial component of the entire exercise will be an
integrated geo-spatial database management system to be used by all thematic
groups and policy makers. The system will provide data storage, retrieval,
visualization and search capabilities. In addition, it will provide relevant interfaces
that can be used by the different thematic groups for simulation, prediction and
analysis of data.
The enhancement of sensor technologies coupled with the advent of advanced
geographic information systems (GIS) provide myriad and virtually limitless
opportunities to applications for assessment and evaluation of natural resources in
a sustainable manner. However, such systems require the capabilities of robust and
large databases in order to be successful in the long run. Thus, the proposed data
centre is an ideal and crucial cog, on which the smooth running of the Ganga basin
project wheel depends.
Several themes have been outlined for managing the different aspects of the plan. A
major common effort will be to collect myriad types of data ranging from climate,
soil conditions, bio-diversity, land usage and socio-economic practices. While the
sources from where these data will be available are different, it will be beneficial
from both a scientific as well as a management point of view to store all the various
types of data in a central repository. The proposed repository or data centre will
provide the additional benefit of linking the data from different themes to get an
overall perspective.
As the data has been (and will be) collected over a period of time across different
spatial sites in the Ganga basin, it will be spatio-temporal in nature. Designing a
geo-spatial database management system is, therefore, crucial to the entire project.
10.2 Objectives
We envision four important aspects of the system:
A database (henceforth referred to as a data centre) that can house and interconnect the different types of data,
Query, visualization, and retrieval capabilities of the stored data,
Interfaces that will make the data available to simulation tools, and
Data mining, pattern recognition and knowledge modelling.
Creation of a data centre using open source technologies and tools with the aim
of migrating to such a system eventually. It is expected that in the initial stages
proprietary software and tools may have to be used since most end users are
generally familiar with them and will require time and training to migrate.
A unified database that has access to all sorts of data is necessary not only to serve
as a central repository, but also to establish the connections across the different
spatial sites, periods and sources of data. Moreover, these will help the thematic
groups to link their own sources of data to other related data for better
understanding of the problems they work on.
However, since the database is supposed to cover every bit of data collected or
produced by every thematic group, the amount of data will be very large. Such
voluminous and continuously increasing data calls
for sophisticated query
processing and indexing techniques. The database must support improved ways of
searching and retrieval in order to be practically useful to the domain scientists.
Since the data can have various attributes, multi-dimensional indexing techniques
along with suitable similarity measures need to be developed. Another important
feature of the data centre is visualization and representation of data in different
forms that are more suitable and amenable to the needs of the domain specialists.
It is important that the provenance of each piece of data can be tracked and can be
displayed on a map of the Ganga basin with the exact time period when it was
collected. For a particular site, a time-series of each type of data needs to be
displayed. This will also help in dissemination of information about the progress of
the project and the status of the river to the general viewers.
Models will require various abstraction layers on top of the raw data, e.g., a flow
abstraction that projects the flows into and out of a chosen object (such as the main
stem of the Ganga or one of its tributaries) giving point and extended source flows.
The comprehensive data gathering and modelling exercise, both qualitative and
quantitative, will also reveal gaps in the existing data and help guide future data
collection efforts.
Another very important aspect of the system will be the data mining and pattern
recognition components. Since the amount of data is extremely large, it is not
possible to sift through them manually and find patterns. Specialized machine
learning techniques need to be applied for pattern discovery and trend analysis.
Statistical methods and models can be incorporated to identify data that is
statistically unlikely and, therefore, points to some unusual physical phenomena
that warrants further exploration. Due to the large area of the Ganga basin, it may
not be possible to collect data from all the spatial sites at all times. Thus, building
an appropriate generative model that describes the different data sources will be a
boon. The model will also help to simulate different situations such as flood,
drought, etc. and predict the future values of various physical parameters. This will
be a valuable resource for policy makers and scientists alike. Since the data will be
from different sources, linking the metadata is important to understand the
relationships among the various types of data. Therefore, the construction of
knowledge models and ontologies are vital as well.
Research on pattern recognition and statistical analysis can provide value addition
as well as support research of other thematic groups. This research will be long
term and will evolve over time with interactions between other thematic groups to
understand their data and identify their requirements. These include, but are not
limited to the following ideas. Sensitivity tolerance and confidence levels can be
added to the models developed by other thematic groups using statistical analysis.
Pattern recognition on remote sensing data will be an important aspect to develop
maps on surface water, glacier extent (and monitoring), soil composition, land-use,
forest cover (and monitoring), etc. Relevant processed data at different times can
feed models of other thematic groups to make better and/or additional parameter
10.3 Scope
The scope of the project extends to the entire Ganga basin management plan. The
data centre will include all the data requirements of all the thematic groups. It will
also include a portal cum qualitative knowledge map (Gangapedia) that will subserve the communication needs of the project.
10.4 Types of Data
The collection of data is external to the project. It is assumed that the different
thematic groups will feed the data collected or generated by them to this group.
Some of the typical sources of data that are expected from them are:
Data from water sources
b) River water levels
Pollution levels
d) Rainfall
Ground water levels
Ground water pollution levels
g) Glacier sizes and melt rates
h) Bio-diversity maps
Chemical substance levels
Data from land sources
k) Land use maps
Pollution levels
m) Bio-diversity maps
n) Remote-sensing data
o) Topographic data
p) Soil composition data
10.5 Methodology
The data objects, attributes, sources, views and interfaces will be identified in close
consultation with representatives of all thematic groups. A consultative group with
representation from each thematic group will be formed to understand the data
requirements of each group and the database group will design and implement the
necessary requirements. It is expected that these requirements will evolve over the
course of the project. The steps below give a more detailed picture of the approach
that will be taken:
a) Identify the objects in the entire system.
b) Identify the attributes for each object - in particular the spatial and temporal
c) Identify the type and structure of data elements and the interfaces needed.
d) Identify the meta data tags for the data elements in the system.
e) Design data mining techniques to access the raw and processed data in different
f) Create a communication portal for within project and external communication
g) Create qualitative knowledge models showing dependencies and nature of
h) Design a security policy for access to data.
Identify the hardware and software needs (e.g., servers, database, GIS and
visualization software, network bandwidth for connectivity, etc.).
Design and implement a system that meets the requirements from (a) to (h)
k) Design pattern recognition techniques to identify trends and anomalies.
Research on other aspects of mapping, modeling, prediction and support to
other thematic groups.
10.6 Work Plan
Setup of the basic data centre
(items (i), (iv) and (viii) with
basic/standard data access
Development of specialized
interfaces for simulation and
modeling; visualization; other
abstraction layers; creation of
qualitative knowledge models.
Initiate research into data
prediction - data mining, pattern
recognition, machine learning,
ontology creation.
10.7 Deliverables
Data centre with appropriate querying, retrieval, visualization, API interfaces and
data abstraction facilities. The data will be acquired by the individual
thematic groups and given to the database group.
10.8 The Team
S No
Alka Bhushan
IIT Bombay
N L Sarada,
IIT Bombay
Smita Sengupta
IIT Bombay
Umesh Bellur
IIT Bombay
A K Gosain
IIT Delhi
A K Mittal
IIT Delhi
Arnab Bhattacharya
IIT Kanpur
Bharat Lohani
IIT Kanpur
Harish Karnick
IIT Kanpur
Krithika Venkataramani
IIT Kanpur
Onkar Dikshit
IIT Kanpur
Purnendu Bose
IIT Kanpur
Rajiv Sinha
IIT Kanpur
T V Prabhakar
IIT Kanpur
Vinod Tare
IIT Kanpur