Idé-skiss till informationsteknikansökan till SSF

advertisement
Idé-skiss till informationteknikansökan till SSF
A GRID Prototype computer-network in Sweden
UTKAST 8 augusti 2001/TE
Research Program
In many fields of fundamental and applied research in Sweden there is presently a strong need for
significantly increased capacity to generate, transfer, store and process very large quantities of data. New
possibilities to provide large capacity of that kind are now offered by the creation of a world wide computer
network called the GRID. The hardware part of GRID consists of large quantities of data storage discs and
robotic tapes and large personal computer clusters, spread out globally and linked together through very
high-frequency optical-fiber links. Special system software, so called middleware (e.g. GLOBUS), is
required for the operation of the GRID and is presently under development. Such middleware will make it
possible for an individual researcher to use his or her personal computer to access the very large and spreadout data and processing resources as if, pictorially spoken, these resources were sitting locally in the personal
computer. In order to make it possible for Swedish researchers to participate in the development of these new
possibilities, resources need to be allocated for the development of the GRID technology in Sweden. The
plan is to set up in Sweden a node, or Tier, of the so-called GRID Prototype, the build-up of which is
organized as an international collaboration and scheduled for the years 2002-2004. This Swedish GRID
Prototype Tier is proposed as a common effort among several different research fields and universities in
Sweden. An advantage of making the effort commonly is that an enhanced international competitiveness can
be attained through coordination of the resources. Below the different research fields involved are briefly
described, in particular with regard to their need and potential usage of the GRID technology.
1.
Biomedical Sciences
Bioinformatics is a new scientific discipline, at the interface between biology, medicine, mathematics and
computer science. There is currently an enormous quantity of information held in the biological databases
and there is a growing need to consult these databases. The sequence of the first completed genome was
published in 1995 and the human genome sequence was published half a year ago. It is generally agreed that
the major challenge for bioinformatics research in the next few years is to elucidate the relations between
sequence, structure and function. Understanding these relations will have a profound effect on industry and
society as a whole. However, computers, algorithms, software and infrastructures have difficulties keeping
up with the exponentially rising tide of data that is generated world wide in biomedical research. Here, GRID
opens up new perspectives for bioinformatics in terms of computing resources and data storage GRID
technologies.
The following are examples of relevant biomedical Swedish research projects. The GRID can be used to
exploit databases that have been made on gene expression (DNA chips analysis) - the GRID may in this case
ultimately help to generate “virtual cells” where experimental outcomes may be predicted. The GRID could
also be of value to exploit a project to related human brain characteristics, gene information and psychiatric
disorders (HUBIN). There is a number of other GRID applications in the area between molecular and clinical
science. In structural biology the GRID may be of value to develop new drugs and in clinical practice to
communicate and evaluate medical images. Digital medical images of different nature displaying e.g.
molecular structures, microscopic analysis of tissue sections or clinical images resulting from Computer
Tomography (CT) or Magnet Resonance Imaging (MRI) all contribute to a large inflow of data. The
automatic analysis of cancer-tissue image-data and the evaluation of this data for radiation treatments
represent an application of high potential value.
2.
Earth Sciences
The atmosphere, the oceans, the cryosphere, the land surfaces and the solid earth are presently under intense
study. One of the prime objectives of such studies is to understand how mankind may alter global climatic
conditions. Global climate change has been identified as a key environmental research area and substantial
efforts are directed towards observing and modeling climate change. The observational efforts include
satellite monitoring as well as conventional observing systems such as weather balloons, surface
measurements and paleoclimatological evidence of past climate changes. To estimate possible future climate
change a physically based modeling of the climate system is required. A physical/mathematical modeling of
the climate system needs substantial computational resources. Also collecting and analyzing data from Earth
observing systems is a computationally demanding task. Huge amounts of data are transferred from
observation platforms, such as space based satellites, to data analysis centers.
Swedish climate scientists are active in many research areas where supercomputers and high-speed data
networks are necessary research tools. In the area of climate modeling a regional climate modeling system at
the forefront of international climate research has been developed in Sweden. Swedish climate scientists have
contributed substantially to international climate research programs and have played a significant role in the
process leading to the 2001 report from the Intergovernmental Panel on Climate Change (IPCC). This report
has been fundamental as a scientific basis for the recently finalized negotiations on reductions of greenhouse
gas emissions, the so-called Kyoto protocol. A EU project, PRISM, attempts to build on the computer
software developed in Europe and to motivate the establishment of a European climate supercomputing
facility. Software for exchanging large volumes of data and considerable needs for storage capacity are
common requirements in GRID and PRISM. A Swedish GRID center may thus benefit the Swedish climate
modeling community both through shared software development and, in particular, shared resources for data
storage and transfer.
3.
Space and Astro Sciences
New observational techniques have radically advanced the possibilities to explore the near outer space, in the
surroundings of the Earth, and the far outer space, in our galaxy and beyond. There are both space-borne and
ground-based instruments. Examples of the former are the upcoming ENVISAT Earth observation satellite,
which will provide measurements of the atmosphere, ocean, land, and ice over a five year period, ODIN, the
ongoing atmospheric physics and astrophysics mission, CLUSTER, a space physics mission in which
Swedish science and technology play prominent roles and the HUBBLE telescope for astronomical
observations. Examples of ground-based instruments, in which Swedish researchers play a significant role,
are the planned radio observatory LOFAR/LOIS, a geographically distributed, multi-point radio facility for
astrophysics, space physics, atmospheric physics and radio research, and the optical telescopes at ESO, the
European Southern Observatory, for astronomical observations.
When in full operation in 2006, LOFAR/LOIS will consist of 31 000 antennas with sensors and emitters,
organized in hundreds of clusters distributed within a circular geographical region of about 350 km diameter.
The total data rate will be about 25 Tbits/s. This means that LOFAR/LOIS will utilize technology at, or even
beyond, the current leading edge, thus requiring antenna, radio, detector, and data handling technologies to
be advanced beyond their present limits. The new ESO observatory in Chile currently produces about 6
Tbytes of sophisticated data per year. The development of a new data analysis technique is needed to extract
maximum information from this data. The elaboration of new techniques based on the GRID technology
may totally change the whole process of astronomical observations, such that the results will be expressed in
physical units and signatures of the telescope and the effects of the Earth atmosphere will be completely
removed from the data.
4.
High Energy Physics
To reach a more fundamental level understanding of the ultimate constituents of physical matter and their
interactions, than that provided today by the so called Standard Model of High Energy Physics, will require
the study of particle collisions at about ten times higher energies than what is feasible presently. The Large
Hadron Collider (LHC) at the European CERN laboratory in Geneva will make such experimental studies
possible from 2006 and onwards. The rate of proton collisions at LHC will be about 100 times the rate at the
present highest-energy particle collider located in Chicago, USA. Such high collision rates will result in the
production of several Petabyte of experimental data per year, the analysis of which will require a global
computing capacity of several Teraflop. As a preparation for the build-up of a GRID system of such capacity,
CERN and the international High Energy Physics community is now proposing to build during the years
2002-2004 a GRID Prototype computer network having approximately one tenth of that capacity. This GRID
Prototype will consist of a GRID Prototype Tier in Geneva and other GRID Protoype Tiers in different
regions of the world connected by 10 Gigabit/second links.
In order for Swedish researchers to be in a position in future to compete at the forefront in the searches for
new particles and phenomena at LHC, the creation of a high-performance GRID Tier in the Nordic region
appears as necessary. An intra-Nordic test-bench project NORDUGRID has already been created with the
aim of preparing the set-up of such a Nordic GRID Tier. It would be of prime interest to have the Nordic
GRID Tier located in Sweden. In order to make way for this, the build-up in Sweden of a GRID Prototype
Tier would constitute a necessary first step. The European DATAGRID project, coordinated by CERN in
Geneva, prepares the ground for a worldwide data-intensive GRID project and comprises participants from
High Energy Physics, Biomedical Sciences, Earth Sciences, National Computer Centers and Industry. It is
thus natural to propose that the Swedish GRID Prototype Tier, to be connected to CERN in Geneva, will be
made to serve several research fields and have connections also to other future major GRID systems.
5.
Information Sciences
Exceptionally large amounts of distributed data and computational resources will be available through the
GRID. Conventional system and database design techniques are not well suited for such a dynamic and
widely distributed environment. This provides research challenges within the areas of distributed data
management, distributed systems, and distributed system development:
-The GRID environment is very different from conventional server-oriented database environments. There
will be need for new kinds of widely distributed database systems able to manage the very large amounts of
dynamic data that is distributed over many heterogeneous nodes in a high-performance multi-layered global
network.
-The GRID environment will require the integration of both data and computations. Research is needed on
how to effectively combine distributed database and distributed computation techniques.
-Large applications will need new tools to effectively utilize the available GRID resources at any given point
in time. It will be virtually impossible to manually configure tasks in this dynamic environment. New tools
and algorithms will be needed for automatically configure and adapt data and computations based on the
resources available.
-Different programmers and teams at different geographic locations will develop the different component
databases and systems. Research is required on new tools and methodologies for designing and managing
very large distributed and heterogeneous systems.
These problems are already under investigation by Swedish information scientists. The future availability in
Sweden of a GRID Prototype of adequate size will enable important test of the new ideas on data
management and distributed systems, using the applications in the various research fields mentioned above as
test cases.
6.
Computer Centers
The development of the technical aspects of GRID technology is already well advanced at the national and
other computer centers in Sweden. The replacement of the earlier single large mainframe computers by
clusters of many small commodity computers results in large cost savings but requires the computations to be
divided up on many parallel streams. Such parallelisation, which requires simultaneous access to different
types of computing resources, is more easily achieved in certain application than in others.
An essential prerequisite for the GRID is the rapid development of the earlier mentioned middleware. Such
development presently represents an important activity at computer centers. There are several different
problems to be solved in order to achieve a reliable and user-friendly GRID. One is that of authenticication,
authorization, security and privacy of the operations to be made over the GRID network. These questions are
particularly critical for future industrial applications of the GRID technology, as they are needed to ensure a
fair competition between competing commercial companies. Other problems to solve for the GRID are how
to provide transparent access to different types of computer file systems, how to make possible the
simultaneous use of different operating systems and different visualization systems in various parts of the
network, how to build a GRID in such a way that it can be scaled up to a larger size and capacity without
having to reorganize the basic network structure and how to deal with the requirements for distributed
database management discussed under Information Sciences.
Download