Construction of a Grid-computing network for life science in

advertisement
Construction of a Grid-computing network for life
sciences in China
Shoji Hatano, Yoshihiro
Ichiyanagi, Juncai Ma*
Institute of Microbiology, Chinese
Academy of Sciences
Beijing 100080
People’s Republic of China
+86-10-62551764
ma@sun.im.ac.cn
Hongyu Shi, Yoshiyuki Kido,
Susumu Date, Toshiyuki
Okumura, Hideo Matsuda
Graduate School of Information
Science and Technology, Osaka
University
c/o BioGrid Business Center, Senri
Life Science Center 12F, 1-4-2
Shinsenri-higashimachi, Toyonaka,
Osaka 560-0082, Japan
+81-6-6873-2116
Shinji Shimojo
Cyber Media Center, Osaka
University
c/o BioGrid Business Center, Senri
Life Science Center 12F, 1-4-2
Shinsenri-higashimachi, Toyonaka,
Osaka 560-0082, Japan
+81-6-6873-2116
shimojo@cmc.osaka-u.ac.jp
date@ais.cmc.osaka-u.ac.jp
ABSTRACT
in life sciences are available in the world [1].
We have started to construct a Grid-computing network for life
sciences in China. We constructed a testbed of its infrastructure
for BLAST program. A portal software for job management was
implemented in China. BLAST jobs were thrown to servers in
China and Japan, then executed by connecting database in China.
The results were successfully returned to the portal, demonstrating
that the Grid network was highly durable beyond the border.
China is one of megadiversity countries. Therefore, this would be
a basis for presenting the unique biodiversity database to the
world.
The technology is also effective to manage biodiversity data
beyond borders. In this paper, we present our strategy for
construction of Grid-computing network. Moreover, a preliminary
implementation is presented. This involves sharing databases,
software and computational powers between China and Japan.
Categories and Subject Descriptors
H.3.4 [Information storage and retrieval]: Syetems and software
– Distributed systems.
General Terms
Keywords
biodiversity,
China is one of megadiversity countries where 70% of the planet’s
species lives. China has 30,000 species of higher plants. The
numbers of species of fish and birds are 1,244 and 3,862
respectively, which are in top class among the megadiversity
countries [2].
IT industry in China is now growing bigger and bigger. The
technological level is now high enough to export IT products to
the world.
Management, Performance, Design, Experimentation
Grid-computing, megadiversity,
cooperation, BLAST.
2. STRATESY FOR CONSTRUCTION OF
THE GRID NETWORK
2.1 Megadiversity and IT technology
international
1. INTRODUCTION
Biodiversity supports the lives of human being and provides
various kind of benefits. It is unequally distributed around the
world. In fact, countries with huge biodiversity have been
important supplier of genetic resources for other countries.
Therefore, introduction of global information technology is
expected to connect both classes of countries.
On the other hand, data of genome analysis have rapidly expanded
and huge computer resources are now required for their
management. Grid-computing technology can produce such huge
computer resources by connecting a number of computers which
are distributed in the world. Now several Grid-computing network
Copyright is held by the author/owner(s)
Asia Pacific Advanced Network 2003, 25-29 August 2003, Busan,
Republic of Korea.
Network Research Workshop 2003, 27 August 2003, Busan, Republic of
Korea.
China is an uncommon country who has both huge biodiversity
and IT technology. This means that China has ability to develop a
total management system of biological resources from the level of
real diverse organism to that of digitized data. Therefore China is
convinced to contribute to world’s life-sciences.
2.2 Application of Grid-computing
The Grid-computing network will be constructed as a part of SDB
(Scientific Database) project. SDB project has started to establish
databases for inventory of biological species and specimens in
China. The SDB is the biggest information project in CAS. There
are 32 institutes in this project, 12 of them are related with
biology.
The following institutes are involved in the biological section of
the project:
Institute of Microbiology

* Contact author and to whom correspondence should be
addressed.
Institute of Zoology
China
Institute of Botany
Japan
3
Institute of Hydro-biology
Institute of Virology
Institute of Oceanography
4
6
5
Institute of Kunming Zoology
Institute of Huanan Botany
Institute of Wuhan Botany
1
Institute of National Genome Center
Institute of Biophysics
GSI-SFS
2
Institute of Shanghai Institute of Bio-Science
NSF
Institute of Kunming Botany
These institutes have their own databases. They are heterogenous
and distributed. On the other hand, data Grid computing is now
under development [3]. This will provide them interoperability.
Moreover, its functions are presented as Grid service based upon
OGSA (Open Grid Service Architecture), which has similar
programming interface with Web service that are widely used.
Therefore, it is easy to scale up to Grid service.
Since software of data Grid is under development, we started to
construct a testbet of Grid-computing network with more general
database and software, namely, GenBank and BLAST.
3. IMPLEMENTATION OF A TESTBED
3.1 Materials and methods
Figure 1. Sharing databases.
Server 3-5 in China connected to the databases on server
1 and 2 through NSF (Network File System). Server 6 in
Japan connected to them through GSI-SFS (Grid Security
Infrastructure-Self-certifying File System).
GUDBIRD [6]
BLAST
3.1.1.4 Servers 4,5
CPU:
Memory:
Storage:
OS:
Software:
Pentium 4, 2 GHz, Dual
512 MByte
600 GB disk array (RAID 5)
RedHat Linux 8.0 on VMware
PBS (Slave, PC cluster manager)
BLAST
3.1.1 Servers
We located 5 servers (No. 1-5) in Institute of Microbiology, CAS,
China, and 1 server (No. 6) in Osaka University. They are
connected through 100 base-T network to the Internet.
3.1.1.1 Server 1
CPU:
Memory:
Storage:
OS:
Software:
Database:
Pentium 4, 2 GHz, Dual
1 Gbyte
2 TB disk array (RAID 5)
RedHat Linux 8.0
Globus 2.0
GSI-SFS [4]
GenBank
3.1.1.5 Server 6
CPU:
Memory:
Storage:
OS:
Software:
Pentium 4, 2 GHz
512 MByte
40 GB
RedHat Linux 7.3
Globus 2.0
BLAST
China
Japan
3
3.1.1.2 Server 2
CPU:
Memory:
Storage:
OS:
Software:
Database:
Pentium 4, 2 GHz, Dual
512 MByte
600 GB disk array (RAID 5)
RedHat Linux 8.0
Globus 2.0
GSI-SFS
GenBank
4
5
6
Grid network
3.1.1.3 Server 3
CPU:
Memory:
Storage:
OS:
Software:
Pentium 4, 2 GHz, Dual
512 MByte
600 GB disk array (RAID 5)
RedHat Linux 8.0 on VMware
Globus 2.0
PBS (Master, PC cluster manager) [5]
Cluster (Master to slave)
Figure 2. Sharing computational powers.
Server 3-5 composed a cluster. Server 3 was a master and
Server 4,5 were slaves. Both servers 3 and 6 are members
of Grid-computing network beyond the border.
China
certifying File System) which presents file transfer service
through Grid network [3].
Japan
3’
3.1.3 Sharing computational powers (Figure 2)
BLAST was installed in servers 3,4,5 (in China) and 6 (in Japan).
Servers 3,4 and 5 composed a cluster by PBS [5]. Server 3 was
master and 4,5 were slaves. The BLAST jobs were thrown to
server 3 (then thrown to members of the cluster) and server 6
through Grid network.
3
4
6
5
GUDBIRD [6] was installed in server 3. This is a portal software
of Grid network which is currently able to manage BLAST jobs. It
uses MyProxy [7] for user authentication. A user can deposit
his/her credential in a MyProxy server. Since MyProxy server
submits them to Grid resources automatically, it is not necessary
for users to submit separately.
Automatic authentication and job
submission
Cluster (Master to slave)
Figure 3. Authentication and job submission by portal
software.
Portal software (3’) automatically authenticated user to
use servers 3 and 6 on Grid network. Then jobs were
submitted to them. Note that the portal function (3’) is
independently described from the BLAST and PBS
function (3).
GSI-SFS
3.1.2 Sharing databases (Figure 1)
GenBank databases on Server 1 and 2 were presented to other
servers. Since servers 3-5 are located in China, they connected the
databases by NSF (Network File System). On the other hand,
server 6 was located in Japan. Therefore it connect to the
databases through GSI-SFS (Grid Security Infrastructure-Self-
Figure 4. The entrance page of a portal software,
GUDBIRD.
User ID and
authentication.
password
were
prompted
3.1.4 User authentication and job management by a
portal software (Figure 3)
for
GUDBIRD also presents job management facility. User can select
which server to submit BLAST jobs. In our case, we were able to
select server 3 (a cluster master, in China) or server 6 (in Japan).
3.2 Results
A user were prompted to input his/her ID and password after
accessing GUDBIRD home page on server 3 (Figure 4). This was
done by Web browser. All the successive operation was done
through Web browser.
After authentication he/she entered the page in which parameters
of jobs can be set (Figure 5). Then he/she selected a Grid server to
execute a job. In this case, two servers, namely server 3 and server
6 (only one server located in Japan) were available. He/she
entered appropriate parameters for BLAST program subsequently.
Figure 5. The page for input of parameters for
BLAST jobs.
Server for job execution can be also selected.
China
Japan
3’
6
1
Job management by
the portal
GSI-SFS
2
Figure 6. The server in Japan (6) executed BLAST jobs
by accessing databases in Chana (1 and 2). The jobs are
still managed by a portal in China (3’).
Then jobs were submitted. They were thrown to servers 3 or 6
through Grid-computing network. Since server 3 was a master of a
cluster, it threw the jobs again to members of the cluster. The jobs
were executed by connecting GenBank database through NSF in
case the servers are located in China (servers 3-5).
The jobs were also thrown on the server 6 in Japan. We installed
GSI-SFS in this server. GSI-FSF provides file sharing on Gridcomputing network. Therefore the server 6 were able to connect
databases in China through GSI-SFS, then executed the BLAST
jobs (Figure 6).
The status of jobs were monitored by a page presented after
submission (Figure 7). The outputs of completed jobs were
returned and stored as home pages. Finally they were displayed as
results of the jobs (Figure 8).
3.3 Discussion
We succeeded in sharing database (GenBank), software (BLAST)
and computing powers on Grid-computing network. Especially
Figure 8. The page of result of the BLAST job.
the server in Japan mainly provided power of computation
although databases were provided by servers in China. The server
was completely controlled and managed by a portal in China.
Thus Grid-computing network is considered to be highly durable
and robust even if it was used beyond the borders.
This means that databases of China can be connected and utilized
by Grid. China is a megadiversity country and his huge
bioresources are very important. In this work, China demonstrated
its ability to present his own database to the world through Grid.
4. ACKNOWLEDGMENTS
.We are very grateful to generous support from SDB (Scientific
Database) project of CAS (Chinese Academy of Sciences).
5. REFERENCES
[1] S. Shimojo, BioGrid project. http://www.biogrid.jp
[2] WCMC. 1992. Global Diversity: Status of the Earth's Living
Resources. London: Chapman & Hall.
[3] Open Grid Services Architecture, Data Access and
Integration. http://www.ogsa-dai.org.
[4] S. Takeda, S. Date and S. Shimojo. 2002. GSI-FSF: A grid
file system. http://www.biogrid.jp
[5] Portable batch system. http://pbs.mrj.com
[6] Y. Kido, S. Date and S. Shimojo. 2002. GUDBIRD, A Grid
user interface of distributed environment for bioinformatics
and biological resource databases. http://www.biogrid.jp
[7] J. Novotny, S. Tuecke, and V. Welch. An Online Credential
Figure 7. The page for monitoring status of BLAST
jobs.
The last row of the table presented the status
(running or completed) of the latest job.
Repository for the Grid: MyProxy. in Proceedings of the
Tenth International Symposium on High Performance
Distributed Computing (HPDC-10: August 2001), IEEE
Press.
Download