Grid Activity at CCS Toshiyuki Amagasa 1

advertisement
Grid Activity at CCS
Toshiyuki Amagasa
Center for Computational Sciences, Univertsity of Tsukuba
1
About Myself
Name



Toshiyuki Amagasa


Affiliation:



Division of Computational
Informatics, Center for
Computational Sciences
Department of Computer
Science, Graduate School
of Systems and Information
Engineering
Area of research

Data engineering
Database system
Recent topics

XML databases




Databases in scientific
applications


2
Parallel XML query
processing
OLAP analysis for XML
Web information extraction
for XML
Faceted navigation for
QCDml
Meteorological database
ILDG-JP Members
Prof. Mitsuhisa Sato (Director, CCS)
Prof. Tomoteru Yoshie (CCS)
Prof. Osamu Tatebe (CCS)
Dr. Naoya Ukita (CCS)
Prof. Toshiyuki Amagasa (CCS)





3
Talk Outline
Current Status of ILDG



A Brief History of JLDG
An Overview of JLDG
A Development of New ILDG Client


Faceted Navigation of QCDml
Conclusions and Future Work

4
Current Status of JLDG
5
A Brief History of JLDG (1/3)

Hepnet-J/sc 2002- (SINET GbE private network)

Widely-distributed file system



Network backbone: Super SINET VPN
Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U.,
Hiroshima U., and Kanazawa U.
Objective and Implementation


Data sharing among institutes / universities, in which administrative
policies are not homogeneous, while attaining security
Mirroring among FSs attached to SCs with administrative
CP-PACS
SR8000
CRC
@ KEK
RCNP
@Osaka
6
SX-5
File Server
File Server
Hepnet-J/sc
File Server
File Server
CCP
@Tsukuba
YITP
@Kyoto
SX-5
A Brief History of JLDG (2/3)
Problems


Growing cost for managing data location



A dataset may be distributed in several disks.
It is hard for users to remember location of data and mirrors.
No concepts of users and user groups

Hard to support multiple research groups.
Necessary functionalities



A flat data sharing system which has not space limit (or can be
extended at anytime)
Users and user group management over several organizations
 Japan Lattice Data Grid (JLDG)


7
Project launched in November 2005
Operation started in March 2007
A Brief History of JLDG (3/3)
JLDG v1 started operation in May 2008


Available datasets

CP-PACS Nf=2 QCD configuration


CP-PACS/JLQCD Nf=2+1 QCD configuration


8,000 files, 1.5 TBytes
21,000 files, 6 TBytes
PACS-CS Nf=2+1 323x64 lattice QCD configuration

2,600 files, 3 TBytes
JLDG v2 started operation in December 2009



8
Storing and sharing research data generated in daily
research activities
Data sharing within a research group
An Overview of JLDG
A widely-distributed file system with 100 TB-scale storage
for domestic researchers in particle physics





Sharing simulation data computed by SCs for several months to
several years.
Data files are distributed. Create replications if necessary.
A user do not need to recognize file locations. Files can be
accessed very quickly if the site has replicas.
Storage space can be incrementally added during operation.
ILDG
Kanazawa
KEK
Gfarm file system
Kyoto
Hiroshima
9
www.jldg.org
Tsukuba
Osaka
SINET3 Network
Software Components

Globus Toolkit V4 (ANL) www.globus.org



VOMS (EDG)


User / host certificate creation
Gfarm file system (U. of Tsukuba) datafarm.apgrid.org


VO management
Naregi-CA (Naregi) www.naregi.org


GSI authentication, Proxy user certificate creation
GridFTP server / client
Widely-distributed file system
Uberftp (NCSA)


10
http://dims.ncsa.uiuc.edu/set/uberftp/
Interactive GridFTP client
Gfarm Distributed File System




An open-source distributed file system
A global namespace to unify storage systems
Scalable I/O performance exploiting data access locality
Automated replica selection for fault-tolerance and loadbalancing
Global
namespace /gfarm
ggf
aist
jp
gtrc
file2
file1
file1
file3
Gfarm File System
11
file2
file4
Mapping
Replica creation
Summary

JLDG



A brief history
An overview
Used as an infrastructure for daily research activity

14
Hands on meeting on 27 Jan., 2009
Successfully done with19 attendees
Development of a New ILDG Client
15
Int’l Lattice Data Grid (ILDG)


A data grid for sharing Lattice QCD configuration
File Formats in ILDG

Configuration binary


Metadata (QCDml)



LIME (Lattice QCD Interchange Message Encapsulation)
ensemble XML
configuration XML
LFN (Logical File Name)

Identifier for configuration binary
ensemble
XML
markovChainURI
16
configuration
configuration
XML
configuration
XML
configuration
XML
XML
configuration
configuration
(binary)
configuration
(binary)
LFN
configuration
LFN (binary)
LFN (binary)
LFN
QCDml Ensemble XML
<markovChain xmlns=“…">
<markovChainURI>mc://JLDG/CP-PACS/RCNF2/RC12x24B1800K014090C1600</markovChainURI>
<management>
<revisions>1</revisions>
<collaboration>CP-PACS</collaboration>
<projectName>RCNF2 (Nf=2 full QCD with iwasaki RG gauge and
tadpole improved clover quark action)</projectName>
<ensembleLabel>B1800</ensembleLabel>
<reference>Phys.Rev. D65 (2002) 054505 (hep-lat/0105015),
Erratum-ibid. D67 (2003) 059901</reference>
<archiveHistory>
<elem>
<revision>1</revision>
<revisionAction>add</revisionAction>
<participant>
<name>T.Yoshie</name>
<institution>Center fof Computational Sciences,
University of Tsukuba</institution>
17
Typical Usecase of ILDG
Find desired data by MDC
LFN (Logical File Name)
Find nearby site by FC
SURL (Site URL)
Access to the site
TURL (Transfer URL)
VOMS
18
Data transfer
Authentication
Difficulties in Finding Desired Configuration

Directly use query language (XQuery / XPath)

A simple example:
/markovChain[descendant::node()[local-name() = 'beta']
[number(text()) > 4]
and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']]



Knowledge about XML, QCDml, and XQuery (XPath) are
needed.
Hard to get the whole picture of available data.
Hierarchical list



19
Easy to use.
Need huge screen to show the entire list.
Still difficult to get the whole picture of the data.
Basic Idea

Applying faceted-navigation interface to browse
QCDml ensemble XML data.
20
Faceted-Navigation

What is “faceted-navigation”?



A scheme for browsing objects with attributes.
Successfully used in some applications, such as Apple iTunes.
Procedure

A user select a value in a facet




To select a set of objects of interest
The system updates the list of objects, list of facets, and
respective values
(Repeat)
Example

The Flamenco Search
http://flamenco.berkeley.edu/
21
The Flamenco Search
http://flamenco.berkeley.edu/
22
The Flamenco Search
http://flamenco.berkeley.edu/
23
The Flamenco Search
http://flamenco.berkeley.edu/
24
Faceted-Navigation

Good features

Users have a freedom to choose a facet


Give a big picture of the dataset


Available values along with their population
Effective

25
c.f. Hierarchical list
Busch’s Law: 4 facets consisting of 10 values are enough to
deal with 10,000 objects.
Technical Challenges



How to define facets?
How to extract values according to the facets?
How to achieve quick response from the database
for improving user experience?
26
Choosing the Facets


Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe.
Selected elements from QCDml ensemble XML






Regional grid
Collaboration
Project name
Number of flavors
Time
Parameters


Lattice size
Gluon action


Quark action

27
Parameters
Parameters
Extracting Values from a Facet
(1/3)

Extract text values



Collaboration
Project name
Need substring
extraction

Date
2000
2005
2006
2007
2008
2+1 DWF
2+1 Dynamical
AsqTAD
Baryon Resonances
Dynamical FLIC
Studies
Electromagnetic Form
Factors
FLIC Overlap Studies
Flux Tube Test
Gluon Propagator
Long_aqstad_run
Pentaquark Volume
Dependence
…
CP-PACS
CP-PACS+JLQCD
CSSM
LHPC
MILC
RBC-UKQCD
UKQCD
dik
etmc
gral
qcdsf
sesam
theta
txl
…
<date>2007-02-26T21:39:33+09:00</date>
28
Extracting Values from a Facet
(2/3)

Need text value generation

Lattice size
e.g.

29
12 / 12 / 12 / 24
<physics>
<size>
<elem>
<name>X</name>
<length>12</length>
</elem>
<elem>
<name>Y</name>
<length>12</length>
</elem>
<elem>
<name>Z</name>
<length>12</length>
</elem>
<elem>
<name>T</name>
<length>24</length>
</elem>
…
Extracting Values from a Facet
(2/3)

Gluon action / Quark action
<action>
<gluon>
<iwasakiRGGluonAction>
<glossary>http://www.jldg.org/JLDG/...
<action>
<gluon>
<DBW2GluonAction>
<glossary>www.lqcd.org/ildg/pla...
An element name itself represents a value
 Extract element name as a value of a facet

30
QCDml Faceted Navigation I/F
System Configuration
ILDG
USQCD
JLDG
Facet Navigation System
LDG
(PHP + SQL + XQuery)
Web Server
(Apache)
Facet Database
31
RDBMS (MySQL)
Facet
extraction
(XQuery)
UKQCD
QCDml
Ensemble (ILDG)
& Configuration (JLDG)
XML DB (eXist)
CSSM
Downloading
Ensemble XML
Database Design (1/2)


Use RDBMS for quick response
Use fixed relational schema for extensibility
*************************** 1. row ***************************
uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC
property: rgrid
value: cssm
*************************** 2. row ***************************
uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC
property: collaboration
value: CSSM
*************************** 3. row ***************************
uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC
property: projectName
value: Dynamical FLIC Studies
*************************** 4. row ***************************
uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC
property: date
32
value:
2007
Database Design (2/2)

Store preformatted text for improving rendering
performance
*************************** 1. row ***************************
collaboration: CSSM
size: 12/12/12/24
uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC
nf: 2
gact: DBW2GluonAction (beta=8.5)
qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.130
*************************** 2. row ***************************
collaboration: CSSM
size: 8/8/8/16
uri: mc://cssm/su3b09836s8t16DBW2
nf:
gact: DBW2GluonAction (beta=9.836)
qact:
33
A Screenshot of the System
34
Conclusion and Future Work

Conclusion



Current Status of ILDG
A Development of New ILDG Client
Future work

Exploring more chances to apply data engineering
techniques in various e-Science fields.



35
Data mining
Data integration
…
Thank you very much
for your kind attention
Questions should be addressed to
amagasa@cs.tsukuba.ac.jp
36
Download