Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1 About Myself Name Toshiyuki Amagasa Affiliation: Division of Computational Informatics, Center for Computational Sciences Department of Computer Science, Graduate School of Systems and Information Engineering Area of research Data engineering Database system Recent topics XML databases Databases in scientific applications 2 Parallel XML query processing OLAP analysis for XML Web information extraction for XML Faceted navigation for QCDml Meteorological database ILDG-JP Members Prof. Mitsuhisa Sato (Director, CCS) Prof. Tomoteru Yoshie (CCS) Prof. Osamu Tatebe (CCS) Dr. Naoya Ukita (CCS) Prof. Toshiyuki Amagasa (CCS) 3 Talk Outline Current Status of ILDG A Brief History of JLDG An Overview of JLDG A Development of New ILDG Client Faceted Navigation of QCDml Conclusions and Future Work 4 Current Status of JLDG 5 A Brief History of JLDG (1/3) Hepnet-J/sc 2002- (SINET GbE private network) Widely-distributed file system Network backbone: Super SINET VPN Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U., Hiroshima U., and Kanazawa U. Objective and Implementation Data sharing among institutes / universities, in which administrative policies are not homogeneous, while attaining security Mirroring among FSs attached to SCs with administrative CP-PACS SR8000 CRC @ KEK RCNP @Osaka 6 SX-5 File Server File Server Hepnet-J/sc File Server File Server CCP @Tsukuba YITP @Kyoto SX-5 A Brief History of JLDG (2/3) Problems Growing cost for managing data location A dataset may be distributed in several disks. It is hard for users to remember location of data and mirrors. No concepts of users and user groups Hard to support multiple research groups. Necessary functionalities A flat data sharing system which has not space limit (or can be extended at anytime) Users and user group management over several organizations Japan Lattice Data Grid (JLDG) 7 Project launched in November 2005 Operation started in March 2007 A Brief History of JLDG (3/3) JLDG v1 started operation in May 2008 Available datasets CP-PACS Nf=2 QCD configuration CP-PACS/JLQCD Nf=2+1 QCD configuration 8,000 files, 1.5 TBytes 21,000 files, 6 TBytes PACS-CS Nf=2+1 323x64 lattice QCD configuration 2,600 files, 3 TBytes JLDG v2 started operation in December 2009 8 Storing and sharing research data generated in daily research activities Data sharing within a research group An Overview of JLDG A widely-distributed file system with 100 TB-scale storage for domestic researchers in particle physics Sharing simulation data computed by SCs for several months to several years. Data files are distributed. Create replications if necessary. A user do not need to recognize file locations. Files can be accessed very quickly if the site has replicas. Storage space can be incrementally added during operation. ILDG Kanazawa KEK Gfarm file system Kyoto Hiroshima 9 www.jldg.org Tsukuba Osaka SINET3 Network Software Components Globus Toolkit V4 (ANL) www.globus.org VOMS (EDG) User / host certificate creation Gfarm file system (U. of Tsukuba) datafarm.apgrid.org VO management Naregi-CA (Naregi) www.naregi.org GSI authentication, Proxy user certificate creation GridFTP server / client Widely-distributed file system Uberftp (NCSA) 10 http://dims.ncsa.uiuc.edu/set/uberftp/ Interactive GridFTP client Gfarm Distributed File System An open-source distributed file system A global namespace to unify storage systems Scalable I/O performance exploiting data access locality Automated replica selection for fault-tolerance and loadbalancing Global namespace /gfarm ggf aist jp gtrc file2 file1 file1 file3 Gfarm File System 11 file2 file4 Mapping Replica creation Summary JLDG A brief history An overview Used as an infrastructure for daily research activity 14 Hands on meeting on 27 Jan., 2009 Successfully done with19 attendees Development of a New ILDG Client 15 Int’l Lattice Data Grid (ILDG) A data grid for sharing Lattice QCD configuration File Formats in ILDG Configuration binary Metadata (QCDml) LIME (Lattice QCD Interchange Message Encapsulation) ensemble XML configuration XML LFN (Logical File Name) Identifier for configuration binary ensemble XML markovChainURI 16 configuration configuration XML configuration XML configuration XML XML configuration configuration (binary) configuration (binary) LFN configuration LFN (binary) LFN (binary) LFN QCDml Ensemble XML <markovChain xmlns=“…"> <markovChainURI>mc://JLDG/CP-PACS/RCNF2/RC12x24B1800K014090C1600</markovChainURI> <management> <revisions>1</revisions> <collaboration>CP-PACS</collaboration> <projectName>RCNF2 (Nf=2 full QCD with iwasaki RG gauge and tadpole improved clover quark action)</projectName> <ensembleLabel>B1800</ensembleLabel> <reference>Phys.Rev. D65 (2002) 054505 (hep-lat/0105015), Erratum-ibid. D67 (2003) 059901</reference> <archiveHistory> <elem> <revision>1</revision> <revisionAction>add</revisionAction> <participant> <name>T.Yoshie</name> <institution>Center fof Computational Sciences, University of Tsukuba</institution> 17 Typical Usecase of ILDG Find desired data by MDC LFN (Logical File Name) Find nearby site by FC SURL (Site URL) Access to the site TURL (Transfer URL) VOMS 18 Data transfer Authentication Difficulties in Finding Desired Configuration Directly use query language (XQuery / XPath) A simple example: /markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']] Knowledge about XML, QCDml, and XQuery (XPath) are needed. Hard to get the whole picture of available data. Hierarchical list 19 Easy to use. Need huge screen to show the entire list. Still difficult to get the whole picture of the data. Basic Idea Applying faceted-navigation interface to browse QCDml ensemble XML data. 20 Faceted-Navigation What is “faceted-navigation”? A scheme for browsing objects with attributes. Successfully used in some applications, such as Apple iTunes. Procedure A user select a value in a facet To select a set of objects of interest The system updates the list of objects, list of facets, and respective values (Repeat) Example The Flamenco Search http://flamenco.berkeley.edu/ 21 The Flamenco Search http://flamenco.berkeley.edu/ 22 The Flamenco Search http://flamenco.berkeley.edu/ 23 The Flamenco Search http://flamenco.berkeley.edu/ 24 Faceted-Navigation Good features Users have a freedom to choose a facet Give a big picture of the dataset Available values along with their population Effective 25 c.f. Hierarchical list Busch’s Law: 4 facets consisting of 10 values are enough to deal with 10,000 objects. Technical Challenges How to define facets? How to extract values according to the facets? How to achieve quick response from the database for improving user experience? 26 Choosing the Facets Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe. Selected elements from QCDml ensemble XML Regional grid Collaboration Project name Number of flavors Time Parameters Lattice size Gluon action Quark action 27 Parameters Parameters Extracting Values from a Facet (1/3) Extract text values Collaboration Project name Need substring extraction Date 2000 2005 2006 2007 2008 2+1 DWF 2+1 Dynamical AsqTAD Baryon Resonances Dynamical FLIC Studies Electromagnetic Form Factors FLIC Overlap Studies Flux Tube Test Gluon Propagator Long_aqstad_run Pentaquark Volume Dependence … CP-PACS CP-PACS+JLQCD CSSM LHPC MILC RBC-UKQCD UKQCD dik etmc gral qcdsf sesam theta txl … <date>2007-02-26T21:39:33+09:00</date> 28 Extracting Values from a Facet (2/3) Need text value generation Lattice size e.g. 29 12 / 12 / 12 / 24 <physics> <size> <elem> <name>X</name> <length>12</length> </elem> <elem> <name>Y</name> <length>12</length> </elem> <elem> <name>Z</name> <length>12</length> </elem> <elem> <name>T</name> <length>24</length> </elem> … Extracting Values from a Facet (2/3) Gluon action / Quark action <action> <gluon> <iwasakiRGGluonAction> <glossary>http://www.jldg.org/JLDG/... <action> <gluon> <DBW2GluonAction> <glossary>www.lqcd.org/ildg/pla... An element name itself represents a value Extract element name as a value of a facet 30 QCDml Faceted Navigation I/F System Configuration ILDG USQCD JLDG Facet Navigation System LDG (PHP + SQL + XQuery) Web Server (Apache) Facet Database 31 RDBMS (MySQL) Facet extraction (XQuery) UKQCD QCDml Ensemble (ILDG) & Configuration (JLDG) XML DB (eXist) CSSM Downloading Ensemble XML Database Design (1/2) Use RDBMS for quick response Use fixed relational schema for extensibility *************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: rgrid value: cssm *************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: collaboration value: CSSM *************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: projectName value: Dynamical FLIC Studies *************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: date 32 value: 2007 Database Design (2/2) Store preformatted text for improving rendering performance *************************** 1. row *************************** collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.130 *************************** 2. row *************************** collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact: 33 A Screenshot of the System 34 Conclusion and Future Work Conclusion Current Status of ILDG A Development of New ILDG Client Future work Exploring more chances to apply data engineering techniques in various e-Science fields. 35 Data mining Data integration … Thank you very much for your kind attention Questions should be addressed to amagasa@cs.tsukuba.ac.jp 36