Grid-based Information Architecture for iSERVO International Solid Earth Research Virtual Organization Western Pacific Geophysics Meeting (WPGM) Beijing Convention Center July 26 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/presentations/ gcf@indiana.edu http://www.infomall.org APEC Cooperation for Earthquake Simulation ACES is a seven year-long collaboration among scientists interested in earthquake and tsunami predication • iSERVO is Infrastructure to support work of ACES • SERVOGrid is (completed) US Grid that is a prototype of iSERVO • http://www.quakes.uq.edu.au/ACES/ Chartered under APEC – the Asia Pacific Economic Cooperation of 21 economies Participating Institutions CSIRO Australia Monash University Australia University of Western Australia, Perth, Australia University of Queensland Australia University of Western Ontario Canada University of British Columbia Canada China National Grid Chinese Academy of Sciences China Earthquake Administration China Earthquake Network Center Brown University Boston University Jet Propulsion Laboratory Cal State Fullerton San Diego State University UC Davis UC Irvine UC San Diego University of Southern California University of Minnesota Florida State University US Geological Survey Pacific Tsunami Warning Center PTWC Hawaii National Central University, Taiwan (Taiwan Chelungpu-fault Drilling Project) University of Tokyo Tokyo Institute of Technology (Titech) Sophia University National Research Institute for Earth Science and Disaster Prevention (NIED) Japan Geographical Survey Institute, Japan Role of Information Technology and Grids in ACES Numerical simulations of physical, biological and social systems Engineering design Economic analysis and planning Sensor networks and sensor webs High performance computing Data mining and pattern analysis Distance collaboration Distance learning Public outreach and education Emergency response communication and planning Geographic Information Systems Resource allocation and management Grids and Cyberinfrastructure Grids are the technology based on Web services that implement Cyberinfrastructure i.e. support eScience or science as a team sport • Internet scale managed services that link computers data repositories sensors instruments and people There is a portal and services in SERVOGrid for • Applications such as GeoFEST, RDAHMM, Pattern Informatics, Virtual California (VC), Simplex, mesh generating programs ….. • Job management and monitoring web services for running the above codes. • File management web services for moving files between various machines. • Geographical Information System services • Quaketables earthquake specific database • Sensors as well as databases • Context (dynamic metadata) and UDDI system long term metadata services • Services support streaming real-time data Repositories Federated Databases Database Sensors Streaming Data Field Trip Data Database Sensor Grid Database Grid Research SERVOGrid Education Compute Grid Data Filter Services Research Simulations ? GIS Discovery Grid Services Customization Services From Research to Education Analysis and Visualization Portal Grid of Grids: Research Grid and Education Grid Education Grid Computer Farm SERVOGrid has a portal The Portal is built from portlets – providing user interface fragments for each service that are composed into the full interface – uses OGCE technology as does planetary science VLAB portal with University of Minnesota Semantically Rich Services with a Semantically Rich Distributed Operating Environment O SOAP Message Streams SS S Another Service Filter Service FS Wisdom MD Data FS SS Raw Data FS Data FS Raw Data O S O FS Knowledge S O S MD Information FS MD SS FS SS FS O S FS FS FS MD F S MD Knowledge O S MD F S Information O S O S FS Other Service MD O S DataFS FS O S FS MD Data FS Decisions O S FS FS SS SS MD O S Information FS SS Another Service FS MetaData SS S S Another Database Grid S S Raw Data S S S S Grids of Grids Architecture S S S S S S S S Raw Data SOAP Message Streams Another Grid S S Sensor Service is same as outward facing application service Linking Grids and Services Linkage of Services and Grids requires that messages sent by one Grid/Service can be understood by another Inside SERVOGrid all messages use • Web service system standards we like (UDDI, WS-Context, WSDL, SOAP) and • GML as extended by WFS so that data sources and simulations all use same syntax All other Web service based Grids use their favorite Web service system standards but these differ from Grid to Grid • Further there is no agreement on application specific standards – not all Earth Science Grids use OGC standards • OGC standards include some capabilities overlapping general Web Services • Use of WSDL and SOAP is agreed although there are versioning issues So there is essentially there is no service level interoperability between Grids but rather interoperation is at diverse levels with shared technology • SQL for databases, PBS for Job scheduling, Condor for job management, GT4 or Unicore for Grids Grids in Babylon Presumptuous Tower of Babel (from the web) • In the Bible, a city (now thought to be Babylon) in Shinar where God confounded a presumptuous attempt to build a tower into heaven by confusing the language of its builders into many mutually incomprehensible languages. For Grids, everybody likes to do their own thing and Grids are complex multi-level entities where no obvious points of interoperation • so one does not need divine intervention to create multiple Grid specifications • But data in China, Tsunami sensors in Indian ocean and simulations in USA etc. will not be linked for better warning and forecasting unless the national efforts can interoperate Two interoperation strategies: • Make all Grids use the same specifications (divine harmony) • Build translation services (filters!) using say OGF standards as a common target language (more practical) Don’t need computers (jobs) to be interoperable (although this would be good) as each country does its own computing • Rather need data and some metadata on each Grid to be accessible from all Grids Interoperability Summary Need to define common infrastructure and domain specific standards • Build Interoperable Infrastructure gatewayed to existing legacy applications and Grids Generic Middleware • Grid software including workflow • Portals/Problem Solving environments incl. visualization • We need to ensure that we can make security, job submission, portal, data access (sharing) mechanisms in different economies interoperate Geographic Information Systems GIS • Use services as defined by Open Geospatial Consortium (Web Map and Feature Services) http://www.crisisgrid.net/ Earthquake/Tsunami Science Specific • Satellites, sensors (GPS, Seismic) • Fault, Tsunami … Characteristics stored in databases need GML extensions - Schema for QuakeTables developed by SERVOGrid can be used Internationally ACES Components Country and/or Economies Data (shared as part of a collaboration) Earthquake Forecast/Model Wave Motion Infrastructure Institutions Australia Seismic data, fault database, GPS Finley, LSM PANDAS prototype Access Canada Polaris Radarsat Pattern Informatics P.R. China Seismic GPS LURR CAS China National Grid Japan GPS Seismic Daichi (InSAR) GeoFEM JST-CREST Earth Simulator Naregi Chinese Taipei FORMOSAT3/COSMIC (F/C) U.S.A. QuakeTables Sesismic InSAR PBO (GPS) Pattern Informatics ALLCAL GeoFEST, PARK, VirtualCalifornia TeraShake SERVOGrid GEON SCECGrid Vlab International IMS Pacific Rim Universities (APRU ) PRAGMA National Earthquake Grids of Relevance APAC –GT2 GT4 gLite ACcESS – Some link to SERVOGrid China National Grid – GOS GT3 GT4 ChinaGrid – CGSP built on GT4 CNGI – China’s Next Generation Internet has significant earthquake data component Naregi – Uses GT4 and Unicore with much enhancements Japanese Earthquake Simulation Grid – unclear K*Grid Korea Enhanced SRB, GT2 to GT4 TIGER Taiwan Integrated Grid for Education and Research unclear technology and unclear earthquake relevance SERVOGrid – Uses WS-I+ simple Web Services TeraGrid – Uses GT4 but not a clear model except for core job submittal TeraGrid: Integrating NSF Cyberinfrastructure Buffalo Wisc UC/ANL Utah Cornell Iowa PU NCAR IU NCSA Caltech PSC ORNL USC-ISI UNC-RENCI SDSC TACC TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research. Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today. APAC National Grid Core Grid Services Portal Tools: GridSphere QPSF Info Services: (JCU) APAC Registry INCA2? ACcESS at UQ (ACES Partner) outside APAC Security: APAC CA MyProxy VOMRS Systems: QPSF APAC National Facility IVEC ANU SAPAC Gateways Partners’ systems Network: GrangeNet / AARNet APAC Private Network (AARNet) ac3 VPAC TPAC CSIRO National “Grid Projects” in China Plan Research Develop Production Procure Deploy Operate Manage China e-Nation Strategy (2006-2020) Virtual Comp. Env. CAS eScience Net-based Res. Env. China National Grid Semantic Grid Edu. & Res. Grid Next-Generation Network Initiative €10M’s NSFC CAS Science and Technology R &D Assets Foundation Platform €M’s State Council MoE MoST National Planning Commission Grid activities still growing CNGrid (2006-2010) • HPC Systems – 100 Tflop/s by 2008, Pflop/s by 2010? • Grid Software Suite: CNGrid GOS – Merge with international efforts – Emphasize production • CNGrid Environment – Nodes, Centers, Policies • Applications – – – – Science Resource & Environment Manufacturing Services – Domain Grids Cyber Science Infrastructure toward Petascale Computing (planned 2006-2011) International Collaboration - EGEE - UNIGRIDS -Teragrid -GGF etc. Operaontional Collaborati R&D Collaboration Nano Proof, Eval. R&D Collaboration Joint Project IMS (Bio) Joint Project Osaka-U Feedback AIST ナノ分野 Delivery ナノ分野 実証・評価 Nano 実証・評価 Proof of al.Concept 分子研 Eval. 分子研 Delivyer IMS Feedback ProjectOriented VO NII NAREGI Site Core Site Customization Operation/Maintenance Domain Specific VO (e.g ITBL) Cyber-Science Infrastructure(CSI) Middleware CA Research Dev.(βver. V1.0 ) V2.0 Collaborative Operation Center (IT Infra. for Academic Research and Education) Delivery Delivery Feedback R&D Collaboration Operation/ Maintenance (Middleware) Operation/ Maintenance (UPKI,CA) Peta-scale System VO Industrial Projects Domain Specific VOs Univ./National Supercomputing VO IMS,AIST,KEK,NAO,etc. Networking Infrastructure (Super-SINET) Note: names of VO are tentative) Networking Contents Project-oriented VO Japanese Earthquake Simulation Grid Data-Server NIED 48xG5, 15TB Data-Server GSI 8xOpteron 20TB Integrated Observation-Simulation Data Grid Super SINET (10Gbps) Earth Simulator 5,120xSX6 PC Cluster ERI, 64xOpteron paraAVS PC Cluster EPS, 64xOpteron paraAVS JST-CREST Integrated Predictive Simulation System Strong Motion and Tsunami Generation Earthquake Generation Tsunami Generation Plate Motion Tectonic Loading EarthquakeRupture Wave Propagation Artificial Structure Oscillation Crustal Movement Data Analysis Seismic Activity Data Analysis Strong Motion Data Analysis GONET Hi-net K-NET Database for Model Construction Structure Oscillation Platform for Integrated Simulation Data Processing, Visualization, Linear Solvers PC clusters for small-intermediate problems Earth Simulator for large-scale problems GIS Urban Information Simulation Output Current PTWC Network of Seismic Stations (from GSN & USNSN & Other Contributing Networks) The NCES/WS-*/GS-* Features/Service Areas I Service or Feature WS-* GS-* NCES (DoD) Comments A: Broad Principles FS1: Use SOA: Service Oriented Arch. WS1 Core Service Architecture, Build Grids on Web Services. Industry best practice FS2: Grid of Grids Strategy for legacy subsystems: modular architecture B: Core Services (Mainly Service Infrastructure and W3C/OASIS focus) FS3: Service Internet, Messaging WS2 NCES3 Core Infrastructure including reliability, publishsubscribe messaging cf. FS13C FS4: Notification WS3 NCES3 JMS, MQSeries, WS-Eventing, Notification FS5: Workflow WS4 NCES5 Grid Programming FS6: Security WS5 NCES2 Grid-Shib, Permis Liberty Alliance ... FS7: Discovery WS6 NCES4 UDDI and extensions FS8: System Metadata & State WS7 FS9: Management WS8 FS10: Policy WS9 ECS FS11: Portals and Users WS10 NCES7 GS7 Globus MDS Semantic Grid, WS-Context GS6 NCES1 CIM Portlets JSR168, NCES Capability Interfaces The NCES/WS-*/GS-* Features/Service Areas II Service or Feature WS-* GS-* NCES Comments B: Core Services (Mainly Higher level and OGF focus) FS12: Computing GS3 Job Management major Grid focus FS13A: Data as Repositories: Files and Databases GS4 NCES8 Distributed Files, OGSA-DAI Managed Data is FS14B FS13B: Data as Sensors and Instruments FS13C: Data Transport OGC SensorML WS 2,3 GS4 NCES3,8 GridFTP or WS Interface to non SOAP transport FS14A: Information as Monitoring GS4 Major Grid effort for job status etc. FS14B: Information, Knowledge, Wisdom part of D(ata)IKW GS4 NCES8 VOSpace for IVOA, JBI for DoD, WFS for OGC Federation at this layer major research area NCOW Data Strategy FS15: Applications and User Services GS2 NCES9 Standalone Services Proxies for jobs FS16: Resources and Infrastructure GS5 Ad-hoc networks; Network Monitoring FS17: Collaboration and Virtual Organizations GS7 NCES6 XGSP, Shared Web Service ports FS18: Scheduling and matching of Services and Resources GS3 Current work only addresses scheduling “batch jobs”. Need networks and services