Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0” Ian Foster Computation Institute Argonne National Lab & University of Chicago “Web 2.0” Software as services Services as platforms Declan Butler Easy composition of services to create new capabilities (“mashups”)—that themselves may be made accessible as new services Enabled by massive infrastructure buildout Data- & computation-rich network services Google projected to spend $1.5B on computers, networks, and real estate in 2006 Dozens of others are spending substantially Paid for by advertising 2 Science 2.0: E.g., Virtual Observatories User Discovery tools Analysis tools Gateway Data Archives Figure: S. G. Djorgovski3 Science 2.0: E.g., Cancer Bioinformatics Grid Data Service @ uchicago.edu <BPEL <Workflow Workflow Inputs> Doc> link link Researcher Or Client App <Workflow Results> BPEL Engine link link Analytic service @ duke.edu Analytic service @ osu.edu Ravi Madduri et al. 4 The Two Dimensions of Science 2.0 Function Resource Decompose across network Users Discovery tools Clients integrate dynamically Select & compose services Select “best of breed” providers Publish result as new services Analysis tools Data Archives Decouple resource & service providers Fig: S. G. Djorgovski 5 Technology Requirements: Integration & Decomposition Users Service-oriented applications Wrap applications & data as services Compose services into workflows Service-oriented Grid infrastructure Composition Workflows Invocation Appln Service Appln Service Provisioning Provision physical resources to support application workloads “The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005 6 Globus Software Enables Grid Infrastructure Web service interfaces for behaviors relating to integration and decomposition Services: program execution, data movement, data access, … Open source software that implements those interfaces Primitives: resources, state, security In particular, Globus Toolkit (GT4) All standard Web services “Grid is a use case for Web services, focused on resource management” 7 Open Source Grid Software Globus Toolkit v4 www.globus.org Data Replication Credential Mgmt Replica Location Grid Telecontrol Protocol Delegation Data Access & Integration Community Scheduling Framework WebMDS Python Runtime Community Authorization Reliable File Transfer Workspace Management Trigger C Runtime Authentication Authorization GridFTP Grid Resource Allocation & Management Index Java Runtime Security Data Mgmt Execution Mgmt Info Services Common Runtime Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 20058 http://dev.globus.org Guidelines (Apache) Infrastructure (CVS, email, bugzilla, Wiki) Projects Include … 9 Hosted Science Services 1) Integrate services from external sources Virtualize “services” from providers Content Services Capacity Community Services Provider Capacity Provider 2) Coordinate & compose Create new services from existing ones “Service-Oriented Science”, Science, 2005 10 The Globus-Based LIGO Data Grid LIGO Gravitational Wave Observatory Birmingham• Cardiff AEI/Golm Replicating >1 Terabyte/day to 8 sites >40 million replicas so far MTBF = 1 month www.globus.org/solutions 11 Data Replication Service Pull “missing” files to a storage system Data Location Data Movement Reliable File Transfer Service Data Replication List of required Files GridFTP Local Replica Catalog Replica Location Index GridFTP Local Replica Catalog Replica Location Index Data Replication Service “Design and Implementation of a Data Replication Service Based on the Lightweight 12 Data Replicator System,” Chervenak et al., 2005 Example: Biology Public PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Back Office Analysis on Grid Natalia Maltsev et al. http://compbio.mcs.anl.gov/puma2 Involves millions of BLAST, BLOCKS, and other processes 14 Example: Earth System Grid Provide access to large climate simulation data Per-collection control Different user classes Server-side processing Implementation (GT) Portal-based User Registration (PURSE) PKI, SAML assertions GridFTP, GRAM, SRM >2000 users >100 TB downloaded 15 Under the Covers 16 Example: Astro Portal Stacking Service On-demand “stacks” of random locations within ~10TB dataset Challenge + + + + + + + Purpose Rapid access to 1010K “random” files Time-varying load Solution Dynamic acquisition of compute, storage With Ioan Raicu & Alex Szalay = S4 Web page or Web Service Sloan Data 17 Astro Portal Stacking Performance (LAN GPFS) 18 Example: Cybershake Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast Hazard Map Strain Green Tensor Rupture Forecast Synthetic Seismogram Spectral Acceleration Tom Jordan et al., SCEC Hazard Curve 19 Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years) JOBS 100000 HRS 10000 1000 Cybershake on the SCEC VO 100 10 Provenance Catalog 11 /1 0 11 /8 11 /6 11 /4 11 /2 10 /3 1 10 /2 9 10 /2 7 10 /2 5 10 /2 3 10 /2 1 10 /1 9 1 Data Catalog Workflow Scheduler/Engine VO Service Catalog SCEC Storage TeraGrid Storage Deelman, Kesselman, et al., USC/ISI VO Scheduler TeraGrid Compute (20 TB, 1.8 CPU year) 20 Science 1.0 Science 2.0 Gigabytes Terabytes Tarballs Services Journals Wikis Individuals Communities Community codes Science gateways Supercomputer centers TeraGrid, OSG, campus Makefile Workflow Computational science Science as computation Physical sciences All sciences (& humanities) Computational scientists All scientists NSF-funded NSF-funded 21 Science 2.0 Challenges A need for new technologies, skills, & roles A need for substantial software development “30-80% of modern astronomy projects is software”—S. G. Djorgovski A need for more & different infrastructure Creating, publishing, hosting, discovering, composing, archiving, explaining … services Computers & networks to host services Can we leverage commercial spending? To some extent, but not straightforward 22 For More Information Globus Alliance Dev.Globus www.opensciencegrid.org TeraGrid dev.globus.org Open Science Grid www.globus.org www.teragrid.org Background 2nd Edition www.mkp.com/grid2 www.mcs.anl.gov/~foster Thanks for DOE and NSF for research support!! 23