The Grid, Grid Services and the Semantic Web: Technologies and Opportunities Dr. Carl Kesselman Director Center for Grid Technologies Information Sciences Institute University of Southern California Outline What are Grids? Grid technology - Globus and the Open Grid Services Architecture Grids and the Semantic Web How do we solve problems? Communities committed to common goals - Virtual organizations Teams with heterogeneous members & capabilities Distributed geographically and politically - No location/organization possesses all required skills and resources Adapt as a function of the situation - Adjust membership, reallocate responsibilities, renegotiate resources The Grid Vision “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” - On-demand, ubiquitous access to computing, data, and services - New capabilities constructed dynamically and transparently from distributed services “When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder) Biomedical Informatics Research Network (BIRN) Evolving reference set of brains provides essential data for developing therapies for neurological disorders (multiple sclerosis, Alzheimer’s, etc.). Today - One lab, small patient base - 4 TB collection Tomorrow - 10s of collaborating labs - Larger population sample - 400 TB data collection: more brains, higher resolution - Multiple scale data integration and analysis National Virtual Observatory http://virtualsky.org/ from Caltech CACR Caltech Astronomy Microsoft Research Virtual Sky has 140,000,000 tiles 140 Gbyte Change scale Change theme Optical (DPOSS) Xray (ROSAT) theme Coma cluster Living in an Exponential World (1) Computing & Sensors Moore’s Law: transistor count doubles each 18 months Magnetohydrodynamics star formation Living in an Exponential World: (2) Storage Storage density doubles every 12 months Dramatic growth in online data (1 petabyte = 1000 terabyte = 1,000,000 gigabyte) - 2000 ~0.5 petabyte - 2005 ~10 petabytes - 2010 ~100 petabytes - 2015 ~1000 petabytes? Transforming entire disciplines in physical and, increasingly, biological sciences; humanities next? An Exponential World: (3) Networks (Or, Coefficients Matter …) Network vs. computer performance - Computer speed doubles every 18 months - Network speed doubles every 9 months - Difference = order of magnitude per 5 years 1986 to 2000 - Computers: x 500 - Networks: x 340,000 2001 to 2010 - Computers: x 60 - Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins. The Grid World: Current Status Dozens of major Grid projects in scientific & technical computing/research & education Considerable consensus on key concepts and technologies - Open source Globus Toolkit™ a de facto standard for major protocols & services - Far from complete or perfect, but out there, evolving rapidly, and large tool/user base Industrial interest emerging rapidly Opportunity: convergence of eScience and eBusiness requirements & technologies The Next Step Globus leverages standard protocols - TLS, LDAP, X.509, HTTP - Only TCP in common Is there a better foundation for Grid functions - More unified protocol stack (common base) - Better support for virtualization - Leverage commodity infrastructure “Web Services” Increasingly popular standards-based framework for accessing network applications - W3C standardization; Microsoft, IBM, Sun, others WSDL: Web Services Description Language - Interface Definition Language for Web services SOAP: Simple Object Access Protocol - XML-based RPC protocol; common WSDL target WS-Inspection - Conventions for locating service descriptions UDDI: Universal Desc., Discovery, & Integration - Directory for Web services Transient Service Instances “Web services” address discovery & invocation of persistent services - Interface to persistent state of entire enterprise In Grids, must also support transient service instances, created/destroyed dynamically - Interfaces to the states of distributed activities - E.g. workflow, video conf., dist. data analysis Significant implications for how services are managed, named, discovered, and used - In fact, much of our work is concerned with the management of service instances OGSA Design Principles Service orientation to virtualize resources - Everything is a service From Web services - Standard interface definition mechanisms: multiple protocol bindings, local/remote transparency From Grids - Service semantics, reliability and security models - Lifecycle management, discovery, other services Multiple “hosting environments” - C, J2EE, .NET, … The Grid Service = Interfaces + Service Data Reliable invocation Authentication Service data access Explicit destruction Soft-state lifetime GridService Service data element … other interfaces … Service data element Service data element Implementation Hosting environment/runtime (“C”, J2EE, .NET, …) Notification Authorization Service creation Service registry Manageability Concurrency Given a set of Services? How do we do a better job of finding out what services we want to use How do we do a better job of configuring services How do we do a better job of composing and nesting services Answer: Do a better job of representing services Deeper representation of services Information is captured via structure - X.509 certificates, MDS models, CIM schema, Metadata Knowledge expresses relationships between entities - Concepts and relationships - Logical framework to inference over relationships Vision “The Semantic Web is an extension of the current Web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation. It is the idea of having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration and reuse across various applications. The Web can reach its full potential if it becomes a place where data can be processed by automated tools as well as people” From the W3C Semantic Web Activity statement Resource Description Framework Ontologies Everywhere What happens if knowledge permeates the Grid - Data elements - Service descriptions (service data elements) - Protocols (e.g. policy, provisioning) More dynamic and general model then Semantic Web - OGSA lifetime model - OGSA SDE model Cognative Grid Grid Services + Ontologies + Knowledge Driven Services Examples - Knowledge driven matchmaking - Agent based service composition - High-level planning and resource discovery - Knowledge based provisioning Some people are using term “semantic grid” to discribe Grid Services+Knowlege SCEC Modeling Environment KNOWLEDGE REPRESENTATION & REASONING Knowledge Server Knowledge base access, Inference Translation Services Syntactic & semantic translation Knowledge Base Ontologies Curated taxonomies, Relations & constraints DIGITAL LIBRARIES Pathway Models Pathway templates, Models of simulation codes Navigation & Queries Versioning, Topic maps KNOWLEDGE ACQUISITION Code Acquisition Interfaces Repositories Dialog planning, FSM RDM AWM Mediated Collections Federated access SRM Data Collections Pathway construction strategies Pathway Assembly Template instantiation, Resource selection, Constraint checking Data & Simulation Products GRID Pathway Execution Policy, Data ingest, Repository access Grid Services Compute & storage management, Security Computing Pathway Instantiations Storage Users DOCKER: Publishing SHA Code Web Browser DOCKER User Interface Constraint Acquisition Model Specification Wrapper Generation (WSDL, PWL) User specifies: Types of model parameters Format of input messages Documentation AS97 Constraints AS97 docs types msg constrs AS97 ontology (Y. Gil, USC/ISI) SCEC ontologies Recommends other models Yes Did you know that [Sadigh97] is a good model for dist >80 miles? Automatically Generates Interface Automatically Generates KR Description myGrid Project - bioinformatics Imminent ‘deluge’ of genomics data - Highly heterogeneous, Highly complex and inter-related Convergence of data and literature archives 1. 2. 3. 4. Database access from the Grid Process enactment on the Grid Personalisation services Metadata services Grid Services + Ontologies Carol Gobel, U. Manchester Resource selection: Matchmaking Providers and requesters describe themselves - Synactic description > Structured or Semi-structured A Matchmaker matches compatible classads - Match based on attribute name, simple prioritization Semantic matchmaking - Inference based matching (e.g. CIM+relations) - Automatic classification (e.g. description logic) - Leverage domain specific ontologies Pegasus: Planning for Execution in Grids Create workflow to create virtual data - Domain specific and generic rules Map Workflow unto Grid resources - System state via Grid services (MDS, RLS,…) - Global and local optimization criteria Chimera (1) Abstract Workflow (DAG) (18) Results Current Sate Generator MCS RLS (3) Logical File Names (LFNs) (2) Abstract DAG Request Manager (9) Concrete DAG (4) Physical File Names (PFNs) (12) DAGMan files (10) Concrete DAG (11) DAGMan files (15) Monitoring MDS Abstract DAG reduction Abstract and Concrete Planner (5) Full Abstract DAG (6) Reduced Abstract DAG Concrete Planner (7) Logical Transformations (8) Physical Transformations and Execution Environment Information Transformation Catalog VDL Generator Submit File Generator for Condor-G DAGMan Submission and Monitoring (13) DAG (14) Log FIles Condor-G/ DAGMan Summary Technology exponentials are changing the shape of scientific investigation & knowledge - More computing, even more data, yet more networking The Grid: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations Many potential opportunities for application of semantic web technologies to Grid services - OGSA Partial Acknowledgements Open Grid Services Architecture design - Karl Czajkowski @ USC/ISI - Ian Foster, Steve Tuecke @ANL - Jeff Nick, Steve Graham, Jeff Frey @ IBM Semantic/Cognitive Grid - Yolanda Gil, Ewa Deelman, Jim Blythe, Tom Russ, Hans Chalupsky - Conversations with Jim Hendler, Carol Gobel, David DeRoure Strong links with many EU, UK, US Grid projects Support from DOE, NASA, NSF, Microsoft For More Information Grid Book - www.mkp.com/grids The Globus Project™ - www.globus.org OGSA - www.globus.org/ogsa Global Grid Forum - www.gridforum.org