How to Reach Mentioned Objectives? • Retain and wrap all four WWW pillars (URI, HTTP, MIME, HTML) “as is” – yet allow for flexible extensions in terms of identification, retrieval and caching of content • Judiciously combine the… – four WWW pillars – Dynamic Data Model (DDM) – Web Service Discovery Architecture (WSDA) – Hyper Registry – the Unified Peer-to-Peer Database Framework (UPDF) – and its Peer Database Protocol (PDP) The Web Service Discovery Architecture (WSDA) in Comparison with OGSA Input for improvements of OGSA Wolfgang.Hoschek@cern.ch European DataGrid Data Management Work Package (EDG WP2) OGSA Workshop, Argonne, May 31, 2002 OGSA Workshop, Argonne, May 31, 2002 1 Objectives of Work Ph.D and Papers • Define how to… – bootstrap, query and publish to a dynamic information space maintained by self-describing network interfaces • Show how to support… – expressive general-purpose queries for service discovery – over a view that integrates autonomous dynamic database nodes – from a wide range of distributed system topologies 1. A Unified Peer-to-Peer Database Framework for XQueries over Dynamic Distributed Content and its Application for Scalable Service Discovery Ph.D Thesis, Tech. University of Vienna (submitted), 2002. 2. A Data Model and Query Language for Service Discovery 3. A Database for Dynamic Distributed Content and its Application for Service and Resource Discovery 4. The Web Service Discovery Architecture 5. A Unified Peer-to-Peer Database Framework and its Application for Scalable Service Discovery 6. A Unified Peer-to-Peer Database Protocol – OGSA Workshop, Argonne, May 31, 2002 See http://cern.ch/grid-data-management/publications.html OGSA Workshop, Argonne, May 31, 2002 2 5 Tuple from Dynamic Data Model World Wide Web Architecture • T. Berners-Lee designed the WWW as a – consistent interface to a flexible and changing heterogeneous information space – for use by CERN's staff, the High Energy Physics community, and, of course, the world at large • WWW architecture rests on four simple and orthogonal pillars: – URIs as identifiers – HTTP for retrieval of content pointed to by identifiers – MIME for flexible content encoding – HTML as the primus-inter-pares (MIME) content type OGSA Workshop, Argonne, May 31, 2002 4 – A WSDA tuple is an… • • • • annotated multi-purpose soft state data container that may contain a piece of arbitrary MIME content and allows for refresh of that content at any time (default content-type is XML) Tuple := Link Type Context Timestamps Metadata Content (optional) Semantics : HTTP GET(tuple.link) --> tuple.content type(HTTP GET(tuple.link)) --> tuple.type 3 OGSA Workshop, Argonne, May 31, 2002 6 1 1 Tuple Set from Dynamic Data Model Discovery Query (2) • Find all CMS replica catalogs and return their physical <tupleset> <tuple link="http://sched001.cern.ch/getServiceDescription" file names (PFNs) for a given logical file name (LFN); suppress PFNs not starting with ``ftp://''. type="service" ctx="parent" TS1="10" TC="15" TS2="20 TS3="30"> <content> <service> service description A goes here </service> LET $repcat := "http://gridforum.org/interface/ReplicaCatalog-1.0" FOR $tuple in /tupleset/tuple[@type="service"] </content> <metadata> <owner name="http://cms.cern.ch"/> LET $s := $tuple/content/service </metadata> WHERE </tuple> SOME $op IN $s/interface[@type = $repcat]/operation SATISFIES ($op/name="XML getPFNs(String LFN)" AND $op/bindhttp/@verb ="GET" <tuple link="http://repcat.cern.ch/pub/getServiceDescription?id=4711" type="service" ctx="child" TS1="30" TC="0" TS2="40" TS3="50"> AND contains($op/allow, "http://cms.cern.ch/everybody")) RETURN FOR $pfn IN invoke($s, $repcat, "XML getPFNs(String LFN)", </tuple> "http://myhost.cern.ch/myFile")/tupleset/PFN </tupleset> WHERE starts-with($pfn, "ftp://") RETURN $pfn OGSA Workshop, Argonne, May 31, 2002 OGSA Workshop, Argonne, May 31, 2002 7 Query Support WSDA Interfaces • Simplest possible query support? ÆMinQuery interface – “Select all”-style! – Return all tuples (including or excluding cached content) • XML getTuples() • XML getLinks() Interface Operations Responsibility Presenter Retrieve service description Default MIME content-type: XML HTTP(S) GET on HTTP(S) URL or MIME getServiceDescription() Consumer (TS4,TS5) publish(XML tupleset) A content provider can publish a dynamic pointer (content link), which in turn enables the consumer (e.g. hyper registry) to retrieve the current content. • Powerful query support? ÆXQuery interface – XQuery Language! – Everything that can be done in SQL can be done in XQuery. But XQuery is even more powerful (e.g. hierarchical navigation) • XML query(XQuery) OGSA Workshop, Argonne, May 31, 2002 10 MinQuery XML getTuples() XML getLinks() Simplest possible query support (“select all”) XQuery XML query(XQuery) Powerful query over tuple set OGSA Workshop, Argonne, May 31, 2002 8 Discovery Query (1) 11 Client and WSDA Interfaces Legend • Find all services that implement a replica catalog Remote Client service interface and that CMS members are allowed to use, and that have an HTTP bindings for the replica catalog operation “XML getPFNs(String LFN)”. HTTP GET or getSrvDesc() publish(...) getTuples() getLinks() Interface T1 query(...) ... Invocation Content Link Tn Presenter Consumer MinQuery XQuery LET $repcat := "http://gridforum.org/interface/replicaCatalog-1.0" Tuple 1 ... Tuple N FOR $tuple IN /tupleset/tuple[@type="service"] WHERE SOME $op IN $tuple/content/service/interface[@type = $repcat]/operation SATISFIES ($op/name="XML getPFNs(String LFN)" AND $op/bindhttp/@verb="GET“ AND Content 1 OGSA Workshop, Argonne, May 31, 2002 Presenter N Presenter 1 contains($op/allow, "http://cms.cern.ch/everybody")) RETURN $tuple 9 ... Content N OGSA Workshop, Argonne, May 31, 2002 12 2 2 Tuple vs. Service Data Element (2) OGSA vs. WSDA (1) Concept WSDA OGSA Interfaces Presenter, MinQuery, XQuery, HandleMap, GridService, Registry, NotificationSink, NotificationSource, Factory, PrimaryKey, Consumer, TriggerXQuery (tbd.) Service identifier Service link (i.e. content link) = HTTP(S) URL, Need not be unique Service description Service description (e.g. WSDL) WSDA Tuple OGSA Service Data Element Dynamic Pointer / ID Content link = dynamic pointer globalName (+May spec.) = ID What? Content-type Type Type (+May spec.) Grid Service Reference (GSR) (e.g. WSDL) Service description via HTTP(S) GET or via HTTP(S) GET or Presenter.getServi HandleMap.findByHand retrieval le(GSH) ceDescription() When? Lifetime 4 timestamps 3 timestamps More annotations Metadata (optional) Not available Embedded data Content (optional) Content (optional) OGSA Workshop, Argonne, May 31, 2002 13 OGSA vs. WSDA (2) Concept WSDA OGSA Multi-purpose data container Tuple Service Data Element Set of data containers Tuple set Collection of service data elements Query capability MinQuery.getLinks(), MinQuery.getTuples(), XQuery.query(XQuery) Data publication (TS4,TS5) Mandatory Interfaces Registry.RegisterService (handle), NotificationSink.deliver Notification(sdata) none GridService OGSA Workshop, Argonne, May 31, 2002 Time Stamp WSDA OGSA TS1 / goodFrom Time content provider last modified content Time from which the value of the SDE carried in its extensibility element is said to be valid. TC Time embedded tuple content was last modified (e.g. by an intermediary) Not available TS2 / goodUntil Expected time while current content at provider is at least valid Time until which the value of the SDE in its extensibility elements is said to be valid. TS3 / avail.Until Expected time while content link at provider is at least valid (alive) Time until which this named SDE is expected to be available. OGSA Workshop, Argonne, May 31, 2002 14 Tuple vs. Service Data Element (1) 17 Hyper Registry vs. MDS a) Content Provider and Hyperlink Registry • Service Data Element… – is a named multi-purpose soft state data container that may contain a piece of arbitrary XML content (value). May contain an arbitrary extensibility element as content. – Attributes added in May spec b) Content Provider and GRIS Remote Client Remote Client Query DB • + Global name (i.e. QName), +Type Query Query Registry (Re)publish content link without content or with content (push) via HTTP POST • WSDA Tuple… – is an annotated multi-purpose soft state data container that may contain a piece of arbitrary MIME content and allows for refresh of that content at any time (default content-type is XML) – Has as attributes a content link, a type, a context, four soft state time stamps, and – (optionally) two arbitrary-shaped extensibility elements, namely metadata and content. OGSA Workshop, Argonne, May 31, 2002 16 Soft State Time Stamps GridService.FindServiceD ata(XML query) Consumer.publish(XML tupleset) Name (?) Context Why? How? Publication purpose/usage Grid Service Handle (GSH) = HTTP(S) URL with restrictions, Must be unique OGSA Workshop, Argonne, May 31, 2002 Concept Cache GRIS Content retrieval (pull) via HTTP GET Content Provider 15 Query Content retrieval (pull) via execution of local program Content Provider Publisher Presenter Mediator Executable Content Source Content Source OGSA Workshop, Argonne, May 31, 2002 18 3 3 Unified Peer-to-Peer Database Framework (UPDF) Example Content Providers publish & refresh retrieve cron job Apache XML file(s) publish & refresh retrieve publish & refresh monitor thread servlet retrieve cron job Perl HTTP publish & refresh • Q: Can we devise a unified P2P database framework … – for general-purpose query support – in large heterogeneous distributed systems – spanning many administrative domains? • Q: Can we devise a framework that allows to express specific applications for a wide range of … – data types (typed or untyped XML, any MIME type) – node topologies (e.g. ring, tree, graph) – query languages (e.g. XQuery, SQL, LDAP) – query response modes (e.g. Routed, Direct and Referral Response) – neighbor selection policies (e.g. in the form of an XQuery) – pipelining characteristics, timeout and other scope options? • Answer: Yes ÆUnified P2P DB Framework retrieve java mon servlet to XML to XML Replica catalog service(s) RDBMS or LDAP cat /proc/cpuinfo uname, netstat (re)compute service description(s) OGSA Workshop, Argonne, May 31, 2002 OGSA Workshop, Argonne, May 31, 2002 19 Query Response Modes Soft State Transitions a) Routed Response (RR) Node Agent Node Originator Query Result set Invitation Data Query Data UNKNOWN 4 currentTime > TS2 TS1 > TC e) 3 11 12 10 8 OGSA Workshop, Argonne, May 31, 2002 6 6 5 6 9 10 1 7 4 8 9 10 2 1 11 f) Direct Metadata Response with Invitation (DRM) 3 4 5 2 1 8 7 1 7 Direct Metadata Response without Invitation 2 7 3 4 5 2 6 3 9 3 1 8 5 Direct Response with Invitation (DR) 6 2 7 Routed Response with Metadata (RRM) CACHED c) b) Direct Response without Invitation 5 4 3 d) Publish with content (push) Retrieve (pull) NOT CACHED 22 7 6 5 13 12 2 4 8 9 10 11 1 11 14 15 OGSA Workshop, Argonne, May 31, 2002 20 Tuples Partitioned over Registry Nodes -Topology 23 Response Mode Switches and Shifts • No need to mandate single response mode globally • Response modes can be permuted arbitrarily • For autonomy, scalability, availability, performance, security, etc. a) RR --> DR Switch b) DR --> RR Switch c) DR --> DR Shift Node Agent Node Originator Query Result set OGSA Workshop, Argonne, May 31, 2002 21 OGSA Workshop, Argonne, May 31, 2002 24 4 4 Template Execution Plan Permitted Message Exchanges • Any query can be answered by appropriate substitutions into template • MSG_QUERY --> RPY_OK | ERR • MSG_RECEIVE --> RPY_SEND | (ANS_SEND [0:N], NULL) | ERR • MSG_INVITE --> RPY_OK | ERR • MSG_CLOSE --> RPY_OK | ERR A SEND A ... Agent Plan L ... Local Query M ... Merge Query M L RECEIVE1 ... RECEIVE k N • Supports synchronous (pull) and asynchronous (push) • Supports batched iterators – RECEIVE/SEND batches of at least N and at most M results from the (remainder of the) result set N ... Neighbor Query U Q ... User Query U ... Unionizer Operator N OGSA Workshop, Argonne, May 31, 2002 OGSA Workshop, Argonne, May 31, 2002 25 Peer Database Protocol (PDP) 28 Node State Transitions • Messaging model and network protocol that supports the UPDF framework and the XQuery interface • Fully based on BEEP IETF standard • Transaction – consists of one or more discrete message exchanges related to the same query • Messages – QUERY, RECEIVE, SEND, INVITE, CLOSE 1. CLOSE received 2. SEND exhausts result set 3. INVITE not accepted (Direct Response non empty resultset) 4. True (Direct Response empty local result set) 5. Various errors 6. Abort timeout Trigger action: Trigger action: Forward QUERY Forward CLOSE to neighbors to dependents OPEN CLOSED Loop timeout QUERY UNKNOWN OGSA Workshop, Argonne, May 31, 2002 OGSA Workshop, Argonne, May 31, 2002 26 PDP Properties Summary: WSDA in One Slide Legend • Low latency, pipelining, early and/or partial result set retrieval due to synchronous pull, and result set delivery in one or more variable sized batches. • It is efficient – due to asynchronous push with delivery of multiple results per batch. • Resource consumption and flow control and on a per query basis – due to the use of a distinct channel per transaction. • Scalable – due to application multiplexing, which allows for very high query concurrency and very low latency, even in the presence of secure TCP connections. OGSA Workshop, Argonne, May 31, 2002 29 Remote Client HTTP GET or getSrvDesc() publish(...) getTuples() getLinks() Interface T1 query(...) ... Invocation Content Link Tn Presenter Consumer MinQuery XQuery Tuple 1 ... Presenter N Presenter 1 Content 1 27 Tuple N ... Content N OGSA Workshop, Argonne, May 31, 2002 30 5 5 Questions? • Input for improvements of OGSA • More information – http://cern.ch/grid-data-management/ – http://www.edg.org • Contacts – wolfgang.hoschek@cern.ch – peter.kunszt@cern.ch OGSA Workshop, Argonne, May 31, 2002 31 6 6