Active XML: A data-centric perspective on Web services 1 Omar Benjelloun INRIA Futurs With: Serge Abiteboul, Tova Milo, and many others. Omar Benjelloun – Active XML April 30th, 2004 2 Active XML - Outline Introduction Active XML • • Active XML documents Active XML services Novel issues • • Exchanging Active XML data Querying Active XML data Active XML Peers • • • The peer as a client The peer as a server Theoretical foundations Applications Conclusion Omar Benjelloun – Active XML 3 Introduction Omar Benjelloun – Active XML 4 Distributed data management in P2P Information is everywhere XML XML Web service XML services services XML Internet services XML XML XML XML services Omar Benjelloun – Active XML Web service Data warehouses Databases Web sites PC, PDA, cell phones, home appliances, cars… 5 The golden triangle of distributed data management XML XML a standard for data representation & exchange • • • Extensible Markup Language Labeled ordered trees Types: XML Schema / tree automata Query languages • XPath, XQuery Web services standards for distributed computing • • • SOAP, WSDL, UDDI Activation of methods on remote servers Many burgeoning standard proposals (Choreography, QoS, user interface, etc.) Omar Benjelloun – Active XML SOAP WSDL XQuery XPath 6 What is Active XML (AXML)? AXML is a declarative language for distributed information management and an infrastructure to support this language, in a peer-to-peer framework. Omar Benjelloun – Active XML 7 Active XML Omar Benjelloun – Active XML 8 Active XML documents XML documents with embedded calls to Web services Intensional • Some of the data is given explicitly • Some is given intensionally (i.e. the means to acquire data when needed are given) Dynamic • If the external sources change, the same document will provide • different information Reaction to world changes Omar Benjelloun – Active XML 9 Not a new idea in databases, nor on the Web Mixing calls to data is an old idea • Procedural attributes in relational systems • Basis of Object-oriented Databases In Web programming • Sun’s JSP, PHP+MySQL Calls to Web services inside documents • Macromedia FLEX, Apache Jelly, Microsoft XAML What is new is the exploitation of the idea… Omar Benjelloun – Active XML 10 Web services in brief A number of standards • XML • SOAP: Exchange of messages between applications • WSDL: Description of service interfaces (e.g. input/output types) • UDDI: Advertisement and discovery of services • … other proposed standards (choreography, security, etc.) For us: means to provide, invoke and describe remote functions with XML input/output. They make AXML documents universally understandable. Omar Benjelloun – Active XML 11 A sample AXML document <?xml version=“1.0” ?> <newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp”> <city>Paris</city> </call> <call svc=“TimeOut.GetEvents”> exhibits </call> </newspaper> newspaper GetEvents title date GetTemp “Exhibits” city “06/10/2003” “Le Monde” “Paris” AXML documents may contain calls: • • Omar Benjelloun – Active XML to any existing Web services (e-bay.net, google.com…) to any AXML Web services (to be defined) 12 Materialization <?xml version=“1.0” ?> <newspaper> <title>Le Monde</title> <date>06/10/2003</date> <temp>16°C</temp> <call svc=“Yahoo.GetTemp”> <city>Paris</city> </call> <call svc=“TimeOut.GetEvents”> exhibits </call> </newspaper> newspaper GetEvents temp GetTemp date “Exhibits” city “16°C” “06/10/2003” “Paris” “Le Monde” title SOAP call Y! We will see later that: • • Replacing the call by its result is not the only option Calls are not necessarily RPC-style synchronous invocations Omar Benjelloun – Active XML 13 AXML Web services Parameters: AXML data Result: AXML data Great flexibility Distribute computations: by sending as parameters data containing service calls, one can delegate some work to other peers. Partial computations: by returning data containing service calls, one can give to the receiver the control of these calls. Omar Benjelloun – Active XML 14 Calling an AXML service <?xml version=“1.0” ?> <newspaper> <title>Le Monde</title> <date>06/10/2003</date> <temp>16°C</temp> <exhibits> <call svc=“Yahoo.GetExhibits”> <call svc=“TimeOut.GetEvents”> <city>Paris</city> exhibits </call> </call> </exhibits> </newspaper> newspaper GetEvents exhibits title temp “Exhibits” GetExhibits date “06/10/2003” “16°C” “Le Monde” “Paris” SOAP call (still…) T! Materialization is a recursive process Termination is an issue Omar Benjelloun – Active XML City 15 Organization Novel issues raised by the AXML language • • Exchange of AXML data Querying AXML data Supporting infrastructure • AXML peers: – Management of persistent AXML data – Declarative AXML services Applications Omar Benjelloun – Active XML 16 Novel issues Omar Benjelloun – Active XML 17 Active XML - Outline Introduction Active XML • • Active XML documents Active XML services Novel issues • • Exchanging Active XML data (SIGMOD 2003) Querying Active XML data Active XML Peers • • • The peer as a client The peer as a server Theoretical foundations Applications Conclusion Omar Benjelloun – Active XML 18 To call or not to call ? newspaper GetEvents temp GetTemp “Exhibits” city “06/10/2003” “16°C” “Le Monde” “Paris” title Y! Materialization can be performed by the sender, before sending a document… or by the receiver, after receiving it. Omar Benjelloun – Active XML date 19 Why control the materialization of calls? For added functionality, e.g. • Intensional data allows to get up-to-date information. For security reasons or capabilities, e.g. • I don’t trust this Web service/domain, • I don’t have the right credentials to invoke it, • It costs money, • Maybe the receiver doesn’t know Active XML! For performance reasons, e.g. • A proxy can invoke all the services on behalf of a PDA. … and many more reasons you can think of! Omar Benjelloun – Active XML 20 How to control it? Using types We extend XML Schema, with intensional types: XMLSchemaint Sender Capabilities ACL Cost ... g q f g q r ... q ... g g q g ... g f r q g f g ... r ... data exchange Schema f Receiver Capabilities ACL Cost ... ... ... Static analysis algorithms use signatures of services: WSDLint Omar Benjelloun – Active XML 21 The extended schema language To simplify, we use here a DTD-like syntax Data: newspaper = title.date.(GetTemp|temp).(GetEvents|exhibit*) title = data date = data temp = data city = data exhibit = title.(GetDate|date) newspaper GetEvents title date GetTemp “Exhibits” city “06/10/2003” “Le Monde” “Paris” Functions: GetTemp(city) -> temp GetEvents(data) -> (exhibit|performance)* GetDate(title) -> date Rewriting: replace call(s) by an arbitrary output of the service. Omar Benjelloun – Active XML 22 Rewritings The Goal: Given • an intensional document d • a schema s, Can we rewrite d so that it matches s? Safe rewriting: one that for sure leads to s (we know without making any call). Possible rewriting: one that may lead to s (depending on the answers of services). Omar Benjelloun – Active XML 23 Difficulties Infinite search space • Vertical • Horizontal Main problem • The result of a Web service call is unknown, • We just know a signature (input/output types) We want a very efficient solution. Foundations of the problem • String & tree automata, • with existential and universal transitions. Omar Benjelloun – Active XML 24 Results The general problem is undecidable [MSS03] Restrictions on the considered rewritings • Left-to-right: No “going back and forth” • K-depth: bound on the nesting of function calls (Search space still infinite but finitely representable) Under these restrictions • We have algorithms to find safe/possible rewritings. • They are PTIME (for deterministic schemas). • We can also do it between schemas. Implementation • demo at VLDB 2003 (customizable news syndication) Omar Benjelloun – Active XML 25 Safe rewriting algorithm Sketch • Deal with function parameters first, • Top-down traversal of the tree, • For each data node: – rewrite its children (viewed as a word), – to match the target type (a regular expression) – using regular automata techniques, and smart marking. Omar Benjelloun – Active XML 26 Safe rewriting algorithm (2) Build an FSA that accepts all k-depth rewritings of the initial word. q0 title q1 date q2 GetTemp q3 q5 temp q6 GetEvents q7 exhibit Aw1 Build an FSA that recognizes the complement of the target type. * title p0 performance * date p1 * temp p2 *GetEvents p3 p4 exhibit * p5 A exhibit Omar Benjelloun – Active XML q4 * p6 * 27 Safe rewriting algorithm (3) Compute the intersection of these languages: performance exhibit q7,p6 q4,p6 GetEvents q3,p6 q7,p6 GetTemp q0,p0 title q1,p1 date q2,p2 q3,p3 q5,p2 temp exhibit exhibit performance q6,p3 GetEvents q7,p3 q4,p3 q4,p4 A Awk A A smart marking determines whether a safe rewriting exists. Then run the word on the marked automaton to find an actual rewriting. Optimization: lazy construction of the automata Omar Benjelloun – Active XML q4,p5 exhibit performance q7,p5 28 Active XML - Outline Introduction Active XML • • Active XML documents Active XML services Novel issues • • Exchanging Active XML data Querying Active XML data (SIGMOD 2004) Active XML Peers • • • The peer as a client The peer as a server Theoretical foundations Applications Conclusion Omar Benjelloun – Active XML 29 Querying AXML Data Given a (tree pattern) query: /newspaper[temp > 18°C]/exhibits//exhibit[location=“Le Louvre”] newspaper Materialize the document? Call only the services that may data to the query answer. exhibits GetEvents temp GetTemp contribute title “Exhibits” getDate GetExhibits city “19°C” City “Paris” “Le Monde” “Paris” The problem: Lazy evaluation of service calls To call or not to call, this time when evaluating a query Omar Benjelloun – Active XML 30 Lazy evaluation Difficulties: • • • • Calls can be found everywhere in the document May appear dynamically (as a result of previous calls) May become (ir)relevant due to previous invocations Need to take signatures of calls into consideration A possible approach: modify the query processor • • • Top-down evaluation Trigger the calls found on the way Not so great: – Computation is blocked – Optimization opportunities are lost Omar Benjelloun – Active XML 31 Our solution Given a query to evaluate: newspaper temp > 18°C exhibits exhibit location “Le Louvre” newspaper Derive a set of exhibits “node-focused” queries (NFQ), that find the relevant calls when evaluated on the document. temp * * * > 18°C Need to be reevaluated, as the document evolves! Omar Benjelloun – Active XML Etc. 32 Optimizations Service calls sequencing • • Analysis of the relationship between calls (through the NFQ’s) Layering, and parallelization inside each layer. Refinement via type analysis • Matching output types of services with data expected of queries “Pushing” queries to capable services Acceleration: • • Via relaxation: – NFQ approximation – Superset of the relevant calls Via a special access structure, similar to a DataGuide: – Restricted to paths that lead to service calls – Indexes the calls Experimental assessment • 10x speed-up when combining optimizations Omar Benjelloun – Active XML 33 Active XML peers Omar Benjelloun – Active XML 34 Distributed data management in P2P Web service XML XML AXML services XML AXML services AXML AXML Web AXML XML services XML XML XML XML AXML services Omar Benjelloun – Active XML AXML Web service 35 What do we need from an AXML system ? Persistent, manageable, dynamic AXML data. Easy ways to define services Control of the exchanged data (parameters & results of service calls) • • • Repository: manages persistent AXML data Client: uses (AXML) Web services Server: provides AXML services Omar Benjelloun – Active XML soap Peer-to-peer architecture, where each AXML peer: AXML peer 36 Global architecture AXML peer S2 AXML peer S1 Query engine query SOAP AXML engine AXML SOAP wrapper read update AXML store AXML SOAP service descriptions XML XML Omar Benjelloun – Active XML AXML peer S3 SOAP service SOAP client 37 Implementation SUN’s Java SDK 1.4 (includes XML parser, XPath processor, XSLT engine) Apache Tomcat 4.1 servlet engine Apache Axis SOAP toolkit 1.1 X-OQL query processor, persistent DOM repository JSP-based Web user interface, using JSTL 1.0 standard tag library Also, a lightweight implementation for PDA/phone (J2ME, CLDC profile), used for [ABB03demo]. Omar Benjelloun – Active XML 38 Active XML - Outline Introduction Active XML • • Active XML documents Active XML services New issues • • Exchanging Active XML data Querying Active XML data Active XML Peers • • • The peer as a client The peer as a server Theoretical foundations Applications • • • P2P auctions News syndication Other applications Conclusion Omar Benjelloun – Active XML 39 Managing persistent AXML data “Our newspaper should have its temperature information refreshed daily. New exhibits should be fetched every week and archived for 6 months” Service call results enrich the document (calls can be kept for possible future reuse) Main issues: • When to activate a service call? • What to do with its result? Omar Benjelloun – Active XML 40 When to activate a service call? Explicit pull mode • • Daily, weekly, or after some event: e.g., when another call occurs This aspect of the problem is related to active databases Implicit pull mode • • Detect which intensional information (the service calls) may contribute to the answer of a query (lazy evaluation) This aspect of the problem is related to deductive databases Push mode • • Based on a query subscription; the service provider pushes information to the client (E.g., for synchronization purposes) This is related to stream and subscription queries Omar Benjelloun – Active XML 41 Managing service call results How long does the returned data remain valid? • • • Just long enough to answer a query: Mediation 1 day, 1 week, … or unbounded: Caching / Warehousing Various portions of the document may follow different policies: Hybrid For repeated service call invocations: merge policy • • • • append, replace, Fusion (using XML Schema-like keys), Specific merge policies can be provided as Web services Omar Benjelloun – Active XML Example: AXML document with control attributes <?xml version=“1.0” ?> <newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp” mode=“lazy” valid=“1 day” merge=“replace” > <city>Paris</city> </call> <call svc=“TimeOut.GetEvents” mode=“every Monday morning” valid=“6 months” merge=“append”> exhibits </call> </newspaper> Omar Benjelloun – Active XML 42 43 Active XML - Outline Introduction Active XML • • Active XML documents Active XML services Novel issues • • Exchanging Active XML data Querying Active XML data Active XML Peers • • • The peer as a client The peer as a server Theoretical foundations Applications Conclusion Omar Benjelloun – Active XML 44 Declarative AXML services Services can be defined by queries or updates over the AXML documents of the repository (XQuery, XPath, Xupdate) let service GetExhibitsByLocation($loc) be for $a in document(“newspaper.xml")/newspaper/exhibits, $b in $a//exhibit where $b@name=$loc return <exhibits> {$b} </exhibits> Which (lazy) service calls may contribute to the answer? Omar Benjelloun – Active XML 45 Other means to define services Other programming languages: • XSLT transformations (through Apache Xalan) • Java classes (through Axis) Composition of existing services: • BPEL4WS (through IBM’s BPEL4J implementation) Omar Benjelloun – Active XML 46 Active XML - Outline Introduction Active XML • • Active XML documents Active XML services New issues • • Exchanging Active XML data Querying Active XML data Active XML Peers • • • The peer as a client The peer as a server Theoretical foundations (PODS 2004) Applications Conclusion Omar Benjelloun – Active XML 47 Theoretical foundations: Positive AXML Restricted framework • Data model • – set-based (unordered) AXML trees – Call results are accumulated in documents Services – Monotone – Positive: defined by conjunctive fragment of XQuery Results • Well-defined (possibly infinite) fix-point semantics • Termination, lazy evaluation… Connections to: • Regular (infinite) trees, Query-Sub-Query [AM04],… Omar Benjelloun – Active XML 48 Applications Omar Benjelloun – Active XML 49 Demos Peer-to-peer auctions • Discovery of new peers/auctions through intensional answers RSS News syndication • (VLDB 2002 demo) (VLDB 2003 demo 1) Customization of services through schemas + news subscriptions Distributed workspaces (VLDB 2003 demo 2) Web warehousing (ECDL 2003 demo) A powerful framework for the fast development of distributed, data-centric applications. Omar Benjelloun – Active XML 50 Other applications E.dot, a dynamic warehouse on food risk management • Use AXML as the platform for the warehouse definition, construction and maintenance Network configuration • Use AXML exchange of information to configure hardware/software components Software distribution • Use AXML to customize distributions and keep your view of the software fresh Decentralized user profile/patient data management • Use AXML to coordinate the integration of data, and privacy enforcement services in a uniform way Omar Benjelloun – Active XML 51 Conclusion Omar Benjelloun – Active XML 52 AXML documents and services A simple paradigm… …that allows for new, powerful features. • • • Intensional parameters and results: AXML documents can be exchanged Support for continuous services (streams of answers) Control over the exchange of AXML data Issues Control of call activation via typing, Lazy evaluation, Replication and distribution, Security, Mobility, Termination, Implementation, Foundations, … Omar Benjelloun – Active XML 53 Current/Future work Security and privacy (with Bell Labs) Editor/browser plug-in for AXML Mass storage XML DB (with Xyleme Corp.) P2P infrastructure … Omar Benjelloun – Active XML 54 To know more… http://purl.org/net/axml • • • Implementation becomes open-source Already available for research Will be released publicly very soon. Selected publications • S.Abiteboul, O. Benjelloun, T. Milo: • • • • Positive Active XML, PODS, 2004. S.Abiteboul, O. Benjelloun, B. Cautis, I. Manolescu, T. Milo, N. Preda: Lazy Query Evaluation for Active XML, SIGMOD, 2004. T. Milo, S. Abiteboul, B. Amann, O. Benjelloun, F. Dang Ngoc: Exchanging Intensional XML Data, SIGMOD, 2003 (full version to appear in TODS). S. Abiteboul, O. Benjelloun, I. Manolescu, T. Milo, R. Weber: Active XML: A Data-Centric Perspective on Web Services (book chapter), In Web Dynamics, Springer, 2004. S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, T.Milo: Dynamic XML Documents with Distribution and Replication, SIGMOD, 2003 Omar Benjelloun – Active XML 55 Merci Omar Benjelloun – Active XML 56 Omar Benjelloun – Active XML 57 Extra slides Omar Benjelloun – Active XML 58 Asynchronous/Continuous services The client subscribes and then is notified The server decides when to send data • E.g., promotional offers Change control: • Management of replication [ABCMM03] • What to do when a change is detected – Send the new state of data – Send the delta between old and new state – Dual of merge policies Omar Benjelloun – Active XML 59 Peer-to-peer auctions (VLDB 2002 demo) Each peer proposes auctions: • Document myauctions.xml with the Each peer knows about peer’s items and their current bids other peers’ auctions: • Services offered: • Document – getLocalAuctions(), – status(auctionId) Each peer bids on auctions: • Document mybids.xml with the • peer’s bids Services offered: – bid(peer,auctionId, amount) – bidUpTo(peer, auctionId, increment, limit) Omar Benjelloun – Active XML • allauctions.xml contains calls to other peers that transitively retrieve their known auctions. Service offered : getAllAuctions() When an auction closes, the winner is notified. 60 News syndication (VLDB 2003 demo) News sources: •GetStory(id) •GetNewsAbout(kwd) Aggregators: •GetNewsAbout(kwd) •…but several versions, more or less intensional Clients: •PC, laptops, PDAs Omar Benjelloun – Active XML 61 Service customization using schemas Customizing the output of services • • • News sources/aggregators provide different versions of GetNewsAbout with different output schemas The output is automatically transformed into the desired schema Clients can also specify a desired output schema as a parameter Customizing the input of services • • Location-aware continuous services for mobile users The context of the user is given by intensional parameters Distributed logging mechanism • Also customizable through the use of schemas Omar Benjelloun – Active XML 62 Call parameters <temp> <call svc=“GetTemp@weather.com”><city>“Denver”</city></call> </temp> XML <temp> <call svc=“GetTemp@weather.com”>../../city</call> </temp> XPath <temp> <call svc=“GetTemp@weather.com”> <city> <call svc=“GetCapital@us.gov”>“colorado”</call> </city> </call> </temp> AXML To call or not to call (before invoking) ? Omar Benjelloun – Active XML