Caching XML Web Services to Support Disconnected Operation Venugopalan Ramasubramanian Cornell University Doug Terry Microsoft Research, Silicon Valley Web Services • method of providing and accessing services on the Internet – consumer services • hotmail, orbitz, mapquest, ebay, … – B to B services • supply chain management • request-response paradigm – RPCs on the internet XML Web Services • W3C (world wide web consortium) standards – Microsoft, IBM, HP, … – Microsoft .Net web services (HailStorm) • mycontacts, myprofile, myfavoritewebsites – TerraServer, CoolRooster • SOAP (simple object access protocol) – standard representation of web service requests/responses (SOAP-RPC) • WSDL (web services description language) – description of web services Availability of Web Services GOAL make web services available despite frequent disconnections and limited bandwidth! • web service clients reside on all kinds of devices – desktop, laptop, PDA, smart phone • network outages (especially wireless) • bandwidth restriction Governing Principles • cannot modify web services • cannot modify access protocols • can perhaps modify client – must also comply with existing clients • can interpose storage and computation client-side caching is a solution to improve availability! XML Standards: SOAP • SOAP-RPC standard – encoding definitions for data types – success, failure definitions • SOAP-Envelope – outer-most element • SOAP-Body – obligatory – request operation: name, parameters – response status: return value, failure • SOAP-Header – optional, multiple header blocks. – supplementary information: kerberos ticket • HTTP binding – HTTP request and response messages example: soap request <s:Envelope xmlns:s=“http://schemas.xmlsoap.org/soap/envelope/” xmlns:m=“http://schemas.microsoft.com/hs/2001/10/myContacts” xmlns:c=“http://schemas.microsoft.com/hs/2001/10/core” xmlns:mp="http://schemas.microsoft.com/hs/2001/10/myProfile" > <s:Header> <licenses xmlns="http://schemas.xmlsoap.org/soap/security/2000-12"> <c:identity> <c:kerberos>3240</c:kerberos> </c:identity> </licenses> <path xmlns="http://schemas.xmlsoap.org/rp/"> <action>http://schemas.microsoft.com/hs/2001/10/core#request</action> <to>http://terry.microsoft.com</to> <fwd><via /></fwd><rev><via /></rev> <id>b55528a4-5d63-49f1-87a2-5fab8d76f658</id> </path> <c:request service="myContacts" document="content" method="insert" genResponse="always" > <key puid="3240" instance="1" cluster="1" /> </c:request> </s:Header> <s:Body> <c:insertRequest select="/m:myContacts/m:contact[mp:name/mp:givenName = ‘Terry']/mp:emailAddress" > <mp:email>terry@microsoft.com</mp:email> </c:insertRequest> </s:Body> </s:Envelope> XML Standards: WSDL • concrete definition of the web service – data structures – interface offered by the web service • operation names and parameters – message formats (components of a message) – protocol binding (SOAP) • automatic generation of client-side stubs – Visual Studio .Net Experiments with Web Cache • experiment with existing clients and services (Microsoft .Net web services) • check feasibility by building a cache to store HTTP requests/responses MyContacts MyServices cache MyProfile Issues in Caching • web services are active – default HTTP cache directive is No Cache! • web services are diverse – unlike files and databases, web services have custom interfaces • fundamental questions – which requests are cacheable? – which operations have permanent side effects? – how to understand requests/responses? • services use different formats for requests/responses example: soap request <s:Envelope xmlns:s=“http://schemas.xmlsoap.org/soap/envelope/” xmlns:m=“http://schemas.microsoft.com/hs/2001/10/myContacts” xmlns:c=“http://schemas.microsoft.com/hs/2001/10/core” xmlns:mp="http://schemas.microsoft.com/hs/2001/10/myProfile" > <s:Header> <licenses xmlns="http://schemas.xmlsoap.org/soap/security/2000-12"> <c:identity> <c:kerberos>3240</c:kerberos> </c:identity> </licenses> <path xmlns="http://schemas.xmlsoap.org/rp/"> <action>http://schemas.microsoft.com/hs/2001/10/core#request</action> <to>http://terry.microsoft.com</to> <fwd><via /></fwd><rev><via /></rev> <id>b55528a4-5d63-49f1-87a2-5fab8d76f658</id> </path> <c:request service="myContacts" document="content" method="insert" genResponse="always" > <key puid="3240" instance="1" cluster="1" /> </c:request> </s:Header> <s:Body> <c:insertRequest select="/m:myContacts/m:contact[mp:name/mp:givenName = ‘Terry']/mp:emailAddress" > <mp:email>terry@microsoft.com</mp:email> </c:insertRequest> </s:Body> </s:Envelope> Issues in Caching contd. request 1: query request <queryRequest select = “myContacts/contact[name=‘terry’]” /> request 2: delete request <deleteRequest select = “myContacts/contact[name=‘terry’]/phone[@cat=‘cell’]” /> • consistency – later requests might invalidate responses cached earlier. • read/write, write/write conflicts – how to specify consistency requirements for generic web services? More Issues… • user experience – user unaware of web service cache – operations reportedly successful could fail! • hoarding – keeping the cache hot – user controlled hoard requests • security – enforce access control Our Approach • annotate WSDL description of web services to define cache properties – published by service providers or third party – no changes to server side code required • transparent cache for web services – acts as a web proxy on the client machine – no modifications of the client program necessary • custom cache managers for each web service – generated automatically from the annotated WSDL description Architecture Proxy Server Web Client 1 Web Client 2 Cache C C M 1 C C M 2 WBQ CCM1: Custom Cache Manager 1 WBQ: Write Back Queue C C M 3 I N T E R N E T Web Service 1 Web Service 2 Web Service 3 WSDL Annotations: for each Operation • cacheable: the operation can be cached • lifetime: the duration for which replies are cached • play-back: the operation has side effects and must be played back when connection is restored • default-response: a default response will be sent when connection is not available WSDL Annotations: for each Service • identify the operation (operationName) – xpath (xml query language) expression to extract the name of the operation • extract the request message (identifier) – portions of the request message should be ignored while caching (date) – xpath expression to extract relevant parts of the message for identification snippet from annotated myContacts.wsdl <binding name="myContactsBinding" type="tns:myContactsPort" operationName = "substring-before(localname(/senv:Envelope/senv:Body/*[1]), 'Request')" Identifier = "/senv:Envelope/senv:Header/s0:licenses | /senv:Envelope/senv:Header/s1:request | /senv:Envelope/senv:Body"> <s:binding transport="http://schemas.xmls.org/s/http" style="document" /> <operation name="insert" cacheable="false" playback="true" defaultResponse="true" cacheHeader="true"> <s:operation sAction="http://schemas.microsoft.com/hs/2001/10/c#request" /> Annotations for Consistency • when does request 2 invalidate the response of an earlier request 1 in the cache? – an insert could invalidate an earlier query response • consider requests to be functions with signatures req1: op1 (param1,1, param1,2, …, param1,n) req2: op2 (param2,1, param2,2, …, param2,m) • invalidate condition is an expression of req1 and req2 f(op1, op2, param1,1, …, param2,1, …) Annotations for Consistency: XSL Transformations • extensible style sheet language (XSL) – transforms XML documents in to html/text/xml – Turing-complete language • cache transform: transforms a cached response – input: request1, reply1, request2, reply2 – output: transformed reply1 (null if invalidated) • powerful than just specifying invalidations – can actually transform the old response Cache Transform Example request 1: query request <queryRequest select = “myContacts/contact[name=‘terry’]” /> request 2: delete request <deleteRequest select = “myContacts/contact[name=‘terry’]/phone[@cat=‘cell’]” /> smart cache transform would delete the cell phone number from the cached query response <xsl:template match="/"> <xsl:variable name="service1" select="$req1/s:Header/c:request/@service"/> <xsl:variable name="service2" select="$req2/s:Header/c:request/@service"/> <xsl:variable name="opName1" select="substring-before(local-name($req1/s:Body/*[1]), 'Request')"/> <xsl:variable name="opName2" select="substring-before(local-name($req2/s:Body/*[1]), 'Request')"/> <xsl:choose> <xsl:when test="$service1 = $service2"> <xsl:choose> <xsl:when test="$opName2 = 'query' and ($opName1 = 'insert' or $opName1 = 'delete' or $opName1 = 'replace')"> <xsl:variable name="cleanQuery1"> <xsl:call-template name="StripSegment"> <xsl:with-param name="xpQuery" select="substring-after($req1/s:Body/c:*/@select, '/')"/> </xsl:call-template> </xsl:variable> <xsl:variable name="cleanQuery2"> <xsl:call-template name="StripSegment"> <xsl:with-param name="xpQuery" select="substring after($req2/s:Body/c:queryRequest/c:xpQuery/@select, '/')"/> </xsl:call-template> </xsl:variable> <xsl:call-template name="CheckIntersection"> <xsl:with-param name="xpQuery1" select="$cleanQuery1"/> <xsl:with-param name="xpQuery2" select="$cleanQuery2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$rep2"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:value-of select="$rep2"/> </xsl:otherwise> </xsl:choose> </xsl:template> Picking Level of Consistency • user-freedom in choosing consistency guarantees – multiple consistency transforms • strong consistency – less availability – better user experience • weak consistency – user experience could deteriorate • operations reportedly successful could fail! • optional cache header – better availability More Transforms • response transform – response from the cache may have to be changed before returning to the client. – adding time-stamp, unique identifiers etc. • default response transform – generates a default response for a request. – default responses are returned when disconnected but request is queued for playback Optional Cache Header • cache provides information to the client using cache header – response from cache or server – age of cached response – request will be played back in the future • no changes to the definition of WSDL – would not affect existing clients in any way. • cache aware clients can provide additional information to the user example: default response and cache header <s:Envelope xmlns:s=“http://schemas.xmlsoap.org/soap/envelope/” xmlns:hs="http://schemas.microsoft.com/hs/2001/10/core"> <s:Header> <path xmlns="http://schemas.xmlsoap.org/rp/"> <action>http://schemas.microsoft.com/hs/2001/10/core#response</action> </rev> <from>http://terry.microsoft.com</from> <relatesTo > d978b559-aceb-4e9e-9747-b8a306234bc8 <relatesTo> </path> < response xmlns ="http://schemas.microsoft.com/hs/2001/10/core" /> <cacheHeader defaultResponse="true" toPlayback="true" xmlns="http://localhost/wsdlannotation" /> </s:Header> <s:Body> <hs:insertResponse status="success" selectedNodeCount="1" newChangeNumber="0" /> </s:Body> </s:Envelope> Conclusion • built a prototype web services cache • experimented with Hailstorm web services and clients • annotated Hailstorm WSDL files • the prototype demonstrates custom cache managers in action for Hailstorm • couldn’t give a demo Work for the Future • WSDL annotations for more web services – hard to find interesting web services with WSDL descriptions yet! • hoarding to enhance availability – specify user controlled hoard queries – hoard transform to obtain response from cached hoard requests • incorporate security constraints • tune cache performance