Caching Web Services

advertisement
Caching XML Web Services to
Support Disconnected Operation
Venugopalan Ramasubramanian
Cornell University
Doug Terry
Microsoft Research, Silicon Valley
Web Services
• method of providing and accessing
services on the Internet
– consumer services
• hotmail, orbitz, mapquest, ebay, …
– B to B services
• supply chain management
• request-response paradigm
– RPCs on the internet
XML Web Services
• W3C (world wide web consortium) standards
– Microsoft, IBM, HP, …
– Microsoft .Net web services (HailStorm)
• mycontacts, myprofile, myfavoritewebsites
– TerraServer, CoolRooster
• SOAP (simple object access protocol)
– standard representation of web service
requests/responses (SOAP-RPC)
• WSDL (web services description language)
– description of web services
Availability of Web Services
GOAL
make web services available despite frequent
disconnections and limited bandwidth!
• web service clients reside on all kinds of
devices
– desktop, laptop, PDA, smart phone
• network outages (especially wireless)
• bandwidth restriction
Governing Principles
• cannot modify web services
• cannot modify access protocols
• can perhaps modify client
– must also comply with existing clients
• can interpose storage and computation
client-side caching is a solution to
improve availability!
XML Standards: SOAP
• SOAP-RPC standard
– encoding definitions for data types
– success, failure definitions
• SOAP-Envelope
– outer-most element
• SOAP-Body
– obligatory
– request operation: name, parameters
– response status: return value, failure
• SOAP-Header
– optional, multiple header blocks.
– supplementary information: kerberos ticket
• HTTP binding
– HTTP request and response messages
example: soap request
<s:Envelope xmlns:s=“http://schemas.xmlsoap.org/soap/envelope/”
xmlns:m=“http://schemas.microsoft.com/hs/2001/10/myContacts”
xmlns:c=“http://schemas.microsoft.com/hs/2001/10/core”
xmlns:mp="http://schemas.microsoft.com/hs/2001/10/myProfile" >
<s:Header>
<licenses xmlns="http://schemas.xmlsoap.org/soap/security/2000-12">
<c:identity> <c:kerberos>3240</c:kerberos> </c:identity>
</licenses>
<path xmlns="http://schemas.xmlsoap.org/rp/">
<action>http://schemas.microsoft.com/hs/2001/10/core#request</action>
<to>http://terry.microsoft.com</to>
<fwd><via /></fwd><rev><via /></rev>
<id>b55528a4-5d63-49f1-87a2-5fab8d76f658</id>
</path>
<c:request service="myContacts" document="content" method="insert" genResponse="always" >
<key puid="3240" instance="1" cluster="1" />
</c:request>
</s:Header>
<s:Body>
<c:insertRequest
select="/m:myContacts/m:contact[mp:name/mp:givenName = ‘Terry']/mp:emailAddress" >
<mp:email>terry@microsoft.com</mp:email>
</c:insertRequest>
</s:Body>
</s:Envelope>
XML Standards: WSDL
• concrete definition of the web service
– data structures
– interface offered by the web service
• operation names and parameters
– message formats (components of a message)
– protocol binding (SOAP)
• automatic generation of client-side stubs
– Visual Studio .Net
Experiments with Web Cache
• experiment with existing clients and services
(Microsoft .Net web services)
• check feasibility by building a cache to store
HTTP requests/responses
MyContacts
MyServices
cache
MyProfile
Issues in Caching
• web services are active
– default HTTP cache directive is No Cache!
• web services are diverse
– unlike files and databases, web services have custom
interfaces
• fundamental questions
– which requests are cacheable?
– which operations have permanent side effects?
– how to understand requests/responses?
• services use different formats for requests/responses
example: soap request
<s:Envelope xmlns:s=“http://schemas.xmlsoap.org/soap/envelope/”
xmlns:m=“http://schemas.microsoft.com/hs/2001/10/myContacts”
xmlns:c=“http://schemas.microsoft.com/hs/2001/10/core”
xmlns:mp="http://schemas.microsoft.com/hs/2001/10/myProfile" >
<s:Header>
<licenses xmlns="http://schemas.xmlsoap.org/soap/security/2000-12">
<c:identity> <c:kerberos>3240</c:kerberos> </c:identity>
</licenses>
<path xmlns="http://schemas.xmlsoap.org/rp/">
<action>http://schemas.microsoft.com/hs/2001/10/core#request</action>
<to>http://terry.microsoft.com</to>
<fwd><via /></fwd><rev><via /></rev>
<id>b55528a4-5d63-49f1-87a2-5fab8d76f658</id>
</path>
<c:request service="myContacts" document="content" method="insert" genResponse="always" >
<key puid="3240" instance="1" cluster="1" />
</c:request>
</s:Header>
<s:Body>
<c:insertRequest
select="/m:myContacts/m:contact[mp:name/mp:givenName = ‘Terry']/mp:emailAddress" >
<mp:email>terry@microsoft.com</mp:email>
</c:insertRequest>
</s:Body>
</s:Envelope>
Issues in Caching contd.
request 1: query request
<queryRequest select = “myContacts/contact[name=‘terry’]” />
request 2: delete request
<deleteRequest select =
“myContacts/contact[name=‘terry’]/phone[@cat=‘cell’]” />
• consistency
– later requests might invalidate responses cached
earlier.
• read/write, write/write conflicts
– how to specify consistency requirements for generic
web services?
More Issues…
• user experience
– user unaware of web service cache
– operations reportedly successful could fail!
• hoarding
– keeping the cache hot
– user controlled hoard requests
• security
– enforce access control
Our Approach
• annotate WSDL description of web services to
define cache properties
– published by service providers or third party
– no changes to server side code required 
• transparent cache for web services
– acts as a web proxy on the client machine
– no modifications of the client program necessary 
• custom cache managers for each web service
– generated automatically from the annotated WSDL
description 
Architecture
Proxy
Server
Web Client 1
Web Client 2
Cache
C
C
M
1
C
C
M
2
WBQ
CCM1: Custom Cache Manager 1
WBQ: Write Back Queue
C
C
M
3
I
N
T
E
R
N
E
T
Web
Service 1
Web
Service 2
Web
Service 3
WSDL Annotations: for each
Operation
• cacheable: the operation can be cached
• lifetime: the duration for which replies are
cached
• play-back: the operation has side effects
and must be played back when connection
is restored
• default-response: a default response will
be sent when connection is not available
WSDL Annotations:
for each Service
• identify the operation (operationName)
– xpath (xml query language) expression to
extract the name of the operation
• extract the request message (identifier)
– portions of the request message should be
ignored while caching (date)
– xpath expression to extract relevant parts of
the message for identification
snippet from annotated myContacts.wsdl
<binding name="myContactsBinding" type="tns:myContactsPort"
operationName =
"substring-before(localname(/senv:Envelope/senv:Body/*[1]), 'Request')"
Identifier = "/senv:Envelope/senv:Header/s0:licenses |
/senv:Envelope/senv:Header/s1:request |
/senv:Envelope/senv:Body">
<s:binding transport="http://schemas.xmls.org/s/http" style="document" />
<operation name="insert" cacheable="false" playback="true"
defaultResponse="true" cacheHeader="true">
<s:operation sAction="http://schemas.microsoft.com/hs/2001/10/c#request" />
Annotations for Consistency
• when does request 2 invalidate the response of
an earlier request 1 in the cache?
– an insert could invalidate an earlier query response
• consider requests to be functions with signatures
req1: op1 (param1,1, param1,2, …, param1,n)
req2: op2 (param2,1, param2,2, …, param2,m)
• invalidate condition is an expression of req1 and
req2
f(op1, op2, param1,1, …, param2,1, …)
Annotations for Consistency:
XSL Transformations
• extensible style sheet language (XSL)
– transforms XML documents in to html/text/xml
– Turing-complete language
• cache transform: transforms a cached
response
– input: request1, reply1, request2, reply2
– output: transformed reply1 (null if invalidated)
• powerful than just specifying invalidations
– can actually transform the old response
Cache Transform Example
request 1: query request
<queryRequest select = “myContacts/contact[name=‘terry’]” />
request 2: delete request
<deleteRequest select =
“myContacts/contact[name=‘terry’]/phone[@cat=‘cell’]” />
smart cache transform would delete the cell
phone number from the cached query response
<xsl:template match="/">
<xsl:variable name="service1" select="$req1/s:Header/c:request/@service"/>
<xsl:variable name="service2" select="$req2/s:Header/c:request/@service"/>
<xsl:variable name="opName1" select="substring-before(local-name($req1/s:Body/*[1]), 'Request')"/>
<xsl:variable name="opName2" select="substring-before(local-name($req2/s:Body/*[1]), 'Request')"/>
<xsl:choose>
<xsl:when test="$service1 = $service2">
<xsl:choose>
<xsl:when test="$opName2 = 'query' and ($opName1 = 'insert' or $opName1 = 'delete' or $opName1 = 'replace')">
<xsl:variable name="cleanQuery1">
<xsl:call-template name="StripSegment">
<xsl:with-param name="xpQuery" select="substring-after($req1/s:Body/c:*/@select, '/')"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="cleanQuery2">
<xsl:call-template name="StripSegment">
<xsl:with-param name="xpQuery" select="substring after($req2/s:Body/c:queryRequest/c:xpQuery/@select, '/')"/>
</xsl:call-template>
</xsl:variable>
<xsl:call-template name="CheckIntersection">
<xsl:with-param name="xpQuery1" select="$cleanQuery1"/>
<xsl:with-param name="xpQuery2" select="$cleanQuery2"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$rep2"/>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$rep2"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Picking Level of Consistency
• user-freedom in choosing consistency
guarantees
– multiple consistency transforms
• strong consistency
– less availability 
– better user experience 
• weak consistency
– user experience could deteriorate 
• operations reportedly successful could fail!
• optional cache header
– better availability 
More Transforms
• response transform
– response from the cache may have to be
changed before returning to the client.
– adding time-stamp, unique identifiers etc.
• default response transform
– generates a default response for a request.
– default responses are returned when
disconnected but request is queued for playback
Optional Cache Header
• cache provides information to the client
using cache header
– response from cache or server
– age of cached response
– request will be played back in the future
• no changes to the definition of WSDL
– would not affect existing clients in any way.
• cache aware clients can provide additional
information to the user
example: default response and cache header
<s:Envelope xmlns:s=“http://schemas.xmlsoap.org/soap/envelope/”
xmlns:hs="http://schemas.microsoft.com/hs/2001/10/core">
<s:Header>
<path xmlns="http://schemas.xmlsoap.org/rp/">
<action>http://schemas.microsoft.com/hs/2001/10/core#response</action>
</rev>
<from>http://terry.microsoft.com</from>
<relatesTo > d978b559-aceb-4e9e-9747-b8a306234bc8 <relatesTo>
</path>
< response xmlns ="http://schemas.microsoft.com/hs/2001/10/core" />
<cacheHeader defaultResponse="true" toPlayback="true"
xmlns="http://localhost/wsdlannotation" />
</s:Header>
<s:Body>
<hs:insertResponse status="success" selectedNodeCount="1"
newChangeNumber="0" />
</s:Body>
</s:Envelope>
Conclusion
• built a prototype web services cache
• experimented with Hailstorm web services
and clients
• annotated Hailstorm WSDL files
• the prototype demonstrates custom cache
managers in action for Hailstorm
• couldn’t give a demo 
Work for the Future
• WSDL annotations for more web services
– hard to find interesting web services with
WSDL descriptions yet!
• hoarding to enhance availability
– specify user controlled hoard queries
– hoard transform to obtain response from
cached hoard requests
• incorporate security constraints
• tune cache performance
Download