slides - The Stanford University InfoLab

advertisement
Active XML:
A data-centric perspective
on Web services
1
Omar Benjelloun
INRIA Futurs
With: Serge Abiteboul, Tova Milo, and many others.
Omar Benjelloun – Active XML
April 30th, 2004
2
Active XML - Outline
Introduction
Active XML
•
•
Active XML documents
Active XML services
Novel issues
•
•
Exchanging Active XML data
Querying Active XML data
Active XML Peers
•
•
•
The peer as a client
The peer as a server
Theoretical foundations
Applications
Conclusion
Omar Benjelloun – Active XML
3
Introduction
Omar Benjelloun – Active XML
4
Distributed data management in P2P
Information is everywhere
XML
XML
Web
service
XML
services
services
XML
Internet
services
XML
XML
XML
XML
services
Omar Benjelloun – Active XML
Web
service
Data warehouses
Databases
Web sites
PC, PDA, cell phones,
home appliances, cars…
5
The golden triangle
of distributed data management
XML
XML
a standard for data representation & exchange
•
•
•
Extensible Markup Language
Labeled ordered trees
Types: XML Schema / tree automata
Query languages
•
XPath, XQuery
Web services
standards for distributed computing
•
•
•
SOAP, WSDL, UDDI
Activation of methods on remote servers
Many burgeoning standard proposals
(Choreography, QoS, user interface, etc.)
Omar Benjelloun – Active XML
SOAP
WSDL
XQuery
XPath
6
What is Active XML (AXML)?
AXML is a declarative language
for distributed information management
and
an infrastructure to support this language,
in a peer-to-peer framework.
Omar Benjelloun – Active XML
7
Active XML
Omar Benjelloun – Active XML
8
Active XML documents
XML documents with embedded calls to Web services
Intensional
• Some of the data is given explicitly
• Some is given intensionally
(i.e. the means to acquire data when needed are given)
Dynamic
• If the external sources change, the same document will provide
•
different information
Reaction to world changes
Omar Benjelloun – Active XML
9
Not a new idea in databases, nor on the Web
Mixing calls to data is an old idea
• Procedural attributes in relational systems
• Basis of Object-oriented Databases
In Web programming
• Sun’s JSP, PHP+MySQL
Calls to Web services inside documents
• Macromedia FLEX, Apache Jelly, Microsoft XAML
What is new is the exploitation of the idea…
Omar Benjelloun – Active XML
10
Web services in brief
A number of standards
• XML
• SOAP: Exchange of messages between applications
• WSDL: Description of service interfaces (e.g. input/output types)
• UDDI: Advertisement and discovery of services
• … other proposed standards (choreography, security, etc.)
For us: means to provide, invoke and describe
remote functions with XML input/output.
They make AXML documents universally understandable.
Omar Benjelloun – Active XML
11
A sample AXML document
<?xml version=“1.0” ?>
<newspaper>
<title>Le Monde</title>
<date>06/10/2003</date>
<call svc=“Yahoo.GetTemp”>
<city>Paris</city>
</call>
<call svc=“TimeOut.GetEvents”>
exhibits
</call>
</newspaper>
newspaper
GetEvents
title
date
GetTemp
“Exhibits”
city
“06/10/2003”
“Le Monde”
“Paris”
AXML documents may contain calls:
•
•
Omar Benjelloun – Active XML
to any existing Web services
(e-bay.net, google.com…)
to any AXML Web services
(to be defined)
12
Materialization
<?xml version=“1.0” ?>
<newspaper>
<title>Le Monde</title>
<date>06/10/2003</date>
<temp>16°C</temp>
<call
svc=“Yahoo.GetTemp”>
<city>Paris</city>
</call>
<call svc=“TimeOut.GetEvents”>
exhibits
</call>
</newspaper>
newspaper
GetEvents
temp
GetTemp
date
“Exhibits”
city
“16°C”
“06/10/2003”
“Paris”
“Le Monde”
title
SOAP
call
Y!
We will see later that:
•
•
Replacing the call by its result is not the only option
Calls are not necessarily RPC-style synchronous invocations
Omar Benjelloun – Active XML
13
AXML Web services
Parameters:
AXML data
Result:
AXML data
Great
flexibility
Distribute computations: by sending as parameters
data containing service calls, one can delegate some
work to other peers.
Partial computations: by returning data containing
service calls, one can give to the receiver the control
of these calls.
Omar Benjelloun – Active XML
14
Calling an AXML service
<?xml version=“1.0” ?>
<newspaper>
<title>Le Monde</title>
<date>06/10/2003</date>
<temp>16°C</temp>
<exhibits>
<call svc=“Yahoo.GetExhibits”>
<call
svc=“TimeOut.GetEvents”>
<city>Paris</city>
exhibits
</call>
</call>
</exhibits>
</newspaper>
newspaper
GetEvents
exhibits
title
temp
“Exhibits”
GetExhibits
date
“06/10/2003”
“16°C”
“Le Monde”
“Paris”
SOAP call
(still…)
T!
Materialization is a recursive process
Termination is an issue
Omar Benjelloun – Active XML
City
15
Organization
Novel issues raised by the AXML language
•
•
Exchange of AXML data
Querying AXML data
Supporting infrastructure
•
AXML peers:
– Management of persistent AXML data
– Declarative AXML services
Applications
Omar Benjelloun – Active XML
16
Novel issues
Omar Benjelloun – Active XML
17
Active XML - Outline
Introduction
Active XML
•
•
Active XML documents
Active XML services
Novel issues
•
•
Exchanging Active XML data (SIGMOD 2003)
Querying Active XML data
Active XML Peers
•
•
•
The peer as a client
The peer as a server
Theoretical foundations
Applications
Conclusion
Omar Benjelloun – Active XML
18
To call or not to call ?
newspaper
GetEvents
temp
GetTemp
“Exhibits”
city
“06/10/2003”
“16°C”
“Le Monde”
“Paris”
title
Y!
 Materialization can be performed
 by the sender, before sending a document…
 or by the receiver, after receiving it.
Omar Benjelloun – Active XML
date
19
Why control the materialization of calls?
For added functionality, e.g.
• Intensional data allows to get up-to-date information.
For security reasons or capabilities, e.g.
• I don’t trust this Web service/domain,
• I don’t have the right credentials to invoke it,
• It costs money,
• Maybe the receiver doesn’t know Active XML!
For performance reasons, e.g.
• A proxy can invoke all the services on behalf of a PDA.
… and many more reasons you can think of!
Omar Benjelloun – Active XML
20
How to control it? Using types
We extend XML Schema, with intensional types: XMLSchemaint
Sender
Capabilities
ACL
Cost
...
g
q
f
g
q
r
...
q
...
g
g
q
g
...
g
f
r
q
g
f
g
...
r
...
data
exchange
Schema
f
Receiver
Capabilities
ACL
Cost
...
...
...
Static analysis algorithms use signatures of services: WSDLint
Omar Benjelloun – Active XML
21
The extended schema language
To simplify, we use here a DTD-like syntax
Data:
newspaper
= title.date.(GetTemp|temp).(GetEvents|exhibit*)
title
= data
date
= data
temp
= data
city
= data
exhibit
= title.(GetDate|date)
newspaper
GetEvents
title
date
GetTemp
“Exhibits”
city
“06/10/2003”
“Le Monde”
“Paris”
Functions:
GetTemp(city)
-> temp
GetEvents(data)
-> (exhibit|performance)*
GetDate(title)
-> date
Rewriting: replace call(s) by an arbitrary output of the service.
Omar Benjelloun – Active XML
22
Rewritings
The Goal:
Given
• an intensional document d
• a schema s,
Can we rewrite d so that it matches s?
Safe rewriting: one that for sure leads to s
(we know without making any call).
Possible rewriting: one that may lead to s
(depending on the answers of services).
Omar Benjelloun – Active XML
23
Difficulties
Infinite search space
• Vertical
• Horizontal
Main problem
• The result of a Web service call is unknown,
• We just know a signature (input/output types)
We want a very efficient solution.
Foundations of the problem
• String & tree automata,
• with existential and universal transitions.
Omar Benjelloun – Active XML
24
Results
The general problem is undecidable [MSS03]
Restrictions on the considered rewritings
• Left-to-right: No “going back and forth”
• K-depth: bound on the nesting of function calls
(Search space still infinite but finitely representable)
Under these restrictions
• We have algorithms to find safe/possible rewritings.
• They are PTIME (for deterministic schemas).
• We can also do it between schemas.
Implementation
• demo at VLDB 2003 (customizable news syndication)
Omar Benjelloun – Active XML
25
Safe rewriting algorithm
Sketch
• Deal with function parameters first,
• Top-down traversal of the tree,
• For each data node:
– rewrite its children (viewed as a word),
– to match the target type (a regular expression)
– using regular automata techniques, and smart marking.
Omar Benjelloun – Active XML
26
Safe rewriting algorithm (2)
Build an FSA that
accepts all k-depth
rewritings of the
initial word.
q0
title
q1
date
q2
GetTemp

q3

q5
temp
q6
GetEvents


q7
exhibit
Aw1
Build an FSA that
recognizes the
complement of the
target type.
* title
p0
performance
* date
p1
* temp
p2
*GetEvents
p3
p4
exhibit
*
p5
A
exhibit
Omar Benjelloun – Active XML
q4
*
p6
*
27
Safe rewriting algorithm (3)
Compute the intersection of these languages:

performance
exhibit
q7,p6

q4,p6

GetEvents
q3,p6
q7,p6
GetTemp
q0,p0
title
q1,p1
date
q2,p2
q3,p3

q5,p2

temp
exhibit
exhibit
performance
q6,p3
GetEvents

q7,p3
q4,p3
q4,p4
A  Awk  A
A smart marking determines whether a safe rewriting exists.
Then run the word on the marked automaton to find an actual rewriting.
Optimization: lazy construction of the automata
Omar Benjelloun – Active XML
q4,p5
exhibit
performance


q7,p5
28
Active XML - Outline
Introduction
Active XML
•
•
Active XML documents
Active XML services
Novel issues
•
•
Exchanging Active XML data
Querying Active XML data (SIGMOD 2004)
Active XML Peers
•
•
•
The peer as a client
The peer as a server
Theoretical foundations
Applications
Conclusion
Omar Benjelloun – Active XML
29
Querying AXML Data
Given a (tree pattern) query:
/newspaper[temp > 18°C]/exhibits//exhibit[location=“Le Louvre”]
newspaper
Materialize the document?
Call only the services that may
data to the query answer.
exhibits
GetEvents
temp
GetTemp
contribute title
“Exhibits”
getDate
GetExhibits
city
“19°C”
City
“Paris”
“Le Monde”
“Paris”
The problem: Lazy evaluation of service calls
To call or not to call, this time when evaluating a query
Omar Benjelloun – Active XML
30
Lazy evaluation
Difficulties:
•
•
•
•
Calls can be found everywhere in the document
May appear dynamically (as a result of previous calls)
May become (ir)relevant due to previous invocations
Need to take signatures of calls into consideration
A possible approach: modify the query processor
•
•
•
Top-down evaluation
Trigger the calls found on the way
Not so great:
– Computation is blocked
– Optimization opportunities are lost
Omar Benjelloun – Active XML
31
Our solution
Given a query to evaluate:
newspaper
temp
> 18°C
exhibits
exhibit
location
“Le Louvre”
newspaper
Derive a set of
exhibits
“node-focused” queries (NFQ),
that find the relevant calls
when evaluated on the document.
temp
*
*
*
> 18°C
Need to be reevaluated, as the document evolves!
Omar Benjelloun – Active XML
Etc.
32
Optimizations
Service calls sequencing
•
•
Analysis of the relationship between calls (through the NFQ’s)
Layering, and parallelization inside each layer.
Refinement via type analysis
•
Matching output types of services with data expected of queries
“Pushing” queries to capable services
Acceleration:
•
•
Via relaxation:
– NFQ approximation
– Superset of the relevant calls
Via a special access structure, similar to a DataGuide:
– Restricted to paths that lead to service calls
– Indexes the calls
Experimental assessment
•
10x speed-up when combining optimizations
Omar Benjelloun – Active XML
33
Active XML peers
Omar Benjelloun – Active XML
34
Distributed data management in P2P
Web
service
XML
XML
AXML
services
XML
AXML
services
AXML
AXML
Web
AXML
XML
services
XML
XML
XML
XML
AXML
services
Omar Benjelloun – Active XML
AXML
Web
service
35
What do we need from an AXML system ?
Persistent, manageable, dynamic AXML data.
Easy ways to define services
Control of the exchanged data (parameters & results of service calls)
•
•
•
Repository: manages persistent AXML data
Client: uses (AXML) Web services
Server: provides AXML services
Omar Benjelloun – Active XML
soap
Peer-to-peer architecture, where each AXML peer:
AXML
peer
36
Global architecture
AXML peer S2
AXML peer S1
Query
engine
query
SOAP
AXML
engine
AXML
SOAP
wrapper
read
update
AXML
store
AXML
SOAP
service
descriptions
XML
XML
Omar Benjelloun – Active XML
AXML peer S3
SOAP
service
SOAP client
37
Implementation
SUN’s Java SDK 1.4 (includes XML parser, XPath processor, XSLT engine)
Apache Tomcat 4.1 servlet engine
Apache Axis SOAP toolkit 1.1
X-OQL query processor, persistent DOM repository
JSP-based Web user interface, using JSTL 1.0 standard tag library
Also, a lightweight implementation for PDA/phone (J2ME, CLDC profile),
used for [ABB03demo].
Omar Benjelloun – Active XML
38
Active XML - Outline
Introduction
Active XML
•
•
Active XML documents
Active XML services
New issues
•
•
Exchanging Active XML data
Querying Active XML data
Active XML Peers
•
•
•
The peer as a client
The peer as a server
Theoretical foundations
Applications
•
•
•
P2P auctions
News syndication
Other applications
Conclusion
Omar Benjelloun – Active XML
39
Managing persistent AXML data
“Our newspaper should have its temperature information
refreshed daily. New exhibits should be fetched every
week and archived for 6 months”
Service call results enrich the document
(calls can be kept for possible future reuse)
Main issues:
• When to activate a service call?
• What to do with its result?
Omar Benjelloun – Active XML
40
When to activate a service call?
Explicit pull mode
•
•
Daily, weekly, or after some event: e.g., when another call occurs
This aspect of the problem is related to active databases
Implicit pull mode
•
•
Detect which intensional information (the service calls) may
contribute to the answer of a query (lazy evaluation)
This aspect of the problem is related to deductive databases
Push mode
•
•
Based on a query subscription; the service provider pushes
information to the client (E.g., for synchronization purposes)
This is related to stream and subscription queries
Omar Benjelloun – Active XML
41
Managing service call results
How long does the returned data remain valid?
•
•
•
Just long enough to answer a query: Mediation
1 day, 1 week, … or unbounded: Caching / Warehousing
Various portions of the document may follow different policies: Hybrid
For repeated service call invocations: merge policy
•
•
•
•
append,
replace,
Fusion (using XML Schema-like keys),
Specific merge policies can be provided as Web services
Omar Benjelloun – Active XML
Example:
AXML document with control attributes
<?xml version=“1.0” ?>
<newspaper>
<title>Le Monde</title>
<date>06/10/2003</date>
<call svc=“Yahoo.GetTemp” mode=“lazy”
valid=“1 day”
merge=“replace” >
<city>Paris</city>
</call>
<call svc=“TimeOut.GetEvents” mode=“every Monday morning”
valid=“6 months”
merge=“append”>
exhibits
</call>
</newspaper>
Omar Benjelloun – Active XML
42
43
Active XML - Outline
Introduction
Active XML
•
•
Active XML documents
Active XML services
Novel issues
•
•
Exchanging Active XML data
Querying Active XML data
Active XML Peers
•
•
•
The peer as a client
The peer as a server
Theoretical foundations
Applications
Conclusion
Omar Benjelloun – Active XML
44
Declarative AXML services
Services can be defined by queries or updates over the AXML
documents of the repository (XQuery, XPath, Xupdate)
let service GetExhibitsByLocation($loc) be
for $a in document(“newspaper.xml")/newspaper/exhibits,
$b in $a//exhibit
where $b@name=$loc
return <exhibits> {$b} </exhibits>
Which (lazy) service calls may contribute to the answer?
Omar Benjelloun – Active XML
45
Other means to define services
Other programming languages:
• XSLT transformations (through Apache Xalan)
• Java classes (through Axis)
Composition of existing services:
• BPEL4WS (through IBM’s BPEL4J implementation)
Omar Benjelloun – Active XML
46
Active XML - Outline
Introduction
Active XML
•
•
Active XML documents
Active XML services
New issues
•
•
Exchanging Active XML data
Querying Active XML data
Active XML Peers
•
•
•
The peer as a client
The peer as a server
Theoretical foundations (PODS 2004)
Applications
Conclusion
Omar Benjelloun – Active XML
47
Theoretical foundations: Positive AXML
Restricted framework
• Data model
•
– set-based (unordered) AXML trees
– Call results are accumulated in documents
Services
– Monotone
– Positive: defined by conjunctive fragment of XQuery
Results
• Well-defined (possibly infinite) fix-point semantics
• Termination, lazy evaluation…
Connections to:
• Regular (infinite) trees, Query-Sub-Query [AM04],…
Omar Benjelloun – Active XML
48
Applications
Omar Benjelloun – Active XML
49
Demos
Peer-to-peer auctions
•
Discovery of new peers/auctions through intensional answers
RSS News syndication
•
(VLDB 2002 demo)
(VLDB 2003 demo 1)
Customization of services through schemas + news subscriptions
Distributed workspaces (VLDB 2003 demo 2)
Web warehousing
(ECDL 2003 demo)
A powerful framework for the fast development
of distributed, data-centric applications.
Omar Benjelloun – Active XML
50
Other applications
E.dot, a dynamic warehouse on food risk management
•
Use AXML as the platform for the warehouse definition,
construction and maintenance
Network configuration
•
Use AXML exchange of information to configure
hardware/software components
Software distribution
•
Use AXML to customize distributions and keep your view of
the software fresh
Decentralized user profile/patient data management
•
Use AXML to coordinate the integration of data, and privacy
enforcement services in a uniform way
Omar Benjelloun – Active XML
51
Conclusion
Omar Benjelloun – Active XML
52
AXML documents and services
A simple paradigm…
…that allows for new, powerful features.
•
•
•
Intensional parameters and results:
AXML documents can be exchanged
Support for continuous services (streams of answers)
Control over the exchange of AXML data
Issues
Control of call activation via typing, Lazy evaluation, Replication
and distribution, Security, Mobility, Termination, Implementation,
Foundations, …
Omar Benjelloun – Active XML
53
Current/Future work
Security and privacy (with Bell Labs)
Editor/browser plug-in for AXML
Mass storage XML DB (with Xyleme Corp.)
P2P infrastructure
…
Omar Benjelloun – Active XML
54
To know more…
http://purl.org/net/axml
•
•
•
Implementation becomes open-source
Already available for research
Will be released publicly very soon.
Selected publications
• S.Abiteboul, O. Benjelloun, T. Milo:
•
•
•
•
Positive Active XML, PODS, 2004.
S.Abiteboul, O. Benjelloun, B. Cautis, I. Manolescu, T. Milo, N. Preda:
Lazy Query Evaluation for Active XML, SIGMOD, 2004.
T. Milo, S. Abiteboul, B. Amann, O. Benjelloun, F. Dang Ngoc:
Exchanging Intensional XML Data, SIGMOD, 2003
(full version to appear in TODS).
S. Abiteboul, O. Benjelloun, I. Manolescu, T. Milo, R. Weber:
Active XML: A Data-Centric Perspective on Web Services (book chapter),
In Web Dynamics, Springer, 2004.
S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, T.Milo:
Dynamic XML Documents with Distribution and Replication, SIGMOD, 2003
Omar Benjelloun – Active XML
55
Merci
Omar Benjelloun – Active XML
56
Omar Benjelloun – Active XML
57
Extra slides
Omar Benjelloun – Active XML
58
Asynchronous/Continuous services
The client subscribes and then is notified
The server decides when to send data
• E.g., promotional offers
Change control:
• Management of replication [ABCMM03]
• What to do when a change is detected
– Send the new state of data
– Send the delta between old and new state
– Dual of merge policies
Omar Benjelloun – Active XML
59
Peer-to-peer auctions (VLDB 2002 demo)
Each peer proposes auctions:
• Document myauctions.xml with the Each peer knows about
peer’s items and their current bids
other peers’ auctions:
• Services offered:
• Document
– getLocalAuctions(),
– status(auctionId)
Each peer bids on auctions:
• Document mybids.xml with the
•
peer’s bids
Services offered:
– bid(peer,auctionId, amount)
– bidUpTo(peer, auctionId,
increment, limit)
Omar Benjelloun – Active XML
•
allauctions.xml contains
calls to other peers that
transitively retrieve their
known auctions.
Service offered :
getAllAuctions()
When an auction closes,
the winner is notified.
60
News syndication (VLDB 2003 demo)
News sources:
•GetStory(id)
•GetNewsAbout(kwd)
Aggregators:
•GetNewsAbout(kwd)
•…but several versions,
more or less
intensional
Clients:
•PC, laptops, PDAs
Omar Benjelloun – Active XML
61
Service customization using schemas
Customizing the output of services
•
•
•
News sources/aggregators provide different versions of
GetNewsAbout with different output schemas
The output is automatically transformed into the desired schema
Clients can also specify a desired output schema as a parameter
Customizing the input of services
•
•
Location-aware continuous services for mobile users
The context of the user is given by intensional parameters
Distributed logging mechanism
•
Also customizable through the use of schemas
Omar Benjelloun – Active XML
62
Call parameters
<temp>
<call svc=“GetTemp@weather.com”><city>“Denver”</city></call>
</temp>
XML
<temp>
<call svc=“GetTemp@weather.com”>../../city</call>
</temp>
XPath
<temp>
<call svc=“GetTemp@weather.com”>
<city>
<call svc=“GetCapital@us.gov”>“colorado”</call>
</city>
</call>
</temp>
AXML
To call or not to call (before invoking) ?
Omar Benjelloun – Active XML
Download