Data-driven web services: specification and verification Victor Vianu UC San Diego Web service: service hosted on the Web • Interactive, often data-driven: accesses an underlying database and interacts with users/programs according to explicit or implicit workflow • Complex services: Web service compositions peers communicating asynchronously • Complexity of workflow leads to bugs: see the public database of Web site bugs (Orbitz bug) • Static analysis required -behavior of individual peers -protocols of communication between peers -global properties Focus of this course Automatic Verification Understand when verification is possible, describe techniques and algorithms Our target: data-aware Web services Home page(HP) Namepage(HP) Home passwd Name passwd login login Error message page(MP) Error message page(MP) back back input cancel cancel Customer page(CP) Customer page(CP) Desktop My order Desktop laptop My order laptop Desktop Search(SP) Past Order (POP) Past Order Past Order(POP) Past Order database laptop Search(SP) Desktop search Desktop Search(SP) Desktop search laptop Search(SP) Desktop search Ram: Desktop search Ram: Ram: Hdd: Ram: Hdd: Hdd: Hdd: Display: search update state Display: search search search output query Order status(OSP) Order status(OSP) Order status Order status Cancel confirmation page(CCP) Cancel confirmation page(CCP) Product index page(PIP) Product index page(PIP) Matching products Matching products Product detail page(PP) Product detail page(PP) Product detail Product detail buy buy Confirmation page(CoP) Confirmation Order detailpage(CoP) Order detail Input Input Input Triggers state update and transition to new page Input Home Page(HP) Message Page (MP) NAME: PASSWD: back cancel login Message state update Customer Page(CP) laptop Laptop Search (LSP) desktop DB Desktop Search (DSP) RAM: CPU: SCREEN: RAM: CPU: submit submit Matching products Confirmation Details buy Product Index (PIP) Product Detail (PDP) print Confirmation (CoP) output Compositions of Web services Buyer Login page(BP) Name passwd login cancel Category choice page(CP) My order Desktop laptop Desktop Search(SP) Past Order (POP) Past Order User payment(UPP) laptop Search(SP) Desktop search Ram: Hdd: Desktop search Ram: Hdd: Display: search Credit Verification Payment CC No: Expire date search M submit Order status(OSP) Order status Cancel confirmation page(CCP) Product index page(PIP) Matching products Product detail page(PP) Product detail buy Confirmation page(CoP) Order detail Examples of Desirable Properties • Semantic properties – “no product is delivered before payment in the right amount is received" – “no user can cancel an order that has already been shipped” • Basic soundness of specification – “conditions guarding transition to next Web page are mutually exclusive” • Navigational properties – “the shopping cart page is reachable from any page” Expressed using temporal logics “An order is rejected in the next step only if it has already been ordered but not paid correctly in the current input” always at next step x G [ X reject-order(x) (past-order(x) y (pay(x,y) price(x,y)))] Outline • • • • Review of model checking Finite-state abstractions Data-aware abstractions The XML angle Disclaimer: a lot not covered here! • • • • • Process algebras, pi-calculus Situation calculus Petri nets Transaction logic Use of ontologies, description logics Finite-state abstractions of Web services* • Black box: input/output signature order bill Supplier delivery * curtesy of Rick Hull payment Finite-state abstractions of Web services • White box: internal logic order bill ?o !b delivery ?p !d payment Simplest: finite state Mealy machines Finite-state abstractions of Web services • White box: internal logic order bill ?o delivery payment Simplest: finite state Mealy machines • Interacting Web services: compositions P : finite set of Web services (peers) store authorize ok bank C : finite set of peer-topeer channels M : (finite) set of mesage classes supplier1 supplier2 Combining Peer and Composition Models store !a bank ?k supplier1 ?o1 !b1 ... ... ?a supplier2 ?o2 • Peer fsa’s begin in their start states !k ... ... Executing a Mealy Composition (cont.) store !a ?k supplier1 ?o1 !b1 ... ... a bank ?a supplier2 ?o2 !k ... ... • STORE produces letter a and sends to BANK Executing a Mealy Composition (cont.) store !a bank ?k supplier1 ?o1 !b1 ... ... • BANK consumes letter a ?a supplier2 ?o2 !k ... ... store !a r2 o1 ?k supplier1 ?o1 !b1 ... ... b2 b1 bank ?a o2 o2 r2 !k supplier2 ?o2 ... ... • Important parameters: --bounded or unbounded queues --open or closed system • Execution successful if all queues are empty and fsa’s in final state Verifying Temporal Properties of Mealy Compositions store !a bank ?k ... “line-ofcredit warehouse1 available” ?o1 !b1 ... ?a !k “shipment just made” ... warehouse2 ?o2 ... • Label states with propositions • Express temporal formulas in LTL, e.g., – “shipment just made” only after “line-of-credit avail” Results on Temporal Verification • Long history, see [Clarke et.al. ’00] Example: one fsa and propositional LTL – PSPACE in size of formula + fsa – linear time in size of fsa • Mealy compositions – Bounded queues • Composition can be simulated as Mealy machine • Verification is decidable • Standard techniques to reduce cost – Unbounded queues • In general, undecidable [Brand & Zafiropulo 83] Alternative: temporal property of sequence of exchanged messages warehouse1 payment2 bank bill2 order1 receipt1 store authorize ok warehouse2 “conversation” a k o1 o2 b1 p1 r1 r2 b2 p2 LTL properties: Every authorize followed by some bill? Examples: G(( [?o] [ !b]) X [ !b]) “if an order has been received but a bill not yet sent, then in the next state a bill has been sent” G( [ ?o] F( [ !b] )) “if an order has been received then eventually a bill will be sent” Conversation languages • Conversation: sequence of exchanged messages • Conversation language: set of all conversations between Mealy peers • Bounded queues: regular language • Unbounded queues !a ?b a ?a b !b p2 p1 • Conversation language L is not regular: L a*b* = { anbn | n 0 } • Conversation languages are context sensitive in fact, they are the quasi-realtime languages [Bultan+Fu+Hull+Su 03] Quasi-realtime languages • Accepted by non-deterministic multi-tape TM in linear time [Book+Greibach] • Also, smallest AFL containing the CFGs and closed under intersection AFL: families of languages containing the finite languages and closed under union, concatenation, +, non-erasing homomorphism, inverse homomorphism, intersection with regular languages Synthesis of compositions • Given a set of peers and communication channels with message names, and a constraint Φ on the conversation language, find Mealy automata for peers so that the constraint is satisfied. Bounded queues • Closed systems: PSPACE for LTL PTIME for ω-regular sets given by an automaton • Open systems: “game” against environment synthesis = finding winning strategy undecidable for arbitrary topology hierarchical topology: decidable [Kupferman + Vardi 01] but non-elementary even in linear case [Pnueli+Rosner] [ Unbounded queues • Open systems: undecidable • Closed systems: open Practical alternative: synthesizing hierarchical composition from “library” of services Travel Service Templates Air Travel Templates Airport Transfer Hotel Reservation Customized Travel Service More work on compositions • The Roman model sequence of papers by Berardi, Calvanese, Da Giacomo, Hull, Lenzerini, Mecella also Bultan, Dang, Fu, Hull, Ibara, Su see references. • Use of description logics and logic programming Beyond fsa: workflow+data Home page(HP) Namepage(HP) Home passwd Name passwd login login Error message page(MP) Error message page(MP) back back input cancel cancel Customer page(CP) Customer page(CP) Desktop My order Desktop laptop My order laptop Desktop Search(SP) Past Order (POP) Past Order Past Order(POP) Past Order database laptop Search(SP) Desktop search Desktop Search(SP) Desktop search laptop Search(SP) Desktop search Ram: Desktop search Ram: Ram: Hdd: Ram: Hdd: Hdd: Hdd: Display: search update state Display: search search search output query Order status(OSP) Order status(OSP) Order status Order status Cancel confirmation page(CCP) Cancel confirmation page(CCP) Product index page(PIP) Product index page(PIP) Matching products Matching products Product detail page(PP) Product detail page(PP) Product detail Product detail buy buy Confirmation page(CoP) Confirmation Order detailpage(CoP) Order detail • Abstract model: relational transducer input transducer state control db output Control: (input, state, db) (output, state) History: • Relational transducers [Abiteboul+Vianu+Fordham+Yesha 98] • ASM relational transducer [Spielmann 00] • Extended ASM transducer [Deutsch+Liying+Vianu 04] • Communicating transducers [Deutsch+Liying+Vianu+Zhou 06] History: • Relational transducers [Abiteboul+Vianu+Fordham+Yesha 98] • ASM relational transducer [Spielmann 00] • Extended ASM transducer [Deutsch+Liying+Vianu 04] • Communicating transducers [Deutsch+Liying+Vianu+Zhou 06] Target: “business models” in e-commerce High-level descriptions of protocols of interaction Transducer - SHORT % schema: relational signatures database: price: 2 input: order: 1, pay: 2 state: past-order: 1, past-pay: 2 output: send-bill: 2, deliver: 1 % state rules past-order(X) +:- order(X) past-pay(X,Y) +:- pay(X,Y) % output rules send-bill(X,Y) :- order(X), price(X,Y), past-pay(X,Y) deliver(X) :- past-order(X), price(X,Y), pay(X,Y), past-pay(X,Y) Run of a transducer input/output sequence input1 input2 input3 output1 output2 output3 Price Input Sequence order(Time) order(Newsweek) order(People) order(Le Monde) pay(Time,55) pay(Newsweek,48) send-bill(Time,55) send-bill(Newsweek,45) Output Sequence Newsweek Time Le Monde 45 55 350 pay(Newsweek,45) deliver(Time) deliver(Newsweek) send-bill(Le Monde,350) What to verify: • Goal reachability is it possible to reach a goal (output)? Example: is it possible to receive a tax refund? • Temporal properties of all runs Example: no product can be delivered unless it has been paid • Comparing transducers -- Equivalence: same runs -- Containment: every run of T1 is a run of T2 • Interacting transducers Overall consistency Example: T1 requires payment before delivery T2 requires delivery before payment deadlock situations! • Generally, undecidable Tractable restriction: Spocus Transducer Semi-Positive Output, CUmulative State • State rules: past-R( x ) + R(x) • Output rules: A0 A1 … Ai … An Rinput A0 : output relation R(x) Ai : input, database, or state literals, safe negation % schema database: input: state: output: % state rules past-order(X) past-pay(X,Y) % output rules send-bill(X,Y) deliver(X) Transducer - SHORT price order, pay past-order, past-pay send-bill, deliver +:- order(X) +:- pay(X,Y) :- order(X), price(X,Y), past-pay(X,Y) :- past-order(X), price(X,Y), pay(X,Y), past-pay(X,Y) What can be verified • Goal reachability goal: x ( A1 … An) Ai : () R( y ) where R is an output relation, safe negation Example: is it possible to reach deliver(x)? • Temporal properties of runs class T of sentences: x φ( x ) where φ is a Boolean combination of output, state, and database literals Example: x y [(deliver(x) price(x,y)) past-pay(x,y)] • Complexity: NEXPTIME Proof: Reduction to finite satisfiability of FO sentences in the Bernays-Schönfinkel prefix class ** FO Comparing Spocus transducers • Motivation: customization, negotiation, etc. • Need to distinguish significant events from “syntactic sugaring” • Notion of log: semantically significant input and output relations % schema database: input: state: output: Transducer SHORT price order, pay past-order, past-pay send-bill, deliver % state rules past-order(X) +:- order(X) past-pay(X,Y) +:- pay(X,Y) % output rules send-bill(X,Y) :- order(X), price(X,Y), past-pay(X,Y) deliver(X) :- past-order(X), price(X,Y), pay(X,Y), past-pay(X,Y) Price Input Sequence order(Time) order(Newsweek) order(People) order(Le Monde) pay(Time,55) pay(Newsweek,48) send-bill(Time,55) send-bill(Newsweek,45) Output Sequence Newsweek Time Le Monde 45 55 350 pay(Newsweek,45) deliver(Time) deliver(Newsweek) send-bill(Le Monde,350) % schema database: input: state: output: Transducer Friendly price order, pay, pending-bills past-order, past-pay, past-pending-bills send-bill, deliver, reject-pay, already-paid, re-bill % some additional output rules re-bill(X,Y) :- pending-bills(X), past-order(X), price(X,Y), past-pay(X,Y) already-paid(X) :- pay(X,Y), past-pay(X,Y) reject-pay(X) :- pay(X,Y), past-order(X) % schema database: input: state: output: log: Transducer Friendly price order, pay, pending-bills past-order, past-pay, past-pending-bills send-bill, deliver, reject-pay, already-paid, re-bill order, pay, send-bill, deliver % some additional output rules re-bill(X,Y) :- pending-bills(X), past-order(X), price(X,Y), past-pay(X,Y) already-paid(X) :- pay(X,Y), past-pay(X,Y) reject-pay(X) :- pay(X,Y), past-order(X) • Log of a run: restriction to logged events • Valid log of a transducer: log of a run • Transducer containment relative to a log: every valid log of T1 is also a valid log of T2 Containment of Spocus transducers: undecidable. Proof: reduction of implication of FDs and IDs • Useful special case: customization: “T2 is an extension of T1” -- input(T1) input(T2) -- input(T1) log(T1) = log(T2) • Faithful extension T2 of T1 : T2 contained in T1 Example: Short vs. Friendly • Decidable for Spocus tranducers in NEXPTIME Proof: Bernays-Schönfinkel Controlling Input Sequences • Example: order(x) must be input before pay(x) • Use a distinguished error output: error pay(x), not past-order(x) Valid runs: error free Power of error-free runs • Propositional output transducer: -- propositional outputs -- one proposition output at each step • Generror-free(T): words output by T on finite runs Theorem: A language L equals Generror-free(T) for some propositional-output Spocus transducer T iff L is a prefix-closed r.e. language Temporal properties of error-free runs • A useful class can be enforced class Terror-free of sentences: x [φ(state,db,in)(x) ψ(state,db,in)(x)] -- φ(state,db,in)(x): conjunction of literals with safe negation -- ψ(state,db,in)(x): positive formula Examples • xy [(past-order(x) price(x,y) past-pay(x,y)) (pay(x,y) cancel(x))] • xy [pay(x,y) • x [cancel(x) (price(x,y) past-order(x))] past-order(x)] Theorem: for every formula φ in Terror-free there exists a Spocus transducer T whose error-free runs are precisely those satisfying φ. Verification: • Undecidable if all error-free runs of a Spocus transducer T satisfy given φ in Terror-free • Decidable for transducers T without negative state literals in error-generating rules Similar results for containment: • Undecidable if every error-free run of T1 is also an error-free run of T2 • Decidable if T1 and T2 have same schema and full log, and no negative state literals in error-generating rules Conclusion on Spocus transducers • Reasonable for simple business models • Too restricted for more intricate Web service workflows • Limited class of temporal properties verified • No compositions Target: more sophisticated Web services • Specified using high-level tools Example: WebML • Single peers and compositions Input Input Input Triggers state update and transition to new page Input Home Page(HP) Message Page (MP) NAME: PASSWD: Message back High-level WebML-style statetools update specification cancel login Customer Page(CP) laptop Laptop Search (LSP) desktop DB Desktop Search (DSP) RAM: CPU: SCREEN: RAM: CPU: submit submit Matching products Confirmation Details buy Product Index (PIP) Product Detail (PDP) print Confirmation (CoP) output Model for data-driven Web services WebML-style A web service W is a set of web page schemas as well as – global database D (fixed during a session) – global state relations S (updatable) – input relations I and output relations O – home page W0 – error page We Webpage Schema • Input options (user may pick at most one) Query(DB, State, Prev-input) • Output and Transitions (triggered by user input) Output: Next Webpage: Query(DB, State, Input, Prev-input) Query(DB, State, Input, Prev-input) • State updates (triggered by user input) Insertions/deletions: Query(DB, State, Input, Prev-Input) “Query”: First Order Logic formula (core SQL) Details on Modeling the Input • input options as provided by menus, pull-down lists, buttons, HTTP links: user must choose one • input constants model text input boxes: name, password, etc. • input I at previous step: prev-I can be viewed as special state Home Page(HP) Message Page (MP) NAME: PASSWD: login Message back input constant cancel Customer Page(CP) input laptop constant desktop Home page(HP) Laptop Search (LSP) Desktop Search (DSP) Input: name, password , clickbutton(x) RAM: RAM: CPU: SCREEN: CPU: Input options: clickbutton(x) (x= “login” or x = “cancel”) submit submit State update: error("bad user") not users(name,password) and clickbutton("login") state table DB table Matching productsrules: Page Transition CP users(name,password ) and clickbutton(“login”) Confirmation Details MP not users(name,buy password) and print clickbutton("login") next page Product Index (PIP) Product Detail (PDP) Confirmation (CoP) Home page(HP) Name passwd login cancel Customer page(CP) Error message page(MP) back My order Desktop Search(SP) Past Order (POP) Past Order Desktop laptop laptop Search(SP) Desktop search Desktop search Ram: Ram: Page: Past Order (POP) Hdd: Hdd: Display: search search Input: Input Rules: Order status(OSP) Order status State Rules: Cancel confirmation page(CCP) Clickbutton(x), Opick(oid, pid, price, status) Opick(pid,price,status) Product index page(PIP) order-db(name, pid, price, status); Matching products Clickbutton(x) x="view cart" or x="logout" or x="continue shopping" or x="back" Userorderpick(name,pid,price, status) Opick(pid,price,status) Product detail page(PP) Product detail Target Webpages: OSP, CP, HP,CC buy Target Rules: Confirmation page(CoP) OSP Opick(pid,price,status); HP Clickbutton("logout"); Order detail ........... Home Page(HP) Product (PIP) Message Info Page PageNAME: PASSWD: (MP) Input: pick(pid,price) Message Input options: back cancel login Customer Page(CP) pick(pid,price) ram laptop cpu desktop prev-search(ram,cpu) catalog(pid,ram,cpu,price) Laptop Search (LSP) RAM: previous CPU: SCREEN: Desktop Search (DSP) input RAM: CPU: db table submit submit Matching products Confirmation Details buy Product Index (PIP) Product Detail (PDP) print Confirmation (CoP) Web service run • Fixed database • Sequence of inputs, states and outputs • Input constants are given values throughout the run Inconsistencies error page Home Page(HP) Message Page (MP) NAME: PASSWD: Message back High-level WebML-style state update specification cancel login Customer Page(CP) laptop Laptop Search (LSP) desktop Desktop Search (DSP) RAM: CPU: SCREEN: RAM: CPU: submit submit Matching products Confirmation Details buy Product Index (PIP) Product Detail (PDP) print Confirmation (CoP) Web application code Examples of Desirable Properties • Semantic properties – “no product is delivered before payment in the right amount is received" – “no user can cancel an order that has already been shipped” • Basic soundness of specification – “conditions guarding transition to next Web page are mutually exclusive” • Navigational properties – “the shopping cart page is reachable from any page” Approach to verification • Define an extended relational transducer • Show that every Web service spec and temporal property to be verified can be translated efficiently into an extended relational transducer spec and a corresponding property Main result restrictions leading to PSPACE decidability of LTL properties Several versions: • Relational transducers [Abiteboul+Vianu+Fordham+Yesha 98] • ASM relational transducer [Spielmann 00] • Extended ASM transducer [Deutsch+Liying+Vianu 04] • Communicating transducers [Deutsch+Liying+Vianu+Zhou 06] Extended relational transducer: reactive system with FO control input state control: FO Single peer db output Control: (input, state, db) (output, state) input state control: FO Single peer db output Control: (input, state, db) (output, state) Single peer state FO query db Input options user choice Single peer state FO query db Input options user choice Single peer state FO queries db output Technical point: queries can also refer to k previous inputs Configurations and runs input Configuration state db output Run: infinite sequence of consecutive configurations Language for properties of runs: LTL-FO FO + LTL operators + Boolean operators • Start with FO formulas referring to the states, db, inputs, top and last message of queues in current configuration FO components • Apply Boolean and LTL operators: X, U, F, G, B • All remaining free variables are universally quantified x (x) Example of LTL-FO formula “An order is rejected in the next step only if it has already been ordered but not paid correctly in the current input” x G [ X reject-order(x) (past-order(x) y (pay(x,y) price(x,y)))] FO components p: reject-order(x) q: (past-order(x) y (pay(x,y) price(x,y))) LTL formula over p, q: G[Xp q] Example Property “any shipped product must be previously paid for” pid, uname,price [(pid, uname, price) B Ship(uname, pid)] output Where (pid, uname, price) is the formula input pay(price)picked(uname, pid, price) prod-price(pid, price) state database The Verification Problem Given transducer M and LTL-FO property Decide if every run of M satisfies . If not, exhibit a counterexample run. Challenge: infinite-state system! Typical approaches in Software Verification are unsatisfactory: – Model checking: developed for finite-state systems described by propositional states. More expressive specifications first abstracted to propositional ones. Unsatisfactory: can check that some payment occurred before some shipment, but not that it involved the correct amount and product. – Theorem proving: no completeness guarantees, not autonomous. Prover requires expert guidance . Alternative approach: identify a restricted but reasonably expressive class of transducers that can be verified Main restrictions for decidability guarded quantification guarded quantification: quantified variables must appear in input (or prev-I) atoms “input boundedness” earlier variant: Spielmann pick(pid,price) ram cpu prev-search(ram,cpu) catalog(pid,ram,cpu,price) Input-bounded transducer • State, action, and transition rules use FO conditions with “input-bounded” quantification: x ( input(- x- ) φ( x )) x ( input(- x- ) φ( x )) state atoms have no quantified variables prev-I can also serve as guard • Input options definitions: *FO (db, prev-input, ground state atoms) Home page(HP) Name passwd login cancel Customer page(CP) Error message page(MP) back My order Desktop Search(SP) Home page(HP) Past Order (POP) Past Order Desktop laptop laptop Search(SP) Desktop search Desktop search Ram: Ram: Hdd: Hdd: Input: name, password , clickbutton(x) search Display: search Input options: clickbutton(x) (x= “login” or x = “cancel”) Product index page(PIP) user") not users(name,password) and State update: error("bad Order status(OSP) clickbutton("login") Order status Matching products Transition rules: HP clickbutton(“cancel”) Cancel confirmation page(CCP) Product detail page(PP) CP user( name,password ) and clickbutton(“login”) Product detail MP not users(name, password) and clickbutton("login") buy Confirmation page(CoP) Order detail Home page(HP) Name passwd login cancel Customer page(CP) Error message page(MP) back My order Desktop Search(SP) Past Order (POP) Past Order Desktop laptop laptop Search(SP) Desktop search Desktop search Ram: Ram: Page: Past Order (POP) Hdd: Hdd: Display: search search Input: Input Rules: Order status(OSP) Order status State Rules: Cancel confirmation page(CCP) Clickbutton(x), Opick(oid, pid, price, status) Opick(pid,price,status) Product index page(PIP) order-db(name, pid, price, status); Matching products Clickbutton(x) x="view cart" or x="logout" or x="continue shopping" or x="back" Userorderpick(name,pid,price, status) Opick(pid,price,status) Product detail page(PP) Product detail Target Webpages: OSP, CP, HP,CC buy Target Rules: Confirmation page(CoP) OSP Opick(pid,price,status); HP Clickbutton("logout"); Order detail ........... Home Page(HP) Product (PIP) Message Info Page PageNAME: PASSWD: (MP) Input: pick(pid,price) Message Input options: back cancel login Customer Page(CP) pick(pid,price) ram laptop cpu desktop prev-search(ram,cpu) catalog(pid,ram,cpu,price) Laptop Search (LSP) RAM: previous CPU: SCREEN: Desktop Search (DSP) input RAM: CPU: db table submit submit Matching products Confirmation Details buy Product Index (PIP) Product Detail (PDP) print Confirmation (CoP) Input-bounded LTL-FO property: FO components are input bounded “An order is rejected in the next step only if it has already been ordered but not paid correctly in the current input” x G [ X reject-order(x) (past-order(x) y (pay(x,y) price(x,y)))] Main verification result Theorem: It is decidable whether an inputbounded extended relational transducer satisfies an input-bounded LTL-FO property. Complexity: PSPACE-complete for bounded arity schemas, EXPSPACE otherwise Tightness: even small extensions lead to undecidability • Relaxing the requirement that state atoms must be ground in formula defining the input options. Reduction: Does TM halt on input epsilon? • Lifting the input-bounded requirement by allowing state projection. Reduction: Implication for FDs and IDs • Allowing Prev-I to record all previous input to I rather than the most recent one. Reduction: Trakhtenbrot’s Theorem • Extend the FO-LTL formulas with path quantification. Reduction: validity of **FO formulas Expressivity of input-bounded specs Significant parts of the following Web applications could be modeled: • • • • Dell-like computer shopping website Expedia Barnes&Noble GrandPrix motor sports Web site See demo site http://www.db.ucsd.edu PSPACE verification: outline for single peer To check that M satisfies , verify that there is no run satisfying Recall model checking approach (finite-state): • Build Büchi automaton B( ) for • Build Kripke model K for all runs • Check that there is no counterexample run: emptiness of K B( ) Our case: infinite-state system Same idea: build B( ), then search for counterexample runs accepted by B( ) But: no finite Kripke model K for the runs! Problem in searching for counterexample runs: infinite runs infinitely many underlying databases How to limit the search space? Infinite search space for runs number of underlying DBs .. . ... ... ... ... ... ... ... ... ... length of run Bounding the search for counterexample runs double-exponentially many DBs number of underlying DBs ... ... ... Sufficient to consider only DBs over a fixed domain of cardinality exponential in size of spec + prop ... ... Finite search space yields decidability of verification ... ... ... Periodic runs suffice: counterexample iff periodic one ... doubly-exponential length in size of spec+prop length of run Key insight for PSPACE complexity • No need to explicitly materialize entire configuration: • Instead, at each step construct only those portions of DB, states and outputs which can affect property. • Call them pseudoconfigurations. Pseudoconfigurations C = a set of relevant constants extracted from the spec. and prop. + a fixed number of variables restriction of outputs to constants in C input picked from C input restriction of states to constants in C output S restriction of DB to C Size polynomial in spec + prop Pseudoruns pseudorun ... DB DB DB counterexample run iff counterexample pseudorun Pseudoruns pseudorun ... DB DB DB • Can compute next possible pseudoconfigurations from current one Pseudoruns pseudorun ... DB DB DB • Can compute next possible pseudoconfigurations from current one • Never construct entire DB, just “slide” poly window over it PSPACE verification algorithm “Relevant” constants in C • constants explicitly used in spec • witnesses for existentially quantified variables in Example: = x G [ X reject-order(x) (past-order(x) y (pay(x,y) price(x,y)))] = x G [ X reject-order(x) (past-order(x) y (pay(x,y) price(x,y)))] Replace x by non-deterministically chosen constant c, continue with new formula G [ X reject-order(c) (past-order(c) y (pay(c,y) price(c,y)))] Variables in C Input values that are not “relevant constants” Why don’t we need infinitely many? Variables in C Input values that are not “relevant constants” k Can use just 2k variables because only k values can be passed via prev-input between configurations Implementation so far WAVE: verifier for single Web service peer [SIGMOD’05] • Essentially implements search for a counterexample pseudorun • Many tricks and heuristics to achieve good verification times Some techniques • Dataflow analysis to identify all constants to which a DB attribute may be compared (directly or indirectly). Limits the relevant combinations of constants when constructing partial DBs. Spectacular reduction: for the computer shopping website, from 2^(17,270,412,688) partial DBs to 8 ! • Internal representation of pseudoconfigs to – Efficiently detect loop in periodic run – Efficiently evaluate queries • Early pruning of pseudoruns Experimental Evaluation of WAVE Tool • Online Demo at http://www.db.ucsd.edu/ • Evaluated experimentally on 4 Web applications: – Dell-like computer shopping – Part of Expedia, Barnes&Noble, GrandPrix • Verification times for a battery of properties: all within seconds, below one minute. • Here, report only Dell experiment. All others are similar. Some of the Verified Properties Property type Property name Time (seconds) Sequence pBq P5 (true) P7 (true) 4 2 Session Gpafter Gq Shipment only proper payment Correlation Fp Fq P9 (true) 1 P10 (true) P11 (false) P12 (true) P13 (false) 0.23 0.29 0.6 0.44 Response p Fq P14 (false) 0.19 Reachability Gp or Fq P2 (true) P3 (false) 0.9 0.37 Recurrence G(Fp) P17 (false) 0.15 Strong non-progress F(Gp) P15 (false) 0.26 Weak non-progress G(pXp) P6 (false) 0.49 Guarantee Fp P1 (true) P8 (false) 0.02 0.11 Failure of Classical Tools • SPIN model checker Abstraction is unsatisfactory. Alternative trick: Try to use SPIN to verify pseudoruns. The resulting SPIN input is too large to handle. • PVS theorem prover Not guaranteed to find a counterexample. Gets stuck during search, asks for guidance from expert user. Branching-time temporal properties homepage Current state Need path quantifiers “At any point in a run, there is a way to return to the home page” Verification results for CTL(*) • Propositional Web services: --states and actions are propositional --prev-I atoms are disallowed • CTL* formulas using state, action, input, and Web page symbols interpreted as propositions Examples Navigational properties • “One can reach the home page from any page” AGEF( Home-Page) • “After login, the user can reach a page where he can authorize payment” AG(Home-Page button(“login”) EF(button(“authorize payment”))) • Verification of CTL(*) formulas for propositional Web services: --CO-NEXPTIME for CTL --EXPSPACE for CTL* Proof idea: (i) show that there is a bound on the databases that need to be considered in order to detect a violation; (ii) for a fixed database, reduce checking violation to model checking for a Kripke structure generated from the database. Getting down to PSPACE: • Navigational properties: CTL* formulas using only Web page symbols • Fully propositional Web services: inputs are also propositional Proof technique: highly efficient model-checking technique of Kupferman, Vardi, Wolper using hesitant alternating tree automata (HAA). Reduce to checking non-emptiness of a one-letter word HAA. Alternative restriction: Web services with “input-driven search” • Propositional states and actions • Inputs are monadic, propagated to next Web page as a parameter using prev-I atoms Example: allows conducting an user-driven search going through consecutive stages of refinement (each choice triggers new set of options) For Web services with input-driven search: CTL formulas can be verified in EXPTIME CTL* formulas can be verified in 2-EXPTIME EXPTIME for fixed out-degree of input choice Proof: reduce to satisfiability of CTL(*) formulas by a Kripke structure Compositions of Web services Buyer Login page(BP) Name passwd login cancel Category choice page(CP) My order Desktop laptop Desktop Search(SP) Past Order (POP) Past Order User payment(UPP) laptop Search(SP) Desktop search Ram: Hdd: Desktop search Ram: Hdd: Display: search Credit Verification Payment CC No: Expire date search M submit Order status(OSP) Order status Cancel confirmation page(CCP) Product index page(PIP) Matching products Product detail page(PP) Product detail buy Confirmation page(CoP) Order detail Examples of Composition Properties • “every payment request by a user results eventually in an approval or denial output to the user” • “the answer to every credit check request message for a user is a credit rating message poor, fair, or good, for the same user” • “for every two consecutive credit rating messages for the same user user there exists an intermediate credit request message for that user.” • Communicating peers: composition • channels between peers • message: finite relation (set or singleton) • one FIFO queue at recipient of each channel More on messages • • • • Flat message: single tuple Nested message: finite set of tuples Messages queued at recipient Message contents: !M(x) :- query(db, state, input, in-messages) Flat messages: query may generate several tuples, choose non-deterministically one to be sent Peers with messages input incoming messages Single peer FO control state output db outgoing messages Control: (input, in-messages, state, db) (output, out-messages, state) Configurations and runs incoming message queues input Configuration of a single peer state db output Configuration of a composition: member peer configurations Transitions: one peer at a time Configuration of a composition: member peer configurations Transitions: one peer at a time Configuration of a composition: member peer configurations Run:Transitions: infinite sequence one peer of consecutive at a time configurations Verification of compositions: main restrictions for decidability input boundedness of rules + bounded queues, lossy channels Input-bounded compositions • State, output, and nested message rules use FO formulas with guarded quantification: x ( guard(- x- ) φ( x )) x ( guard(- x- ) φ( x )) where guard is an input or flat message atom and state and nested message atoms in φ have no quantified variables • Input options and flat message definitions: *FO formulas with ground state and nested message atoms Verification of compositions Reduce to single peer verification • Reduction applies to input-bounded compositions with bounded, lossy channels • Flat message queues simulated by inputs • Nested message queues simulated by states • Non-deterministic choice of peer at each transition simulated with additional input • Some tricky timing issues in translation of property PTIME reduction preserving input boundedness Main verification result Theorem: It is decidable whether an inputbounded composition with bounded queues and lossy channels satisfies an input-bounded LTL-FO property. Complexity: PSPACE-complete for bounded arity schemas, EXPSPACE otherwise Examples of extensions leading to undecidability • Allowing perfect flat queues (perfect nested queues are ok) • Disallowing non-deterministic choice for flat messages Reduction: Post Correspondence Problem Additional verification problems • Conversation protocols sequences of messages observed in runs data-agnostic: message parameters ignored data-aware: parameters taken into account • Modular verification specs of some peers not available information limited to input/output behavior Verification of conversation protocols • data-agnostic protocol: Büchi automaton over alphabet of message names • Possible semantics with lossy channels: observer-at-recipient observer-at-source Theorem: Theorem: ItItisisPSPACE-complete undecidable if an input-bounded if an input-bounded composition composition with with bounded, bounded, lossy lossy channels channels satisfies satisfies aadata-agnostic data-agnostic conversation protocol with observer-at-recipient observer-at-source semantics semantics Verification of conversation protocols • Similar results for data-aware protocols: formalized as Büchi automaton whose alphabet is a finite set of FO formulas on message relations G( get-rating(x) B rating(x,y) ) Theorem: It is PSPACE-complete if an input-bounded composition with bounded, lossy channels satisfies a data-aware conversation protocol with observer-at-recipient semantics Modular verification ? ? Black box peers: input-output behavior Modular verification Environment Environment specification: LTL-FO description of input and output messages Properties under given environment Composition C satisfies LTL-FO property under environment specification : every run of C in which messages to/from the environment satisfy and use values from some finite domain, satisfies Verification under given environment Additional restriction needed for decidability LTL-FO property is strictly input-bounded if its FO components have no free variables Example: G ssn [ ?getRating(ssn) (!rating(ssn, “poor”) !rating(ssn, “fair”) !rating(ssn, “good”))] Verification under given environment Theorem: It It is is PSPACE-complete undecidable if an input-bounded Theorem: if an input-bounded composition C C with with bounded bounded queues queues and and lossy lossy channels channels composition satisfies an an input-bounded input-bounded LTL-FO LTL-FO property property under under satisfies an input-bounded but not strictly-input-bounded a strictly-input-bounded environment specification environment specification Putting the pieces together WebML-style spec of Web service composition PTIME peer composition spec PTIME single peer spec The XML angle • Black box: input/output signature order bill Supplier delivery payment The XML angle • Black box: input/output signature XML type order bill XML type Supplier XML type delivery payment XML type • Interacting Web services store supplier1 authorize ok bank supplier2 • Interacting Web services XML types authorize ok store XML types bank XML types supplier1 XML types supplier2 Active XML • Full integration of XML with Web service calls • Service calls embedded in XML documents • Static analysis: typechecking, type casting, optimization, materialization policy • More later! Quick XML Review • XML document (ignoring PC data): labeled, unranked, ordered tree root section section intro section conc intro section intro section conc intro conc intro conc conc • XML type: XML Schema • XML query language: XQuery Tools in static analysis: connections to tree automata and tree transducers Simple XSchema: Document Type Definition (DTD) : alphabet of element names, root set of rules: e element name r regular expression over Documents satisfying a DTD root …. e e e1 …. r ek r Set of trees satisfying DTD d: Sat(d) Example A DTD and a tree satisfying it: root section section*; intro, section*,conclusions; root section section intro section conc intro conc intro section section conc intro conc intro conc Essential enhancement of XML Schema: specialization dealer used new car car model year model car has different structure in different contexts Specialization dealer used carused model year new carnew model car has different structure in different contexts Specialized DTD dealer used new carused carnew used new (carused )* (carnew)* model year model (year | ) dealer dealer used new used new carused carnew car car model year model model year model Tool in static analysis: Powerful connection to tree automata! Tree automata States: p,q,r, … Start state: q0 Final state: qf root a a c b c c b c a a Transitions: b a b p a a b Tree automata States: p,q,r, … Start state: q0 Final state: qf root a a c b c c b c a a Transitions: b a b p a q r Tree automaton computation root a a c b c c b c a a b a b a Tree automaton computation q0 a a c b c c b c a a b a b a Tree automaton computation q0 p a c q c c b c a a b a b a Tree automaton computation q0 p r c q p c b q a a p a b a Tree automaton computation q0 p r Acceptance: there is a computation such that all leaves are labeled qf q p q p qf qf qf qf qf qf qf qf Set of accepted trees: regular tree language Tree automata variants • Top-down nondeterministic • Bottom-up (non)-deterministic • Top-down deterministic (weaker) Examples • regular: set of trees representing Boolean circuits evaluating to true 0 1 1 0 0 1 0 1 • not regular: equal numbers of a and b nodes Regular: definable in Monadic Second-Order Logic Theorem: XMLSchemas define regular languages of unranked trees Benefits: • Algorithms for validation wrt XMLSchemas, testing inclusion of XMLSchemas, etc. • Static analysis Query languages for XML • Standard: XQuery integrates previous paradigms • One paradigm (also XML-QL, Lorel) – extract bindings for variables using patterns – construct answer from bindings • Another paradigm (also XSLT, UnQL) – structural recursion Basic form of XML-QL query WHERE pattern(X,Y,…) CONSTRUCT answer(X,Y,…) root pattern: p X p,q,r,s: regular path expressions q Y r Z s T Basic form of XML-QL query WHERE pattern(X,Y,…) CONSTRUCT answer(X,Y,…) root pattern: p X p,q,r,s: regular path expressions q Y r Z=5 s T Basic form of XML-QL query WHERE pattern(X,Y,…) CONSTRUCT answer(X,Y,…) root pattern: p X r Z=5 = p,q,r,s: regular path expressions q Y s T Basic form of XML-QL query WHERE pattern(X,Y,…) CONSTRUCT answer(X,Y,…) answer: f(X), g(X,Y) Skolem functions root … f(X) … g(X,Y) Basic form of XML-QL query WHERE pattern(X,Y,…) CONSTRUCT answer(X,Y,…) answer: f(X), g(X,Y) Skolem functions root … f(X) … Basic form of XML-QL query WHERE pattern(X,Y,…) CONSTRUCT answer(X,Y,…) answer: f(X), g(X,Y) Skolem functions root f(X1) …f(Xi) …f(Xn) Basic form of XML-QL query WHERE pattern(X,Y,…) CONSTRUCT answer(X,Y,…) answer: f(X), g(X,Y) Skolem functions root … f(X) … g(X,Y) Basic form of XML-QL query WHERE pattern(X,Y,…) CONSTRUCT answer(X,Y,…) answer: f(X), g(X,Y) Skolem functions root … f(X) … g(X,Y1)…g(X,Yk) Example: art exhibits root exhibit title museum DTD: exhibit* title. museum*. review* artist* address. dates._* WHERE CONSTRUCT root exhibit X1 title museum X2 artist X4 = Vermeer X3 Vermeer’s exhibits title(X2) museum(X2,X3) * X5 X5(X2,X3,X5) Q(X2) Nested query Q(X2): WHERE root exhibit Y1 title review X2 Y2 CONSTRUCT Vermeer’s-reviews(X2) review(X2 ,Y2) A Different Paradigm: Structural Recursion • Example: change all name tags occurring under person nodes to pname a • Notation: a(t1,t2) denotes the tree t1 • Transformation defined by f: f(x(t1,t2)) = if x = person then x (g(t1), g(t2)) else x (f(t1), f(t2)) g(x(t1,t2)) = if x = name then pname (g(t1), g(t2)) else x (g(t1), g(t2)) t2 Connection to tree transducers k-pebble transducers [Milo+Suciu+V.-] • subsume core of XQuery • useful for static analysis k-pebble transducers – States, pebbles, transition rules – Control: alternating k pebbles: 1 2 k k-pebble transducers stack discipline for pebbles i 2 1 k-pebble transducers stack discipline for pebbles 3 pebbles: 1 2 3 k-pebble transducers 1 stack discipline for pebbles 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 1 1 3 pebbles: 1 2 3 k-pebble transducers 2 stack discipline for pebbles 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 2 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 2 2 1 1 3 pebbles: 1 2 3 k-pebble transducers 3 stack discipline for pebbles 2 3 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 3 2 3 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 23 3 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 2 3 1 2 3 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 23 3 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 3 2 3 2 1 1 3 pebbles: 1 2 3 k-pebble transducers 3 stack discipline for pebbles 2 3 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 2 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 2 2 1 1 3 pebbles: 1 2 3 k-pebble transducers 2 stack discipline for pebbles 2 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 1 1 3 pebbles: 1 2 3 k-pebble transducers stack discipline for pebbles 1 1 3 pebbles: 1 2 3 k-pebble transducers 1 stack discipline for pebbles 1 3 pebbles: 1 2 3 Output-producing transitions root Output-producing transitions root 2 1 3 2 1 3 Output-producing transitions root 2 2 3 1 1 3 Output-producing transitions root 32 1 2 1 Output-producing transitions root 2 3 1 2 1 Output-producing transitions root 32 1 1 2 Output-producing transitions root a 32 32 1 2 1 1 Output-producing transitions root a b 2 3 1 1 2 Output-producing transitions root a c b 2 1 3 Output-producing transitions root a b c a Output-producing transitions root a c b a b Output-producing transitions root a c b a a b Output-producing transitions root a c b a a b c a Output-producing transitions root a c b c a a b c a a XML Query languages vs. k-pebble transducers k-pebble transducers subsume the tree manipulation core of XQuery Application: typechecking output type β Tree transducer T XML query output type Web service A output type α Web service B output type Web service C Need to check: T(α) β Equivalently: α T-1(β) Theorem: T-1(β) is a regular tree language [Milo+Suciu+V.-] Typechecking: 1. Compute from T and β the tree automaton for T-1(β) 2. Check that α T-1(β) Caveat: no data joins Example: application to matching for service composition Web service A output type α Can output of A be restructured to fit the input type β of B? Web service B input type β Key: describe allowed restructurings by a nondeterministic transducer T • Given output I of service A, check whether T(I) β ≠ by constructing a tree automaton for T(I) • If yes, produce as side effect a minimal restructuring of I that satisfies β, witnessing the nonempty intersection Static version: Can every output of A be restructured so as to satisfy the input type of B? • Key: {I / T(I) β ≠ } is regular if T is k-pebble transducer • Enough to check that α {I / T(I) β ≠ } Going all the way: Active XML newspaper XML+ embedded service calls title GetTemp date city “06/10/2003” “Le Monde” GetEvents “Exhibits” “Paris” Y! Materialization: replacing a service call by its result. It’s a recursive process. [Milo,Abiteboul,Amann,Benjelloun,Ngoc – SIGMOD03] Going all the way: Active XML newspaper XML+ embedded service calls title temp date “06/10/2003” GetEvents “Exhibits” “16°C” “Le Monde” T! Materialization: replacing a service call by its result. It’s a recursive process. [Milo,Abiteboul,Amann,Benjelloun,Ngoc – SIGMOD03] Going all the way: Active XML newspaper XML+ embedded service calls title temp date “06/10/2003” “16°C” exhibits GetExhibits City “Le Monde” “Paris” Materialization: replacing a service call by its result. It’s a recursive process. [Milo,Abiteboul,Amann,Benjelloun,Ngoc – SIGMOD03] • Context: peer-to-peer Web services • Each peer – Repository of intensional (AXML) documents – Server: provides Web services (XQuery) – Client: when invoking the embedded service calls Extended type • Restriction on where data and service calls occur in tree newspaper GetEvents title date GetTemp “Exhibits” city “06/10/2003” “Le Monde” “Paris” • Service call input/output signatures • Service call definitions input parameters: XQueries on client data output: XQuery on parameters and server data Basic typechecking problem Given AXML types and service signatures and definitions for all peers, check if: • all AXML documents resulting from calls among peers are valid • all service inputs and outputs satisfy the signatures Can be checked using transducers (if no data joins) Controlling expansion policy by typing newspaper GetEvents title date GetTemp “Exhibits” city “06/10/2003” “Le Monde” “Paris” Y! Materialization can be performed by the sender, before sending a document… or by the receiver, after receiving it. Controlling expansion policy by typing newspaper GetEvents title date temp “06/10/2003” “16°C” “Le Monde” Materialization can be performed by the sender, before sending a document… or by the receiver, after receiving it. “Exhibits” Controlling expansion policy by typing newspaper Y! GetEvents title date GetTemp “Exhibits” city “06/10/2003” “Le Monde” “Paris” Materialization can be performed by the sender, before sending a document… or by the receiver, after receiving it. Controlling expansion policy by typing newspaper Y! GetEvents title date temp “06/10/2003” “16°C” “Le Monde” Materialization can be performed by the sender, before sending a document… or by the receiver, after receiving it. “Exhibits” Why control the materialization of calls? • For added functionality, e.g. – Intensional data allows to get up-to-date information. • For security reasons or capabilities, e.g. – – – – I don’t trust this Web service/domain, I don’t have the right credentials to invoke it, It costs money, Maybe the receiver doesn’t know Active XML! • For performance reasons, e.g. – A proxy can invoke services on behalf of a PDA. Example scenario • Client allows only certain service calls, specified by its type • Can server always force its answer to satisfy the clients schema by appropriate expansions? • Game between server and invoked services Example: word case • Client type: regular language R • Input: word a1 ... an each ai represents a Web service call • Output type of service a: regular language Ra • Game: Bob chooses a in the current word, Alice responds with word in Ra to replace a • Bob wins if resulting word is in R Does Bob have a winning strategy on a1 ... an ? Undecidable [Segoufin,Schwentick,Muscholl – STACS’04] even if limited to “context-free games”: Ra consists of just finitely many words Decidable under restrictions [Segoufin,Schwentick,Muscholl – STACS’04] [Milo,Abiteboul,Amann,Benjelloun,Ngoc – SIGMOD’03] complexity: PSPACE to EXPSPACE Mille Grazie!