Model Checking XML Manipulating Software Xiang Fu Tevfik Bultan Jianwen Su Department of Computer Science University of California, Santa Barbara {fuxiang,bultan,su}@cs.ucsb.edu Web Services Composition WSCI BPEL4WS Service WSDL Message SOAP Type XML Schema Data XML Web Service Standards Implementation Platforms Interaction Microsoft .Net, Sun J2EE • Loosely coupled, interaction through standardized interfaces • Standardized data transmission via XML • Asynchronous messaging • Platform independent (.NET, J2EE) Outline • An Example: Stock Analysis Service • Capturing Global Behaviors – Conversations, Conversation Protocols • Web Service Analysis Tool • XML Messaging – XML data, MSL types, XPath expressions • Model Checking Conversation Protocols – Translation to Promela • Conclusions and Future Work An Example: Stock Analysis Service (SAS) • SAS is a composite web service – a finite set of peers: Investor (Inv), Stock Broker (SB), and Research Department (RD) – and a finite set of message classes: register, ack, cancel, accept, ... Investor (Inv) report register ack, cancel Stock Broker (SB) accept, reject, bill request, terminate Research Dept. (RD) Communication Model • We assume that the messages among the peers are exchanged through reliable and asynchronous messaging – FIFO and unbounded message queues Stock Broker (SB) req req Research Dept. (RD) • This model is similar to industry efforts such as – JMS (Java Message Service) – MSMQ (Microsoft Message Queuing Service) Conversations • A virtual watcher records the messages as they are sent register Investor (Inv) accept ack Stock Broker (SB) bill Watcher Research Dept. (RD) reg acc req rep ack bil ter • A conversation is a sequence of messages the watcher sees during an execution Conversation Protocols • Conversation Protocol: An automaton that accepts the desired conversation set SAS conversation protocol report 1 register 3 reject 6 request 2 accept 7 cancel ack 8 ack request 5 9 report terminate 4 12 terminate bill 11 cancel 10 Properties of Conversations • The notion of conversation enables us to reason about temporal properties of the composite web services • LTL framework extends naturally to conversations – LTL temporal operators X (neXt), U (Until), G (Globally), F (Future) – Atomic properties Predicates on message classes (or contents) Example: G ( accept F bill ) • Model checking problem: Given an LTL property, does the conversation set satisfy the property? Web Service Analysis Tool (WSAT) Web Services Front End Analysis Back End Intermediate Representation BPEL (bottom-up) BPEL to GFSA Guarded automata GFSA to Promela (synchronous communication) Synchronizability Analysis GFSA to Promela (bounded queue) skip Conversation Protocol (top-down) GFSA parser Guarded automaton Verification Languages Realizability Analysis success GFSA to Promela (single process, no communication) fail • Friday 4:00pm, tool presentation at CAV • Demonstration Saturday (or anytime you find me with my laptop) Promela SAS Guarded Automata Topdown { Schema{ PeerList{ Investor, Broker, ResearchDept }, TypeList{ Register ... Accept ... }, MessageList{ register{ Investor -> Broker : Register }, accept{ Broker -> Investor : Accept }, ... } }, GProtocol{ States{ s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12 }, InitialState{ s1 }, FinalStates{ s4 }, TransitionRelation{ t1{ s1 -> s2 : register, Guard{ true } }, t2{ s2 -> s5 : accept, Guard{ true => $accept[//orderID := $register//orderID] } }, ... } } } XML (eXtensible Markup Language) • XML is a markup language like HTML • Similar to HTML, XML tags are written as <tag> followed by </tag> • HTML vs. XML – In HTML, tags are used to describe the appearance of the data <b> </b> <i> </i> ... – In XML, tags are used to describe the content of the data rather than the appearance <date> </date> <address> </address> • XML documents can be modeled as trees where each internal node corresponds to a tag, and leaf nodes correspond to basic types An XML Document and Its Tree <Register> <investorID> VIP01 </investorID> <requestList> <stockID> 0001 </stockID> <stockID> 0002 </stockID> </requestList> <payment> <accountNum> 0425 </accountNum> </payment> </Register> Register investorID VIP01 requestList stockID stockID 0001 0002 payment accountNum 0425 MSL (Model Schema Language) • MSL is a language for defining XML data types – MSL captures core features of XML Schema • Basic MSL syntax g g | b | t[g ] | g{m,n } | g,g | g&g | g|g is an XML type (i.e., an MSL type expression) is the empty sequence b is a basic type such as string, boolean, int, etc. t is a tag m and n are positive integers [ ] { } & , | are MSL type constructors MSL Semantics t[g ] denotes a type with root node labeled t with children of type g g{m,n } denotes a sequence of size at least m and at most n where each member is of type g g1 , g2 denotes an ordered sequence where the first member is of type g1 and the second member is of type g2 g1 & g2 denotes an unordered sequence where one member is of type g1 and the other member is of type g2 g1 | g2 denotes a choice between type g1 and type g2, i.e., either type g1 or type g2, but not both An MSL Type Declaration and an Instance Register[ investorID[string] , requestList[ stockID[int]{1,3} ] , payment[ creditCardNum[int] | accountNum[int] ] ] <Register> <investorID> VIP01 </investorID> <requestList> <stockID> 0001 </stockID> <stockID> 0002 </stockID> </requestList> <payment> <accountNum> 0425 </accountNum> </payment> </Register> Mapping MSL types to Promela • Restrictions: no unbounded or unordered sequences, no string manipulation • Basic types – integer and boolean types are mapped to Promela basic types int and bool – strings are mapped to enumerated type (mtype) in Promela • we only allow constant string values • Type constructors are handled using – structured types (declared using typedef) in Promela – or arrays Example Register[ investorID[string] , requestList[ stockID[int]{1,3} ] , payment[ creditCardNum[int] | accountNum[int] ] ] typedef t1_investorID{ mtype stringvalue;} typedef t2_stockID{int intvalue;} typedef t3_requestList{ t2_stockID stockID [3]; int stockID_occ; } typedef t4_accountNum{int intvalue;} typedef t5_creditCard{int intvalue;} mtype {m_accountNum, m_creditCard} typedef t6_payment{ t4_accountNum accountNum; t5_creditCard creditCard; mtype choice; } typedef Register{ t1_investorID investorID; t3_requestList requestList; t6_payment payment; } XPath • In order to write specifications or programs that manipulate XML documents we need: – an expression language to access values and nodes in XML documents • XPath is a language for writing expressions (queries) that navigate through XML trees and return a set of answer nodes • An XPath query defines a function which – takes and XML tree and a context node (in the same tree) as input and – returns a set of nodes (in the same tree) as output XPath Syntax Basic XPath syntax: q . | .. | b | t | * | q / q | q // q | q [ exp ] q is an XPath query exp denotes a predicate on basic types, i.e., on the leaf nodes of the XML tree b denotes a basic type such as string, boolean, int, etc. t denotes a tag XPath Semantics XPath expression are evaluated from left to right Given an XML tree and a node n as a context node . returns n .. returns the parent of n Given an XML tree and a set of nodes * returns all the nodes b returns the nodes that are of basic type b t returns the nodes which are labeled with tag t XPath Semantics Contd. Starting at the context node: q1 / q2 returns each node which matches q2 starting at a child of a node which matches q1 q1 // q2 returns each node which matches q2 starting at a descendant of a node which matches q1 (if q1 is missing, then start at the root) q [ exp ] returns the nodes that match q and with children for which exp evaluates to true Examples Register investorID VIP01 requestList stockID stockID 0001 0002 payment accountNum 0425 //payment/* returns the node labeled accountNum /Register/requestList/stockID/int returns the nodes labeled 0001 and 0002 //stockID[int > 1]/int returns the node labeled 0002 XPath to Promela • Generate code that evaluates the XPath expression – Restrictions: no ancestors-axis, no string expressions • Uses two data structures – Type tree shows the structure of the corresponding MSL type – Abstract statements which are mapped to Promela code • Traverse the XPath expression from left to right – Statements generated in each step are inserted into the BLANK spaces left in the code from the previous step – The type tree is used to keep track of the context of the generated code Statement IF(c) Promela Code if :: c -> BLANK :: else -> skip fi FOR(v,l,h) v = l – 1 do :: v < h -> BLANK v++ :: else -> break od EMPTY BLANK INC(v) v++ SET(v,a) v = a Type Tree Register[ investorID[string] & requestList[ stockID[int]{1,3} ] & payment[ creditCardNum[int] | accountNum[int] ] ] 2 investorID 3 string Register 1 7 payment 4 requestList 8 10 5 stockID creditCard accountNum (idx: i1) 9 6 int int int 11 $register // stockID / [int()>5] / [position() = last()] / int() SET (i2,1) EMPTY SET (bRes2,0) SET (bRes1,0) 1 FOR (i1,1,3) IF (cond) SET (bRes1,1) IF (i2==i3) SET (bRes2,0) 5 IF (bRes1) IF (bRes2) EMPTY 5 5 INC (i2) cond v_register.requestlist.stockID[i1] > 5 Sequence Insert 6 $request//stockID=$register//stockID[int()>5][position()=last()] /* result of the XPath expression */ bool bResult = false; /* results of the predicates 1, 2, and 1 resp. */ bool bRes1, bRes2, bRes3; /* index, position(), last(), index, position() */ int i1, i2, i3, i4, i5; i2=1; /* pre-calculate the value of last(), store in i3 */ i4=0; i5=1; i3=0; do :: i4 < v_register.requestList.stockID_occ -> /* compute first predicate */ bRes3 = false; if :: v_register.requestList.stockID[i4].intvalue>5 -> bRes3 = true :: else -> skip fi; if :: bRes3 -> i5++; i3++; :: else -> skip fi; i4++; :: else -> break; od; $request//stockID=$register//stockID[int()>5][position()=last()] i1=0; do :: i1 < v_register.requestList.stockID_occ -> bRes1 = false; if :: v_register.requestList.stockID[i1].intvalue>5 -> bRes1 = true :: else -> skip fi; if :: bRes1 -> bRes2 = false; if :: (i2 == i3) -> bRes2 = true; :: else -> skip fi; if :: bRes2 -> if :: (v_request.stockID.intvalue == v_register.requestList.stockID[i1].intvalue) -> bResult = true; :: else -> skip fi :: else -> skip fi; i2++; :: else -> skip fi; i1++; :: else -> break; od; Model Checking Using Promela • Error in SAS conversation protocol t14{ s8 -> s12 : bill, Guard{ $request//stockID = $register//stockID [position() = last()] => $bill[ //orderID := $register//orderID ] } } • Repeating stockID will cause error • One can only discover these kinds of errors by analysis of XPath expressions Related Work • Verification of web services – Simulation, verification, composition of web services using a Petri net model [Narayanan, McIlraith WWW’02] – Using MSC to model BPEL web services which are translated to labeled transition systems and verified using model checking [Foster, Uchitel, Magee, Kramer ASE’03] – Model checking Web Service Flow Language specifications using SPIN [Nakajima ICWE’04] – BPEL verification using a process algebra model and Concurrency Workbench [Koshkina, van Breugel TAVWEB’04] Related Work • Conversation specification – IBM Conversation support project http://www.research.ibm.com/convsupport/ – Conversation support for business process integration [Hanson, Nandi, Kumaran EDOCC’02] Future Work • Other input languages in the front end – WSCI, OWL-S • Other verification tools at the back end – SMV, Action Language Verifier • Symbolic representations for XML data • Abstraction for XML data and XML data manipulation Current and Future Work Verification Languages Front End WSCI ... Back End Intermediate Representation BPEL Conversation Protocols Analysis Translator for bottom-up specifications Translator for top-down specifications Guarded automata Guarded automaton Automated Abstraction Web Service Specification Languages Synchronizability Analysis Translation with bounded queue skip Realizability Analysis fail Translation with synchronous communication success Translation with single process, no communication Promela Action Language SMV ...