The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005 Software Model Checking Observables Code Abstractor Counter-example Control flow graph + Boolean vars (Pushdown automata) Predicate abstraction Model Verifier Temporal logics/Automata Regular! Specification Yes On-the-fly explicit state Symbolic fixpoint evaluation Regular specifications not expressive enough Classical Hoare-style pre/post conditions If p holds when procedure A is invoked, q holds upon return Total correctness: every invocation of A terminates Integral part of emerging standard JML Stack inspection properties: security/access control If a setuuid bit is being set, process root must be in the call stack Inter-procedural data-flow analysis An expression e is very busy at a control point p if on all paths from p, e will be used before any of its variables (possibly local in current procedure) are modified Need matching of calls with returns, or finding pending calls, or local paths --- Context-free properties! Checking Context-free Specifications Obstcales Context-free languages are not closed under intersection Checking context-free properties against context-free models is undecidable However, many such properties are verifiable Existing work in security that handles some stack inspection properties [JMT99,JKS03] Adding assert statements in the program (with additional local variables, if needed), and then checking regular properties (e.g. reachability) amounts to checking contextfree properties Inter-procedural data-flow analysis algorithms [RHS95] Exposing Calls and Returns What’s common to the checkable properties? Both model and property have their own stack, but the two stacks are synchronized and grow/shrink together! As a generator, program exposes its calls and returns, and as an acceptor, property pushes on calls and pops on returns Formalization of this intuition: Visibly Pushdown Languages A surprisingly robust class of languages with properties like the regular languages and potentially many applications Talk Outline Visibly Pushdown Languages Temporal Logic CaRet and Model Checking Ongoing Work References: Visibly pushdown languages Alur, Madhusudan; STOC’04 A temporal logic of nested calls and returns Alur, Etessami, Madhusudan; TACAS’04 Congruences for visibly pushdown languages Alur, Kumar, Madhusudan, Viswanthan; ICALP’05 Context-free Languages: Recap (1/2) Given an alphabet S, a language L is a set of finite words over S A pushdown automaton (PDA) has a finite control and a stack, and while reading a word, it can push/pop stack symbols while updating control state Configuration of a PDA: control state + a string of stack symbols Acceptance defined by empty stack or final state A language L is a context-free language (CFL) if there is a pushdown automaton that accepts it Sample CFLs All regular languages Set of words of the form an bn, for some n Set of words with equal number of a and b symbols Non-CFL: Set of words of the form an bn cn Context-free Languages: Recap (2/2) Alternative characterization: Context-free grammars Natural and popular for defining syntax Nondeterministic PDAs are more expressive than deterministic ones Emptiness of a PDA solvable in polynomial-time Closed under union, but not closed under intersection or complementation Language inclusion, emptiness of intersection undecidable Applications: Parsing, Natural language processing, Program analysis… Exposing Calls and Returns Pushdown alphabet: partitioned into 3 disjoint sets Σ = Spush Spop Slocal Pushdown words: finite words over pushdown Σ A visibly pushdown automaton over a pushdown alphabet Σ is a pushdown automaton that pushes a symbol onto the stack on a symbol in Spush pops the stack on a symbol in Spop cannot change the stack on a symbol in Slocal Key: Stack size at any time is determined by the input word but not control state or stack content Visibly pushdown languages (VPL) A language L is a VPL over a pushdown alphabet Σ, if there is a visibly pushdown automaton that accepts it (acceptance by final state) The language {an bn | for some n} VPL if a is in Spush and b is in Spop Not a VPL for other partitions The language of words with equal number of a and b symbols is not a VPL (independent of partition) Every regular language L is a VPL independent of partitioning Dyck language (words with well-balanced parantheses) is a VPL provided left/right parantheses are in Spush/Spop resp VPLs in Program Analysis Program bool P(u:int) { global int x; local int y; … a: if Q { x = (x+y) }; … } bool Q { local int y; if { …. y++; return 1;} else return P(x) } Analysis To figure out whether the expression e=(x+y) is very busy at program point a, Spush = {call-p, call-q} Spop = {ret-p, ret-q} Slocal = {used-e, mod-x, mod-y, skip} Executions are pushdown words, e.g. call-q, skip, mod-y, ret-q, used-e, mod-x, skip, ret-p Set of executions starting at a location a is a VPL: La Set of executions in which e is very-busy is also a VPL: Le e is very busy at a if La is included in Le VPLs for Document Processing XML Document <conference> <name> CONCUR 2005 </name> <location> <city> San Francisco </city> <hotel> Stanford Court </hotel> </location> <sponsor> CISCO </sponsor> <sponsor> Microsoft </sponsor> … </conference> Query Processing Pushdown alphabet Spush = {<name>, <location>, …} Spop = {</name>, </location>, …} Slocal = {San Francisco, Microsoft, …} A document d is a pushdown word Sample Query: Find documents related to conferences sponsored by Microsoft in San Francisco Specify query as a VPL: L Analysis: Membership question Does document d satisfy query L ? Use VPAs instead of tree automata! (typically, no recursion, but only hierarchy) Closure Properties Note: can’t combine languages wrt different partitions Closed under intersection: Given two VPAs A and B, build a product C accepting intersection of L(A) and L(B) State of C: (state of A, state of B) Stack symbol of C: (stack symbol of A, stack symbol of B) C can simulate the stacks of A and B together Closed under union Closed under complementation Closed under concatenation and Kleene-* Closed under partition-preserving homomorphisms Determinization Given a nondeterministic VPA A, we can construct a deterministic VPA B that accepts the same language and has size exponential in A Potentially useful for building runtime monitors for checking program executions, and online algorithms for XML query processing VPLs are a subclass of DCFLs (languages defined by deterministic PDAs) DCFLs not closed under union Equivalence problem for DCFLs decidable, but complex Determinization: Sketch of the construction Determinization of nondeterministic automata uses subset construction: a state R of B is a set of states of A (the states that A can be, having read the word w so far) Subset construction does not apply to stack But we can do subsets of summaries: if w is a wellmatched word, (q,q’) is a summary of A on w, if A can go from (q,$) to (q’,$), where $ is stack bottom More precisely, if w=w1c1w2c2…cnwn+1, where ci’s are calls and wi’s are well-matched words, then after reading w, determinized automaton B has Stack is (Sn,Rn,cn),….(S1,R1,c1)$ Control state is (Sn+1,Rn+1) Ri = Set of all states A can be in after w1c1…wi Si= Set of all summaries of A on the segment wi Decision Problems Emptiness: Given a VPA A, is its language empty? Same as for PDAs: Polynomial-time complete (cubic) Language inclusion (or equivalence): Given VPAs A and B, is language of A contained in that of B? Determinize B, take its complement, take product with A, and test for emptiness Exponential-time complete Recall: Inclusion is PSPACE-complete for (nondeterministic) finite automata, and undecidable for PDAs VPL Properties Summary L Emptiness Inclusion Regular Yes Yes Yes NLOG Pspace CFL Yes No No PTIME Undec DCFL No No Yes PTIME Undec VPL Yes Yes Yes PTIME Exptime Pushdown Words as Binary Trees Let w = i5 c1 i1 c2 i4 i3 i3 r2 c1 i1 r1 r1 i5 i3 i5 c1 r1 i1 Stack trees i5 c2 i3 r2 i4 c1 i3 i3 i1 r1 VPL: Connection to tree languages Tree-language characterization: Let L be a set of pushdown words and let ST(L) be the set of stack trees that correspond to L. Theorem: L is a VPL iff ST(L) is a regular tree language Note: It is well-known that the set of parse trees corresponding to a context-free grammar is a regular tree language Finite word automata that can jump Let w = i5 c1 i1 c2 i4 i3 i3 r2 c1 i1 r1 r1 i5 i3 Summary Automata Finite-state automaton that reads pushdown word While reading a call, can send a copy to matching return d(q,a) is a set of pairs of states if a is in Spush Nondeterministic summary automata are expressively equivalent to VPAs Deterministic VPA (= VPL) > Deterministic summary automata > Deterministic tree automata (on stack trees) Robustness: Alternative Characterizations Monadic second order logic with matching predicate m(x,y) means x is a call and y is matching return Sample formula: forall x. if p(x) then exists y,z. ( q(y) and x<y<z and m(x,z) ) Thm. MSO + matching predicate interpreted over pushdown words is expressively equivalent to VPLs Thm: Every CFL is a homomorphic image of a VPL Context-free grammar based characterization Two types of non-terminals V0 (matched words) and V1 All productions are of the form Xa if X is in V0 then a must be local XaYbZ a is a call, b is a return, Y is in V0 if X is in V0 then Z must be in V0 “Regular-like” properties continue.. Congruences and minimization (Myhill-Nerode Theorem) central to theory of regular languages Given a language L, for well-matched words u and v, define u ~L v iff for all words x and y, xuy in L iff xvy in L Theorem: A language L of well-matched words is a VPL iff the congruence ~L is of finite index Minimization No unique minimal deterministic VPA in general, but… Minimization of (single-entry) RSMs (i.e. procedural boolean programs) possible. Partitioning into k procedures/modules is adequate to get canonicity! ω-VPL - Extension to Infinite Words A Büchi VPA: VPA over infinite pushdown words A word is accepted if along a run, the set F is seen infinitely often ω-VPL – class of languages accepted by Büchi VPAs ω-VPL is closed under all Boolean operations Characterization using regular trees and MSO characterization hold. However, ω-VPLs are not determinizable! Let L be set of all words such that the stack is repeatedly bounded i.e. for some n, the stack depth is n infinitely often. L is an ω-VPL but there is no deterministic (Muller) VPA for it Language inclusion and equivalence are still decidable Talk Outline Visibly Pushdown Languages Temporal Logic CaRet and Model Checking Ongoing Work Software Model Checking Observables Code Abstractor Predicate abstraction Control flow graph + Boolean vars (Pushdown automata) Model CaRet/VPAs Counter-example Verifier Yes Specification Abstracting Software int x, y; bool bx, by; if x>0 { ……. y=x+1 .…… } else { …… y=x+1 …… } if bx { ……… by=true ……… } else { ………… by={true,false} ………. } Program bx: x>0 by: y>0 Boolean Program Abstracting Modular Programs Program main() { bool y; … x = P(y); … z = P(x); … } bool P(u: bool) { … return Q(u); } bool Q(w: bool) { if … else return P(~w) } Recursive State Machine (RSM)/ Pushdown automaton A1 A2 A2 A2 A3 A3 Entry/Inputs A3 Box (function-calls) A1 Exit/outputs Linear-time Propositional Temporal Logic Q ::- p | not Q | Q or Q’ | Next Q | Always Q | Eventually Q | Q Until Q’ Interpreted over (infinite) sequences. Models of an LTL formula is a w-regular language. Useful for stating sequencing properties: If req happens, then req holds until it is granted: Always ( req → (req Until grant) ) An exception is never raised: Always ( not Exception ) CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C A …………. Global successor used by LTL CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C D …………. Global successor used by LTL Local successor: Jump from calls to returns Otherwise global successor at the same level CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C A …………. Global successor used by LTL Local successor: Jump from calls to returns Otherwise global successor at the same level CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties Local path A B C A …………. Global successor used by LTL Local successor: Jump from calls to returns Otherwise global successor at the same level CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C A …………. Global successor used by LTL Local successor: Caller modality: Jump from calls to returns Otherwise global successor at the same level Jump to the caller of the current module Defined for every position except top-level ones CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C A Caller path gives the stack content! …………. Global successor used by LTL Abstract successor: Caller modality: Jump from calls to returns Otherwise global successor at the same level Jump to the caller of the current module Defined for every position except top-level ones CARET Definition Syntax: Q ::- p | not Q Next Q Eventually Q | Q or Q’ | | Always Q | | Q Until Q’ Local-Next Q Local-Eventually Q | | Local-Always Q Q Local-Until Q’ Caller Q CallerPath-Eventually Q | CallerPath-Always Q | Q CallerPath-Until Q’ Local- and Caller- versions of all temporal operators All these operators can be nested Expressing properties in Caret Pre-post conditions: If P holds when A is called, then Q must hold when the call returns Always ( (P and call-to-A) P Local-Next Q ) Q A Integrating Manna/Pnueli-style reasoning for reactive computations with Hoare-style reasoning for structured programs Expressing properties in Caret If A is called with low priority, then it cannot access the file Always ( call-to-A and low-priority Local-Always ( not access-file ) ) A low-priority A high-priority access-file Expressing properties in Caret Stack inspection properties If a variable x is accessed, then A must be on the call stack Always ( access-to-x CallerPath-Eventually call-to-A ) A access-to-x Model checking CARET Given: A (boolean) recursive state machine/ visibly pushdown automaton M A CARET formula Q Model-checking: Do all runs of M satisfy the specification Q? CARET can be model-checked in time that is polynomial in M and exponential in Q. |M|3 . 2O(|Q|) Complexity class same as that for LTL ! Generalization of Vardi-Wolper construction Model-checking CARET: Intuition The specification matches calls and returns of the program, so the runs of the program and models of the formula are both visibly pushdown languages Given M and formula Q, Build a Buchi pushdown automaton that accepts words exhibited by M that satisfy (not Q) Check this pushdown automaton for emptiness Construction builds on the classical tableaux for LTL Local-Next Q1 s Push s and Q1 Pop s and Q1 ; Check Q1 s, Q1 Conclusions and Ongoing Work Exposing calls and returns lets you hide the stack! VPLs seem robust and adequate to model software analysis problems VPL-triggered research Dynamic logic with VPL (Loding,Serre) Visibly pushdown games (Loding,Madhusudan,Serre) XML query processing (Pitcher) Third-order Algol with iteration (Murawski,Walukiewicz) Active area of current research DTDs, XML, and query languages Branching-time logics, Fixpoint calculus, and visibly pushdown tree automata (Alur, Chaudhuri, Madhusudan) Expressive completeness of temporal operators Implementing a model checker for VPL monitors