The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania

advertisement
The Benefits of Exposing Calls and Returns
Rajeev Alur
University of Pennsylvania
CONCUR/SPIN, August 2005
Software Model Checking
Observables
Code
Abstractor
Counter-example
Control flow graph +
Boolean vars
(Pushdown automata)
Predicate
abstraction
Model
Verifier
Temporal logics/Automata
Regular!
Specification
Yes
On-the-fly explicit state
Symbolic fixpoint evaluation
Regular specifications not expressive enough
 Classical Hoare-style pre/post conditions
 If p holds when procedure A is invoked, q holds upon return
 Total correctness: every invocation of A terminates
 Integral part of emerging standard JML
 Stack inspection properties: security/access control
 If a setuuid bit is being set, process root must be in the call
stack
 Inter-procedural data-flow analysis
 An expression e is very busy at a control point p if on all
paths from p, e will be used before any of its variables
(possibly local in current procedure) are modified
Need matching of calls with returns, or finding pending
calls, or local paths --- Context-free properties!
Checking Context-free Specifications
 Obstcales
 Context-free languages are not closed under intersection
 Checking context-free properties against context-free models
is undecidable
 However, many such properties are verifiable
 Existing work in security that handles some stack inspection
properties [JMT99,JKS03]
 Adding assert statements in the program (with additional
local variables, if needed), and then checking regular
properties (e.g. reachability) amounts to checking contextfree properties
 Inter-procedural data-flow analysis algorithms [RHS95]
Exposing Calls and Returns
 What’s common to the checkable properties?
 Both model and property have their own stack, but the two
stacks are synchronized and grow/shrink together!
 As a generator, program exposes its calls and
returns, and as an acceptor, property pushes on calls
and pops on returns
Formalization of this intuition:
Visibly Pushdown Languages
A surprisingly robust class of languages with properties
like the regular languages and potentially many
applications
Talk Outline
 Visibly Pushdown Languages
 Temporal Logic CaRet and Model Checking
 Ongoing Work
References:



Visibly pushdown languages
Alur, Madhusudan; STOC’04
A temporal logic of nested calls and returns
Alur, Etessami, Madhusudan; TACAS’04
Congruences for visibly pushdown languages
Alur, Kumar, Madhusudan, Viswanthan; ICALP’05
Context-free Languages: Recap (1/2)
 Given an alphabet S, a language L is a set of finite





words over S
A pushdown automaton (PDA) has a finite control
and a stack, and while reading a word, it can
push/pop stack symbols while updating control state
Configuration of a PDA: control state + a string of
stack symbols
Acceptance defined by empty stack or final state
A language L is a context-free language (CFL) if
there is a pushdown automaton that accepts it
Sample CFLs
 All regular languages
 Set of words of the form an bn, for some n
 Set of words with equal number of a and b symbols
 Non-CFL: Set of words of the form an bn cn
Context-free Languages: Recap (2/2)
 Alternative characterization: Context-free grammars
 Natural and popular for defining syntax
 Nondeterministic PDAs are more expressive than




deterministic ones
Emptiness of a PDA solvable in polynomial-time
Closed under union, but not closed under
intersection or complementation
Language inclusion, emptiness of intersection
undecidable
Applications: Parsing, Natural language processing,
Program analysis…
Exposing Calls and Returns
 Pushdown alphabet: partitioned into 3 disjoint sets
Σ = Spush  Spop  Slocal
 Pushdown words: finite words over pushdown Σ
 A visibly pushdown automaton over a pushdown
alphabet Σ is a pushdown automaton that
 pushes a symbol onto the stack on a symbol in Spush
 pops the stack on a symbol in Spop
 cannot change the stack on a symbol in Slocal
Key:
Stack size at any time is determined by the input word
but not control state or stack content
Visibly pushdown languages (VPL)

A language L is a VPL over a pushdown alphabet Σ, if there is
a visibly pushdown automaton that accepts it (acceptance by
final state)

The language {an bn | for some n}
 VPL if a is in Spush and b is in Spop
 Not a VPL for other partitions

The language of words with equal number of a and b symbols
is not a VPL (independent of partition)

Every regular language L is a VPL independent of partitioning

Dyck language (words with well-balanced parantheses) is a
VPL provided left/right parantheses are in Spush/Spop resp
VPLs in Program Analysis
Program
bool P(u:int) {
global int x;
local int y;
…
a: if Q
{ x = (x+y) };
…
}
bool Q {
local int y;
if { ….
y++;
return 1;}
else return P(x)
}
Analysis
To figure out whether the expression
e=(x+y) is very busy at program point a,
Spush = {call-p, call-q}
Spop = {ret-p, ret-q}
Slocal = {used-e, mod-x, mod-y, skip}
Executions are pushdown words, e.g.
call-q, skip, mod-y, ret-q,
used-e, mod-x, skip, ret-p
Set of executions starting at a location a
is a VPL: La
Set of executions in which e is very-busy
is also a VPL: Le
e is very busy at a if La is included in Le
VPLs for Document Processing
XML Document
<conference>
<name>
CONCUR 2005
</name>
<location>
<city>
San Francisco
</city>
<hotel>
Stanford Court
</hotel>
</location>
<sponsor>
CISCO
</sponsor>
<sponsor>
Microsoft
</sponsor>
…
</conference>
Query Processing
Pushdown alphabet
Spush = {<name>, <location>, …}
Spop = {</name>, </location>, …}
Slocal = {San Francisco, Microsoft, …}
A document d is a pushdown word
Sample Query: Find documents related to
conferences sponsored by Microsoft in
San Francisco
Specify query as a VPL: L
Analysis: Membership question
Does document d satisfy query L ?
Use VPAs instead of tree automata!
(typically, no recursion, but only hierarchy)
Closure Properties
Note: can’t combine languages wrt different partitions
 Closed under intersection: Given two VPAs A and B,
build a product C accepting intersection of L(A) and
L(B)
 State of C: (state of A, state of B)
 Stack symbol of C: (stack symbol of A, stack symbol of B)
 C can simulate the stacks of A and B together
 Closed under union
 Closed under complementation
 Closed under concatenation and Kleene-*
 Closed under partition-preserving homomorphisms
Determinization
 Given a nondeterministic VPA A, we can construct a
deterministic VPA B that accepts the same
language and has size exponential in A
 Potentially useful for building runtime monitors for
checking program executions, and online
algorithms for XML query processing
 VPLs are a subclass of DCFLs (languages defined by
deterministic PDAs)
 DCFLs not closed under union
 Equivalence problem for DCFLs decidable, but complex
Determinization: Sketch of the construction
 Determinization of nondeterministic automata uses



subset construction: a state R of B is a set of states
of A (the states that A can be, having read the word
w so far)
Subset construction does not apply to stack
But we can do subsets of summaries: if w is a wellmatched word, (q,q’) is a summary of A on w, if A
can go from (q,$) to (q’,$), where $ is stack bottom
More precisely, if w=w1c1w2c2…cnwn+1, where ci’s are
calls and wi’s are well-matched words, then after
reading w, determinized automaton B has
 Stack is (Sn,Rn,cn),….(S1,R1,c1)$
 Control state is (Sn+1,Rn+1)
 Ri = Set of all states A can be in after w1c1…wi
 Si= Set of all summaries of A on the segment wi
Decision Problems
 Emptiness: Given a VPA A, is its language empty?
 Same as for PDAs: Polynomial-time complete (cubic)
 Language inclusion (or equivalence): Given VPAs A
and B, is language of A contained in that of B?
 Determinize B, take its complement, take product with A,
and test for emptiness
 Exponential-time complete
 Recall: Inclusion is PSPACE-complete for
(nondeterministic) finite automata, and undecidable for
PDAs
VPL Properties Summary
L
Emptiness
Inclusion
Regular
Yes
Yes
Yes
NLOG
Pspace
CFL
Yes
No
No
PTIME
Undec
DCFL
No
No
Yes
PTIME
Undec
VPL
Yes
Yes
Yes
PTIME Exptime
Pushdown Words as Binary Trees
Let w = i5 c1 i1 c2 i4 i3 i3 r2 c1 i1 r1 r1 i5 i3
i5
c1
r1
i1
Stack trees
i5
c2
i3
r2
i4
c1
i3
i3
i1
r1
VPL: Connection to tree languages
Tree-language characterization:
Let L be a set of pushdown words and let
ST(L) be the set of stack trees that correspond to L.
Theorem: L is a VPL
iff ST(L) is a regular tree language
Note:
It is well-known that the set of parse trees corresponding
to a context-free grammar is a regular tree language
Finite word automata that can jump
Let w = i5 c1 i1 c2 i4 i3 i3 r2 c1 i1 r1 r1 i5 i3
 Summary Automata
 Finite-state automaton that reads pushdown word
 While reading a call, can send a copy to matching return
 d(q,a) is a set of pairs of states if a is in Spush
 Nondeterministic summary automata are
expressively equivalent to VPAs
 Deterministic VPA (= VPL)
> Deterministic summary automata
> Deterministic tree automata (on stack trees)
Robustness: Alternative Characterizations
 Monadic second order logic with matching predicate
 m(x,y) means x is a call and y is matching return
 Sample formula:
forall x. if p(x) then exists y,z. ( q(y) and x<y<z and m(x,z) )
 Thm. MSO + matching predicate interpreted over
pushdown words is expressively equivalent to VPLs
 Thm: Every CFL is a homomorphic image of a VPL
 Context-free grammar based characterization
 Two types of non-terminals V0 (matched words) and V1
 All productions are of the form
Xa
if X is in V0 then a must be local
XaYbZ
a is a call, b is a return, Y is in V0
if X is in V0 then Z must be in V0
“Regular-like” properties continue..
 Congruences and minimization (Myhill-Nerode
Theorem) central to theory of regular languages
 Given a language L, for well-matched words u and

v, define u ~L v iff for all words x and y, xuy in L iff
xvy in L
Theorem: A language L of well-matched words is a
VPL iff the congruence ~L is of finite index
 Minimization
 No unique minimal deterministic VPA in general, but…
 Minimization of (single-entry) RSMs (i.e. procedural
boolean programs) possible. Partitioning into k
procedures/modules is adequate to get canonicity!
ω-VPL - Extension to Infinite Words

A Büchi VPA:
 VPA over infinite pushdown words
 A word is accepted if along a run, the set F is seen infinitely often

ω-VPL – class of languages accepted by Büchi VPAs

ω-VPL is closed under all Boolean operations
Characterization using regular trees and MSO characterization hold.

However, ω-VPLs are not determinizable!
 Let L be set of all words such that the stack is repeatedly bounded
i.e. for some n, the stack depth is n infinitely often.
 L is an ω-VPL but there is no deterministic (Muller) VPA for it
 Language inclusion and equivalence are still decidable
Talk Outline
 Visibly Pushdown Languages
 Temporal Logic CaRet and Model Checking
 Ongoing Work
Software Model Checking
Observables
Code
Abstractor
Predicate
abstraction
Control flow graph +
Boolean vars
(Pushdown automata)
Model
CaRet/VPAs
Counter-example
Verifier
Yes
Specification
Abstracting Software
int x, y;
bool bx, by;
if x>0 {
…….
y=x+1
.……
}
else {
……
y=x+1
……
}
if bx {
………
by=true
………
}
else {
…………
by={true,false}
……….
}
Program
bx: x>0
by: y>0
Boolean Program
Abstracting Modular Programs
Program
main() {
bool y;
…
x = P(y);
…
z = P(x);
…
}
bool P(u: bool) {
…
return Q(u);
}
bool Q(w: bool) {
if …
else return P(~w)
}
Recursive State Machine (RSM)/ Pushdown automaton
A1
A2
A2
A2
A3
A3
Entry/Inputs
A3
Box (function-calls)
A1
Exit/outputs
Linear-time Propositional Temporal Logic
Q ::-
p | not Q | Q or Q’ | Next Q |
Always Q | Eventually Q | Q Until Q’
Interpreted over (infinite) sequences.
Models of an LTL formula is a w-regular language.
Useful for stating sequencing properties:
 If req happens, then req holds until it is granted:
Always ( req → (req Until grant) )
 An exception is never raised:
Always ( not Exception )
CARET
CARET: A temporal logic for Calls and Returns
Expresses context-free properties
A
B
C
A
………….
Global successor used by LTL
CARET
CARET: A temporal logic for Calls and Returns
Expresses context-free properties
A
B
C
D
………….
Global successor used by LTL
Local successor:
Jump from calls to returns
Otherwise global successor at the same level
CARET
CARET: A temporal logic for Calls and Returns
Expresses context-free properties
A
B
C
A
………….
Global successor used by LTL
Local successor:
Jump from calls to returns
Otherwise global successor at the same level
CARET
CARET: A temporal logic for Calls and Returns
Expresses context-free properties
Local path
A
B
C
A
………….
Global successor used by LTL
Local successor:
Jump from calls to returns
Otherwise global successor at the same level
CARET
CARET: A temporal logic for Calls and Returns
Expresses context-free properties
A
B
C
A
………….
Global successor used by LTL
Local successor:
Caller modality:
Jump from calls to returns
Otherwise global successor at the same level
Jump to the caller of the current module
Defined for every position except top-level ones
CARET
CARET: A temporal logic for Calls and Returns
Expresses context-free properties
A
B
C
A
Caller path gives the
stack content!
………….
Global successor used by LTL
Abstract successor:
Caller modality:
Jump from calls to returns
Otherwise global successor at the same level
Jump to the caller of the current module
Defined for every position except top-level ones
CARET Definition
Syntax:
Q ::- p | not Q
Next Q
Eventually Q
|
Q or Q’
|
| Always Q |
| Q Until Q’
Local-Next Q
Local-Eventually Q
|
|
Local-Always Q
Q Local-Until Q’
Caller Q
CallerPath-Eventually Q
| CallerPath-Always Q
| Q CallerPath-Until Q’
 Local- and Caller- versions of all temporal operators
 All these operators can be nested
Expressing properties in Caret
Pre-post conditions:
If
P holds when A is called, then Q must hold when
the call returns
Always ( (P and call-to-A)
P
Local-Next Q )
Q
A
Integrating Manna/Pnueli-style reasoning for reactive computations
with Hoare-style reasoning for structured programs
Expressing properties in Caret
If A is called with low priority, then it cannot
access the file
Always ( call-to-A and low-priority
Local-Always ( not access-file ) )
A
low-priority
A
high-priority
access-file
Expressing properties in Caret
Stack inspection properties
If a variable x is accessed, then A must be on
the call stack
Always ( access-to-x
CallerPath-Eventually call-to-A )
A
access-to-x
Model checking CARET
 Given:

A (boolean) recursive state machine/
visibly pushdown automaton M
A CARET formula Q
Model-checking:
Do all runs of M satisfy the specification Q?
CARET can be model-checked in time that is
polynomial in M and exponential in Q.
|M|3 . 2O(|Q|)
Complexity class same as that for LTL !
Generalization of Vardi-Wolper construction
Model-checking CARET: Intuition


The specification matches calls and returns of the program, so
the runs of the program and models of the formula are both
visibly pushdown languages
Given M and formula Q,
 Build a Buchi pushdown automaton that accepts
words exhibited by M that satisfy (not Q)
 Check this pushdown automaton for emptiness
 Construction builds on the classical tableaux for LTL
Local-Next
Q1
s
Push s and Q1
Pop s and Q1 ;
Check Q1
s,
Q1
Conclusions and Ongoing Work
 Exposing calls and returns lets you hide the stack!
 VPLs seem robust and adequate to model software

analysis problems
VPL-triggered research
 Dynamic logic with VPL (Loding,Serre)
 Visibly pushdown games (Loding,Madhusudan,Serre)
 XML query processing (Pitcher)
 Third-order Algol with iteration (Murawski,Walukiewicz)
 Active area of current research
 DTDs, XML, and query languages
 Branching-time logics, Fixpoint calculus, and visibly
pushdown tree automata (Alur, Chaudhuri, Madhusudan)
 Expressive completeness of temporal operators
 Implementing a model checker for VPL monitors
Download