Games for Formal Design and Verification of Reactive Systems Rajeev Alur University of Pennsylvania http://www.cis.upenn.edu/~alur/ ATVA, Taipei, November 2004 System Reliability Software bugs are pervasive Bugs can be expensive Bugs can cost lives Bulk of development cost is in validation, testing, bug fixes Old problem that just won’t go away Many approaches and decades of research Systematic testing Programming languages technology (e.g. types) Formal methods (specification and verification) Grand challenge for computer science: Tools for designing “correct” software model temporal property Model Checker yes Error trace Advantages Automated formal verification, Effective debugging tool Growing industrial success In-house groups: Intel, Microsoft, Lucent, Motorola… Commercial model checkers Opportunities for research Scalability is still a problem Effective use requires great expertise Models for Formal Analysis Model is usually a composition of models for components Component models are primarily of two types 1. Manual or automatic abstraction of code that implements the components (e.g. tcp client) 2. Capturing the environment that the components are reacting to (e.g. network connecting the clients) Nondeterminism / choice is essential in modeling 1. Abstraction loses information (e.g. control-flow graph keeps both branches of a conditional test) 2. Environment supplies inputs and/or has unpredictable events (e.g. a network may or may not lose a message) Verifying a model typically amounts to checking all possible executions of the model From Code to Models via Abstraction int x, y; if x>0 { ………… y:=x+1 ……….} else { ………… y:=x+1 ……….} Predicate Abstraction bx: x>0; by : y>0 bool bx, by; if bx { ………… by:=true ……….} else { ………… by:={true,false} ……….} Contemporary tools for software verification (SLAM, BLAST…) use automated predicate abstraction and symbolic model checking Classical Model Checking Processor1 Cache Controller1 Cache Controller2 Processor2 Bus Model is viewed as a state transition graph (no distinction among choices of various components) Requirements No reachable state has two caches in write-exclusive states (Safety/Reachability) Every read/write request is eventually completed (Liveness/Linear Temporal Logic) From every reachable state, there exists a path leading to a quiescent state (Branching-time/CTL) Game-based Analysis Processor1 Cache Controller1 Cache Controller2 Processor2 Bus Model is viewed as a game graph Different components viewed as separate players Each move belongs to one of the players Strategy for a player is to choose the next move based on the execution so far Requirements: Processor1 and Controller1 have a strategy to successfully write no matter how other components behave (adversarial/collaborative groups) Beyond model checking: Compute the most general model for Processor that satisfies requirements (Synthesis) Talk Outline Motivation Introduction to Theory of Games Games in Requirements (MOCHA) Interface Synthesis using Games (JIST) Conclusions Formal Definition of Games A game graph G consists of A set V of vertices and a set E of edges A labeling of edges with moves in a set M When game is at a vertex v, player0 chooses a move m, player1 chooses an edge (v, u) labeled m, and game proceeds to u A strategy f for player0 is a function from V+ to M For a vertex v, strategy f, Plays(v,f) contains paths v0v1v2… in graph G s.t. v0=v and for each i, there is an edge (vi,vi+1) labeled with f(v0v1… vi) A winning condition W is a set of infinite paths over V Reachability: a path is in W if it contains a vertex in target set F Safety: a path is in W if all its vertices are in safe set S A strategy f is winning for player0 in initial state v if every path in Plays(v,f) is in W Game problem: Given G, v, and W, decide if player0 has a winning strategy starting in v (and compute the winning strategy) Reachability Game b b 1 b a a a a 2 4 5 4 a b 3 a b 1 a a 2 3 a b 5 Can win from every vertex except 4 Sufices to consider memoryless strategies Can be computed in linear time (PTIME complete) Partial Information Games Player0 does not know the game position precisely Every vertex has an observation A strategy maps a sequence of observations to a move b b 1 b a a a a 2 4 b Color shows observation Cannot win from any non-target vertex 3 a b 5 Solving partial information game requires subset construction Problem is Exponential-time complete! Safety Games for Assumption Generation Env Generating inputs System model M Nondeterministic Goal: System should stay within a given safe set S Let E be the environment that generates all possible inputs All paths in E || M may not stay within S Game: In E||M, check Env has a winning strategy to keep the system within S Winning strategies in safety games are closed under union Most general winning strategy A for safety game: From a vertex v allow move m if some winning strategy picks m Strategy A is the most permissive assumption on the environment so that A||M is safe Results on Games Many variations of game graphs studied Synchronous multi-player games: Each player chooses a move independently, and next vertex is determined by the moves selected by all the players Asynchronous interleaving games: A (fair) scheduler picks which player gets to move, and the selected player chooses the next state Complexity of infinitary winning conditions W specified by LTL: Double-exp-time W specified by parity condition: NP & CoNP (Open whether in P) Games with time, probabilities, costs, rewards, pushdown models… Connections to tree automata, and mu-calculus model checking Our Focus: Can games be useful in practice? How to solve games efficiently (state explosion problem)? Talk Outline Motivation Introduction to Theory of Games Games in Requirements (MOCHA) Alternating Temporal Logic (ATL) Symbolic model checking for ATL Non-repudiation for security protocols Interface Synthesis using Games (JIST) Conclusions Overview of MOCHA Key features Compositional modeling language: Reactive Modules Game-based requirements of open systems: ATL Refinement checking by assume-guarantee rules Joint project with UC Berkeley See http://www.cis.upenn.edu/~mocha/ Alternating-time temporal logic [Alur,Henzinger,Kupferman, JACM 2002] Game-based verification of non-repudiation protocols [Kremer and Raskin, Journal of Computer Security, 2003] Alternating Temporal Logic Suitable for requirements of multiagent systems Interpreted over game graphs Suppose Sys chooses move (dashed/solid), and Env chooses next state EF p AG p <<sys>>F p Alternating Temporal Logic Interpreted over game graphs where set of players is P Syntax: phi := p | ~ phi | phi & phi | <<Q>> Next phi | <<Q>> phi Until phi where p is a proposition, Q is a subset of P <<Q>> phi holds at a state v iff players in Q have a winning strategy in the game starting at v where phi gives winning condition Sample property <<A,B>> G p can agents A and B collaborate to maintain invariant p? existential over choices of A & B, universal over others Can specify games and controllability More expressive than CTL Symbolic Representation of Games Typically, model/game specified implicitly Model variables X Each var is of finite type, say, boolean Move variables M (e.g. inputs) Update: T(X,M,X’) How new vars X’ are related to old vars X as a result of executing one step, when Player0 chooses values of M Reachability Game: Target specified by predicate p(X) Computational problem: Compute the states from which Player0 can reach p ? Model checking the ATL formula <<Player0>>F p Building the game graph explicitly not feasible! Symbolic Solution R:=p(X) repeat APre(R(X)) := Exists M. Forall X’. T(X,M,X’) -> R(X’) if R contains APre(R) return R else R := R union APre(R) APre(R): Set of states from which Player0 can force the game to reach R in one step Similar to standard CTL model checking, except that preimage computation involves quantifier alternation Mocha implements this symbolic solution using OBDDs as a symbolic representation (CUDD package) Performance: Models with 50-60 variables analyzed easily Analysis of Security Protocols Authentication Protocols Goal: Establish secure communication between Alice and Bob so that a malicious third party cannot talk to Alice pretending to be Bob Many formal methods and model checkers used to analyze and find bugs in authentication protocols (e.g. Lowe used FDR model checker to find a bug in Needham-Shroeder public key authentication) Analysis involves modeling of adversary, and checking all executions satisfy correctness requirements (only nondeterminism is communication medium and adversary) Non-repudiation Protocols Repudiation means that Bob can pretend not to have participated in the protocol (after receiving what Bob really wanted from Alice) Non-repudiation protocols allow Alice/Bob to have evidence of messages sent/received, typically using Trusted Third Party (TPP), so that other person cannot cheat Game-based modeling and ATL model checking using Mocha has been shown to be the most effective technique for analysis (KR01,KR03..) Modeling Non-repudiation Protocols Alice Honest Cheat Honest Cheat Communication Channels Nondeterministic model Trusted Third Party Deterministic Bob Analysis using Mocha Model described in guarded command language Players: Alice (A), Bob (B), Communication channels (Com), TPP NRR: Alice gets non-repudiation of receipt evidence NRO: Bob gets non-repudiation of origin evidence Requirements in ATL (and not expressible in CTL) Viability: Alice and Bob can cooperate to be fair to each other <<A,B>> F (NRR & NRO) Fairness to Alice: Bob and Com cannot cooperate to reach a state where Bob has his evidence, but Alice can no longer get hers ~ <<B,Com>> F (NRO & ~ <<A>> F NRR) Many published protocols formally analyzed by Mocha Asokan-Shoup-Weidner certified mail protocol (previously known violations of fairness found) Zhou-Gollman non-repudiation protocol (way to cheat Alice found for certain types of channels) Talk Outline Motivation Introduction to Theory of Games Games in Requirements (MOCHA) Interface Synthesis using Games (JIST) Behavioral interfaces for Java classes Learning automata representing strategies Implementation and results Conclusions Static Interfaces for Java Classes package java.security; … public abstract class Signature extends java.security.SignatureSpi { <<variable declarations>> protected int state = UNINITIALIZED; public final void initVerify (PublicKey publicKey) {…} public final byte[] sign () throws SignatureException { ….} public final boolean verify (byte[] signature) throws SignatureException { ….} public final void update (byte b) throws SignatureException {…} .. } Behavioral Interface Methods: initVerify (IV), verify (V), initSign (IS), sign(S), update (U) Constraints on invocation of methods so that the exception signatureException is not thrown initVerify (initSign) must be called just before verify (sign), but update can be called in between update cannot be called at the beginning IS IV IV S, U, IS IS V, U, IV AbstractList.ListItr public Object next() { … lastRet = cursor++; …} public Object prev() { … lastRet = cursor; …} public void remove() { if (lastRet==-1) throw new IllegalExc(); … lastRet = -1; …} public void add(Object o) { … lastRet = -1; …} Behavioral Interface Start next add next,prev Safe Unsafe remove,add add next,prev Interfaces for Java classes Given a Java class C with methods M and return values R, an interface I is a function from (MxR)* to 2M Interface specifies which methods can be called after a given history Given a safety requirement S over class variables, interface I is safe for S if calling methods according to I keeps C within S Given C and S, there exists a most permissive interface that is safe wrt S Interfaces can be useful for many purposes Documentation Modular software verification (check client conforms to interface) Version consistency checks JIST: Automatic extraction of finite-state interfaces Phase 1: Abstract Java class into a Boolean class using predicate abstraction Phase 2: Generate interface as a solution to game in abstract class Game in Abstracted Class next prev From black states, Player0 gets to choose the input method call From purple states, Player1 gets to choose a path in the abstract class till call returns Objective for Player0: Ensure error states (from which exception can be raised) are avoided Winning strategy: Correct method sequence calls Most General winning strategy: Most permissive safe interface Game is partial information! Interface Synthesis Most permissive safe interface can be captured by a finite automaton (as a regular language over MxR) For partial information games, the standard way (subset construction) to generate the interface is exponential in the number of states of abstract class Number of states of abstract class is exponential in the number of predicates used for abstraction Use of symbolic methods (e.g. OBDDs) desired Novel approach: Use algorithms for learning a regular language to learn interface Angluin’s L* algorithm Works well if we expect the final interface to have a small representation as a minimized DFA L* Algorithm for Learning DFAs Infers the structure of an S := {ε}; // states of DFA unknown DFA by E := {ε}; // distinguishing expts – membership queries repeat: – equivalence queries Update T; // member tests for (S U S•Σ)•E Observation table (S,E,T) MakeTClosed(S,E,T); T: (S U S•Σ)•E {0, 1} C := MakeConjecture(S,E,T); Constructs a minimal DFA using if !(c=IsEquiv(C)) then return C; a polynomial number of else{ queries e = FindSuffix(c); O(|Σ|n2 + n log m) member Add e to E; } at most n-1 equivalence Implementing L* Transform abstract class into a model M in NuSMV (a state-of-the-art BDD-based model checker) Membership Query: Is a string s in the desired language? Are all runs of M on s safe? Construct an environment Es that invokes methods according to s, and check M||Es safe using NuSMV Equivalence Query: Is current conjecture interface C equivalent to the final answer I? If not, return a string in the difference Subset check: Is C contained in I ? Are all strings allowed by C safe? Check if C||M is safe using NuSMV Superset check: Does C contain I ? Is C most permissive? Superset Query Is C maximal, that is, contains all safe method sequences? Problem is NP-hard, and does not directly lend to a model checking question Approximate it using two tests A sequence s is weakly safe if some run of M on s stays safe. We can check if C includes all weakly safe runs using a CTL model checking query over C||M. We can locally check if allowing one more method in a state of C keeps it safe Summary: Our implementation of L* computes interface I as a minimal DFA Guaranteed to be safe Algorithm either says I is most permissive, or do not know (in that case, most permissive will have more states than I as a minimal DFA) JIST: Java Interface Synthesis Tool Java Byte Code Java Soot Jimple Predicate Abstarctor Interface Interface Automaton Synthesis NuSMV Language BJP2SMV Boolean Jimple Signature Class 3 global variable predicates used for abstraction 24 boolean variables in abstract model 83 membership, 3 subset, 3 superset queries time: 10 seconds JIST synthesized the most permissive interface package java.security; … public abstract class Signature extends java.security.SignatureSpi { <<variable declarations>> protected int state = UNINITIALIZED; IV public final void initVerify (PublicKey publicKey) {…} public final byte[] sign () throws SignatureException { ….} public final boolean verify (byte[] signature) throws SignatureException { ….} public final void update (byte b) throws SignatureException {…} …} IV IS S, U, IS IS V, U, IV JIST Project Tool is able to construct useful interfaces for sample Java classes in Java2SDK accurately and efficiently Work in progress, many challenges remain How to choose predicates for abstraction? How to refine abstractions? Features of Java (e.g. class hierarchy) Robustness of the tool Reference: Synthesis of Interface specifications for Java classes, ACMN, POPL 2005 Joint work with Pavol Cerny, P. Madhusudan, Wonhong Nam See http://www.cis.upenn.edu/jist/ Conclusions Games provide a modeling paradigm for multi-agent systems to highlight the distinction among choices/nondeterminism of different components Alternating temporal logic (ATL) as a specification language for game-based requirements Main application: Security protocols Synthesis of most general winning strategies Automatic extraction of assumptions Interfaces for software components Coping with state-space explosion raises new challenges Learning-based strategy extraction seems promising Not much research on solving games efficiently