Constraint-Based Methods: Adding Algebraic Properties to Symbolic Models Vitaly Shmatikov SRI International One-Slide Summary “Constraint solving” is a symbolic analysis method for cryptographic protocols Decidable without finite bounds on the attacker Big win over finite-state checking (FDR, Mur, etc.) Only need to specify behavior of honest participants Can be extended with algebraic theories for XOR, modular multiplication, Diffie-Hellman Push-button procedure for finding both Dolev-Yao and algebraic attacks (e.g., Pereira-Quisquater) Works only for a finite number of sessions “Attack template” must be expressed as a symbolic execution trace Protocol Analysis Techniques Protocol Analysis Techniques Formal Models Modal Logics Finite-state Decidable (no probabilities) Process Calculi Computational Models Inductive Proofs … Probabilistic poly-time Random oracle … Infinite message space, finite sessions Free attacker algebra Attacker algebra with equational theory Protocol Analysis Meets Algebra Dolev-Yao model uses “black-box” cryptography Many crypto primitives are not black boxes Attacker can and will exploit algebraic properties XOR: ab = ba; aa = 0 Modular exponentiation: xy = yx; (xy)x-1 = y Ryan-Schneider attack on Bull’s recursive authentication protocol Pereira-Quisquater attack on A-GDH.2 protocol Goal: fully automated analysis of protocols with relevant algebraic theories GDOI, group key management protocols, … A-GDH.2 Protocol [Ateniese, Steiner, Tsudik ’00] Parties start with pairwise keys Kaz,Kbz,Kcz The goal is to establish common session key rarbrcrz A ,ra B ra,rb,rarb p is prime q is prime divisor of p-1 is generator of cyclic subgroup of Z*p of order q C rbrc,rarc,rarb,rarbrc rbrcrzKaz, rarcrzKbz, rarbrzKcz Computes session key rarbrcrz -1 as (rarcrzKbz)Kbz rb Z Is This Protocol Secure? Suppose two sessions are run concurrently, and malicious C wants to learn the session key of the session from which he is excluded A B ,ra rbrcrzKaz, rarcrzKbz, rarbrzKcz ra,rb, rarb Z C A B qa, qb, qaqb ,qa rbrc, rarc, rarb, rarbrc C qbqzKaz, qaqzKbz Z Can the attacker who controls the network and participates in the 1st session learn the session key of the 2nd session? Model Checking Approach Two sources of infinite behavior Multiple protocol sessions, multiple participant roles Message space or data space may be infinite Finite approximation Assume finite number of participants Example: 2 clients, 2 servers Assume finite message space This restriction is necessary (or the problem is undecidable) Represent random numbers by r1, r2, r3, … Do not allow encrypt(encrypt(encrypt(…))) This is restriction is not necessary for fully automated analysis! Infinite-State Protocol Model [Amadio and Lugiez ‘00] Finite number of processes [Millen and Shmatikov ‘01] Variables represent data under attacker’s control Attacker capabilities modeled by a term algebra [Boreale ‘01] Messages modeled as terms with variables Each process models a protocol role [Rusinowitch and Turuani ‘01] No artificial bounds on attacker computations Generates an infinite space of possible attacker messages Protocol analysis problem reduces to a decidable symbolic constraint solving problem Easy-to-use, practical software for protocol analysis Roles in A-GDH.2 Protocol A ,ra B ra,rb,rarb C rbrc,rarc,rarb,rarbrc B role ,X1 B B X1,rb,X1rb B Y1,Y2Kbz,Y3 rbrcrzKaz, rarcrzKbz, rarbrzKcz Z Z role Z Z1,Z2,Z3,Z4 Z Z1rzKaz,Z2rzKbz,Z3rzKcz Variables represent terms unknown to the party who plays the role Attacker can instantiate a variable with any value, but instantiation must be consistent in all terms where it occurs Symbolic Execution Trace Suppose two sessions are run concurrently, and malicious C wants to learn the session key of the session from which he is excluded A B ,ra rbrcrzKaz, rarcrzKbz, rarbrzKcz ra,rb, rarb Z C A B ,qa rbrc, rarc, rarb, rarbrc B and Z from 1st session Partial execution trace (there are finitely many) qa, qb, qaqb C B ,X1 qbqzKaz, X r X r 1 b 1 b B , , qaqzKbz Z Z1,Z2,Z3,Z4 Z Z r K Z r K Z r K Z 1 z az, 2 z bz, 3 z cz B from 2nd session B ,V1 B V1,qb,V1qb B W1,W2Kbz,W3 Is There A Feasible Attack? B ,X1 B X1,rb,X1rb Z Z1,Z2,Z3,Z4 Z Z1rzKaz,Z2rzKbz,Z3rzKcz B ,V1 B V1,qb,V1qb B will use this value as session key. If attacker can learn (and announce) it, B W1,W2Kbz,W3 W2qb the protocol is broken. This attack is feasible if and only if the attacker can consistently instantiate all variables in the trace so that he can produce every message received by B and Z Symbolic Attack Traces Attack is modeled as a symbolic execution trace Adequate for trace-based security properties A trace is a sequence of message send and receive events Attack trace ends in a violation (e.g., attacker learns the secret) Messages contain variables, modeling data controlled by attacker Secrecy, authentication, some forms of fairness… A symbolic trace may or may not have a feasible concrete instantiation Finding whether such an instantiation exists is the main goal of symbolic (infinite-state) protocol analysis From Attack Traces to Constraints For each message sent by the attacker in the attack trace, create a symbolic constraint mi from t1, …, tn mi is the message attacker needs to send t1,…,tn are the messages observed by attacker up to this point Attack is feasible if and only if all constraints are satisfiable simultaneously There exists an instantiation such that i mi can be derived from t1, …, tn in attacker’s term algebra Constraint Generation for A-GDH.2 B ,X1 B X1,rb,X1rb Z Z1,Z2,Z3,Z4 Z Z1rzKaz,Z2rzKbz,Z3rzKcz B ,V1 B V1,qb,V1qb B W1,W2Kbz,W3 W2qb ,X 1 from ,kcz (attacker’s initial knowledge) Z1,Z2, X1,rb,X1rb from ,k , cz Z Z 3, 4 ,V1 W1,W2Kbz, W3 from ,kcz,X1,rb,X1rb, Z1rzKaz,Z2rzKbz,Z3rzKcz from ,kcz,X1,rb,X1rb, Z1rzKaz,Z2rzKbz,Z3rzKcz, V1,qb,V1qb W2qb from ,kcz,X1,rb,X1rb, Z1rzKaz,Z2rzKbz,Z3rzKcz, V1,qb,V1qb Dolev-Yao Term Algebra Attacker’s term algebra is a set of derivation rules vT if u=v for some Tu T[u,v] Tu T[u,v] Tv Tu Tv T[u,v] Tu Tv Tcryptu[v] Tcryptu[v] Tu Tv Symbolic constraint m from t1, …, tn is satisfiable if and only if there is a substitution such that t1, …, tn m is derivable using these rules Properties of Term Algebra No restriction on structural size of terms Untyped The closure of any term set under derivation rules is infinite There is no a priori bound on attacker computations Attacker doesn’t have to comply with the protocol specification Attacker may substitute a ciphertext for a random number, a key for an output of a hash function, etc. Symmetric encryption with non-atomic keys Can add an equational theory to model algebraic properties of cryptographic functions XOR, modular exponentiation, blinded signatures, … Solving Symbolic Constraints [Millen and Shmatikov CCS ’01] Constraint reduction rules Replace each mi from Ti with one or more simpler constraints Preserve essential properties of the constraint sequence Nondeterministic reduction procedure Structure-driven, but several rules may apply in any state Exponential in the worst case (the problem is NP-complete) The procedure is terminating and complete If T m is derivable in attacker’s term algebra, 1. There exists reduction rule r=r() which is applicable to m from T and produces some m’ from T’ such that 2. T’ m’ is derivable in attacker’s term algebra Reduction Rules [m1,m2] from T m1 from T m2 from T (pair) m from T, v (elim) m from T m from [u,v], T (split) m from u, v, T cryptk[m] from T (enc) m from T k from T m from t, T (un) ___ add mgu(t,m) to m from cryptu[v], T (dec) u from cryptuv, T m from cryptu[v], v, T Reduction Procedure Initial constraint sequence apply every possible reduction rule to first m from T where m is not a variable ••• ••• No rule is applicable or v1 from T1 • • • vN from TN If reduction tree has at least one such sequence as a leaf, there is a solution, and attack trace is feasible Symbolic Analysis Summary specified by the analyst fully automated Formal specification of protocol roles attacker is implicit! variables model attacker’s input Attack (violating execution trace) may not have a feasible instantiation Sequence of symbolic constraints satisfiable if and only if there exists a feasible instantiation of attack trace Decidable constraint solving procedure Let’s Add Algebraic Properties Verification of trace-based security properties … is decidable for protocols with XOR Comon-Lundh and Shmatikov (LICS ’03) Chevalier, Kϋsters, Rusinowitch, Turuani (LICS ’03) … reduces to a system of quadratic Diophantine equations for protocols with Abelian groups Millen and Shmatikov (CSFW ’03) … is decidable for a restricted class of protocols with modular exponentiation Chevalier, Kϋsters, Rusinowitch, Turuani (FST/TCS ’03) … is decidable for any well-defined protocol with products and modular exponentiation Shmatikov (ESOP ’04) Attacker Term Algebra Dolev-Yao vT Tu T[u,v] Tu Tu Tv T[u,v] Attacker can’t take discrete logs or solve DiffieHellman problem T[u,v] Tv Tu Tv Tcryptu[v] Tuv Tv Tuv Tuw Tuvw Tcryptu[v] Tu Tv Tu Tv Tuv Tu Tu Tv Tuv Tu-1 Associative: (x y) z = x (y z) Commutative: x y = y x Normalization xx-1 1 x1 x rules: (x-1)-1 x (xy)-1 y-1x-1 x1 x (xy)z xyz Key Insights For Decidability In a well-defined protocol, honest participants don’t need to guess values of attacker inputs Leads to a syntactic condition on usage of variables If attacker can derive u from T, then there is a derivation which uses only subterms of T and u If constraints are satisfiable, then there is an attack in which every variable is instantiated by a product of subterms drawn from a finite set Origination Stability Variable origination condition If C is a constraint sequence generated from an execution trace, then there exists a linear ordering < on Vars(C) such that if x appears for the first time in mi from Ti C, then x Vars(mi) and y Vars(Ti) y < x This condition must be satisfied by C after any partial substitution Rules out only ill-defined protocols AB BA XY X Requires B to split a product of two unknown values Normal Derivations t1T t2T tnT Tt1 Tt2 Ttn … … Tv Tv1 ANALYSIS stage: All intermediate terms are products of subterms of T … Tvk … Tu SYNTHESIS stage: Only pairing, encryption, multiplication, inverse & exponentiation used Lemma: if Tu is derivable, then there is a normal derivation Conservative Solutions Conservative solution only uses subterms from the original, uninstantiated constraint sequence x Subterms(x) Subterms(C) closed under , inverse and exponentiation All subterms used in the conservative solution are drawn from a finite set which is known before any variables are instantiated Lemma: if C has a solution, then C has a conservative solution This lemma allows to derive a bound on the size of the attack Symbolic Decision Procedure { u1 from T1 , …, un from Tn } 1. 3. 4. Monotonic: T1 … Tn Satisfy the variable stability condition Guess all equalities between subterms 2. symbolic constraints generated from protocol Finite number of possible unifiers modulo AG Guess the order in which subterms are derived Replace exponentiation by and inverse Reduce to a decidable system of Solvable iff a linear quadratic Diophantine equations subsystem is solvable Back to A-GDH.2 X1 Z1 Z2 Z3 Z4 V1 W2Kbz W2qb from ,kcz from -”-,X1,rb,X1rb from -”from -”from -”from -”-, Z1rzKaz,Z2rzKbz, Z3rzKcz from -”-, V1,qb,V1qb from -”- X1 rb-1Z1 rb-1Z2 rb-1Z3 rb-1Z4 from from from from from kcz kcz kcz kcz kcz Only and inverse used in derivation. Reduces to system of Diophantine equations. Z3-1rz-1kcz-1V1 from kcz Z2-1rz-1W2 from kcz V1-1W2 from kcz Key insight: under the Diffie-Hellman assumption, attacker can produce -1 x from y if and only if he can produce y-1x (x=(y)y x) Decidable Quadratic Equations u1X11…X1k1 from t11, …, tm1 u2X21…X2k2 from t21, …, tm2 … unXn1…Xnkn from tn1, …, tmn Only and inverse used in each derivation Convert each constraint into a Diophantine equation uiXi1…Xik from ti1, …, tim becomes uiXi1…Xik = ti1z1 …timzm for integer zj If some tij is a variable, equation becomes quadratic, for example a2X = (ab)z1 a6 = (ab)z2(bX)z3 Equations associated with execution traces have special structure If a variable occurs on the right, it must previously occur on the left All terms used to construct the variable where it first occurred are available in every subsequent constraint Intuition Behind Decidability a2X = (ab)z1 a6 = (ab)z2(bX)z3 substitute X a6 = (ab)z2(ba-2(ab)z1)z3 group (ab) terms together a6 = (ab)z’(ba-2)z3 z’ = z2 + z1z3 Quadratic part always has a solution because z2 is unconstrained Is There A Feasible Attack? Yes! B ,X1 B X1,rb,X1rb Z Z1,Z2,Z3,Z4 Z Z1rzKaz,Z2rzKbz,Z3rzKcz B ,V1 B V1,qb,V1qb Attacker can learn this value by B W1,W2Kbz,W3 W2qb clever variable instantiation Attack on A-GDH.2 Suppose two sessions are run concurrently, and malicious C wants to learn the session key of the session from which he is excluded A B ,ra rbrcrzKaz, rarcrzKbz, rarbrzKcz 1. Replace with 1 A ra,rb, rarb Z B ,qb rbrc, rarc, rarb, rarbrc qbqzKaz, qaqzKbz 3. Replace with rbrzkcz qb, qb, qaqb Attacks of this type can be found automatically from protocol specification Z 4. Replace with rbrzkbz 2. Replace with rb,rb,rb,rb Attack: B will use r r q as session key, which attacker can compute as (r r k q )k b z b b z cz b -1 cz Decision Procedures Free (“black-box”) algebra: decidable XOR: decidable All integer variables are equal to 0 or 1 (Group) Diffie-Hellman: decidable Implemented as an easy-to-use analysis tool System of quadratic Diophantine equations, which is solvable if and only if a linear subsystem is solvable Some restrictions (no products in exponentiation base) Blind signatures, super-exponentiation, ... Axiomatic models of various cryptographic primitives Current research