Towards Satisfiability of Separation Logic with Integer Arithmetic ABSTRACT Decision procedures for satisfiability are important for determining if some formula is either infeasible (to support entailment proving) or has a feasible instance (to support failure tracing via counterexample). Recently, a decision procedure was proposed for a fragment of separation logic with shape-only inductive predicates by Brotherston et al at LICS 2014. To support automated verification, we often need a more expressive fragment of separation logic with other kinds of pure properties, such as size or set. In this work, we consider satisfiability problem for a separation logic fragment, comprising inductive predicates with Presburger linear arithmetic. This extended logic, called SLA1, can handle richer data structures with sortedness and size properties. We start by proving that the satisfiability problem in the SLA1 fragment is undecidable. We identify a decidable fragment, named SLA2, where inductive predicate defines an eventually periodic set. We prove the satisfiability is decidable in the subsystem by giving its decision algorithm. We also propose a practical decision procedure for satisfiability in this fragment. The essence of our procedure is a mechanism to infer precise invariant for each inductive predicate that is equi-satisfiable to its recursive predicate for the SLA1 fragment. Our procedure is based on abstract interpretation that may compute over-approximated invariant for predicates that do not belong to SLA2 fragment. We use projection to first deriving a precise shape-only invariant, before attempting an over-approximated invariant for the numeric properties for SLA1 fragment. For invariants that are computed from the non-SLA2 fragment, we can still use an under-approximation check to determine if it is in fact a precise invariant We prove the soundness of both these procedures, and provide a prototype implementation to illustrate its feasibility. Keywords Combining Decision Procedure, Satisfiability, Separation Logic, User-Defined Predicates. 1. INTRODUCTION 1Sec 2.1 Separation logic is an extension of Hoare logic to model states of heap manipulating programs. Its strength comes from the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. new separation conjunction operator ∗. The formula κ1 ∗ κ2 specifies that the heap can be split into two disjoint regions in which κ1 and κ2 hold respectively. Using this operator, we can define recursive predicates that succinctly describe fairly complex shapes of data structures. With these predicates, we can then specify and verify correctness properties for heap manipulating programs. However, the expressiveness of separation logic comes with a price as this logic is undecidable in general [1]. To have decidability, one has to restrict the fragment of separation logic under investigation. For example, most works so far have only considered a fixed set of predicates based on the linked-list structure with only pointer (dis)equality. In this paper, we consider a more expressive fragment of separation logic for satisfiability. Our fragment includes empty assertion, points-to predicates and arbitrary user-defined predicates to model data structures. Moreover, we can use Presburger arithmetic to describe useful properties of our data structure such as size of a list, height of a tree and even sortedness. An example is illustrated below for a non-empty sorted linked-list predicate, with m as its smallest value and n as its length. data node{ int val; node next; } // data type declaration pred sortll(root,n,m) ≡ root7→node(m, null) ∧ n=1 ∨ ∃ q, m1 ·root7→node(m, q) ∗ sortll(q, n−1, m1 )∧m≤m1 Due to the presence of inductive predicates and an infinite integer domain, this fragment is actually undecidable (see Sec 2.5). However, we have discovered a significant sub-fragment whose satisfiability is decidable. To support this fragment, we introduce a new three-stage method to compute over-approximated predicate invariants. Like [6], the first stage computes a precise shape-only predicate invariant. The second stage computes a numeric predicate invariant that is either over-approximated or precise. The third stage computes a combined predicate invariant, that is precise if the numeric portion is precise. We show such precise predicate invariant in the combined domain to be equi-satisfiable to its inductive predicate. We summarize our key contributions, as follows: • We first show that satisfiability of separation logic (with inductive predicates) and Presburger arithmetic is undecidable. • By restricting the logic fragment, where each inductive predicate defines an eventually periodic set, we give a constructive proof that decidable outcome on satisfiability is possible. • To support a working implementation for our decision procedure, we present a three-stage algorithm to compute precise predicate invariant that is equi-satisfiable to their inductive predicates. • As a practical proof of concept, we have implemented this satisfiability decision procedure for this fragment of separation logic within an existing HIP/SLEEK[8, 16] verification infrastructure. • In the Appendix, we also describe an alternative way to infer precise invariants. This utilizes our inference mechanism for over-approximation, and then using a novel underapproximation check to confirm if an over-approximating invariant is also precise. This check must ensures that each non-false under-approximation is also non-empty (i.e. has at least one satisfiable instance). 2. SEPARATION LOGIC WITH ARITHMETIC 2.1 Syntax We start with a fragment of separation logic with Presburger arithmetic. We call this fragment SLA1. It assumes a finite collection of type constructors Ptr, a set of predicate names P , a set of (program and logical) variables Var, a set Loc of distinct heap locations, a set of non-address values Val, with null ∈ Val and Val ∩ Loc = ∅. Predicates Disj. SL Heap formula BAGA formula Ptr (Dis)Eq. Pred Ψ κ ψ α Presburger arith. φ Linear arith. i a v, vi , x, y ∈ Var kint ::= pred P1 (v)≡Ψ1 ; · · · ; pred Pn (v)≡Ψn ::= ∃v̄· (κ∧α∧φ) | Ψ1 ∨ Ψ2 ::= emp | x7→c(v) | P(v) | ψ | κ1 ∗κ2 ::= false | (B, α, φ) | ψ1 ∨ψ2 ::= true | v1 =v2 | v=null | v1 6=v2 | v6=null | α1 ∧α2 ::= true | i | ∃v· φ | ∀v· φ | ¬φ | φ1 ∧φ2 | φ1 ∨φ2 ::= a1 =a2 | a1 ≤a2 ::= kint | v | kint ×a | a1 +a2 | −a | max(a1 ,a2 ) | min(a1 ,a2 ) ∈ Int Pi , P ∈ P c ∈ Ptr v ≡ v1 , . . ., vn This logic fragment is quite expressive as it can use inductive predicates to describe complex data structures, such as the nearlybalanced AVL tree (omitting sortedness): data c2 { int val; c2 left; c2 right; } pred avl(root,s,h) ≡ emp ∧ root=null ∧ s=0 ∧ h=0 ∨ ∃l, r, s1 , s2 , h1 , h2 ·root7→c2 (_, l, r) ∗ avl(l, s1 , h1 )∗ avl(r, s2 , h2 ) ∧ s=s1 +s2 +1 ∧ h=1+max(h1 , h2 ) ∧ −1≤h1 −h2 ≤1 We use a special formula, called BAGA, to explicitly capture a bag of addresses to denote the (heap and numeric) abstraction of each inductive predicate. This BAGA form is itself expressed without any user-defined predicates. The semantic denotation of this formula is given in the next sub-section, while Sec 2.2 elaborates on how it may be inferred from predicate definitions of separation logic. 2.2 BAGA Formula To support the (un-)satisfiability problem for separation logic with Presburger arithmetic, we will use a special formula, called BAGA, to explicitly capture a BAG of Addresses to denote the (heap and numeric) invariant of each inductive predicate. The syntax of BAGA and the corresponding abstractions for separation logic formulas and predicates are: BAGA inv. Disj abstr. BAGA abstr. ψ Ψ# ψ# ::= false | (B, α, φ) | ψ1 ∨ψ2 # ::= ∃v̄· ψ # | Ψ# 1 ∨ Ψ2 # ::= (B, α, φ) | ψ1 ∗ ψ2# | P# (v) Each basic component of BAGA is a triple with a multi-set of pointer variables B, a pointer constraint α and an arithmetic constraint φ. Compared to [6], we have now added an extra arithmetic constraint. To support satisfiability, we will need to derive precise invariants for our inductive predicates, but this is not always possible with Presburger arithmetic. For example, we can derive ({root}, true , n>0) as the precise invariant that is equi-satisfiable to sortll(root, m, n). However, for avl(root, s, h), we could derive ({},root=null,h=0∧s=0)∨({root},true ,s≥h>0) which is an over-approximated invariant lacking a relation between s and h. Such over-approximation cannot decide satisfiability, but is still helpful for some scenarios of unsatisfiability. A BAGA formula is an abstraction of a separation logic formula. We shall show how to compute an over-approximation for inductive predicates and how to determine when they are precise. They can be normalized via the following BAGA-specific rules. ({x} ⊎ B, α, φ) ∧ (α→x=null) ({x, y}⊎B, α, φ) ∧ (α→x=y) (B, α, φ) ∧ (α∧φ → false ) (B1 , α1 , φ1 ) ∗ (B2 , α2 , φ2 ) (B, α, φ1 ) ∨ (B, α, φ2 ) ∃x∪V · (B, α∧x=y, φ) ∃x∪V · (B, α, φ) ∧ typeof(x)∈Ptr ∃x∪V · (B, α, φ) ∧ typeof(x)=Int 2.3 ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ false false false (B1 ⊎B2 , α1 ∧α2 , φ1 ∧φ2 ) (B, α, φ1 ∨φ2 ) ∃V · ([y/x]B, [y/x]α, φ) ∃V · (B−{x}, ∃x·α, φ) ∃V · (B, α, ∃x·φ) Semantics The semantic relation for our logic relies on the following two functions: def Heaps =Loc⇀f in (Ptr, Val ∪ Loc) def Stacks =Var → Val ∪ Loc The semantic relation s,h |= Ψ requires the stack s and heap h to satisfy the constraint Ψ where h ∈ Heaps, s ∈ Stacks, and Ψ is a separation logic formula. As the semantic relation for pure formula is standard, we define the heap-related components as follows: s, h |= ({v̄}, α, φ) s, h |= emp s, h |= v7→c(v̄) s, h |= P(v) s, h |= κ1 ∗ κ2 iff iff iff iff iff s, h |= ∃v̄·(κ∧α∧φ) iff s, h |= Ψ1 ∨ Ψ2 iff s |= α∧φ and {s(v)}=dom(h) h={} l=s(v), h={l → (c, s(v))} s, h |= Ψ, where P(v) ≡ Ψ ∃h1 , h2 · h1 #h2 and h=h1 ·h2 and s, hi |= κi for i=1, 2 ∃ν̄ · s[v7→ν], h |= κ and s[v7→ν] |= α∧φ s, h |= Ψ1 or s, h |= Ψ2 Note dom(h) returns the domain of the heap h; {} is the empty heap that is undefined everywhere; h1 #h2 denotes that heaps h1 and h2 are disjoint, i.e. dom(h1 ) ∩ dom(h2 ) = ∅; h1 ·h2 denotes the union of two disjoint heaps; s[v1 7→ν1 , .., vn 7→νn ] denotes the stack defined as s except that s[v1 7→ν1 , .., vn 7→νn ](vi ) = νi , for 1≤i≤n. Using this relation, we can semantically classify predicate invariants, as follows: D EFINITION 1 (P REDICATE I NVARIANTS ). Given a predicate P(v) ≡ Ψ and a BAGA formula ψ : • ψ is an over-approximated invariant of P if ∀s,h |= Ψ implies ∃h′ , dom(h′ )⊆dom(h) s,h′ |= ψ . • ψ is a precise invariant of P if ψ is over-approximated invariant of P and ∀s,h |= ψ implies ∃h′ · dom(h)⊆dom(h′ )∧s,h′ |= Ψ. The first definition essentially states that a formula ψ is an overapproximation if every satisfiable instance of s,h |= Ψ is also an instance of ψ , since ∃h′ , dom(h′ )⊆dom(h) s,h′ |= ψ . The BAGA counterpart ψ may refer to the same or fewer heap locations than original formula Ψ. The second definition states that an over-approximated formula ψ is precise when it is also a valid under-approximation. A formula ψ is an under-approximation of Ψ, if for every instance s,h that belongs to ψ , we are guaranteed an instance: ∃h′ · dom(h)⊆dom(h′ )∧s,h′ |= Ψ that satisfies the original predicate definition, Ψ. In other works, a formula ψ is a precise invariant if is equi-satisfiable with Ψ. 2.5 Undecidability Without any restrictions on the shape and Presburger arithmetic of inductive definitions for user-defined predicates, the satisfiability is undecidable in SLA1. T HEOREM 2.4. The satisfiability of a formula is undecidable in SLA1. 2.4 From Separation Logic to BAGA Proof. Define To derive the invariant for each predicate, P(v)≡Ψ we shall build BAGA abstraction using P# (v)≡A[Ψ] where A is defined as follows: pred PW(x, y) ≡ emp ∧ (x = 0 ∧ y = 1) ∨∃x1 , y1 .(emp ∗ PW(x1 , y1 ) ∧ x = x1 + 1 ∧ y = 2 × y1 ) pred NPW(x, y) ≡ ∃z.(P (x, z) ∧ y 6= z) A[emp] =df ({ },true ,true ) A[κ1 ∗κ2 ]=df A[κ1 ] ∗ A[κ2 ] A[κ∧α∧φ] =df A[κ]∗({ },α,φ) A[v7 ,true ) W→c(_)]=df ({v},true W A[ ∃v · ∆] =df ∃v · A[∆] A[P(v)] =df P# (v) Then the satisfiability of PW(n, m) is equivalent to m = 2n and the satisfiability of NPW(n, m) is equivalent to m 6= 2n for natural numbers m, n. Using sortll predicate as our running example, we obtain: Presburger arithmetic with the predicate y = 2x is known to be # sortll (root,n,m)≡ ({root},true ,n=1) ∨ ∃ q,m1 · equivalent to Peano arithmetic. Hence we have some Σ01 formula # ({root},true ,m≤m1 )∗sortll (q, n−1, m1 ) F (x) in Presburger arithmetic with the predicate y = 2x such that F (n) is true iff the n-th Turing machine terminates, for every natIndeed, A is an equi-satisfiability reduction. ural number n. We can assume F (x) is in the prenex normal form and its quantifierL EMMA 2.1. Given a base formula ∆ and ψ ≡ A[∆]. ∆ is free part is in the disjunction normal form. Construct F1 (x) from satisfiable iff there is a disjunct (B, α, φ) of ψ such that both α and F (x) by replacing every occurrence of the predicate ¬(y = 2x ) by φ are satisfiable NPW(x, y). Then construct F2 (x) from F1 (x) by every occurrence P ROOF. Given a base formula ∆ and (B, α, φ)≡A[∆], we prove of the predicate y = 2x by PW(x, y). Since F2 (x) has negations (∀s, ∃h· s, h |= ∆ iff s |= α ∧ s |= φ) by induction on number ∗ only in front of arithmetical atomic formulas and it does not have of κ, e.g. size(κ). any universal quantifiers, we can transform F2 (x) to an equivalent Base case. size(κ=0). Two cases: formula Φ(x) in SLA1. Then the satisfiability of Φ(x) in SLA1 • ∆≡emp∧α∧φ and A[∆]≡({}, α, φ). Trivial. is equivalent to the truth of F (x) in Presburger arithmetic with the predicate y = 2x . Therefore Φ(n) is satisfiable in SLA1 iff the • ∆≡x7→c(vi )∧α∧φ, A[∆]≡({x}, α, φ). for all s such that: n-th Turing machine terminates, for every natural number n. ConWe have, s(x)=l and l6=h(null), then s |= x6=null. Thus, sequently the satisfiability in SLA1 is undecidable. ✷ s, h |= x7→c(vi )∧α∧φ ⇔ s |= α∧φ (a). From (a) and under assumption that heap and integer domains are disjoint, s |= x6=null∧π ⇔ s |= α∧s |= φ Induction case. Assume that Lemma 2.1 is valid for all heap κ: size(κ)=k≥0; we now prove that Lemma 2.1 is also valid with heap size k’ whereas k′ =k+1. Since k′ ≥1, there exist κ1 and κ2 such that size(κ1 )≤k, size(κ2 )≤k and κ≡κ1 ∗κ2 . Thus, we have s, h |= κ1 ∗κ2 ∧π ∃h1 , h2 ·, size(h1 )≤k, size(h2 )≤k and h1 #h2 and h=h1 ·h2 and s, h1 |= κ1 ∧π and s,Vh2 |= κ2 ∧π . h1 #h2 ⇔ s |= {v1 6=v2 | v1 7→_(_) ∈ κ1 and v2 7→_(_) ∈ κ2 } (2) s, h1 |= κ1 ∧π ⇔ s |= A[κ1 ] (by induction hypothesis) (3) s, h2 |= κ2 ∧π ⇔ s |= A[κ2 ] (by induction hypothesis) (4) From (2),(3),(4), and semantics of ∧, we obtain V s |= {v1 6=v2 | v1 7→_(_) ∈ κ1 and v2 7→_(_) ∈ κ2 }∧A[κ1 ]∧ A[κ2 ] ⇔ s |= A[κ1 ∗κ2 ] (definition of A over ∗ reduction) (5) From (1), (5) and semantics of ∧, we obtain s |= A[κ1 ∗κ2 ]∧π ⇔ s |= A[κ1 ∗κ2 ∧π] (definition of A over κ∧π ) ⇔ s |= A[κ∧π]. Therefore, Lemma 2.1 is proven for heap size k+1. L EMMA 2.2. Given a predicate P(v)≡Ψ and ψ ≡ A[Ψ]. P(t) is satisfiable iff there is a disjunct (B, α, φ) of ψ[v̄ := t̄] such that both α and φ are satisfiable T HEOREM 2.3. Given a formula Ψ and ψ ≡ A[Ψ]. ∆ is satisfiable iff there is a disjunct (B, α, φ) of ψ such that both α and φ are satisfiable 3. DECIDABLE FRAGMENT OF SEPARATION LOGIC & ARITHMETIC This section presents a decidable subsystem of SLA1, and gives an algorithm for the satisfiability. We call this subsystem SLA2. SLA2 covers our sorted-list predicate sortll. We will prove the decidability by reducing it to the decidability of the arithmetical part by using the decision algorithm of the non-arithmetical part given in [6]. First we will present a decidable fragment of Presburger arithmetic and inductive definitions. We call this fragment DPI. Then we reduce the satisfiability in SLA2 to the decidability in DPI. 3.1 Decidable fragment DPI of Presburger arithmetic with inductive definitions We define a fragment DPI of Presburger arithmetic with inductive definitions. The idea is that we impose some restrictions on the inductive definitions so that the inductive predicate defines an eventually periodic set. Since the decidability proof of Presburger arithmetic relies on the fact that a definable set is exactly an eventually periodic set, this restriction enables us to use the same proof idea for its extension with inductive definitions. Linear arithmetic terms a are the same as those in SLA1. We extend linear arithmetic atomic formulas i in SLA1 by adding inductive predicates P (v̄). Presburger conjunctive formulas φ are similarly defined from these atomic formulas. Presburger disjunctive formulas Φ are defined by Φ ::= ∃v̄.φ|Φ ∨ Φ. The inductive definition is pred P (v̄) ≡ Φ. For simplicity, we assumed only one inductive predicate P , but this decidability result can be straightforwardly extended to more than one inductive predicates. Let Φ be _ φj . We call φj a base case when P does not appear j in it, and we call it an induction case when P appears in it. Let the arity of P be m. Restriction 1. Our first condition is that the definition body of P has only one induction case. Let the induction case of the inductive definition pred P (x̄) be ^ ∃z̄.i ∧ P (āl ). The decision procedure for DPI is obtained by computing the above M, p1 , p2 according to the decidability proof. 3.2 Decidable fragment SLA2 (1) xj = f (z̄j ) + c, (2) xj ≥ f (z̄j ) + c, (3) xj ≤ f (z̄j ) + c, (4) a conjunction of the forms nxj = f (z̄j ), nxj ≥ f (z̄j ), and nxj ≤ f (z̄j ), where c, n are some integer constants, n > 0, and f (z̄j ) is a combination of zj1 , . . . , zjL with max, min, defined by We define a subsystem SLA2 of SLA1 as follows. The restrictions are essentially the same as those imposed on DPI. The idea of the decision procedure is that since the arithmetical part and the non-arithmetical part are independent, we can split the satisfiability of a given formula into the satisfiability of its arithmetical part and its non-arithmetical part. However, these two parts need to synchronize. Since the unfolding of the non-arithmetical part terminates within some k steps according to [6], we consider two cases: one case when the number of unfolding of the whole formula is less than k, and the other case when it is equal to or greater than k. The first case can be handled by a finite number of unfolding. In the second case, we can use the inductive predicate for the arithmetical part. In order to realize this idea, we will take k to be the maximum number of possible (B, α)’s instead of the number of the iteration. Our condition is that for the inductive definition pred P ≡ Φ, # ≡ PN (A[Φ]) satisfies the conthe inductive definition pred PN dition of DPI. f (z̄j ) ::= zjl |max(f (z̄j ), f (z̄j ))|min(f (z̄j ), f (z̄j )). T HEOREM 3.2. The satisfiability of a given disjunctive formula Φ is decidable in SLA2. 1≤l≤L Note that P (x̄) denotes P (x1 , . . . , xm ) and P (āl ) denotes P (al1 , . . . , alm ). Restriction 2. Our ^second condition is that the above i of the induction case is ij such that ij is either of the following: 1≤j≤m Then we can let the inductive definition of P be _ ^ ^ pred P (x̄) ≡ ∃z̄k .ik ∨ ∃z̄. ij ∧ P (z̄ l ). k 1≤j≤m l Example. The arithmetical part P of sortll is inductively defined by pred P (n, m) ≡ n = 1 ∨ ∃q, m1 .(P (n − 1, m1 ) ∧ m ≤ m1 ). This satisfies the above conditions. Note that P (n − 1, m1 ) is an abbreviation of ∃z(z = n − 1 ∧ P (z, m1 )). T HEOREM 3.1. The truth of a given disjunctive formula Φ is decidable in DPI. Proof. Define a set S of integers to be eventually periodic if there are some M ≥ 0, p1 , p2 > 0 such that n ∈ S iff n + p1 ∈ S for all n ≥ M , and n ∈ S iff n − p2 ∈ S for all n ≤ −M . Then we call the set (M, p1 , p2 )-periodic. Let S = {(x1 , . . . , xm )|P (x1 , . . . , xm )}, Q = {(x1 , . . . , xm )|Φ0 (x1 , . . . , xm )} where Φ0 is the disjunction of all the base cases of the definition of P . We write Sj for {xj |P (x1 , . . . , xm )}, and Qj for {xj |Φ0 (x1 , . . . , xm )}. We have S = S1 × . . . × Sm and Q = Q1 × . . . × Qm , since the j-th value xj depends on only the previous j-th value zj in the definition of P . It is known that a set definable in Presburger arithmetic is exactly an eventually periodic set. Hence each Qj is eventually periodic. Let Qj be (M, p1 , p2 )-periodic. In the case (1), we can show that Sj is (0, |c|, |c|)-periodic if c 6= 0, and (M, p1 , p2 )-periodic if c = 0. In the case (2), we can show that Sj is Qj ∪{x|x > (minQj )+c} if Qj is downward finite and c ≥ 0, and Sj is Z otherwise. In the case (3), we can show that Sj is Qj ∪ {x|x < (maxQj ) + c} if Qj is upward finite and c ≤ 0, and Sj is Z otherwise. In the case (4), we can show that Sj is also (M, p1 , p2^ )-periodic. Since the truth of P (n1 , . . . , nm ) is equivalent to (nj ∈ 1≤j≤m Sj ) and each Sj is eventually periodic, we can decide the truth of a given formula that contains P . ✷ Proof. We will discuss only the case where P is an inductive predicate since a general case can be proved by straightforwardly extending the proof of this case. For simplicity, we assume P takes one pointer variable and one numeric variable. For simplicity, we also assume P appears exactly once in the induction case. The general case can be proved by straightforwardly extending the proof of this case. Let the inductive definition be pred P (x, y) ≡ Φ0 (x, y) ∨ Φ1 (x, y). where Φ0 (x, y) is the disjunction of all the base cases and Φ1 (x, y) is the induction case. We define Φ#(k) (x, y) by P #(0) (x, y) ≡ A[Φ0 (x, y)], P #(k+1) (x, y) ≡ A[Φ1 (x, y)][P # (u, v) := P #(k) (u, v)]. P #(k) (x, y) is the k-times iteration of A[Φ1 (x, y)] to A[Φ0 (x, y)]. #(k) #(k) We define PS (x) as PS [P #(k) (x, y)] and_ PN (y) as #(≤k) #(k) #(k) PN [P (x, y)]. We also define PS (x) as PS (x). 0≤j≤k #(>k) We define PN (y) by #(>k) pred PN (y) ≡ PN [P #(k+1) (x, y)]∨ #(>k) # := PN ]. PN [A[Φ1 (x, y)]][PN Suppose we are deciding the satisfiability of P (t, a) where t is a pointer variable or null and a is an arithmetical term. Let k be 2n n(n + 1) where n is the number of pointer variable arguments. It is the maximum number of possible (B, α)’s. In the current simplified case, k = 4, since n = 1. The decision algorithm is as follows: Return the satisfiability of _ #(k) #(>k) P #(j) (t, a) ∨ PS (t) ∧ PN (a). 0≤j≤k The algorithm for the non-arithmetical part is based on the same idea as that in [6]. #(>k) (a) is decided by Theorem 3.1. The satisfiability of PN We show its correctness as follows. Assume P (t, a) is satisfiable. We will show the algorithm returns yes. Then we have some n such that P #(n) (t, a) is satisfiable. If n ≤ k, then the algorithm returns yes by the first disjunct. #(n) #(n) Assume n > k. Then both PS (t) and PN (a) are satisfi#(k) #(n) able. Since n > k, PS (t) is satisfiable. Since PN (a) implies #(>k) PN (a), the algorithm returns yes. For showing the other direction, assume the algorithm returns yes for the input t, a. We will show _ P (t, a) is satisfiable. #(k) We consider cases according to P #(j) (t, a) ∨ PS (t) ∧ 0≤j≤k #(>k) PN (a). If P #(j) (t, a) is satisfiable for some 0 ≤ j ≤ k, then P (t, a) is satisfiable. #(k) #(>k) Assume PS (t) ∧ PN (a) is satisfiable. Then we have #(n) some n > k such that PN (a) is satisfiable. #(k) Since PS (t) is satisfiable and k is the maximum number of #(q) possible (B, α)’s, we have some q < k such that PS (t) and #(q+1) PS (t) have some common (B, α) and (B, α) is statisfiable. #(m) #(n) Then PS (t) is satisfiable for all m ≥ q. Then PS (t) is #(n) satisfiable. Therefore P (t, a) is satisfiable, and hence P (t, a) is satisfiable. ✷ Example. The algorithm goes for sortll as follows: We have k = 4 since the number of pointer variable arguments is 1. sortll#(0) (root, n, m) = ({root}, true, n = 1), sortll#(1) (root, n, m) = ∃q, m1 .({root}, true, m ≤ m1 ) ∗ ({q}, true, n − 1 = 1). Since #(0) sortllS (root) = ({root}, true), #(1) sortllS (root) = ∃q, m1 .({root}, true) ∗ ({q}, true) #(4) sortllS (root) and they are equivalent, is equivalent to ({root}, true). Hence the satisfiability of sortll(root, n, m) is decided by checking the satisfiability of _ sortll#(j) (root, n, m)∨ Sec A) which could confirm when some over-approximation is also an under-approximation. This check provides a novel alternative to confirm the presence of precise (or equi-satisfiable) predicate invariant. As it is not strictly required by our current proposal on a decidable satisfiability SLA2 fragment, we have placed it in the Appendix for discussions and future considerations. The next subsection provide details on how we infer over-approximating numeric predicate invariants which are provably precise for the DPI fragment. 4.1 Computing Over-Approximation With the BAGA abstraction, we shall infer its invariant in three stages. First, we compute the shape-only invariant, by removing (via projection) non-pointer variables from its constraints and solving its shape-only abstraction using the SLSAT algorithm [6]. Second, we infer the numeric-only invariant by removing pointer variables from the set of constraints and then inferring its numerical invariant using a fix-point computation based on a disjunctive abstract interpretation (using FixCalc [25]). In the third stage, we combine via conjunction the two fixed points obtained from the prior two steps as an initial invariant for the combined domains. This combined invariant may be imprecise but can be refined by unfolding its predicate’s BAGA abstraction, a small number of times. We illustrate these three stages using the sortll predicate. Stage 1. We first compute shape-only abstraction by a projection process PS : PS [(B, α, φ)] PS [P# (v)] =df (B, α) =df P# S (v S ) PS [ψ1# ∗ψ2# ] W PS [ ∃v·ψ # ] =df PS [ψ1# ]∗PS [ψ2# ] W =df ∃v S ·PS [ψ # ] This essentially drops numeric constraints and parameters from the BAGA abstraction. For each abstract predicate P# (v), we obtain shape-only predicate P# S (v S ) such that v S = v if typeof(v)∈Ptr. We then infer the invariant over the shape-only domain for PS (v S ) using the algorithm in [6]. For our running example, we derive: sortll# S (root) =df ({root}, true ). 0≤j≤4 #(4) sortllS #(>4) (root) ∧ sortllN (n, m) where #(>4) pred sortllN (n, m) ≡ n = 6 #(>4) ∨∃q, m1 .(sortllN (n − 1, m1 ) ∧ m ≤ m1 ). 4. INFERRING BAGA INVARIANT In the previous section, we have identified an arithmetic fragment with inductive predicate over Presburger formula, named DPI, for which it is always possible to constructively build a precise (or equi-satisfiable) numeric invariant. In this section, we propose a practical algorithm for inferring precise invariants. Our algorithm is based on abstract interpretation which would attempt to infer an over-approximation for each inductive predicate. We use this approach since it is a standard and practical way for deriving numeric invariants, which are typically over-approximations of the inductive predicates themselves. Leveraging on the periodic set property of DPI class of formulas, we can further confirm that equi-satisfiable numeric invariants can always be inferred for this class of inductive numeric predicates. Through two projections, followed by a combination of shape and numeric domains, we show that it is always possible for each heap-based inductive predicate with the DPI property to have a precise BAGA invariant. This outcome leads to a decidable satisfiability algorithm for this class of formula. Nevertheless, we also hope to go beyond this decidable fragment by proposing an under-approximation check (see Stage 2. We next compute numeric-only abstraction by another projection PN : PN [(B, α, φ)] PN [P# (v)] =df φ =df P# N (v N ) PN [ψ1# ∗ψ2# ] W PN [ ∃v·ψ # ] =df PN [ψ1# ]∧PN [ψ2# ] W =df ∃v N ·PN [ψ # ] Using the sortll predicate as our running example, we obtain: # sortll# N (n,m) ≡ n=1 ∨ ∃ m1 ·m≤m1 ∧sortllN (n−1, m1 ) This numerical abstraction is subjected to a fix-point computation yielding a numerical invariant: sortll# (n,m)≡n>0. N(fix) Stage 3. We combine the two invariants before applying unfolding to obtain a more precise fix-point. In essence, this step eliminates any model that is satisfiable in the separate domains but is not in combined invariant is W the combined domain. Initial # inv0 (v)≡ {(B, α, P# N (v N )) | (B, α)∈PS (v S )}. Steps for each (kth) unfolding: • Unfold the BAGA abstraction of the predicate P# by substituting recursive predicate instances with the prior invariant invk−1 (v); • Normalize and combine the branches. As invariant over numerical domain covers all predicate branches, this step only refines the third component of each disjunct of invk (v). Hence, number of disjuncts of the all invk (v) is not changed. • We check if the prior invariant is as strong as the new one: invk−1 (v) =⇒ invk (v). If this is so, we have reached fixed point for the combined domain. Implication over BAGA formula is checked pairwise, namely: invk−1 (v) =⇒ invk (v) iff ∀(ψ,α,φi )∈invk−1 (v) · ∃(ψ,α,φj )∈invk (v) · φi =⇒ φj . For our running example, we derive inv0 (root,n,m)≡({root}, true , n>0) Unfolding gives: inv1 (root,n,m) ≡({root},true , n=1)∨({root},true ,n>1) ≡({root},true , n=1∨n>1) Since inv0 =⇒ inv1 , we detected inv0 as its combined fixed point. Note that we only apply repeated unfolding for precise combined fix-points. For imprecise fix-points, we only unfold once, as Stage 3 may loop otherwise. L EMMA 4.1 (C OMBINED I NVARIANT ). Split P# (v) = Ψ# [P# (w)] into two disjoint abstractions: P# S (v) = # # # # (w)] (v) [P and P = Ψ for which fix-points (w)] [P Ψ# N N N S S # fixS (v) ⇐⇒ Ψ# S [fixS (w)] and ΨN [fixN (w)] =⇒ fixN (v) exists. The conjunct of fix-points, namely fixC0 (v) = fixS (v)∧fixN (v), is an over-approximated fix-point Ψ# [fixC0 (w)] =⇒ fixC0 (v) of the combined domain. Furthermore, repeated unfolding of the form: fixCn (v) = Ψ# [fixCn−1 (w)] would derive an over-approximated invariant for the combined domain. The following soundness theorem and lemma for it are proved in a similar way to proofs of Theorem 3.2 given in Sec 3. L EMMA 4.2. Provided an inductive predicate P(v) and fix(w) be a fixed point computed by the three stages above, if s,h |= P(v), there exists a disjunct BAGA formula (B, α, φ) of fix(w) and s′ such that s ⊆ s′ , s′ |= α, s′ |= φ and s(B)⊆dom(h) T HEOREM 4.3 (S OUNDNESS ). Provided an inductive predicate P(v) and fix(w) be a fixed point computed by the three stages above, if fix(w)≡false , then P(v) is unsatisfiable. If pure fix-point in Stage 2 is precise, we eventually infer a precise fix-point. T HEOREM 4.4 (P RECISE I NVARIANT ). Split P# (v) = Ψ# [P# (w)] into two disjoint abstractions: P# S (v) = # # # # Ψ# S [PS (w)] and PN (v) = ΨN [PN (w)] for which precise fix-points # fixS (v) ⇐⇒ Ψ# S [fixS (w)] and ΨN [fixN (w)] ⇐⇒ fixN (v) exists. The conjunct of fix-points, namely fixC0 (v) = fixS (v)∧fixN (v), is a precise fix-point Ψ# [fixC0 (w)] ⇐⇒ fixC0 (v) of the combined domain. Furthermore, repeated unfolding of the form: fixCn (v) = Ψ# [fixCn−1 (w)] would eventually derive a precise invariant for the combined domain. Proof Sketch. We rely on the premise that both heap and integer abstract domains are disjoint with precise fixpoints computed independently. For precise heap invariant, we have a finite set of heap configurations that is preserved by unfolding. For the numeric domain, we can use disjunction to combine pure formula of identical heap configurations without loss of information. Proof on termination of unfolding relies on the fact that heap configuration is finite, and that we have precise numeric fix-point. Our over-approximating fixpoint computation would yield precise numeric invariant for the DPI fragment. For more complex formula, we may still derive an over-approximating invariant. To confirm if these invariants are possibly precise, we propose a procedure for under-approximation check in App A. To be sound, this procedure also checks that every non-false invariant has at least one satisfiable instance. 5. EXPERIMENTAL ASSESSMENT We have implemented and integrated our proposed procedure in Sec. 4 and Appendix A into HIP/SLEEK [8, 17, 16] for verifying heap-based programs. We made use of Omega Calculator [26] to eliminate existential quantifiers, Z3 [12] as a back-end SMT solver, and FixCalc [25] to compute fixed point for the pure domain. The URL for our demo web-site is available on request for double blind reasons. In the rest of this section, we will show the capability of our invariant inference and its application in program verification. The experiments were performed on a machine with the Intel i7-960 (3.2GHz) processor and 16 GB of RAM. 5.1 Invariant Inference Using our proposed procedure, we have inferred precise invariants for a broad range of data structures. Some examples include cyclic linked-list, list segment, linked-list with even size, binary tree, and tree with linked leaves. In all these predicates, size n is the pure property. The inferred invariants are shown in Table 1. While the precise invariants for these common predicates are fairly straightforward, we can also infer precise invariants that are non-trivial. An example is the bndll predicate below to describe a singly linked-list with lower bound l and upper bound u on every value of the nodes with no two consecutive nodes having the same value. pred bndll(root,p,l,u,n) ≡ root=null ∧ n=0 ∨ ∃v, q· root7→node(v, q)∗bndll(q,v,l,u,n−1)∧l≤v≤u∧v6=p Our inference system automatically derives the following precise invariant for it. ({}, root=null, n=0) ∨ ({root}, true , n=1∧l≤u∧(l≤p−1∨p<u)) ∨ ({root}, true , n>1 ∧ l+1≤u ∧ (l≤p−1 ∨ p<u)) There are also examples where we have inferred over-approximated rather than precise invariants. These typically occur when they are outside of the SLA2 fragment. Apart from AVL tree, we cannot infer precise invariants for heap, complete, red-black trees. 5.2 Compositional Program Verification We have integrated our proposed decision procedure into HIP/SLEEK [8, 17, 16] for deciding unsat queries. Our satisfiability solver was used to prune infeasible program states, discharge verification conditions with empty heap in RHS, and generate counterexample to highlight detected real errors. The specification system in HIP/SLEEK is based on separation logic with user-defined predicates [8] and second-order predicates [16]. The Hoare-style forward reasoning engine computes a set of states in separation logic form. During this process, it prunes infeasible disjunctive program paths with unsat queries at branch statements (e.g. if, while statements). Additionally, this component also generates entailment obligations to ensure absence of memory errors (no null deference, no double free and no leaking memory), validity of functional calls/loops via compositional pre-/post- conditions and postconditions holding. As a compositional verification system, the correctness of a program is reduced to the validity of appropriate verification conditions generated. For each entailment check (i.e. ∆a ⊢ ∆c ), the entailment procedure SLEEK discharges such a verification condition by searching for matching on heap such that heap of the RHS is subsumed by heap of the antecedent. Concretely, it manipulates heap on antecedent and consequent of the verification condition (with folding and matching) until heap of the consequent is empty, i.e. ∆ ⊢ emp∧πr . To support both safety Data Structure Singly llist Even llist Sorted llist Doubly llist CompleteT Heap trees AVL BST RBT rose-tree TLL Bubble Quick sort Precise Predicate Invariant Inferred ({ }, root=null, n=0) ∨ ({root}, true, n>0) ({ }, root=null, n=0) ∨ ({root}, true, ∃i· i>0 ∧ n=2∗i) ({ }, root=null, n=0 ∧ sm<=lg) ∨ ({root}, true, n>0 ∧ sm<=lg) ({ }, root=null, n=0) ∨ ({root}, true, n>0) ({ }, root=null, n=0) ∨ ({root}, true, n<=2∗nmin ∧ nmin+1<=n) ∨({root}, true, n<=2∗nmin−1 ∧ nmin<=n) ({ }, root=null, n=0 ∧ mx=0) ∨ ({root}, true, n>0 ∧ mx>=0) ({ }, root=null, m=0 ∧ n=0 ∧ bal=1) ∨({root}, true, m>=1 ∧ bal<=2 ∧ bal<=n ∧ bal>=2−n ∧ bal>=0 ({ }, root=null, n=0) ∨ ({root}, true, n>0) ({ }, root=null, n=0 ∧ bh=1 ∧ cl=0) ∨ ({root}, true, n>0 ∧ bh>0 ∧ cl=1) ∨({root}, true, n>0 ∧ bh>1 ∧ cl=0) ({ }, root=null, true) ∨ ({root}, true, true) ({root}, root=ll, true) ∨ ({root, ll}, true, true) ({ }, root=null, n=0) ∨ ({root}, true, n>0 ∧ sm<bg) ({ }, root=null, n=0) ∨ ({root}, true, n>0 ∧ sm<bg) Type Precise Precise Precise Precise Over Over Over Precise Over Precise Precise Precise Precise Table 1: Invariants inferred for a benchmark of data structures Data Structure (pure props) Singly llist (size) Even llist (size) Sorted llist (size, sorted) Doubly llist (size) CompleteT (size, minheight) Heap trees (size, maxelem) AVL (height, size, bal) BST (height, size, sorted) RBT (size, blackheight, color) rose-tree TLL Bubble (size, sorted) Quick sort (size, sorted) #queries 535 126 168 381 321 386 539 260 1451 55 107 208 181 #unsat 66 122 18 44 25 38 25 29 126 6 13 19 21 #sat 469 1 150 337 0 0 0 231 0 49 94 189 160 #unknown 0 3 0 0 296 348 514 0 1325 0 0 0 0 Time (s) 0.68 0.71 1.31 0.77 3.22 3.23 6.19 1.85 10.81 0.11 0.28 0.61 0.72 Table 2: Experimental Results on a Wide Range of Complex Data Structures proving and bugs finding, SLEEK used an error calculus [17] to classify the entailment with empty heap in consequent into√a lattice value: unreachability ⊥ (when ∆ is unsatisfiable), safety (when ∆∧¬πr is unsatisfiable), must error ✵must (when ∆∧πr is unsatisfiable), and may error ✵may (otherwise). Since SLEEK verifies programs with only sound abstraction, it can not confirm the satisfiability of ∆a ; consequently, it can not distinguish safety, must errors (and may errors) with unreachability. Must errors may be unreachable and this would be considered as false alarms. To solve this problem, we employ the proposed decision procedure as a new satisfiability procedure in the error calculus. When a must error is detected, we perform a additional satisfiability of ∆a to check its reachability. If it is satisfiable, we proceed to invoke error explanation to identify a set of code statements relating to the error [17]. With our new satisfiability procedure, we can confirm more true bugs and bring better support to fixing program bugs. The experimental results are summarized in Table 2. The first column lists data structures and their pure properties. rose-trees are trees with nodes that are allowed to have variable number of children, stored as doubly-linked lists. TLL is binary trees whose nodes point to their parent and all leave nodes are linked as singlylinked list. The second column lists the total number of satisfiability queries sent to the decision procedure. The third, fourth and fifth columns show the amount of unsat, sat and unknown queries, respectively. The last column captures the processed time (in second) for queries of each data structure. As expected, Heap trees, Complete trees, AVL and RBT data structures are beyond the decidable fragment, our system can only answer unsat queries. For the case of Evenllist, although we can generate precise invariants, queries (generated from quantified specification) with complex form of quantifiers may drop into undecidable and discharged with unknown outputs. 6. RELATED WORK AND CONCLUSION Due to complications of negation and the need for frame inference in resource-oriented separation logic, there are two separate decision procedures of interests, namely entailment and satisfiability. Most works have focused on the first problem. Calcagno et al. [7] first presented foundations about computability and complexity results of the basic separation logic fragment without recursive predicates. Since then there were initial proposals that introduced decision procedures for satisfiability in a fragment of separation logic with hardwired linked lists and (dis)equality between pointer variables. Berdine et. al. proposed foundation and the first proof theory for such a decidable fragment [2, 3]. Subsequently, [9] and [22] use more efficient algorithms to prune infeasi- ble branches using graph techniques [9] and superposition calculus [22]. [23] presents an efficient SMT-based decision procedure for separation logic with acyclic list segments with length, by combining an entailment checking algorithm for separation logic with decidable SMT theories. To support more expressive fragment, [24] proposed a logic of graph reachability and stratified sets to capture the semantics of heap structure. For a fragment with general inductive predicates, [15] showed decidability (both satisfiability and entailment problems) for bounded-width tree-like data structures. On satisfiability, Brotherston et al. [6] recently presented a decision procedure for satisfiability over the fragment which included (arbitrary) recursive predicates with pointer (dis)equality. Our work is inspired from their proposal, as we also use fixed point computations to produce a set of models that are equi-satisfiable with the original formula. We have now extended such a decision procedure to include predicates with Presburger invariants. Our work shows that precise invariants are key to ensuring satisfiability and that we can use projection to derive precise invariants separately for the shape and integer domains. In terms of combining abstract domains, we list three related works. [14] proposed a logical product combination of abstract interpretations (over pure domains) to gain better precision than the reduced product combination [11, 10], with the same conditions imposed on individual theories as in the Nelson-Oppen combination for decision procedures [21]. [32] presented a framework for the reduced product combination of memory abstractions (over shape domains). Lastly, [13] combines heap with numeric abstract domains. Unlike these past works, our fix-points are based on disjoint shape and pure domains which are then combined by an unfolding stage to derive precise combined invariants for satisfiability. There have also been a number of related works concentrating on decision procedures for data structures. [31] proposes a multiphase decision procedure for the quantifier-free theory of algebraic data types with different fold functions, that can be used to prove various properties of functional data structures. [29] employs a fragment of separation logic with recursive definitions but no explicit quantification, to specify general properties of structure, data and separation, which are converted into classical logic and then handled by natural proof mechanisms with the help of decidable SMT solvers. [18] defines a new logic allowing existential and universal quantifications over nodes and complex combination of data and structural constraints and identifies decidable fragments of the logic where the decision procedure can be implemented by combining an MSO decision procedure over trees and an SMT solver for integer constraints. Other works, such as [4, 16, 19, 20, 27, 28], propose abstract interpretation based analyses to infer invariants over heap and data for heap-manipulating programs. These analyses either make use of fixed/pre-built shape predicates (e.g. [4, 19, 20]) or support user-defined predicates (e.g. [27, 28]) or infer arbitrary (second-order) shape predicates (e.g. [16]). They are also based mostly on over-approximation analyses, and do not provide any decision procedure on satisfiability. Conclusion. We have considered an expressive fragment of separation logic with Presburger arithmetic. Though the satisfiability of this fragment is not decidable, we show that precise invariant can be guaranteed for the SLA2 fragment, leading to decidable satisfiability. 7. REFERENCES [1] T. Antonopoulos, N. Gorogiannis, C. Haase, M. Kanovich, and J. Ouaknine. Foundations for decision problems in separation logic with general inductive predicates. In FoSSaCS, pages 411–425, 2014. [2] J. Berdine, C. Calcagno, and P. W. O’Hearn. A Decidable Fragment of Separation Logic. In FSTTCS. Springer-Verlag, December 2004. [3] J. Berdine, C. Calcagno, and P. W. O’Hearn. Symbolic Execution with Separation Logic. In APLAS, volume 3780, pages 52–68, November 2005. [4] A. Bouajjani, C. Dragoi, C. Enea, A. Rezine, and Mihaela Sighireanu. Invariant synthesis for programs manipulating lists with unbounded data. In CAV, pages 72–88, 2010. [5] J. Brotherston, D. Distefano, and R. L. Petersen. Automated cyclic entailment proofs in separation logic. In CADE, pages 131–146, 2011. [6] J. Brotherston, C. Fuhs, J. A. Navarro Pérez, and N. Gorogiannis. A decision procedure for satisfiability in separation logic with inductive predicates. In CSL-LICS. ACM, 2014. [7] C. Calcagno, H. Yang, and P. W. O’Hearn. Computability and complexity results for a spatial assertion language for data structures. In FSTTCS, pages 108–119, 2001. [8] W.N. Chin, C. David, H.H. Nguyen, and S. Qin. Automated verification of shape, size and bag properties via user-defined predicates in separation logic. SCP, 77(9):1006–1036, 2012. [9] B. Cook, C. Haase, J. Ouaknine, M. Parkinson, and J. Worrell. Tractable reasoning in a fragment of separation logic. In CONCUR, volume 6901, pages 235–249. 2011. [10] P. Cousot. Lecture Notes on Abstract Interpretation. 2005. http://web.mit.edu/16.399/www/. [11] P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In ACM POPL, San Antonio, Texas, 1979. [12] L. M. de Moura and N. Bjørner. Z3: An Efficient SMT Solver. In TACAS, 2008. [13] P. Ferrara. Generic combination of heap and value analyses in abstract interpretation. In VMCAI, volume 8318, pages 302–321. 2014. [14] S. Gulwani and A. Tiwari. Combining abstract interpreters. In ACM PLDI, pages 376–386, 2006. [15] R. Iosif, A. Rogalewicz, and J. Simácek. The tree width of separation logic with recursive definitions. In CADE, pages 21–38, 2013. [16] QL. Le, C. Gherghina, S. Qin, and W.-N. Chin. Shape analysis via second-order bi-abduction. In CAV, 2014. [17] QL. Le, A. Sharma, F. Craciun, and W.-N. Chin. Towards complete specifications with an error calculus. In NASA Formal Methods, volume 7871, pages 291–306. Springer Berlin Heidelberg, 2013. [18] P. Madhusudan, Gennaro Parlato, and Xiaokang Qiu. Decidable logics combining heap structures and data. In ACM POPL, pages 611–622, New York, NY, USA, 2011. ACM. [19] S. Magill, J. Berdine, E. M. Clarke, and B. Cook. Arithmetic strengthening for shape analysis. In SAS, pages 419–436, 2007. [20] B. McCloskey, T. Reps, and M. Sagiv. Statically inferring complex heap, array, and numeric invariants. In SAS, pages 71–99, Berlin, Heidelberg, 2010. Springer-Verlag. [21] G. Nelson and D. Oppen. Simplification by cooperating decision procedures. ACM Trans. Program. Lang. Syst., 1(2):245–257, October 1979. [22] J. A. N. Pérez and A. Rybalchenko. Separation logic + superposition calculus = heap theorem prover. In ACM PLDI, [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] pages 556–566. ACM, 2011. J. A. N. Pérez and A. Rybalchenko. Separation logic modulo theories. In APLAS 2013, pages 90–106, 2013. R. Piskac, T. Wies, and D. Zufferey. Grasshopper: Complete heap verification with mixed specifications. TACAS’14, 2014. C. Popeea and W.-N. Chin. Inferring disjunctive postconditions. In ASIAN, pages 331–345, 2006. W. Pugh. The Omega Test: A fast practical integer programming algorithm for dependence analysis. Communications of the ACM, 8:102–114, 1992. S. Qin, G. He, C. Luo, W.-N. Chin, and X. Chen. Loop invariant synthesis in a combined abstract domain. J. Symb. Comput., 50:386–408, 2013. S. Qin, G. He, C. Luo, W.-N. Chin, and H. Yang. Automatically refining partial specifications for heap-manipulating programs. Sci. Comput. Program., 82:56–76, 2014. X. Qiu, P. Garg, A. Ştefănescu, and P. Madhusudan. Natural proofs for structure, data, and separation. In PLDI, pages 231–242, New York, NY, USA, 2013. ACM. G. Rosu and A. Stefanescu. Checking reachability using matching logic. In ACM OOPSLA, pages 555–574, New York, NY, USA, 2012. ACM. P. Suter, M. Dotta, and V. Kuncak. Decision procedures for algebraic data types with abstractions. In ACM POPL, 2010. A. Toubhans, B.-Y. E. Chang, and X. Rival. Reduced product combination of abstract domains for shapes. In VMCAI, pages 375–395, 2013. M.-T. Trinh, QL. Le, C. David, and W.-N. Chin. Bi-abduction with pure properties for specification inference. In APLAS, pages 107–123. 2013. APPENDIX A. UNDER-APPROXIMATION CHECK Given a pure formula π and a user-defined predicate P# N (v), we define under-approximated invariant problem as a procedure to verify whether π is an under-approximated invariant of P# N i.e. ∀h, s. s, h|=π =⇒ ∃h′ . s, h◦h′ |=P# N (v) The challenge is that both π and P# N (t) are sets of disjoint, unfolded and possibly infinite disjuncts. In essence, it requires a systematic procedure to (i) match a disjunct πi in LHS with its partner # ∆# Ni in RHS such that πi is an under-approximation of ∆Ni , (ii) strengthen LHS by excluding the middle, and (iii) strengthen RHS by unfolding the predicate instances. Cyclic technique [5, 30] is a promising approach for inductive reasoning of inductive predicates, which looks up an induction hypothesis in historical proofs. As our under-approximated invariant check is quite special, immediately deploying cyclic proof is not efficient. In this section, we propose a verification procedure for the underapproximated invariant problem. Our verification provides inductive reasoning and a case-split mechanism to strengthen the verification process. The verification of under-approximated invariant is formalized as (b, V ): π ≪ P# N (v) where (b, V ) is a reset table data structure called the context of the proof search: b is the boolean refuted, which can be set any time the context is detected to be inconsistent; and V is a set of variables defining under-approximation. The verification process is performed by systematically applying rules in Fig. 1. Starting from the goal (false , { }): π ≪ P# N (v), we initially unfold RHS (using [UV−U] rule) and add π ≪ P# N (v) as induction hypothesis. The unfolding over P# (v) enables inductive reasoning N on the problem, i.e. induction hypothesis can be applied for any predicate instances P# N in RHS. Inductive reasoning is implemented as follows. First, we encode the induction hypothesis above by setting π as a unique active under-approximated invariant of predicate P# N . After that, the induction hypothesis is applied as presented in the [UV−I] rule. The soundness of inductive reasoning requires that a proof including the induction application [UV−I] is valid if there exists a valid proof of its corresponding base case. In the [UV−I] rule, the auxiliary procedure eXPure(P# N (v)) substitutes the induc(v) tive predicate P# by its active under-approximated invariant. N Rule [UV−U] strengthens the RHS by unfolding one user-defined predicate assertion in the RHS via the procedure unfold(P(t), ∆). This procedure unfold(P(t), ∆) unfolds once the user-defined predicate instance P(t) of the formula ∆. The steps are formalized as follows: W ′ ′ fresh wi ρi =[wi /wi ] P(v)≡ n i=1 (∃w i · πi ) ′ ′′ ′ πi = πi [ρi ] ρ0 =[t/v] πi = πi [ρ0 ] W ′ ′′ unfold(∃w0 · P(t)∧π0 , i) ❀ n (∃w 0 ∪w i · π0 ∧πi ) i=1 First, the function looks up the definition of P, refreshes the existential quantifiers. Second, formal parameters are substituted by the actual parameters. Finally, the substituted definition is combined with the residual formula as in the RHS of ❀. [UV−D] rule is a terminal rule. It handles (b, V ): πa ≪ πc such that both πa and πc are in conjunction form and do not include any inductive predicates. It checks the implication over the pure theory, i.e. integer arithmetic T . If the implication is not valid, the boolean refuted will be set to true . Such a refuted context will cause the search for proof to backtrack and consider a different case. A naive approach to backtracking is to go down until reaching the [UV−RO] and try another disjunct in RHS. In the rest, we present a systematic backtracking approach based on an entailment and inference procedure. In more detail, we implement an entailment procedure, called SeA, as the implication check in the theory T : [V ] πa ∧? ⊢ πc . Especially, this procedure is able to perform logical abduction over the selected variables V [33]. Selective abduction is the problem of finding missing hypotheses, i.e. ?, over a set of variables in a logical inference task. Concretely, in the above implication if πa could not logically imply πc , our selective abduction would infer the simplest and most general explanation πI over variables V such that πa ∧πI =⇒ πc and πa ∧πI =⇒ 6 false . The explanation will be used to guide the case split of backtracking search strategy. Search Strategy. We focus on the two rules, unfolding and induction applying, that are possibly applied for inductive predicates. Unfolding predicate provides more precise disjuncts, but may explode the proof search. Applying induction helps soundly decide a set of infinite disjuncts, but may be over under-approximated and produces false alarms. Our strategy is that we give higher priority on applying induction over predicate instances to hope that we can find a proof as early as possible. In case of the boolean refuted is set, we propose to first backtrack on under-approximation and then strengthen the verification by unfolding the RHS or case splitting the LHS to hope that we can reduce the false alarms as many as possible. In the following, we describe the strategy with conflict analysis for unfolding and explanation for case split. Suppose the implication check in background theory T be =⇒ T . Given an antecedent πa , a cosequent πc in the theory T , and a set of variables V , the procedure SeA, i.e. [V ] πa ∧? ⊢ πc , is formalized [UV−CS] (b, V ): (πa ∧πI )∨(πa ∧¬πI ) ≪ πc (b, V ): πa ≪ πc [UV−D] [V ] πa ∧? ⊢ πc (b, V ): πa ≪ πc [UV−U] Wn (b, V ): ψ ≪ (b, V ): ψ ≪ P(t)∧π i=1 ∆i ≡unfold(P(t), P(t)∧π) Wn i=1 ∆i [UV−I] (b, V ∪v): πa ≪ eXPure(P# N (t))∧π (b, V ): πa ≪ P# (t)∧π N [UV−LO] (b, V ): π1 ≪ π (b, V ): π2 ≪ π (b, V ): (π1 ∨ π2 ) ≪ π [UV−RO] (b, V ): π ≪ πi i ∈ {1, 2} (b, V ): π ≪ (π1 ∨ π2 ) Figure 1: Inference Rules for Under-Approximation Check. as the following [REFUTED] πa =⇒ T false [CONFL] πa =⇒ T ¬πc [VALID] πa =⇒ T πc [EXPL] infer [FV(πa )] πa =⇒ T πc split on LHS as in the following ... (c) I (b) I n>0∧¬(n>1) ≪ πc n>0∧n>1 ≪ πc LO (n>0∧n>1)∨(n>0∧¬(n>1)) ≪ πc (a) CS n>0 ≪ ∃m1 ·sortll# N (n−1, m1 ) In the first two rules, while [REFUTED] first checks the inconsistency on LHS and sets the boolean refuted to true, [VALID] proves the validity and completes the proof search. In the last two rules, the implication is not proven. Instead of giving up by setting the boolean refuted to true, we will show how to strengthen the verification with conflict analysis or explanation inference. Conflict Analysis. In the [CONFL] rule, SeA detects conflict between LHS and RHS and analyze the conflict to find whether variables defining the under-approximation contribute to the conflict. The set of variables is computed by: Vf =V ∩ FV(πc ). Our proof system now backtracks until reaching the earliest under-approximation of a variable in Vf , rollbacks the induction application on predicate instance and unfolds the predicate instance to obtain a more precise RHS. Explanation Inference. In the [EXPL] rule, if neither πa =⇒ T πc nor πa =⇒ T ¬πc , the procedure cannot discharge or validate the potential inconsistency. This is the case when either πa is too weak to be an under-approximation of the given πc or πc has not been sufficiently unfolded. To strengthen πa , we use abduction to infer a right cut over free variables of πa to exclude the middle. This inference is performed through free variables of πa , and produces an explanation πI . Our proof system now backtracks until reaching the earliest under-approximation of a variable in FV(πi ) i.e. (b, V ′ ): πa ≪ P# N (v)∧π and v∩FV(πi )6={ }, and applies case split on the LHS as: (b, V ′ ): (πa ∧πI ) ∨ (πa ∧¬πI )) ≪ (P# N (v))∧π (as presented in [UV−CS] rule). In the above proof, πc ≡∃m1 ·sortll# N (n−1, m1 ). Steps to search proof for (b) is similar to (a) above. The following is the search proof for (c). T HEOREM A.1 (C ORRECTNESS ). Provided an inductive pred# icate P# N (v) and ψ such that (false , { }): π ≪ PN (v). s, h |= π # implies s, h |= PN (v). SeA introduces an abduction, backtracking and case split as follows. Example 1. We verify that n>0 is an under-approximated invariant of sortll# N predicate. The proof is as follows. (Rules applied for proof search are annotated (without the prefix −UV).) (a) [n] n>0∧? ⊢ n−1>0 D n>0) ≪ ∃m1 ·n−1>0) n>0 ≪ ∃m1 ·sortll# N (n−1, m1 ) I n>0 ≪ (n=0)∨(∃m1 ·sortll# N (n−1, m1 )) RO2 U n>0 ≪ sortll# N (n, m) In the above proof, since LHS of the implication at the top implies neither RHS nor ¬RHS, our SeA system applies [EXPL] rule to infer explanation πI =n>1, then backtracks to (a) and does case (c) [n] n>0∧¬(n>1)∧? ⊢ n−1>0 D n>0∧¬(n>1)} ≪ ∃m1 ·n−1>0 n>0∧¬(n>1) ≪ ∃m1 ·sortll# N (n−1, m1 ) I At the top of the above proof, we apply [CONFL] rule to analyse conflict over n variable, then backtrack to (c) and unfold sortll# N instance. These steps are presented as follows. (c) [n] r6=null∧n>0∧¬(n>1)∧? ⊢ ∧n=1 D n>0∧¬(n>1) ≪ n=1 n>0∧¬(n>1) ≪ n=1 ∨ ∃m1 ·sortll# N (n−1, m1 ) RO1 Therefore, n>0 is an under-approximated invariant of the predicate sortll# N . Example 2. To illustrate that inductive reasoning requires a base case, we consider the following check true ≪ sortll# N (n, m). Initially, the predicate in RHS is unfolded as: true ≪ n=1 ∨ ∃m1 ·sortll# N (n−1, m1 )∧m≤m1 The proof for the base case is as follows. [n] true ∧? ⊢ n=1 D true ≪ n=1 RO1 true ≪ n=1 ∨ ∃m1 ·sortll# N (n−1, m1 )∧m≤m1 (1a) n=1 ≪ sortll# N (n, m) ¬(n=1) ≪ sortll# N (n, m) n=1∨¬(n=1) ≪ sortll# N (n, m) The check in LHS is trivial. The check in RHS is as follows. ... [n] ¬(n=1)∧? ⊢ ¬(n−1=1) D ¬(n=1) ≪ ¬(n−1=1) (1b) I ¬(n=1) ≪ ∃m1 ·sortll# N (n−1, m1 )∧m≤m1 ¬(n=1) ≪ n=1 ∨ ∃m1 ·sortll# N (n−1, m1 )∧m≤m1 CS RO2 SeA recommends for another abduction process. This process may be non-terminating. This shows that our inference system is sound; it never concludes that true is an under-approximated invariant of sortll# N (n, m).