272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 9: Test Generation from Models Test Data Generation from Models • Object-Role Modeling (ORM): A modeling language for specifying relational data models – Kind of like UML class diagrams or entity-relationship diagrams – Has formal semantics • It allows the developer specify a data model with various constraints • Used in industry • Problem: – Satisfiability: Given an ORM diagram, are there any instances that satisfy the constraints specified in the diagram • Test generation: Given a satisfiable ORM diagram, generate instances that satisfy the constraints specified in the diagram ORM diagrams • Ovals show entities or values (basic types) – These are sets of objects or values • Rectangles show predicates – These represent the relations among the objects – They correspond to the tables in a relational database ORM constraints • Uniqueness constraint: Occurrence of a value in any column of a predicate is unique • Mandatory constraint: Every object of an entity is implicitly assumed to participate in some of the predicates in which the entity has a role • Frequency constraint: Specifies how many times a tuple can occur in a predicate • Subset, equality constraints: It specifies that the tuples in the roles of the first predicate have to be among the tuples in the roles of the second • Subtype constraints: Used to model subtyping among entities • Value and cardinality constraints: Specify a range of values for primitive type or a cardinality (which can be a number or a range) for an entity • Exclusion constraints: specifies that a particulate value can occur online in one table • Ring constraints: Constraints on predicates (symmetric, acyclic, transitive, intransitive, reflexive, irreflexive, asymmetric) Results • For the most general case where constraints on the diagrams are specified using arbitrary queries – Satisfiability is undecidable • For a more restricted class of diagrams, satisfiability checking is NPhard • Brute-force checking of satisfiability (bounded-exhaustive testing) does not scale • For a class of diagrams that the authors define called ORM-, the satisfiability can be checked in polynomial time NP-Hardness of ORM satisfiability • NP-hardness proof shows that well-known NP-complete boolean satisfiability problem, 3SAT, can be reduced to an instance of ORM model satisfiability • A 3-CNF formula has the following form: (c1,1 c1,2 c1,3) (c2,1 c2,2 c2,3) … where each ci,j represents expressions of the form qi or qi over a set of boolean variables q1, q2, …, qn. We call (ci,1 ci,2 ci,3) clause ci. • Determining satisfiability of a 3-CNF formula is an NP-complete problem • A 3-CNF formula can be translated to a ORM diagram, such that, the 3-CNF formula is satisfiable if and only if the generated ORM diagram is satisfiable NP-Hardness of ORM satisfiability 3-CNF formula to ORM diagram translation • Introduce an object type for each qi • Introduce an object type for each clause ci • Introduce a value constraint for each of the object types above where their value is restricted to a single value qi or ci • For each variable qi let n be the number of times it occurs in the clauses. If n > 0, create a (n+1)-ary predicate qi-true • For each expression qi let m be the number of times it occurs in the clauses. If m > 0, create a (m+1)-ary predicate qi-false • Connect type qi to predicates qi-true and qi-false • Impose and exclusion constraint on the roles played by qi in qi-true and qi-false (meaning that qi is either true or false, but not both) • Connect each clause ci to the predicate qi-true if it contains qi or qifalse if it contains qi Brute-force approach • As it is possible to translate satisfiability of boolean formulas to satisfiability of ORM diagrams, it is also possible to translate satisfiaibility of ORM diagrams, satisfiability of boolean formulas • The authors tried this approach • They translated ORM diagrams to input languages of CoBaSa and Alloy Analyzer – These are tools that can check satisfiability queries about bounded models – They use SAT solvers as back-end solvers • Experiments with this approach shows that ORM satisfiability checking using a SAT solver is not efficient and does not scale to large diagrams ORMGiven the theoretical difficulty of the general ORM satisfiability problem and the practical inefficiency of the brute-force approach – the authors define a subset of the ORM language for which satisfiability can be determined efficiently. ORM- allows only a subset of ORM constraints • Uniqueness constraints cannot overlap • Mandatory constraints are allowed on a single role • Frequency constraints cannot overlap • Value and cardinality constraints are allowed • Subtype constraints are allowed According the authors majority of constraints used in practice are covered by this subset. Note that, subset constraints, exclusion constraints and ring constraints are not allowed. Testing Satisfiability of ORM• Using a transformation from the ORM- to integer constraints the authors show that the satisfiability of the ORM- can be checked by checking satisfiability of the resulting integer constraints • They also show that the satisfiability of the resulting integer constraints can be checked in polynomial time Converting to integer constraints • For each type A introduce a variable a representing its cardinality. For each predicate R, introduce a variable r denoting its cardinality. For each role in R played by A introduce a variable ra representing the number of unique elements from A in this role • Produce inequalities: ra ≤ a and ra ≤ r • For each cardinality constraint for a type A such as min … max produce inequalities: min ≤ a and a ≤ max – Do the same for the predicates. The default min value is 1 and the default max value is M (which could be set to MaxInt) • For each value constraint let ds denote the domain size for the value domain, generate the inequality: a ≤ ds • For each mandatory constraint for a role played by type A in predicate R, generate the inequality: a ≤ ra Converting to integer constraints • For each frequency constraint with frequency with range fmin, … , fmax on the roles played in predicate R by types A, B, …, K, introduce the variable rab…k representing the number of unique tuples in these roles and produce the following inequalities: r ≤ fmax . rab…k fmin . rab…k ≤ r rab…k ≤ ra . rb . … . rk ra ≤ rab…k , rb ≤ rab…k , … , rk ≤ rab…k • For each subtype constraint between types A and S produce the inquality: a ≤ s Converting to integer constraints • For each predicate R with roles played by types A, B, … , N express the implicit uniqueness constraint over all roles by producing the inequality: r ≤ rab…k . rlm…p . … . rn where right hand side are variables that correspond to ranges of roles that participate in some frequency constraints, followed by the variables corresponding to roles that are under no frequency constraint • For each non-independent type A that plays roles in predicates R, S, …., V introduce the constraint: a ≤ ra + sa + … + va which captures the implicit disjunctive mandatory constraint that each object of a non-independent type needs to appear in some predicate Properties of this conversion • The set of constraints generation by this conversion are satisfiable if and only if the ORM- diagram itself is satisfiable – Hence this approach is both sound and complete • This means that we reduced the satisfiability of the ORM- diagrams to satisfiability of a set of integer constraints • Satisfiability of integer constraints is a difficult problem itself. • However, the integer constraints generated with this conversion are of a specific type for which the satisfiability can be checked efficiently as the authors show Checking satisfiability of constraints • Rewrite all inequalities of the form c . x ≤ y1 . y2 . … . yn as form x ≤ (y1 . y2 . … . yn) / c • For each variable maintain its current upper bound. In every step substitute the current upper bounds for all variables on the rhs and after arithmetic manipulation obtain upper bound candidates for each variable and choose the smallest one • Repeat until either an upper bound crosses a lower bound or all upper bounds remain unchanged The algorithm runs in polynomial time because if there are n variables in the system, each inequality can yield a different upper bound at most n times. Test Generation • Any set of integer values satisfying the generated integer constraints can be used to construct an instance of the ORM- data model • These instances can then be used as test cases for testing a database • So the overall approach would be: – Given a database, specify an ORM- diagram that characterizes a set of test cases for testing the database – First check the satisfiability of the ORM- diagram by generating integer constraints as described above and then check the satisfiability of the generated integer constraints – If the integer constraints are satisfiable, find integer values that satisfy the constraints and convert those integer values the instances of the ORM- model