Pattern-Constrained Test Case Generation Martin Atzmueller and Joachim Baumeister and Frank Puppe University of Würzburg Department of Computer Science VI Am Hubland, 97074 Würzburg, Germany {atzmueller, baumeister, puppe}@informatik.uni-wuerzburg.de Abel 2002; Gupta & Biegel 1990): These approaches rely on an existing knowledge base, and they generate test cases based on the available set of derivation knowledge. In contrast, this paper introduces a novel method for the generation of test cases that is independent from the actually used knowledge representation. Here, test cases are generated in a separate phase ignoring probably existing derivation knowledge. Therefore, it is especially useful to be applied within test–first methodologies, e.g., the agile methodology for developing knowledge systems (Baumeister, Seipel, & Puppe 2004). Those process models intend to define the test knowledge before the applied problemsolving knowledge; among other things this process is motivated by the evolutionary way the system is developed. In the context of this paper, we focus on the domain of diagnostic/classification systems, i.e., for a given input of findings a (set of) solution(s) is derived by the system explaining the given findings. The rest of the paper is organized as follows: We first introduce subgroup patterns and Bayesian networks, and describe the issues and the challenges of generating test cases for empirical testing. After that, we introduce the test data generation process in detail. Next, we provide a case study where we generate test cases for a knowledge base from the biological domain. Finally, we conclude the paper with a discussion of the presented approach, and we show promising directions for future work. Abstract In this paper we present a novel approach for patternconstrained test case generation. The generation of test cases with known characteristics is usually a non-trivial task. In contrast, the proposed method allows for a transparent and intuitive modeling of the relations contained in the test data. For the presented approach, we utilize a general-purpose data generator: It relies on easy to understand subgroup patterns which are mapped to a Bayesian network representing the data generation model. The data generation phase is embedded into an incremental process for quality control and adaptation of the generated test cases. We provide a case study in the biological domain exemplifying the presented approach. Introduction The use of test cases for empirical testing is probably the most natural method for the validation of intelligent systems (Preece 1998). For this black-box testing method the intelligent system derives new solutions for previously solved test cases and compares the solutions stored in the cases with the derived solutions. As a result of the method appropriate measures like precision/recall can be used to evaluate the validity of the system. Among others, the use of empirical testing is useful for the following reasons: • Removing communication errors: Defining test cases before the implementation of the knowledge base can be very helpful for removing communication errors between the developer and the domain specialist. • Detecting side effects: When extending the knowledge base the definition of (new) test cases is helpful for identifying unwanted interference of old and new problemsolving knowledge. • The actual validation of knowledge using a suite of test cases covering the broad spectrum of the system use. However, the manual creation of test cases is a complex task and is therefore often ignored or accomplished with decreased interest due to the primary complexity of building the knowledge base. Therefore, a number of methods have been proposed to decrease the acquisition costs of the test knowledge by automatically generating test cases, e.g., (Knauf, Gonzalez, & Background In this section we first introduce the used knowledge representation and describe the basics of subgroup patterns. After that, we give an informal introduction to Bayesian networks. Finally, we introduce the principles of empirical testing and their challenges. General Definitions Let ΩA the set of all attributes. For each attribute a ∈ ΩA a range dom(a) of (nominal) values is defined. Furthermore, we assume VA to be the (universal) set of attribute values of the form (a = v), where a ∈ ΩA is an attribute and v ∈ dom(a) is an assignable value. A case c = {a1 = v1 , . . . , ak = vk }, (ai = vi ) ∈ VA , is a set of individual attribute values. These attribute values comprise of two parts: First, we distinguish the problem description of c 2007, American Association for Artificial IntelliCopyright gence (www.aaai.org). All rights reserved. 518 the case, i.e., the input to the knowledge system containing the findings of the case. Second, the attribute values additionally contain at least one solution for the case, denoting the output of the intelligent system. Let CB be the case base containing all available cases. Empirical Testing During the development of the system the test suite is used to check the validity of the created/modified knowledge. Here, the test cases are subsequently given to the system as input, and the derived solution is compared with the solution already stored in the test case. Measures like precision, recall, and the E-measure are commonly used for the comparison of the solutions. Subgroup Patterns Subgroup patterns (Klösgen 2002; Atzmueller, Puppe, & Buscher 2005), often provided by conjunctive rules, describe ’interesting’ subgroups of cases, e.g., "the subgroup of 16-25 year old men that own a sports car are more likely to pay high insurance rates than the people in the reference population." Subgroups are then described by relations between independent (explaining) variables (Sex=male, Age=16-25, Car type=Sportscar) and a dependent (target) variable (Insurance rate). The independent variables are modeled by selection expressions on sets of attribute values. In the context of this paper, the target variables usually correspond to the solutions of a case, and the independent variables refer to specific findings. A subgroup pattern relies on the following properties: the subgroup description language, the subgroup size, the target share of the subgroup, and the target variable. In the context of this work we focus on binary target variables. Coverage of Generated Cases An important property of the test suite is its coverage with respect to the tested knowledge. For rule-based knowledge representations appropriate methods have been investigated thoroughly in the past, cf. (Barr 1999; Knauf, Gonzalez, & Abel 2002). Since we propose a generation method that is independent of the inspected knowledge base, we cannot guarantee a suitable cover of the knowledge to be tested. For this reason, we propose a post-processing step to be applied after the test case generation: We run the test suite and create a simple statistics on the knowledge elements used, e.g., for rulebased systems we count the rules that were actually used. If the coverage of the test suite is found to be unsatisfactory, then we propose a collection of input values that should also be included in the test cases (they are extracted from the unused knowledge elements). These new input values can serve as input for a second (parametrized) generation phase. Definition 1 (Subgroup Description) A subgroup description sd = {e1 , . . . , en } consists of a set of selection expressions (selectors) that are selections on domains of attributes, i.e., ei = (ai , Vi ), where ai ∈ ΩA , Vi ⊆ dom(ai ). A subgroup description is defined by the conjunction of its contained selection expressions. The Test Case Generation Process In the following we define the incremental process model for generating test case data, which is discussed in more detail in the following subsections. We apply a generation model given by a Bayesian network as the underlying knowledge representation: Using the network, we are able to define the dependency relations between the individual attributes and the solutions capturing the specific data characteristics. These data characteristics correspond to the goal characteristics of the test cases. Using the generation model, we can generate the output data quite easily, e.g., by applying sampling algorithms for Bayesian networks, e.g., (Jensen 1996). The difficult part is the construction of the Bayesian network itself which is usually non-trivial: The nodes of the network with the attached conditional probability tables are only easy to model at the local (node) level. In contrast, entering all the conditional probabilities is often a difficult problem, e.g., if relations between nodes need to be considered that are not directly connected. Therefore, we aim to help the user in an incremental process, where the data characteristics can be described from abstract to more specific ones. Parts of the generation model can be described using subgroup patterns which are structurally mapped to the defined Bayesian network in turn. Alternatively, the Bayesian network can be defined or modified directly with an interactive adaptation step using the given (subgroup pattern) constraints as test knowledge. The process consists of the following steps that are also depicted in Figure 1 (read from left to right): 1. Define Domain Ontology: We first define the domain ontology, i.e., the set of attributes (inputs and solutions) and attribute values used for the test case generation. The description language specifies the individuals belonging to a subgroup s = {c | c ∈ CB ∧ sd(c) = true}: The latter is then given by the set of cases c that fulfill the subgroup description sd(s), and are thus contained in the respective subgroup s. The parameters subgroup size n and target share p are used to measure the interestingness of a subgroup, where n is the number of cases contained {c | c ∈ s}; the target share in a subgroup s, i.e., n = p = n1 · {c | c ∈ s ∧ t ∈ c} specifies the share of subgroup cases that contain the (dependent) target variable t ∈ VA . Bayesian Networks A Bayesian network consists of a set of attributes and a set of directed edges connecting the attributes, e.g., (Jensen 1996). For each attribute a the range dom(a) has to contain a finite set of distinct values. A directed acyclic graph is defined by the attributes and the set of edges inducing dependency relations between pairs of attributes. For each attribute a and its parents pa(a), induced by the edges, a conditional probability table (CPT) is attached. For an attribute with no parents an unconditioned prior probability is used. Empirical Testing and Case Coverage The building block of empirical testing is a suite of test cases, i.e., a collection of representative cases with correct solutions covering the (expected) application of the intelligent system. 519 Define Exclusion Test-Patterns Define Domain Ontology Generate Data Data Model Define Association Test-Patterns Case Base Case Replay – Correct? Adapt Specification No Yes Test Case Selection Check/ Optimize Model Figure 1: Process model for the case generation. 2. Specify Data Generation Model: The Bayesian network or fragments of the network can either be specified manually, or they can be generated automatically by using subgroup patterns. More specifically, for test case generation we need to model sets of attribute values (association test-patterns) that occur together with a specific solution (target variable), and attribute values that do never occur with a certain solution (target variable, i.e., exclusion test-patterns. Both association and exclusion test-patterns are modeled with respect to a specific solution (target variable) in relation to a set of attribute values (independent variables). The defined patterns and network fragments are then structurally merged into the generation model for the output generation. The relations between the variables in the network are represented by connections of the individual nodes. Using the subgroup parameters, constraints are derived in order to check these relations. Such constraints can also be supplied by the user directly. 3. Optimize Model: Initially, the conditional probability tables of the nodes contained in the Bayesian network, i.e., the generation model, are initialized with default values. Since the strengths of the modeled relations depend on these, the parameters of the relations are usually not correct initially. Therefore, the model is tested given the available set of constraints, and an optimization step is applied in order to adjust the conditional probability tables contained in the network. If the model fits the constraints, then the process is finished, and the data generation model is ready for use. As an alternative to the optimization step, the user can also either adapt the patterns/constraints, modify the network structure, or try to edit the conditional probability tables by hand. 4. Generate Data: After the generation model has been fit to the specification of the user, the final generation step is performed using a sampling algorithm. Given the network we apply the prior-sample algorithm (Jensen 1996): In a top-down algorithm for every node a value is computed according to the values of its parent nodes. 5. Case Replay and Test: After the prototypical test cases have been generated they are used in an initial empirical testing phase. 6. Adapt Specification: Based on the results of the previous steps, usually the modeled patterns need to be adapted when there are errors with respect to the generation model or the knowledge base: We apply certain adaptation operators in order to handle incorrectly solved cases. Sometimes the knowledge base needs to be adapted, or a generated case needs to be removed completely. We will discuss the applicable set of operators in detail below. 7. Test Case Selection: After the final set of test cases has been obtained, we apply a selection technique in order to arrive at a more diverse set of cases, i.e., cases with a diverse coverage of the problem space. Then, the result set of cases is inserted into the test suite. In the following sections we first give an overview of the approximation of the generation model, and we then describe the particular phases of the process model in more detail. Overview: Approximating the Generation Model A collection of subgroup patterns describe dependencies between a target solution and a set of findings. Using such subgroup patterns a two layered network can be constructed automatically. Either the target variable can be designated as the parent of the explaining variables, or as its child. For each target variable a constraint is generated using the specified total target share considering the entire population, i.e., the prior probability of the target variable p(t). Constraints for the parameters of the subgroup patterns are based on the contained target variable t and the set of selectors in the subgroup description sd = {e1 , e2 , . . . , en }. Two constraints are generated for each pattern, i.e., the subgroup size equivalent to the joint probability p(e1 , e2 , . . . , ek ), ei ∈ sd, and the target share of the subgroup equivalent to the conditional probability p(t | e1 , e2 , . . . , ek ), ei ∈ sd. Using the subgroup patterns defined for a target variable the user can select from two basic network structures that are generated automatically, if the respective nodes contained in the subgroup description are not already contained in the network: Either the target variable is the parent of the subgroup selectors, cf. Figure 2(a), or the target variable is the child of the subgroup selectors, cf. Figure 2(b). The first figure depicts the relation ’IF target TV THEN selectors Si ’; the latter models the inverse relation ’IF selectors Si THEN target TV’. Both structures have certain advantages and drawbacks concerning the optimization step: Option (a) includes a simple definition of the prior probability of the target variable; using the CPTs of the children selectors it is very easy to adapt the parameter subgroup size. On the other hand, op- 520 S1 TV S1 S2 ... SN (a) If target TV, then selectors Si S2 ... Constraint-based Model Optimization (Phase 3) SN For the details of the constraint-based optimization method, we refer to (Atzmueller et al. 2006). In summary, we need to compute arbitrary joint and conditional probabilities in the network, check the available set of constraints, and apply a hill-climbing constraint satisfaction problem solver, that tries to fit the model to the constraints minimizing a global error function. After the CSP-solver has been applied the resulting state of the model can be controlled by the user interactively: The deviations of the defined patterns and the patterns included (implicitly) in the network are compared. Then, the model is tuned if necessary: The automatic optimization process is either restarted, or the user can interact manually by adapting the conditional probability tables of the network, or by modifying the structure of the Bayesian network. TV (b) If selectors Si , then target TV Figure 2: Possible network structures for modeling the dependency relations between the target (dependent) variable and the selectors (independent variables). tion (b) allows for an easier adaptation of the subgroup target shares (contained as values in the CPT of the target variable). Furthermore, option (b) typically results in larger CPTs. This allows for better adaptation possibilities for the optimization algorithm. However, the size of the CPT of the target is exponential with respect to the number of parent selectors. Test Case Generation (Phase 4) In the case generation phase we use the prior-sample algorithm (Jensen 1996) also known as forward-sampling: In a top-down algorithm for every node a value is computed according to the values of its parent nodes. Then, we generate a set of test cases of arbitrary size corresponding to the data characteristics that are modeled in the Bayesian network. Modeling Test Patterns (Phase 2a) As outlined above, a test pattern, i.e., either an association or an exclusion pattern, is modeled with respect to a specific solution (target variable) relating a set of attribute values (independent variables) to it. For example, if a specific solution should be derived categorically with respect to a set of input findings, then a target share p = 1 is assigned to the corresponding subgroup pattern, i.e., the solution (target concept) is always established. Otherwise, for a probabilistic relation the target share is given by p ∈ [0; 1]. The second important parameter of a subgroup pattern is given by the (relative) subgroup size that controls the generality of the pattern (rule): It corresponds to the frequency (support) the pattern will be observed in the generated test data set. If the solutions are independent from each other, then we can build a model for each solution following a generative programming approach. We then need to merge the sets of test cases into the final suite of test cases. Such an approach can significantly reduce the complexity of the data generation models in a divide-and-conquer approach. Therefore, the practical applicability and tractability is often increased. Case Replay and Test (Phase 5) The empirical testing phase was implemented with an automated test tool (Baumeister, Seipel, & Puppe 2004). Essentially, we provide all generated cases to the tool, then the tests are processed automatically. Visual feedback is given by a colored status bar, which turns green, if all tests have passed successfully. An error is reported by a red status bar, i.e., if for some of the tests a different set of solutions was derived compared to the solutions stored in the respective generated case. This metaphor of a colored status bar was adopted from automated testing techniques known in software engineering. Furthermore, a more detailed report of each test result is presented. Figure 3 shows an example of the automated test tool with an incorrectly processed test suite; we will give more details in the case study. Adaptation and Refinement (Phase 6) In general, it will be likely that some of the generated test cases fail during the empirical testing phase. An automatically generated test case can fail because of the following reasons: 1. Deficient test generation knowledge: The problem description of a case is incorrect due to incorrectly/incompletely defined test generation knowledge, i.e., the subgroup patterns and/or parts of the network are incorrect/incomplete. The developer can choose from a set of different possible operators: • Extend pattern base: Add missing relations to pattern base. This action is often applicable if there are at least two factors with a meaningful missing dependency relation, and if this relation is necessary for the model. • Modify patterns: Alternatively, it may be sufficient to modify existing patterns, i.e., adapt the subgroup parameters. Modeling of a Bayesian Network (Phase 2b) In addition to the specification of a set of subgroup patterns the user can also define the Bayesian network directly by connecting the nodes/attributes. Additionally, the conditional probability tables of the nodes in the network can be adapted. If additional nodes are entered manually, then the entries of the conditional probability tables need to be specified. However, this step is usually the most difficult one. In an advanced step the network structure can be enriched using hidden nodes that enable further possibilities for data generation: Hidden nodes are used to add constraining relations of the active nodes that are used for case generation. The hidden nodes are then used for the case generation step, but they are not visible in the generated test cases. 521 For assessing the similarity of the query case q and a retrieved case c, we use the well-known matching features similarity function. sim(c, c ) = |{a ∈ ΩA : πa (c) = πa (c )}| , ΩA (1) where πa (c) returns the value of attribute a in case c. The diversity of a set of test cases TC = {ci }k of size k is then measured according to the measure diversity(TC ): k−1 diversity(TC ) = k i=1 j=i+1 1 − sim(ci , cj ) k · (k − 1)/2 . (2) We select the k most diverse test cases from the generated set of n test cases with k ≤ n. Case Study Figure 3: The d3web.KnowME testing tool. The applicability of the presented approach was evaluated using a rule base from the biological domain: The plant system (Puppe 1998) is a consultation system for the identification of the most common flowering plants vegetating in Franconia, Germany. For a classification of a given plant the user enters findings concerning the flowers, leaves and trunks of the particular plant. The plant knowledge base defines 39 solutions and 40 inputs; each input has a discrete range of possible finding values. We generated test data for several selected solutions. The case study was implemented using a special data generator plugin of the VIKAMINE system (Atzmueller & Puppe 2005). In the following we give an exemplary introduction into the modeling issues with respect to one solution of the plant knowledge base, i.e., the plant Camomile (Matricaria Inodora). Figure 4 shows the structure of the (generated) Bayesian network for the plant Camomile. • Modify data generation model: The developer can modify the Bayesian network directly, if some dependencies can be handled more easily by modifying the structure of the network. Then, commonly a hidden node is introduced in order to add a exclusion relation between two solutions. Consequently, either edges between nodes can be added or removed, or hidden nodes can be introduced. Additionally, (simple) changes of the conditional probability tables can also often be performed manually. • Remove case: Sometimes the modification of the test knowledge as described above is too costly for the convenient creation of test cases. As a trade-off between the (potentially) increased complexity of the generation model and the generation of successful test cases we also propose to simply delete cases, that do not fit into the expectations of the developer. 2. Deficient knowledge base: The failed test case uncovered an incorrect/unexpected behavior of the knowledge base. In contrast to classical test case generation methods, this method is able to uncover missing relations in the knowledge base (whereas the classical methods only can detect incorrect knowledge). • Refinement: Extend or modify the knowledge base with respect to the missing or incorrect relations. Sometimes, it might also be useful to adapt the test knowledge patterns in order to take care of the uncovered relations. Figure 4: Generation model for the solution Camomile (Matricaria Inodora) - in german. Test Case Selection (Phase 7) In order to select a representative but diverse set of test cases, we can apply a post-selection step on the total set of generated test cases. By applying this step we can ensure that the problem descriptions of a selected subset of the generated test cases are not too similar to each other. We use a technique adapted from case-based reasoning (CBR) for measuring the diversity of a set of cases, c.f. (Smyth & McClave 2001). In the case study we can make the independency assumption between the particular solutions, i.e., plants. This significantly simplifies the modeling effort for each solution, and also increases the understandability and maintainability of the constructed model. We only consider single findings that are counted positive for deriving the solution (association patterns), and exclusion patterns made up of findings that are negative for inferring the solution. Therefore, we 522 Conclusion opted for the dependency structure for which the target is the parent of the subgroup selectors, i.e., the findings, since the combinations of these findings do not need to be modeled explicitly in the network. In Tables 1 and 2, we show examples of the positive and negative factors for deriving the solution Camomile (Matricaria Inodora), respectively. We denote the strength of the relations by the symbols ’P1’, ’P2’, ’P3’ which denote positive categories in ascending order, and the equivalent ’N1’, ’N2’, ’N3’ for the negatives categories: Input Finding Blossom Type: Daisy family Blossom Color: White Blossom leaf edge: Not serrated Coarse Blossom Type: Star-shaped Chalice Color: Yellow In this paper we presented a novel process model for test case generation based on modeling simple subgroup patterns. The approach makes no use of the underlying knowledge representation, and is especially useful to be integrated into test-first development methodologies. We successfully demonstrated the method in a case study implemented in the biological domain. In the future, we are planning to enhance the postprocessing of generated cases by automatically proposing appropriate adaptation operators for handling cases that were incorrectly solved in the initial empirical testing phase. Strength P3 P3 P2 P2 P1 References Atzmueller, M., and Puppe, F. 2005. Semi-Automatic Visual Subgroup Mining using VIKAMINE. Journal of Universal Computer Science (JUCS), Special Issue on Visual Data Mining 11(11):1752–1765. Atzmueller, M.; Baumeister, J.; Goller, M.; and Puppe, F. 2006. A Datagenerator for Evaluating Machine Learning Methods. Journal Künstliche Intelligenz 03/06:57–63. Atzmueller, M.; Puppe, F.; and Buscher, H.-P. 2005. Exploiting Background Knowledge for Knowledge-Intensive Subgroup Discovery. In Proc. 19th Intl. Joint Conference on Artificial Intelligence (IJCAI-05), 647–652. Barr, V. 1999. Applications of Rule-Base Coverage Measures to Expert System Evaluation. Knowledge-Based Systems 12:27–35. Baumeister, J.; Seipel, D.; and Puppe, F. 2004. Using Automated Tests and Restructuring Methods for an Agile Development of Diagnostic Knowledge Systems. In Proc. of 17th FLAIRS, 319–324. AAAI Press. Gupta, U. G., and Biegel, J. 1990. A Rule–Based Intelligent Test Case Generator. In Proc. AAAI–90 Workshop on Knowledge–Based System Verification, Validation and Testing. AAAI Press. Jensen, F. V. 1996. An Introduction to Bayesian Networks. London, England: UCL Press. Klösgen, W. 2002. Handbook of Data Mining and Knowledge Discovery. Oxford University Press, New York. chapter 16.3: Subgroup Discovery. Knauf, R.; Gonzalez, A. J.; and Abel, T. 2002. A Framework for Validation of Rule-Based Systems. IEEE Transactions of Systems, Man and Cybernetics - Part B: Cybernetics 32(3):281–295. Preece, A. 1998. Building the Right System Right. In Proc. KAW’98, 11th Workshop on Knowledge Acquisition, Modeling and Management. Puppe, F. 1998. Knowledge Reuse among Diagnostic Problem-Solving Methods in the Shell-Kit D3. Intl. Journal of Human-Computer Studies 49:627–649. Smyth, B., and McClave, P. 2001. Similarity vs. Diversity. In Proc. 4th Intl. Conference on Case-Based Reasoning (ICCBR 01), 347–361. Berlin: Springer Verlag. Table 1: Single factors that are positive for deriving the solution Camomile. Input Finding Leaf Cirrus: Exist Stipe Spine: Exist Leaf edge: With spines Leaf edge: Not kibbled Blossom Type: Not dandelion Strength N3 N3 N2 N2 N1 Table 2: Single factors that are negative for deriving the solution Camomile. For this solution we first generated 100 test cases only including the positive factors. This resulted in 54 falsely solved cases, since the inhibiting factors were missing. After that, exclusion relations were included resulting in 27 incorrectly solved cases. By further optimizing the pattern constraints we obtained 100 test cases with only 9 incorrect cases which were removed after a manual inspection. After Figure 5: Three generated cases for the solution Camomile. that, the model was not further optimized by including additional patterns in order to guarantee its concise and clear semantics. These points are usually important for further extensions and maintenance. Figure 5 shows a screenshot of generated cases. 523