3. Test Prioritization

Test Prioritization Using System Models Bogdan Korel Computer Science Department Illinois Institute of Technology Chicago, IL 60616, USA Luay H. Tahat Lucent Technologies Bell Labs Innovations Naperville, IL 60566, USA Mark Harman King’s College London Strand, London korel@iit.edu ltahat@lucent.com Mark@dcs.kcl.ac.uk Abstract During regression testing, a modified system is retested using the existing test suite. Because the size of the test suite may be very large, testers are interested in detecting faults in the system as early as possible during the retesting process. Test prioritization tries to order test cases for execution so the chances of early detection of faults during retesting are increased. The existing prioritization methods are based on the code of the system. In this paper, we present test prioritization that is based on the system model. System modeling is a widely used technique to model state-based systems. In this paper, we present methods of test prioritization based on state-based models after changes to the model and the system. The model is executed for the test suite and information about model execution is used to prioritize tests. Execution of a model is inexpensive as compared to execution of the system, therefore the overhead associated with test prioritization is relatively small. In addition, we present an analytical framework for evaluation of test prioritization methods whereas the existing evaluation framework is based on experimentation (observation). We have performed an experimental study in which we compared different test prioritization methods. The results of the experimental study suggest that system models may improve the effectiveness of test prioritization with respect to early fault detection. 1. Introduction During maintenance of evolving software systems, their specification and implementation are changed to fix faults, to add new functionality, to change the existing functionality, etc. Regression testing is the process of validating that the changes introduced in a system are correct and do not adversely affect the unchanged portion of the system. During regression testing, new test cases are frequently generated, but also previously developed test cases are deployed for revalidating a modified system. Regression testing tends to consume a large amount of time and computing resources, especially for large software systems. WC2R 2LS, UK There has been a significant amount of research on regression testing. There exist two types of regression testing: code-based and specification-based. Most regression testing methods are code-based, e.g., [3, 6, 7, 10, 11, 14, 17]. Specification-based regression testing methods [1, 8, 9, 15] use system models to select and generate test cases related to the modification. The goal of these methods is to test the modification of the system. During regression testing, after testing the modified part of the system, the modified system needs to be retested using the existing test suite, to have confidence that the system does not have faults. Because of the large size of a test suite, system retesting may be very expensive, it may last hours, days or even weeks. Test prioritization orders tests, in the test suite, for “execution” in such a way that faults in the system are uncovered early during the retesting process. Test prioritization methods [12, 13, 18] order tests in the test suite according to some criterion, e.g., a code coverage is achieved at the fastest rate. Test cases are then executed in this prioritized order: test cases with higher priority, based on the prioritization criterion, are executed first whereas test cases with the lower priority are executed later. The existing test prioritization techniques use mainly the source code to prioritize tests. These methods may require re-execution of the system for the whole or partial test suite to collect information about the system behavior that is used in test prioritization. System modeling is a new emerging technology. System models are created to capture different aspects of the system behavior. Several modeling languages have been developed to model state-based software systems, e.g., State Charts, Extended Finite State Machine (EFSM) [2], and Specification Description Language (SDL) [5]. System modeling is very popular for modeling state-based systems, e.g., computer communications systems, industrial control systems, etc. System models are used in the development process, e.g., in partial code generation, or in the testing process to design test cases. In recent years, several modelbased test generation [2, 4, 16] and test suite reduction [8] techniques have been developed. In this paper, we present model-based test prioritization in which the original and modified system models together with information collected during execution of the modified model on the test suite are used to prioritize tests for retesting of the modified software system. The goal of model-based test prioritization is for early fault detection in the modified system, where the faults of interest are faults in models and faults in implementations of model changes in the system. Execution of the model is very fast compared to the execution of the actual system. Therefore, execution of the model for the whole test suite is relatively inexpensive, whereas execution of the system for the whole test suite may be very expensive (both resource-wise and time-wise). In this paper, we present two model based test prioritization methods: selective test prioritization and model dependence-based test prioritization. In addition, we present a framework for comparison of test prioritization methods with respect to the effectiveness of early fault detection. The existing approach of evaluation of test prioritization methods is based on observation (experimentation). In this paper, we present an analytical approach of evaluation of test prioritization methods. An experimental study was performed to compare effectiveness of the presented test prioritization methods. The results of the experimental study suggest that model-based test prioritization may be a good complement to the existing code-based test prioritization techniques. The paper is organized as follows: Section 2 provides an overview of the system modeling, Section 3 presents the problem of test prioritization, Section 4 presents modelbased test prioritization methods. In Section 5, an analytical framework for comparison of test prioritization methods is discussed. In Section 6, the experimental study is presented. In Conclusions, future research is discussed. 2. System Modeling In this paper, we concentrate on EFSM system models. However, our approach can be extended to other modeling languages, e.g., SDL [5]. EFSM [2] is very popular for modeling state-based systems. An EFSM consists of states and transitions between states. The following elements are associated with each transition: an event, a condition, and a sequence of actions. A transition is triggered when an event occurs and a condition (a Boolean predicate) associated with the transition evaluates to true. When a transition is triggered, an action(s) associated with the transition is executed. EFSM models are diagrammatically represented as graphs where states are represented as nodes and transitions as directed edges between states. A simplified EFSM model of an ATM system is shown in Figure 1. In this model, for example, transition labeled T 4 is triggered when the system is in state S1, event PIN(p) is received and the value of parameter p equals to variable pin. When the transition is triggered, “Display menu” action is executed. Continue/Print b; Display menu PIN(p)[(p != pin) and (attempts < 3)]/ Display error; attempts = attempts+1; Prompt for PIN; Withdrawal(w)[w<=b]/ b=b-w T11 T7 T10 T2 Start Card(pin, b)/ Prompt for PIN; T1 attempts = 0 S1 PIN(p) [p == pin]/ Display menu T4 Withdrawal(w)[w>=b]/ Display error T6 S2 S3 Deposit(d)/ b=b+d T9 PIN(p) [(p != pin) and (attempts == 3)]/ Display error; Eject card; Exit/ Eject card Balance/ Display b T8 T3 Exit Figure 1. A sample model In this paper, we assume that system models are executable, i.e., enough detail is provided so the model may be executed. In order to support model execution, some actions may not be implemented (they represent “empty” actions). However, all actions are implemented during the development of the system. A model input (a test) consists of a sequence of events with input values. The following is a sample test for the model of Figure 1: t: Card(1234,100), PIN(1234), Deposit(20), Continue(), Withdrawal(50), Continue(), Exit() When the model is executed on t, the following sequence of transitions is executed: T1, T4, T6, T7, T11, T7, T8. 3. Test Prioritization In this paper, we consider test prioritization with respect to early fault detection [13]. The goal is to increase the likelihood of revealing faults earlier during execution of the prioritized test suite. Let TS={t1, …, tN} be a test suite of size N, where ti is a test case. Let D(TS)={d1,…, dL} be a set of faults in the system that are detected by test suite TS. Let TS(d)  TS be a set of tests that fail because of fault d. Let S  ti1 , ti2 ,..., tik ,..., tiN  be a prioritized sequence of tests of test suite TS, where the subscript indicates the position of a test in the sequence, e.g., test t i1 is in position 1, test t i2 is in position 2, etc. Let t ik  TS(d) be the first failed test in sequence S caused by fault d, i.e., all tests ti1 ,..., tik 1 in S between position 1 and k-1 do not fail because of d. Let pS(d)= k be the position of t ik , i.e., the first position of the failed test in S caused by fault d. Let rpS(d) be the first relative position of the failed test in S caused by fault d, where rpS(d) is computed as follows: p (d ) (3.1) rp ( d )  S S N Notice rpS(d) represents the test suite fraction at which d is detected and its values range between 0  rpS(d)  1. The rate of fault detection [13] is a measure of how rapidly a prioritized test sequence detects faults. This measure is a function of the percentage of faults detected in terms of the test suite fraction, i.e., a relative position in the test suite. More formally, let P(S)=<rpS(d1),…,rpS(dL)> be a list of relative positions of first failed tests for all faults in D(TS). Notice that at the same position in S more than one fault may be detected, therefore, some positions in P(S) may have the same value. Let F(S)=<rp1,…,rpq>, q ≤ L, be an ordered (in ascending order) sequence of all unique first relative positions from P(S), where rpi represents the test suite fraction at which at least one fault is detected in S. F(S) represents an order in which faults are uncovered by test sequence S. The rate of fault detection RFD(S) can be defined as a sequence of pairs (rpi,fdi), RFD(S)=<(rp1,fd1),…,(rpq,fdq)>, where rpi is an element of F(S), and fdi is the cumulative percentage of faults detected at position rpi in F(S) and is computed as follows: i  nd S (rp j ) fd i  j 1 (3.2)  100 % | D (TS ) | where, ndS(rpj) is the number of faults detected at the relative position rpj in S. For example, suppose test suite TS={t1, t2, t3, t4, t5, t6, t7, t8, t9, t10} consists of 10 tests that detect four faults D(TS)={d1, d2, d3, d4} in a system. The following tests fail because of individual faults: TS(d1)={t5, t7}, TS(d2)={t3, t7}, TS(d3)={t5}, and TS(d4)={t3, t9}. Let S1=< t1, t2, t3, t4, t5, t6, t7, t8, t9, t10> and S2=< t10, t9, t8, t3, t5, t7, t4, t6, t2, t1> be two prioritized test sequences. The rates of fault detection for S1 and S2 can be represented by the following table: S1 S2 fd: % of detected faults rp: Test suite fraction fd: % of detected faults 50% 0.3 25% 100% 0.5 50% rp: Test suite fraction 0.20 0.4 100 % 0.50 The goal of test prioritization is to order tests of TS for execution so that the likelihood of improving the rate of fault detection of faults in D(TS) is increased [13]. In order to measure how rapidly a prioritized test sequence detects faults during the execution of sequence S, a weighted average of the percentage of faults detected, APFD(S), was introduced [13]. For a given rate of fault detection RFT(S) = <(rp1,fd1),…, (rpq,fdq)>, APFD(S) is computed as: q 1 APFD ( S )   ( fd i 1  fd i )( 2  rpi 1  rpi ) i 0 (3.3) 2 where (rp0,fd0)=(0,0) and (rpq+1,fdq+1)=(1,100). The values of APFD(S) range from 0 to 100, where higher APFD(S) value means faster (better) fault detection rate. For two sequences S1 and S2 presented earlier, APFD(S1)=72.5% and APFD(S2)=67.5%. Sequence S1 leads to a higher rate of fault detection than S2. The simplest test prioritization method is random test prioritization where test cases are ordered randomly. For a test suite of size N, there are N ! possible test sequences. Random prioritization selects randomly one of these sequences. Random test prioritization may be viewed as a “no test prioritization” approach and may be treated as a base-line for comparison with other test prioritization methods. 4. Model-based test prioritization Changes in specifications frequently lead to changes in EFSM system models. The idea of model-based test prioritization is to use the original model and the modified model to identify a difference between these models. The modified model is executed for the whole test suite to collect information related to the difference. The collected information is then used to prioritize the test suite. The goal of model-based test prioritization is for early fault detection in the modified system, where the faults of interest are faults in models and faults in implementations of model changes in the system. Notice that system models frequently do not produce any observable outputs (or only partial outputs), therefore, detecting model faults by executing models on a test suite may not be appropriate. Model checking methods are typically used to detect model faults, but these methods detect only a limited class of model faults. Model-based test prioritization presented in this paper can be used for any modification of the EFSM system model. The approach uses the original model Mo and the modified model Mm and automatically identifies the difference [8] between these models, where the difference is as a set of elementary model modifications. There are two types of elementary modifications: a transition addition and a transition deletion. As a result, a difference between models Mo and Mm is represented by a set Ra of added transitions and a set Rd of deleted transitions. When elementary modifications of sets Ra and Rd are applied to the original model Mo, the resulting model is the modified model Mm. Any complex modification to the model can be represented by these two sets. Notice that an addition of a new state or a deletion of an existing state is not considered as an elementary modification because an addition or a deletion of a state is always associated with an addition or a deletion of a transition, respectively. For example, a difference between the original model of Figure 1 and the modified model of Figure 2 is: deletion of transition T11 and addition of transition T 12, i.e., Ra ={T12} and Rd ={T11}. Transition T11 does not exist any more in the modified model, and it is shown in Figure 2 as a dashed line only for presentation purposes. Continue/Print b; Display menu Withdrawal(w)[w<=b]/ b=b-w PIN(p)[(p != pin) and (attempts < 3)]/ Display error; attempts = attempts+1; Prompt for PIN; T11 T7 T10 T2 Start Card(pin, b)/ Prompt for PIN; attempts = 0 T1 S1 PIN(p) [p == pin]/ Display menu T12 T4 Withdrawal(w)[w>=b]/ Display error Withdrawal(w)[w<b]/ b=b-w S2 S3 T6 Deposit(d)/ b=b+d T9 PIN(p) [(p != pin) and (attempts == 3)]/ Display error; Eject card; Exit/ Eject card Balance/ Display b T8 T3 Exit Figure 2. A modified model of Figure 1 After the difference between the original and the modified model is identified, the modified model is executed for test suite TS to collect different types of information that is used to prioritize tests for retesting of the modified system. Depending on information being used for test prioritization, different prioritization methods may be developed. In this paper, we present two model-based test prioritization methods: selective test prioritization and model dependence-based test prioritization. 4.1. Selective test prioritization The idea of selective test prioritization is to assign a high priority to tests that execute modified transitions in the modified model. Low priority is assigned to tests that do not execute any modified transition. Let TSH be a set of high priority tests and TSL be a set of low priority tests. Sets TSH and TSL are disjoint and TS = TSH  TSL. Notice that information about executed added/deleted transitions may be also used in regression test selection, but in this paper we concentrate only on using this information for test suite prioritization. We present two versions of the selective test prioritization in order to investigate their effectiveness in early detection of faults. Version I: In this version, modified transitions of Mm are represented only by added transitions of Ra. Since deleted transitions of Rd do not exist in the modified model, they are ignored. Every transition T  Ra is selected for monitoring during execution of Mm on test suite TS. Let t be a test and T(t)=< Ti1 ,..., Ti n > be a sequence of transitions traversed during execution of the model on t. If during execution of modified model Mm on test t, transition T  Ra is executed, a high priority is assigned to t, i.e., t  TSH. Otherwise, a low priority is assigned to t, i.e., t  TSL. For example, consider the following three tests and the corresponding sequences of transitions traversed during execution of the modified model of Figure 2: t1: Card(12,10), PIN(12), Withdrawal(5), Continue(), Exit() t2: Card(12, 10), PIN(12), Withdraw(15), Continue(), Exit() t3: Card(12, 10), PIN(12), Withdraw(10), Continue(), Exit() T(t1) = <T1,T4,T12,T7,T8>, T(t2) = <T1,T4,T10,T7,T8>, T(t3) = <T1,T4,T10,T7,T8>. A set of added transitions for the modified model of Figure 2 is Ra={T12}. Based on the execution of these tests, the following high and low priority tests are identified: TSH={t1} and TSL={t2, t3}. Since T12 is executed on test t1, a high priority is assigned to this test. Version II: In this version, modified transitions of Mm are represented by added and deleted transitions, i.e., transitions in Ra and Rb. These transitions are selected for monitoring. If during execution of modified model Mm on test t, transition T that is in Ra or Rd is executed, a high priority is assigned to test t, i.e., t  TSH. Otherwise, a low priority is assigned to test t, i.e., t  TSL. For example, when the modified model of Figure 2 is executed on tests t1, t2, and t3 presented in Version I, sequences of transitions for t1 and t2 are the same, but for t3 transition sequence T(t3)=<T1, T4, T10, (T11), T7, T8> is different, i.e., deleted transition T11 is executed, indicated in parentheses, together with transition T10. As a result, a high priority is assigned to t3, i.e., TSH={t1, t3}, TSL={t2}. During system retesting that is based on Version I or Version II, tests with high priority are executed first followed by execution of low priority tests. High priority tests and low priority tests are ordered using random ordering. The algorithm for selective test prioritization is shown in Figure 3. In the first step, high priority tests are ordered randomly (lines 1-4), then low priority test are ordered randomly (lines 5-8) in prioritized test sequence S. Input: A set of high priority tests: TSH A set of low priority tests: TSL Output: Prioritized test sequence: S 1 2 3 4 5 6 7 8 9 for p=1 to |TSH| do Select randomly and remove test t from TSH Insert t into S at position p endfor for p=1 to |TSL| do Select randomly and remove test t from TSL Insert t into S at position p + |TSH| endfor Output S Figure 3. Selective test prioritization algorithm Notice that in Version II additional instrumentation of the model is required to capture the execution of deleted transitions because deleted transitions do not exist in the modified model. When the model, during its execution, is in a state from which the deleted transition was outgoing, it is possible to capture traversal of the deleted transition when the event associated with the deleted transition is generated and the enabling condition of the deleted transition evaluates to true. 4.2. Model dependence-based test prioritization there exists a control dependence between Tim and Tik , and In this section, we present an approach in which high priority tests TSH are prioritized using model dependence analysis. We concentrate on high priority tests TSH identified by the Version II of selective prioritization. The idea of model dependence-based test prioritization is to use model dependence analysis [8] to identify different ways in which added and deleted transitions interact with the remaining parts of the model and use this information to prioritize high priority tests. In the model dependence analysis there are two types of dependences that may exist in the model: data dependence and control dependence. These model dependences are between transitions and represent potential “interactions” between them. A data dependence captures the notion that one transition defines a value to a variable and another transition may potentially use this value. There exists data dependence between transitions Ti and Tk [8] if transition Ti modifies value of variable v, transition Tk uses v, and there exists a path (transition sequence) in the model from T i to Tk along which v is not modified. For example, there exists data dependence between transitions T 1 and T11 in the model of Figure 1 because transition T 1 assigns a value to variable b, transition T11 uses b, and there exists a path (T1, T4, T11) from T1 to T11 along which b is not modified. A control dependence captures the notion that one transition may affect traversal of another transition, and it is defined formally in [8]. For example, transition T4 has control dependence on transition T 11 in the model of Figure 1 because execution of T11 depends on execution of T4. Notice that if T4 is not executed, i.e., transition T 8 is executed instead, T11 is also not executed. Data and control dependences in the model can be represented graphically by a graph where nodes represent transitions and directed edges represent data and control dependences. Figure 4 shows a dependence sub-graph of the model of Figure 1. Data dependences are shown as solid edges and control dependences are shown as dashed edges. In order to prioritize tests, we are interested in data and control dependences that are present during model execution on each test t in test suite TS. We refer to these dependences as dynamic dependences. Let t be a test and T(t)=< Ti1 ,..., Tin > be a sequence of transitions traversed for all j, m < j < k, there is no control dependence between Ti and Tik . For example, consider the following during execution of the model on t. There exists a dynamic data dependence [8] between transitions Tim and Tik in T(t), m < k, if transition Tim modifies value of variable v, transition Tik uses v, and v is not modified between positions m and k in T(t). There exists a dynamic control dependence in T(t) between transitions Tim and Tik , m < k, if j test t for the model of Figure 2: t: Card(5,6), PIN(5), Deposit(1), Continue(), Withdrawal(2), Continue(), Withdrawal(90), Continue(), Exit() On test t, the following sequence of transitions T(t) = <T1, T4, T6, T7, T12, T7, T10, T7, T8> is executed. In T(t) there exists a dynamic data dependence between T 6 and T12 with respect to variable b and also a dynamic control dependence between T4 and T6. Note that for each dynamic dependence in T(t) there exists a corresponding dependence (edge) in the model dependence graph. T3 Data Dependence T2 Control Dependence T6 T1 T10 T7 T11 T4 T8 T9 Figure 4. Model dependence sub-graph The goal of model dependence-based test prioritization is to identify unique patterns of interactions between model transitions and added/deleted transitions that are present during execution of the modified model on tests of TS. We identify three types of interaction patterns related to a modification (added/deleted transition): an affecting interaction patterns, an affected interaction patterns, and a side-effect interaction patterns. Interaction patterns are represented as model dependence sub-graphs with respect to added and deleted transitions. Notice that interaction patterns were introduced in [8]. In this paper, because of space limitations, we present interaction patterns informally. A detailed description may be found in [8]. During execution of modified model Mm on test t, dynamic data and control dependences are identified in transition sequence T(t). The corresponding dependences are marked in the model dependence graph. Unmarked dependences are removed from the dependence graph. The resulting dependence sub-graph G contains only dependences that are present during execution of Mm on t. Affecting Interaction Pattern: The goal is to identify transitions that affect an added or deleted transition during execution of the modified model on test t. These transitions are identified by traversing backwards in G starting from the added/deleted transition. Dependences that are not traversed during the backward traversal are removed from graph G. The resulting dependence sub-graph is referred to as an Affecting Interaction Pattern. Affected Interaction Pattern: The goal is to identify transitions that are “affected” by the added or deleted transition. They are identified by traversing forward in G starting from the added/deleted transition through dependence edges. Dependences that are not traversed during the forward traversal are removed from G. The resulting dependence sub-graph is referred to as an Affected Interaction Pattern. Side-Effect Interaction Pattern: The goal is to identify “side-effects” that are caused by an added or deleted transition, where by a side-effect we mean an introduction of a new dependence or a removal of a dependence. Clearly, an addition or deletion of a transition may introduce in the modified model new dependences that do not exist in the original model, or it may cause a removal of some dependences that do exist in the original model. During execution of the modified model on a test, new or removed data and control dependences that are present during model execution are identified. These dependences are referred to as a Side-Effect Interaction Pattern. Consider the following two tests for the modified model of Figure 2: t1: Card(5,6), PIN(5), Deposit(1), Continue(), Withdrawal(2), Continue(), Withdrawal(90), Continue(), Exit() t2: Card(5,6), PIN(5), Deposit(11), Continue(), Withdrawal(7), Continue(), Withdrawal(9), Continue(), Exit() On these tests the following sequences of transitions are executed: T(t1)=<T1, T4, T6, T7, T12, T7, T10, T7, T8>, T(t2)=<T1, T4, T6, T7, T10, (T11), T7, T10, T7, T8> where added transition T12 is executed in T(t1) and deleted transition T11 is executed in T(t2). Affecting and affected interaction patterns for added transition T 12 for t1 are shown in Figure 5. Affecting and affected interaction patterns for deleted transition T11 for t2 are shown in Figure 6. Suppose that during execution of the modified model Mm on test suite TS, the following interactions patterns are computed: IP1,…, IPq. Let TS(IPi) be a set of tests t such that (1) an added or deleted transition T is executed in Mm on test t, and (2) interaction pattern IPi is computed with respect to T in T(t). We refer to all TS(IP1),…,TS(IPq) as an interaction pattern test distribution. Notice that each TS(IPi) is a subset of set TSH of high priority tests determined in the Version II of selective prioritization. In addition, each test t  TSH belongs to at least one TS(IPi), and the same test may belong to different TS(IPi) sets. T1 T12 T4 T6 Affecting Interaction Pattern for test t1 T7 T12 T10 Affected Interaction Pattern for test t1 Figure 5. Interaction patterns for test t1 T1 T11 T4 T10 Affecting Interaction Pattern for test t2 T11 T7 Affected Interaction Pattern for test t2 Figure 6. Interaction patterns for test t2 Input: Test Suite: TS Interaction pattern test distribution: TS(IP1),…,TS(IPq) A set of high priority tests: TSH A set of low priority tests: TSL Output: Prioritized test sequence: S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 p=0 while true do for i=1 to q do if TS(IPi)   then p=p+1 Select randomly test t from TS(IPi) Remove t from every TS(IP) to which t belongs Insert t into S at position p if p=|TSH| then exit while loop endif endfor endwhile for p=1 to |TSL| do Select randomly and remove test t from TSL Insert t into S at position p + |TSH| endfor Output S Figure 7. Model dependence-based test prioritization algorithm The algorithm that computes a prioritized test sequence using interaction patterns is shown in Figure 7. The algorithm in the first step (lines 1-12) prioritizes tests that are associated with interaction patterns, by iteratively selecting (lines 3-11) one test from each interaction pattern TS(IPi) and inserting them into the prioritized sequence. After selecting one test from each interaction pattern, the algorithm repeats this process (lines 2-12) until all tests in all TS(IPi)s are selected. In the next step (lines 13-16), the algorithm continues with the prioritization with low priority tests by ordering them randomly. Notice that the algorithm selects tests randomly from each TS(IPi). In addition, no assumption is made about the order in which interaction patterns are processed, i.e., interaction patterns are randomly ordered for test prioritization. The presented model-based test prioritization is only one way tests can be prioritized based on interaction patterns. One may develop other algorithms to prioritize tests based on interaction patterns, e.g., tests that “cover” the larger number of IPs are assigned a higher priority. This is a research issue that we are planning to investigate in the future. 5. Measuring effectiveness of early fault detection In order to compare different test prioritization methods an experimental study needs to be performed with different systems that contain known faults. In this paper, the rate of fault detection [13] is used as a measure of the effectiveness of early fault detection. This measure can be used to evaluate the effectiveness of test prioritization methods for a given system(s) with known fault(s). Notice that the rate of fault detection is not used during the process of prioritizing tests by test prioritization methods, but it is used only during an experimental study to measure the effectiveness of individual test prioritization methods. The experimental study is presented in Section 6. Test prioritization methods may generate many different solutions (prioritized test sequences) for a given test suite. For example, for test suite TS of size N, a random prioritization generates a prioritized test sequence out of N ! possible test sequences (all possible permutations of tests in TS). A factor that may influence the resulting prioritized test sequence is, for example, an order in which tests are processed during the prioritization process. As a result, a given prioritization method may generate different prioritized test sequences with different rates of fault detection. Let TS={t1,…, tN} be a test suite of size N and let D(TS) ={d1,…, dL} be a set of faults in the system that are detected by test suite TS. Let TS(d) be a set of failed tests caused by fault d  D(TS). Notice that for every fault d in D(TS), TS(d) can be determined by executing test suite TS for the system. Let S  ti1 ,..., tiN  be a prioritized sequence of tests of test suite TS, and let P(S) = <rpS(d1),…, rpS(dL)> be a list of relative positions of the first failed tests for all faults in D(TS) for test sequence S. The rate of fault detection for S can be determined based on P(S) as discussed in Section 3. In order to compare different test prioritization methods, we introduce the concept of the most likely rate of fault detection that captures an average rate of fault detection over all possible prioritized sequences that may be generated by a test prioritization method for a given system and a test suite. Since the rate of fault detection is based on the concept of a relative position of the first failed test, we introduce the concept of the most likely relative position, RP(d), of the first failed test that detects fault d. Notice that rpS(d) represents a relative position of the first failed test that detects fault d in test sequence S, whereas RP(d) represents an average (most likely) relative position of the first failed test that detects d for a test prioritization method. In the next sub-sections we concentrate on determining analytically RP(d) for test prioritization methods discussed in this paper. In Section 5.4, we discuss how the most likely rate of fault detection is computed from values of RP(d). Let M be the number of all possible prioritized test sequences that may be generated by a given test prioritization method for test suite TS. For each fault d in D(TS) and for each prioritized test sequence S, the position of the first failed test pS(d) caused by fault d in S can be determined. Let R(i,d) be a number of prioritized test sequences that may be generated by a given test prioritization method for which pS(d)=i, i.e., the first failed test t  TS(d) caused by fault d is in the ith position. Let MLP(d) be the most likely (average) position of the first failed test that detects fault d over all possible prioritized test sequences that may be generated by a test prioritization method. The following formula is used to compute MLP(d): N MLP ( d )   i  R (i , d ) i 1 (5.1) M RP(d), the most likely relative position of the first failed test that detects d, is computed from MLP(d) as follows: RP ( d )  MLP ( d ) (5.2) N For many test prioritization methods, M may be very large. Therefore, determining precisely RP(d) experimentally may be very expensive or even prohibitive. In this paper, we discuss how RP(d) can be determined analytically, rather than by observation (experimentally), for test prioritization methods discussed in this paper. The analytical approach may significantly reduce the cost of evaluation of test prioritization methods as opposed to evaluation by observation. The analytical evaluation methods are probably most appropriate for test prioritization methods for which a high degree of randomness is present or M is large. On the other hand, evaluation methods based on observation may be more appropriate for test prioritization methods where the degree of randomness is small or M is small. 5.1. Random prioritization In random test prioritization, tests are ordered in random order. For a test suite of size N, there are N ! possible test sequences. The most likely position MLPR(d) for the random prioritization can be precisely computed by the following formula: N m 1 m MLP R (d )   i 1  N  m  (i  1)! ( N  i) !  i 1  i (5.3) N! Notice that the summation is from position 1 to N-m+1, where m = |TS(d)|. The expression inside of the summation, except i, represents the number of random test sequences for which the first failed test caused by d is in position i. RPR(d), the most likely relative position of the first failed test that detects d, is computed as shown in Formula 5.2. For example, suppose test suite TS={t1, t2, t3, t4, t5, t6, t7} consists of 7 tests that detect two faults D(TS)={d1, d2} in a system. The following tests fail because of individual faults: TS(d1)={t5} and TS(d2)={t5, t7}. RPR(d1)=0.57 and RPR(d2)=0.38 are the most likely relative positions for the random prioritization for faults d1 and d2. 5.2. Selective prioritization In selective test prioritization tests are divided into two categories: high priority tests and low priority tests. In test prioritization all high priority tests are first selected for execution followed by low priority tests. High priority tests are ordered using random prioritization. Similarly, low priority tests are ordered using random prioritization. The effectiveness of selective test prioritization depends on whether failed tests are high priority tests or not. More formally, let TSH be a set of high priority tests and TSL be a set of low priority tests. Let p, p  m, be a number of failed tests in TSH caused by fault d, where m = |TS(d)|. Let MLPR(d,Q,q) be the most likely test position for the random test prioritization for a test suite of size Q that contains q failed tests caused by fault d (Formula 5.3). The most likely position MLPs(d) for the selective prioritization is computed as follows: Case I: p1 MLPs(d) = MLPR(d, K, p) Case II: p=0 MLPs(d) = K + MLPR(d, N-K, m) where K=|TSH|. In the first case, it is assumed that TSH contains at least one failed test caused by fault d. The most likely position for the selective methods is equivalent to the most likely position of the random test prioritization for test suite TSH with p failed tests, i.e., MLPR(d, K, p). In the second case, it is assumed that TSH does not contain any failed test caused by defect d, i.e., TSL contains all, m, failed tests. Executing all high priority tests (K tests) does not uncover fault d. Only when low priority tests are executed, fault d is detected. The most likely position in the second case is equivalent to the most likely position of the random test prioritization for test suite TSL with m failed tests after all K high priority tests are executed, i.e., K+ MLPR(d, N-K, m). RPs(d), the most likely relative position of the first failed test that detects d, is computed by Formula 5.2. For example, consider the example of Section 5.1. Suppose the following high and low priority tests are determined for TS: TSH ={t1, t4, t5, t7} and TSL ={t2, t3, t6}. RPs(d1)=0.36 and RPs(d2)=0.24 are the most likely relative positions for the selective prioritization for faults d1 and d2. 5.3. Model dependence-based prioritization For the model dependence-based prioritization we were not able to identify a precise formula for RP(d), the most likely relative position of the first failed test that detects fault d. Therefore, we have implemented a randomized approach of estimation of RP(d). This estimation accepts as an input, a set of tests associated with each interaction pattern and a set of failed tests. This information is collected (computed) during execution of the modified model on the test suite as presented in Section 4.2. The estimation randomly generates prioritized test sequences according to the model dependence-based prioritization of Figure 7. For each test sequence, the position of the first failed test for each fault is determined. After a large number of test sequences is generated, RP(d) for each fault is computed using Formulas 5.1 and 5.2. Consider the example of Section 5.1. The following high priority selective tests are identified TSH ={t1, t4, t5, t7}, and three interaction patterns are computed with the following distribution of tests among them: IP1={t4, t5}, IP2={t1, t4, t7}, IP3={t5, t7}. RPs(d1)=0.31 and RPs(d2)=0.22 are the most likely relative positions for the model dependencebased prioritization computed by the randomized estimation. 5.4. Most likely rate of fault detection In Section 3, the rate of fault detection RFD(S) was discussed for a prioritized test sequence S. Computation of RFD(S) depends on a list P(S)=<rpS(d1),…, rpS(dL)> of positions of first failed tests in S for all faults in D(TS). In this section, we introduce the most likely rate of fault detection MLRFD for a test prioritization method. The most likely rate of fault detection is based on the most likely relative positions RP(d). More formally, let P=<RP(d1),…, RP(dL)> be a list of the most likely relative positions of first failed tests determined for a test prioritization method for all faults in D(TS). Let F=<RP1,…,RPq> be an ordered (in ascending order) sequence of all unique most likely relative positions from P, where q ≤ L. The most likely rate of fault detection MLRFD for the test prioritization method is defined as a sequence of pairs (RPi,fdi), MLRFD = <(RP1,fd1), …,(RPq,fdq)>, where RPi is an element of F, and fdi represents the cumulative percentage of faults detected at position RPi (as discussed in Section 3). For example, suppose test suite TS={t1, t2, t3, t4, t5, t6, t7, t8, t9, t10} consists of 10 tests that detect four faults D(TS)={d1, d2, d3, d4} in a system. The following tests fail because of individual faults: TS(d1)={t5, t7}, TS(d2)={t3, t7, t9}, TS(d3)={t6} and TS(d4)={t3, t9}. RPR(d1)=0.37, RPR(d2)=0.28, RPR(d3)=0.55 and RPR(d4)=0.37 are the most likely relative positions for the random prioritization. Suppose that during model execution on TS, the following high priority tests are identified for the selective prioritization: TSH ={t1, t3, t4, t6, t7, t9}. RPs(d1)=0.35, RPs(d2)=0.18, RPs(d3)=0.35 and RPs(d4)=0.23 are the most likely relative positions for the selective prioritization. The most likely rates of fault detection for the random prioritization and the selective prioritization are shown in the table below: Random Selective fd: % of detected faults RP: Test suite fraction fd: % of detected faults RP: Test suite fraction 25% 75% 100% 0.28 25% 0.37 50% 0.55 100% 0.18 0.23 0.35 In order to compare most likely rates of fault detection for different test prioritization methods, we may use a weighted average of the percentage of faults detected, APFD, as discussed in Section 3 (Formula 3.3). For two most likely rates of fault detection shown in the table, APFDR=68.8% and APFDs=78.1%. In this example, the selective prioritization leads to a higher most likely rate of fault detection than the random prioritization. using specification-based testing methods, i.e., equivalence class partitioning and boundary-value analysis, and modelbased testing, i.e., transition coverage, and partial path coverage. Each test suite contains also a small number of randomly generated test cases. The sizes of test suites range from 790 to 980 test cases. Each implementation was tested and debugged for its test suite until all tests passed. In order to measure the effectiveness of early fault detection of different test prioritization methods, we created incorrect models. We seeded faults into models and then made appropriate changes to the corresponding systems (implementations). In the experiment, we seeded only one fault into the model. We were interested only in faults that cause a small number of tests to fail. Therefore, we selected only those faults for which the number of failed tests ranges from 1 to 10 tests. For each model, we have identified nine seeded faults. For each model with a seeded fault and corresponding implementation, we measured the most likely relative position of the first failed test RP for each test prioritization method under study. Notice that in modelbased test prioritization, the correct model was considered as an original model and a faulty model was considered as a modified model. ATM Cruise Control 0.6 0.4 0.2 R S1 S2 Fuel Pump IP R IP R S1 S2 All models IP 6. Experimental Study 0.6 The goal of the experiment study is to compare the effectiveness of early fault detection of test prioritization methods presented in this paper: random prioritization, selective prioritization (Version I and II), and model dependence-based prioritization. We used RP(d), the most likely relative position of the first failed test that detects fault d, as the measure of the effectiveness of early fault detection. In the experimental study, we concentrated on model faults. For the experiment, we have created three system models: ATM model, cruise control model, and fuel pump model. The sizes of models range from 7 to 13 states and 20 to 28 transitions. For each model, the corresponding system was implemented in C language. The sizes of these implementations range from 600 to 800 lines of source code. For each implementation, we have created test suites 0.4 0.2 R R: S1: S2: IP: S1 S2 S1 S2 IP Random prioritization Selective prioritization – Version I Selective prioritization – Version II Model dependence-based prioritization Figure 8. RP boxplots for the experimental study The results of the experiment are shown in Figure 8 that presents boxplots of the RP values for the four test prioritization methods for three models and an-all model total. The presented results indicate that model-based test prioritization may improve the effectiveness of test prioritization for the Version II of selective prioritization and the model dependence-based test prioritization. However, the results for the Version I of selective prioritization are mixed. In several cases, this test prioritization method performs much worse than the random prioritization. This is caused by the fact that monitoring only “modified” transitions in the modified model may not be sufficient for effective test prioritization. On the other hand, the Version II of selective prioritization monitors also execution of “deleted” transitions that results in a significant improvement in effectiveness of test prioritization. The model dependence-based test prioritization, although a little more expensive compared to the Version II of selective prioritization, may lead to improvement in the effectiveness of test prioritization. This may be attributed to the fact that more information about the model behavior is collected that may improve the effectiveness of test prioritization. 7. Conclusions In this paper, we have presented model-based test prioritization methods in which the information about the system model and its behavior is used to prioritize the test suite for system retesting. In addition, we presented an analytical framework for comparison of test prioritization methods with respect to the effectiveness of early fault detection. In the experimental study, we investigated the presented test prioritization methods with respect to their effectiveness of early fault detection. The results from the experiment are promising and suggest that system models may improve the effectiveness of test prioritization. Modelbased test prioritization may be a good complement to the existing code-based test prioritization methods [13]. The experimental study presented in this paper was relatively small. In the future research, we plan to perform an experimental study on larger models and systems to have better understanding of the advantages and limitations of model-based test prioritization. In addition, we plan to perform an experimental study in which we will investigate effectiveness of model-based test prioritization for faults in implementations of model changes in the system (these are code-based faults related to implementation of model changes). We also plan to investigate a possible synergy of code-based and model-based test prioritization methods. 8. References [1] S. Beydeda, V. Gruhn, “An Integrated Testing Technique for Component-Based Software,” Proc. IEEE Computer Systems and Applications International Conference, pp. 328 –334, 2001. [2] K. Cheng, A. Krishnakumar, “Automatic Functional Test Generation Using The Extended Finite State Machine Model,” Proc. ACM/IEEE Design Automation Conf., pp. 86-91, 1993. [3] Y. Chen, D. Rosenblum, K. Vo, “Testtube: A System for Selective Regression Testing,” Proc. IEEE International Conference on Software Engineering, pp. 211-220, 1994. [4] J. Dick, A. Faivre, “Automating the Generation and Sequencing of Test Case from Model-Based Specification,” Proc. International Symposium on Formal Methods, pp. 268-284, 1992. [5] R. Dssouli, K. Saleh, E. Aboulhamid, A. En-Nouaary, C. Bourhfir, “Test Development For Communication Protocols: Towards Automation,” Computer Networks, 31, pp.1835-1872, 1999. [6] R. Gupta, M. Harrold, M. Soffa, “An Approach to Regression Testing Using Slices,” Proc. IEEE International Conference on Software Maintenance, pp. 299-308, 1992. [7] B. Korel, A. Al-Yami, “Automated Regression Test Generation,” Proc. ACM International Symposium on Software Testing and Analysis, pp. 143-152, 1998. [8] B. Korel, L. Tahat, B. Vaysburg, “Model Based Regression Test Reduction Using Dependence Analysis,” Proc. IEEE International Conference on Software Maintenance, pp. 214-223, 2002. [9] J. Loyall, S. Mathisen, P. Hurley, J. Williamson, “Automated Maintenance of Avionics Software”, Proc. IEEE Aerospace and Electronics Conference, pp.508-514, 1993. [10] G. Rothermel, M. Harrold, “Selecting Tests and Identifying Test Coverage Requirements for Modified Software,” Proc. IEEE International Conference on Software Maintenance, pp. 358-367, 1994. [11] G. Rothermel, M. Harrold, “A Safe, Efficient Regression Test Selection Technique,” ACM Transactions on Software Engineering & Methodology, 6(2), pp. 173-210, 1997. [12] G. Rothermel, R. Untch, C. Chu, M. Harrold, “Test Case Prioritization: An Empirical Study,” Proc. IEEE International Conference on Software Maintenance, pp. 179-188, 1999. [13] G. Rothermel, R. Untch, M. Harrold, “Prioritizing Test Cases For Regression Testing,” IEEE Transactions on Software Engineering, vol. 27, No. 10, pp. 929-948, 2001. [14] B. Sherlund, B. Korel, “Modification Oriented Software Testing,” Proc. Quality Week, pp. 1-17, 1991. [15] W. Tsai, X. Bai, R. Paul, L. Yu, “Scenario-Based Functional Regression Testing,” Proc. IEEE International Computer Software and Applications Conference, pp. 496-501, 2001. [16] B. Vaysburg, L. Tahat, B. Korel, “Dependence Analysis in Reduction of Requirement Based Test Suites,” Proc. ACM International Symposium on Software Testing and Analysis, pp. 107-111, 2002. [17] L. White, “Test Manager: A Regression Testing Tool,” Proc. IEEE International Conference on Software Maintenance, pp. 338-347, 1993. [18] W. Wong, J. Horgan, S. London, H. Agrawal, “A Study of Effective Regression Testing in Practice,” Proc. International Symposium on Software Reliability Eng., pp. 230-238, 1997.

3. Test Prioritization

Related documents

Products

Support

3. Test Prioritization

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib