3. Test Prioritization

advertisement
Test Prioritization Using System Models
Bogdan Korel
Computer Science Department
Illinois Institute of Technology
Chicago, IL 60616, USA
Luay H. Tahat
Lucent Technologies
Bell Labs Innovations
Naperville, IL 60566, USA
Mark Harman
King’s College London
Strand, London
korel@iit.edu
ltahat@lucent.com
Mark@dcs.kcl.ac.uk
Abstract
During regression testing, a modified system is retested
using the existing test suite. Because the size of the test
suite may be very large, testers are interested in detecting
faults in the system as early as possible during the retesting
process. Test prioritization tries to order test cases for
execution so the chances of early detection of faults during
retesting are increased. The existing prioritization methods
are based on the code of the system. In this paper, we
present test prioritization that is based on the system
model. System modeling is a widely used technique to
model state-based systems. In this paper, we present
methods of test prioritization based on state-based models
after changes to the model and the system. The model is
executed for the test suite and information about model
execution is used to prioritize tests. Execution of a model is
inexpensive as compared to execution of the system,
therefore the overhead associated with test prioritization is
relatively small. In addition, we present an analytical
framework for evaluation of test prioritization methods
whereas the existing evaluation framework is based on
experimentation (observation). We have performed an
experimental study in which we compared different test
prioritization methods. The results of the experimental
study suggest that system models may improve the
effectiveness of test prioritization with respect to early fault
detection.
1. Introduction
During maintenance of evolving software systems, their
specification and implementation are changed to fix faults,
to add new functionality, to change the existing
functionality, etc. Regression testing is the process of
validating that the changes introduced in a system are
correct and do not adversely affect the unchanged portion
of the system. During regression testing, new test cases are
frequently generated, but also previously developed test
cases are deployed for revalidating a modified system.
Regression testing tends to consume a large amount of time
and computing resources, especially for large software
systems.
WC2R 2LS, UK
There has been a significant amount of research on
regression testing. There exist two types of regression
testing: code-based and specification-based. Most
regression testing methods are code-based, e.g., [3, 6, 7, 10,
11, 14, 17]. Specification-based regression testing methods
[1, 8, 9, 15] use system models to select and generate test
cases related to the modification. The goal of these methods
is to test the modification of the system.
During regression testing, after testing the modified part
of the system, the modified system needs to be retested
using the existing test suite, to have confidence that the
system does not have faults. Because of the large size of a
test suite, system retesting may be very expensive, it may
last hours, days or even weeks. Test prioritization orders
tests, in the test suite, for “execution” in such a way that
faults in the system are uncovered early during the retesting
process. Test prioritization methods [12, 13, 18] order tests
in the test suite according to some criterion, e.g., a code
coverage is achieved at the fastest rate. Test cases are then
executed in this prioritized order: test cases with higher
priority, based on the prioritization criterion, are executed
first whereas test cases with the lower priority are executed
later. The existing test prioritization techniques use mainly
the source code to prioritize tests. These methods may
require re-execution of the system for the whole or partial
test suite to collect information about the system behavior
that is used in test prioritization.
System modeling is a new emerging technology. System
models are created to capture different aspects of the
system behavior. Several modeling languages have been
developed to model state-based software systems, e.g.,
State Charts, Extended Finite State Machine (EFSM) [2],
and Specification Description Language (SDL) [5]. System
modeling is very popular for modeling state-based systems,
e.g., computer communications systems, industrial control
systems, etc. System models are used in the development
process, e.g., in partial code generation, or in the testing
process to design test cases. In recent years, several modelbased test generation [2, 4, 16] and test suite reduction [8]
techniques have been developed.
In this paper, we present model-based test prioritization
in which the original and modified system models together
with information collected during execution of the modified
model on the test suite are used to prioritize tests for
retesting of the modified software system. The goal of
model-based test prioritization is for early fault detection in
the modified system, where the faults of interest are faults
in models and faults in implementations of model changes
in the system. Execution of the model is very fast compared
to the execution of the actual system. Therefore, execution
of the model for the whole test suite is relatively
inexpensive, whereas execution of the system for the whole
test suite may be very expensive (both resource-wise and
time-wise). In this paper, we present two model based test
prioritization methods: selective test prioritization and
model dependence-based test prioritization. In addition, we
present a framework for comparison of test prioritization
methods with respect to the effectiveness of early fault
detection. The existing approach of evaluation of test
prioritization methods is based on observation
(experimentation). In this paper, we present an analytical
approach of evaluation of test prioritization methods. An
experimental study was performed to compare effectiveness
of the presented test prioritization methods. The results of
the experimental study suggest that model-based test
prioritization may be a good complement to the existing
code-based test prioritization techniques.
The paper is organized as follows: Section 2 provides an
overview of the system modeling, Section 3 presents the
problem of test prioritization, Section 4 presents modelbased test prioritization methods. In Section 5, an analytical
framework for comparison of test prioritization methods is
discussed. In Section 6, the experimental study is presented.
In Conclusions, future research is discussed.
2. System Modeling
In this paper, we concentrate on EFSM system models.
However, our approach can be extended to other modeling
languages, e.g., SDL [5]. EFSM [2] is very popular for
modeling state-based systems. An EFSM consists of states
and transitions between states. The following elements are
associated with each transition: an event, a condition, and a
sequence of actions. A transition is triggered when an event
occurs and a condition (a Boolean predicate) associated
with the transition evaluates to true. When a transition is
triggered, an action(s) associated with the transition is
executed. EFSM models are diagrammatically represented
as graphs where states are represented as nodes and
transitions as directed edges between states. A simplified
EFSM model of an ATM system is shown in Figure 1. In
this model, for example, transition labeled T 4 is triggered
when the system is in state S1, event PIN(p) is received and
the value of parameter p equals to variable pin. When the
transition is triggered, “Display menu” action is executed.
Continue/Print b; Display menu
PIN(p)[(p != pin) and (attempts < 3)]/
Display error;
attempts = attempts+1;
Prompt for PIN;
Withdrawal(w)[w<=b]/
b=b-w
T11
T7
T10
T2
Start
Card(pin, b)/
Prompt for PIN;
T1 attempts = 0
S1
PIN(p)
[p == pin]/
Display menu
T4
Withdrawal(w)[w>=b]/
Display error
T6
S2
S3
Deposit(d)/
b=b+d
T9
PIN(p)
[(p != pin) and (attempts == 3)]/
Display error;
Eject card;
Exit/
Eject card
Balance/
Display b
T8
T3
Exit
Figure 1. A sample model
In this paper, we assume that system models are
executable, i.e., enough detail is provided so the model may
be executed. In order to support model execution, some
actions may not be implemented (they represent “empty”
actions). However, all actions are implemented during the
development of the system. A model input (a test) consists
of a sequence of events with input values. The following is
a sample test for the model of Figure 1:
t: Card(1234,100), PIN(1234), Deposit(20), Continue(),
Withdrawal(50), Continue(), Exit()
When the model is executed on t, the following sequence of
transitions is executed: T1, T4, T6, T7, T11, T7, T8.
3. Test Prioritization
In this paper, we consider test prioritization with respect
to early fault detection [13]. The goal is to increase the
likelihood of revealing faults earlier during execution of the
prioritized test suite. Let TS={t1, …, tN} be a test suite of
size N, where ti is a test case. Let D(TS)={d1,…, dL} be a
set of faults in the system that are detected by test suite TS.
Let TS(d)  TS be a set of tests that fail because of fault d.
Let S  ti1 , ti2 ,..., tik ,..., tiN  be a prioritized sequence of
tests of test suite TS, where the subscript indicates the
position of a test in the sequence, e.g., test t i1 is in position
1, test t i2 is in position 2, etc. Let t ik  TS(d) be the first
failed test in sequence S caused by fault d, i.e., all tests
ti1 ,..., tik 1 in S between position 1 and k-1 do not fail
because of d. Let pS(d)= k be the position of t ik , i.e., the
first position of the failed test in S caused by fault d. Let
rpS(d) be the first relative position of the failed test in S
caused by fault d, where rpS(d) is computed as follows:
p (d )
(3.1)
rp ( d )  S
S
N
Notice rpS(d) represents the test suite fraction at which d
is detected and its values range between 0  rpS(d)  1.
The rate of fault detection [13] is a measure of how
rapidly a prioritized test sequence detects faults. This
measure is a function of the percentage of faults detected in
terms of the test suite fraction, i.e., a relative position in the
test suite. More formally, let P(S)=<rpS(d1),…,rpS(dL)> be a
list of relative positions of first failed tests for all faults in
D(TS). Notice that at the same position in S more than one
fault may be detected, therefore, some positions in P(S)
may have the same value. Let F(S)=<rp1,…,rpq>, q ≤ L, be
an ordered (in ascending order) sequence of all unique first
relative positions from P(S), where rpi represents the test
suite fraction at which at least one fault is detected in S.
F(S) represents an order in which faults are uncovered by
test sequence S. The rate of fault detection RFD(S) can be
defined
as
a
sequence
of
pairs
(rpi,fdi),
RFD(S)=<(rp1,fd1),…,(rpq,fdq)>, where rpi is an element of
F(S), and fdi is the cumulative percentage of faults detected
at position rpi in F(S) and is computed as follows:
i
 nd S (rp j )
fd i 
j 1
(3.2)
 100 %
| D (TS ) |
where, ndS(rpj) is the number of faults detected at the
relative position rpj in S.
For example, suppose test suite TS={t1, t2, t3, t4, t5, t6, t7,
t8, t9, t10} consists of 10 tests that detect four faults
D(TS)={d1, d2, d3, d4} in a system. The following tests fail
because of individual faults: TS(d1)={t5, t7}, TS(d2)={t3, t7},
TS(d3)={t5}, and TS(d4)={t3, t9}. Let S1=< t1, t2, t3, t4, t5, t6,
t7, t8, t9, t10> and S2=< t10, t9, t8, t3, t5, t7, t4, t6, t2, t1> be two
prioritized test sequences. The rates of fault detection for S1
and S2 can be represented by the following table:
S1
S2
fd: % of detected faults
rp: Test suite fraction
fd: % of detected faults
50%
0.3
25%
100%
0.5
50%
rp: Test suite fraction
0.20
0.4
100
%
0.50
The goal of test prioritization is to order tests of TS for
execution so that the likelihood of improving the rate of
fault detection of faults in D(TS) is increased [13]. In order
to measure how rapidly a prioritized test sequence detects
faults during the execution of sequence S, a weighted
average of the percentage of faults detected, APFD(S), was
introduced [13]. For a given rate of fault detection RFT(S)
= <(rp1,fd1),…, (rpq,fdq)>, APFD(S) is computed as:
q 1
APFD ( S ) 
 ( fd i 1  fd i )( 2  rpi 1  rpi )
i 0
(3.3)
2
where (rp0,fd0)=(0,0) and (rpq+1,fdq+1)=(1,100).
The values of APFD(S) range from 0 to 100, where
higher APFD(S) value means faster (better) fault detection
rate. For two sequences S1 and S2 presented earlier,
APFD(S1)=72.5% and APFD(S2)=67.5%. Sequence S1
leads to a higher rate of fault detection than S2.
The simplest test prioritization method is random test
prioritization where test cases are ordered randomly. For a
test suite of size N, there are N ! possible test sequences.
Random prioritization selects randomly one of these
sequences. Random test prioritization may be viewed as a
“no test prioritization” approach and may be treated as a
base-line for comparison with other test prioritization
methods.
4. Model-based test prioritization
Changes in specifications frequently lead to changes in
EFSM system models. The idea of model-based test
prioritization is to use the original model and the modified
model to identify a difference between these models. The
modified model is executed for the whole test suite to
collect information related to the difference. The collected
information is then used to prioritize the test suite. The
goal of model-based test prioritization is for early fault
detection in the modified system, where the faults of
interest are faults in models and faults in implementations
of model changes in the system. Notice that system models
frequently do not produce any observable outputs (or only
partial outputs), therefore, detecting model faults by
executing models on a test suite may not be appropriate.
Model checking methods are typically used to detect model
faults, but these methods detect only a limited class of
model faults.
Model-based test prioritization presented in this paper
can be used for any modification of the EFSM system
model. The approach uses the original model Mo and the
modified model Mm and automatically identifies the
difference [8] between these models, where the difference is
as a set of elementary model modifications. There are two
types of elementary modifications: a transition addition and
a transition deletion. As a result, a difference between
models Mo and Mm is represented by a set Ra of added
transitions and a set Rd of deleted transitions. When
elementary modifications of sets Ra and Rd are applied to
the original model Mo, the resulting model is the modified
model Mm. Any complex modification to the model can be
represented by these two sets. Notice that an addition of a
new state or a deletion of an existing state is not considered
as an elementary modification because an addition or a
deletion of a state is always associated with an addition or a
deletion of a transition, respectively.
For example, a difference between the original model of
Figure 1 and the modified model of Figure 2 is: deletion of
transition T11 and addition of transition T 12, i.e., Ra ={T12}
and Rd ={T11}. Transition T11 does not exist any more in the
modified model, and it is shown in Figure 2 as a dashed line
only for presentation purposes.
Continue/Print b; Display menu
Withdrawal(w)[w<=b]/
b=b-w
PIN(p)[(p != pin) and (attempts < 3)]/
Display error;
attempts = attempts+1;
Prompt for PIN;
T11
T7
T10
T2
Start
Card(pin, b)/
Prompt for PIN;
attempts = 0
T1
S1
PIN(p)
[p == pin]/
Display menu
T12
T4
Withdrawal(w)[w>=b]/
Display error
Withdrawal(w)[w<b]/
b=b-w
S2
S3
T6
Deposit(d)/
b=b+d
T9
PIN(p)
[(p != pin) and (attempts == 3)]/
Display error;
Eject card;
Exit/
Eject card
Balance/
Display b
T8
T3
Exit
Figure 2. A modified model of Figure 1
After the difference between the original and the
modified model is identified, the modified model is
executed for test suite TS to collect different types of
information that is used to prioritize tests for retesting of the
modified system. Depending on information being used for
test prioritization, different prioritization methods may be
developed. In this paper, we present two model-based test
prioritization methods: selective test prioritization and
model dependence-based test prioritization.
4.1. Selective test prioritization
The idea of selective test prioritization is to assign a high
priority to tests that execute modified transitions in the
modified model. Low priority is assigned to tests that do
not execute any modified transition. Let TSH be a set of high
priority tests and TSL be a set of low priority tests. Sets TSH
and TSL are disjoint and TS = TSH  TSL. Notice that
information about executed added/deleted transitions may
be also used in regression test selection, but in this paper
we concentrate only on using this information for test suite
prioritization. We present two versions of the selective test
prioritization in order to investigate their effectiveness in
early detection of faults.
Version I: In this version, modified transitions of Mm are
represented only by added transitions of Ra. Since deleted
transitions of Rd do not exist in the modified model, they
are ignored. Every transition T  Ra is selected for
monitoring during execution of Mm on test suite TS. Let t be
a test and T(t)=< Ti1 ,..., Ti n > be a sequence of transitions
traversed during execution of the model on t. If during
execution of modified model Mm on test t, transition T  Ra
is executed, a high priority is assigned to t, i.e., t  TSH.
Otherwise, a low priority is assigned to t, i.e., t  TSL. For
example, consider the following three tests and the
corresponding sequences of transitions traversed during
execution of the modified model of Figure 2:
t1: Card(12,10), PIN(12), Withdrawal(5), Continue(), Exit()
t2: Card(12, 10), PIN(12), Withdraw(15), Continue(), Exit()
t3: Card(12, 10), PIN(12), Withdraw(10), Continue(), Exit()
T(t1) = <T1,T4,T12,T7,T8>, T(t2) = <T1,T4,T10,T7,T8>, T(t3) =
<T1,T4,T10,T7,T8>.
A set of added transitions for the modified model of
Figure 2 is Ra={T12}. Based on the execution of these tests,
the following high and low priority tests are identified:
TSH={t1} and TSL={t2, t3}. Since T12 is executed on test t1, a
high priority is assigned to this test.
Version II: In this version, modified transitions of Mm are
represented by added and deleted transitions, i.e.,
transitions in Ra and Rb. These transitions are selected for
monitoring. If during execution of modified model Mm on
test t, transition T that is in Ra or Rd is executed, a high
priority is assigned to test t, i.e., t  TSH. Otherwise, a low
priority is assigned to test t, i.e., t  TSL. For example,
when the modified model of Figure 2 is executed on tests t1,
t2, and t3 presented in Version I, sequences of transitions for
t1 and t2 are the same, but for t3 transition sequence
T(t3)=<T1, T4, T10, (T11), T7, T8> is different, i.e., deleted
transition T11 is executed, indicated in parentheses, together
with transition T10. As a result, a high priority is assigned to
t3, i.e., TSH={t1, t3}, TSL={t2}.
During system retesting that is based on Version I or
Version II, tests with high priority are executed first
followed by execution of low priority tests. High priority
tests and low priority tests are ordered using random
ordering. The algorithm for selective test prioritization is
shown in Figure 3. In the first step, high priority tests are
ordered randomly (lines 1-4), then low priority test are
ordered randomly (lines 5-8) in prioritized test sequence S.
Input:
A set of high priority tests: TSH
A set of low priority tests: TSL
Output: Prioritized test sequence: S
1
2
3
4
5
6
7
8
9
for p=1 to |TSH| do
Select randomly and remove test t from TSH
Insert t into S at position p
endfor
for p=1 to |TSL| do
Select randomly and remove test t from TSL
Insert t into S at position p + |TSH|
endfor
Output S
Figure 3. Selective test prioritization algorithm
Notice that in Version II additional instrumentation of
the model is required to capture the execution of deleted
transitions because deleted transitions do not exist in the
modified model. When the model, during its execution, is in
a state from which the deleted transition was outgoing, it is
possible to capture traversal of the deleted transition when
the event associated with the deleted transition is generated
and the enabling condition of the deleted transition
evaluates to true.
4.2. Model dependence-based test prioritization
there exists a control dependence between Tim and Tik , and
In this section, we present an approach in which high
priority tests TSH are prioritized using model dependence
analysis. We concentrate on high priority tests TSH
identified by the Version II of selective prioritization. The
idea of model dependence-based test prioritization is to use
model dependence analysis [8] to identify different ways in
which added and deleted transitions interact with the
remaining parts of the model and use this information to
prioritize high priority tests. In the model dependence
analysis there are two types of dependences that may exist
in the model: data dependence and control dependence.
These model dependences are between transitions and
represent potential “interactions” between them.
A data dependence captures the notion that one
transition defines a value to a variable and another
transition may potentially use this value. There exists data
dependence between transitions Ti and Tk [8] if transition Ti
modifies value of variable v, transition Tk uses v, and there
exists a path (transition sequence) in the model from T i to
Tk along which v is not modified. For example, there exists
data dependence between transitions T 1 and T11 in the
model of Figure 1 because transition T 1 assigns a value to
variable b, transition T11 uses b, and there exists a path (T1,
T4, T11) from T1 to T11 along which b is not modified.
A control dependence captures the notion that one
transition may affect traversal of another transition, and it is
defined formally in [8]. For example, transition T4 has
control dependence on transition T 11 in the model of Figure
1 because execution of T11 depends on execution of T4.
Notice that if T4 is not executed, i.e., transition T 8 is
executed instead, T11 is also not executed.
Data and control dependences in the model can be
represented graphically by a graph where nodes represent
transitions and directed edges represent data and control
dependences. Figure 4 shows a dependence sub-graph of
the model of Figure 1. Data dependences are shown as solid
edges and control dependences are shown as dashed edges.
In order to prioritize tests, we are interested in data and
control dependences that are present during model
execution on each test t in test suite TS. We refer to these
dependences as dynamic dependences. Let t be a test and
T(t)=< Ti1 ,..., Tin > be a sequence of transitions traversed
for all j, m < j < k, there is no control dependence
between Ti and Tik . For example, consider the following
during execution of the model on t. There exists a dynamic
data dependence [8] between transitions Tim and Tik in T(t),
m < k, if transition Tim modifies value of variable v,
transition Tik uses v, and v is not modified between
positions m and k in T(t). There exists a dynamic control
dependence in T(t) between transitions Tim and Tik , m < k, if
j
test t for the model of Figure 2:
t: Card(5,6), PIN(5), Deposit(1), Continue(), Withdrawal(2),
Continue(), Withdrawal(90), Continue(), Exit()
On test t, the following sequence of transitions T(t) =
<T1, T4, T6, T7, T12, T7, T10, T7, T8> is executed. In T(t)
there exists a dynamic data dependence between T 6 and T12
with respect to variable b and also a dynamic control
dependence between T4 and T6. Note that for each dynamic
dependence in T(t) there exists a corresponding dependence
(edge) in the model dependence graph.
T3
Data Dependence
T2
Control Dependence
T6
T1
T10
T7
T11
T4
T8
T9
Figure 4. Model dependence sub-graph
The goal of model dependence-based test prioritization
is to identify unique patterns of interactions between model
transitions and added/deleted transitions that are present
during execution of the modified model on tests of TS. We
identify three types of interaction patterns related to a
modification (added/deleted transition): an affecting
interaction patterns, an affected interaction patterns, and a
side-effect interaction patterns. Interaction patterns are
represented as model dependence sub-graphs with respect
to added and deleted transitions. Notice that interaction
patterns were introduced in [8]. In this paper, because of
space limitations, we present interaction patterns
informally. A detailed description may be found in [8].
During execution of modified model Mm on test t,
dynamic data and control dependences are identified in
transition sequence T(t). The corresponding dependences
are marked in the model dependence graph. Unmarked
dependences are removed from the dependence graph. The
resulting dependence sub-graph G contains only
dependences that are present during execution of Mm on t.
Affecting Interaction Pattern: The goal is to identify
transitions that affect an added or deleted transition during
execution of the modified model on test t. These transitions
are identified by traversing backwards in G starting from
the added/deleted transition. Dependences that are not
traversed during the backward traversal are removed from
graph G. The resulting dependence sub-graph is referred to
as an Affecting Interaction Pattern.
Affected Interaction Pattern: The goal is to identify
transitions that are “affected” by the added or deleted
transition. They are identified by traversing forward in G
starting from the added/deleted transition through
dependence edges. Dependences that are not traversed
during the forward traversal are removed from G. The
resulting dependence sub-graph is referred to as an Affected
Interaction Pattern.
Side-Effect Interaction Pattern: The goal is to identify
“side-effects” that are caused by an added or deleted
transition, where by a side-effect we mean an introduction
of a new dependence or a removal of a dependence.
Clearly, an addition or deletion of a transition may
introduce in the modified model new dependences that do
not exist in the original model, or it may cause a removal of
some dependences that do exist in the original model.
During execution of the modified model on a test, new or
removed data and control dependences that are present
during model execution are identified. These dependences
are referred to as a Side-Effect Interaction Pattern.
Consider the following two tests for the modified model
of Figure 2:
t1: Card(5,6), PIN(5), Deposit(1), Continue(), Withdrawal(2),
Continue(), Withdrawal(90), Continue(), Exit()
t2: Card(5,6), PIN(5), Deposit(11), Continue(), Withdrawal(7),
Continue(), Withdrawal(9), Continue(), Exit()
On these tests the following sequences of transitions are
executed: T(t1)=<T1, T4, T6, T7, T12, T7, T10, T7, T8>,
T(t2)=<T1, T4, T6, T7, T10, (T11), T7, T10, T7, T8> where
added transition T12 is executed in T(t1) and deleted
transition T11 is executed in T(t2). Affecting and affected
interaction patterns for added transition T 12 for t1 are shown
in Figure 5. Affecting and affected interaction patterns for
deleted transition T11 for t2 are shown in Figure 6.
Suppose that during execution of the modified model Mm
on test suite TS, the following interactions patterns are
computed: IP1,…, IPq. Let TS(IPi) be a set of tests t such
that (1) an added or deleted transition T is executed in Mm
on test t, and (2) interaction pattern IPi is computed with
respect to T in T(t). We refer to all TS(IP1),…,TS(IPq) as an
interaction pattern test distribution. Notice that each
TS(IPi) is a subset of set TSH of high priority tests
determined in the Version II of selective prioritization. In
addition, each test t  TSH belongs to at least one TS(IPi),
and the same test may belong to different TS(IPi) sets.
T1
T12
T4
T6
Affecting Interaction
Pattern for test t1
T7
T12
T10
Affected Interaction
Pattern for test t1
Figure 5. Interaction patterns for test t1
T1
T11
T4
T10
Affecting Interaction
Pattern for test t2
T11
T7
Affected Interaction
Pattern for test t2
Figure 6. Interaction patterns for test t2
Input:
Test Suite: TS
Interaction pattern test distribution: TS(IP1),…,TS(IPq)
A set of high priority tests: TSH
A set of low priority tests: TSL
Output: Prioritized test sequence: S
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
p=0
while true do
for i=1 to q do
if TS(IPi)   then
p=p+1
Select randomly test t from TS(IPi)
Remove t from every TS(IP) to which t belongs
Insert t into S at position p
if p=|TSH| then exit while loop
endif
endfor
endwhile
for p=1 to |TSL| do
Select randomly and remove test t from TSL
Insert t into S at position p + |TSH|
endfor
Output S
Figure 7. Model dependence-based test prioritization
algorithm
The algorithm that computes a prioritized test sequence
using interaction patterns is shown in Figure 7. The
algorithm in the first step (lines 1-12) prioritizes tests that
are associated with interaction patterns, by iteratively
selecting (lines 3-11) one test from each interaction pattern
TS(IPi) and inserting them into the prioritized sequence.
After selecting one test from each interaction pattern, the
algorithm repeats this process (lines 2-12) until all tests in
all TS(IPi)s are selected. In the next step (lines 13-16), the
algorithm continues with the prioritization with low priority
tests by ordering them randomly. Notice that the algorithm
selects tests randomly from each TS(IPi). In addition, no
assumption is made about the order in which interaction
patterns are processed, i.e., interaction patterns are
randomly ordered for test prioritization.
The presented model-based test prioritization is only one
way tests can be prioritized based on interaction patterns.
One may develop other algorithms to prioritize tests based
on interaction patterns, e.g., tests that “cover” the larger
number of IPs are assigned a higher priority. This is a
research issue that we are planning to investigate in the
future.
5. Measuring effectiveness of early fault
detection
In order to compare different test prioritization methods
an experimental study needs to be performed with different
systems that contain known faults. In this paper, the rate of
fault detection [13] is used as a measure of the effectiveness
of early fault detection. This measure can be used to
evaluate the effectiveness of test prioritization methods for
a given system(s) with known fault(s). Notice that the rate
of fault detection is not used during the process of
prioritizing tests by test prioritization methods, but it is
used only during an experimental study to measure the
effectiveness of individual test prioritization methods. The
experimental study is presented in Section 6.
Test prioritization methods may generate many different
solutions (prioritized test sequences) for a given test suite.
For example, for test suite TS of size N, a random
prioritization generates a prioritized test sequence out of N !
possible test sequences (all possible permutations of tests in
TS). A factor that may influence the resulting prioritized
test sequence is, for example, an order in which tests are
processed during the prioritization process. As a result, a
given prioritization method may generate different
prioritized test sequences with different rates of fault
detection.
Let TS={t1,…, tN} be a test suite of size N and let D(TS)
={d1,…, dL} be a set of faults in the system that are
detected by test suite TS. Let TS(d) be a set of failed tests
caused by fault d  D(TS). Notice that for every fault d in
D(TS), TS(d) can be determined by executing test suite TS
for the system. Let S  ti1 ,..., tiN  be a prioritized
sequence of tests of test suite TS, and let P(S) = <rpS(d1),…,
rpS(dL)> be a list of relative positions of the first failed tests
for all faults in D(TS) for test sequence S. The rate of fault
detection for S can be determined based on P(S) as
discussed in Section 3.
In order to compare different test prioritization methods,
we introduce the concept of the most likely rate of fault
detection that captures an average rate of fault detection
over all possible prioritized sequences that may be
generated by a test prioritization method for a given system
and a test suite. Since the rate of fault detection is based on
the concept of a relative position of the first failed test, we
introduce the concept of the most likely relative position,
RP(d), of the first failed test that detects fault d. Notice that
rpS(d) represents a relative position of the first failed test
that detects fault d in test sequence S, whereas RP(d)
represents an average (most likely) relative position of the
first failed test that detects d for a test prioritization method.
In the next sub-sections we concentrate on determining
analytically RP(d) for test prioritization methods discussed
in this paper. In Section 5.4, we discuss how the most likely
rate of fault detection is computed from values of RP(d).
Let M be the number of all possible prioritized test
sequences that may be generated by a given test
prioritization method for test suite TS. For each fault d in
D(TS) and for each prioritized test sequence S, the position
of the first failed test pS(d) caused by fault d in S can be
determined. Let R(i,d) be a number of prioritized test
sequences that may be generated by a given test
prioritization method for which pS(d)=i, i.e., the first failed
test t  TS(d) caused by fault d is in the ith position. Let
MLP(d) be the most likely (average) position of the first
failed test that detects fault d over all possible prioritized
test sequences that may be generated by a test prioritization
method. The following formula is used to compute MLP(d):
N
MLP ( d ) 
 i  R (i , d )
i 1
(5.1)
M
RP(d), the most likely relative position of the first failed
test that detects d, is computed from MLP(d) as follows:
RP ( d ) 
MLP ( d )
(5.2)
N
For many test prioritization methods, M may be very
large.
Therefore,
determining
precisely
RP(d)
experimentally may be very expensive or even prohibitive.
In this paper, we discuss how RP(d) can be determined
analytically, rather than by observation (experimentally),
for test prioritization methods discussed in this paper. The
analytical approach may significantly reduce the cost of
evaluation of test prioritization methods as opposed to
evaluation by observation. The analytical evaluation
methods are probably most appropriate for test
prioritization methods for which a high degree of
randomness is present or M is large. On the other hand,
evaluation methods based on observation may be more
appropriate for test prioritization methods where the degree
of randomness is small or M is small.
5.1. Random prioritization
In random test prioritization, tests are ordered in random
order. For a test suite of size N, there are N ! possible test
sequences. The most likely position MLPR(d) for the
random prioritization can be precisely computed by the
following formula:
N m 1
m
MLP
R
(d ) 

i 1
 N  m
 (i  1)! ( N  i) !
 i 1 
i
(5.3)
N!
Notice that the summation is from position 1 to N-m+1,
where m = |TS(d)|. The expression inside of the summation,
except i, represents the number of random test sequences
for which the first failed test caused by d is in position i.
RPR(d), the most likely relative position of the first failed
test that detects d, is computed as shown in Formula 5.2.
For example, suppose test suite TS={t1, t2, t3, t4, t5, t6, t7}
consists of 7 tests that detect two faults D(TS)={d1, d2} in a
system. The following tests fail because of individual faults:
TS(d1)={t5} and TS(d2)={t5, t7}. RPR(d1)=0.57 and
RPR(d2)=0.38 are the most likely relative positions for the
random prioritization for faults d1 and d2.
5.2. Selective prioritization
In selective test prioritization tests are divided into two
categories: high priority tests and low priority tests. In test
prioritization all high priority tests are first selected for
execution followed by low priority tests. High priority tests
are ordered using random prioritization. Similarly, low
priority tests are ordered using random prioritization. The
effectiveness of selective test prioritization depends on
whether failed tests are high priority tests or not. More
formally, let TSH be a set of high priority tests and TSL be a
set of low priority tests. Let p, p  m, be a number of failed
tests in TSH caused by fault d, where m = |TS(d)|. Let
MLPR(d,Q,q) be the most likely test position for the random
test prioritization for a test suite of size Q that contains q
failed tests caused by fault d (Formula 5.3).
The most likely position MLPs(d) for the selective
prioritization is computed as follows:
Case I: p1
MLPs(d) = MLPR(d, K, p)
Case II: p=0
MLPs(d) = K + MLPR(d, N-K, m)
where K=|TSH|.
In the first case, it is assumed that TSH contains at least
one failed test caused by fault d. The most likely position
for the selective methods is equivalent to the most likely
position of the random test prioritization for test suite TSH
with p failed tests, i.e., MLPR(d, K, p). In the second case, it
is assumed that TSH does not contain any failed test caused
by defect d, i.e., TSL contains all, m, failed tests. Executing
all high priority tests (K tests) does not uncover fault d.
Only when low priority tests are executed, fault d is
detected. The most likely position in the second case is
equivalent to the most likely position of the random test
prioritization for test suite TSL with m failed tests after all K
high priority tests are executed, i.e., K+ MLPR(d, N-K, m).
RPs(d), the most likely relative position of the first failed
test that detects d, is computed by Formula 5.2.
For example, consider the example of Section 5.1.
Suppose the following high and low priority tests are
determined for TS: TSH ={t1, t4, t5, t7} and TSL ={t2, t3, t6}.
RPs(d1)=0.36 and RPs(d2)=0.24 are the most likely relative
positions for the selective prioritization for faults d1 and d2.
5.3. Model dependence-based prioritization
For the model dependence-based prioritization we were
not able to identify a precise formula for RP(d), the most
likely relative position of the first failed test that detects
fault d. Therefore, we have implemented a randomized
approach of estimation of RP(d). This estimation accepts as
an input, a set of tests associated with each interaction
pattern and a set of failed tests. This information is
collected (computed) during execution of the modified
model on the test suite as presented in Section 4.2. The
estimation randomly generates prioritized test sequences
according to the model dependence-based prioritization of
Figure 7. For each test sequence, the position of the first
failed test for each fault is determined. After a large number
of test sequences is generated, RP(d) for each fault is
computed using Formulas 5.1 and 5.2.
Consider the example of Section 5.1. The following high
priority selective tests are identified TSH ={t1, t4, t5, t7}, and
three interaction patterns are computed with the following
distribution of tests among them: IP1={t4, t5}, IP2={t1, t4,
t7}, IP3={t5, t7}. RPs(d1)=0.31 and RPs(d2)=0.22 are the
most likely relative positions for the model dependencebased prioritization computed by the randomized
estimation.
5.4. Most likely rate of fault detection
In Section 3, the rate of fault detection RFD(S) was
discussed for a prioritized test sequence S. Computation of
RFD(S) depends on a list P(S)=<rpS(d1),…, rpS(dL)> of
positions of first failed tests in S for all faults in D(TS). In
this section, we introduce the most likely rate of fault
detection MLRFD for a test prioritization method. The most
likely rate of fault detection is based on the most likely
relative positions RP(d). More formally, let P=<RP(d1),…,
RP(dL)> be a list of the most likely relative positions of first
failed tests determined for a test prioritization method for
all faults in D(TS). Let F=<RP1,…,RPq> be an ordered (in
ascending order) sequence of all unique most likely relative
positions from P, where q ≤ L. The most likely rate of fault
detection MLRFD for the test prioritization method is
defined as a sequence of pairs (RPi,fdi), MLRFD =
<(RP1,fd1), …,(RPq,fdq)>, where RPi is an element of F, and
fdi represents the cumulative percentage of faults detected
at position RPi (as discussed in Section 3).
For example, suppose test suite TS={t1, t2, t3, t4, t5, t6, t7,
t8, t9, t10} consists of 10 tests that detect four faults
D(TS)={d1, d2, d3, d4} in a system. The following tests fail
because of individual faults: TS(d1)={t5, t7}, TS(d2)={t3, t7,
t9}, TS(d3)={t6} and TS(d4)={t3, t9}. RPR(d1)=0.37,
RPR(d2)=0.28, RPR(d3)=0.55 and RPR(d4)=0.37 are the most
likely relative positions for the random prioritization.
Suppose that during model execution on TS, the following
high priority tests are identified for the selective
prioritization: TSH ={t1, t3, t4, t6, t7, t9}. RPs(d1)=0.35,
RPs(d2)=0.18, RPs(d3)=0.35 and RPs(d4)=0.23 are the most
likely relative positions for the selective prioritization. The
most likely rates of fault detection for the random
prioritization and the selective prioritization are shown in
the table below:
Random
Selective
fd: % of detected
faults
RP: Test suite fraction
fd: % of detected
faults
RP: Test suite fraction
25%
75%
100%
0.28
25%
0.37
50%
0.55
100%
0.18
0.23
0.35
In order to compare most likely rates of fault detection
for different test prioritization methods, we may use a
weighted average of the percentage of faults detected,
APFD, as discussed in Section 3 (Formula 3.3). For two
most likely rates of fault detection shown in the table,
APFDR=68.8% and APFDs=78.1%. In this example, the
selective prioritization leads to a higher most likely rate of
fault detection than the random prioritization.
using specification-based testing methods, i.e., equivalence
class partitioning and boundary-value analysis, and modelbased testing, i.e., transition coverage, and partial path
coverage. Each test suite contains also a small number of
randomly generated test cases. The sizes of test suites range
from 790 to 980 test cases. Each implementation was tested
and debugged for its test suite until all tests passed.
In order to measure the effectiveness of early fault
detection of different test prioritization methods, we created
incorrect models. We seeded faults into models and then
made appropriate changes to the corresponding systems
(implementations). In the experiment, we seeded only one
fault into the model. We were interested only in faults that
cause a small number of tests to fail. Therefore, we selected
only those faults for which the number of failed tests ranges
from 1 to 10 tests. For each model, we have identified nine
seeded faults. For each model with a seeded fault and
corresponding implementation, we measured the most likely
relative position of the first failed test RP for each test
prioritization method under study. Notice that in modelbased test prioritization, the correct model was considered
as an original model and a faulty model was considered as a
modified model.
ATM
Cruise Control
0.6
0.4
0.2
R
S1
S2
Fuel Pump
IP
R
IP
R
S1
S2
All models
IP
6. Experimental Study
0.6
The goal of the experiment study is to compare the
effectiveness of early fault detection of test prioritization
methods presented in this paper: random prioritization,
selective prioritization (Version I and II), and model
dependence-based prioritization. We used RP(d), the most
likely relative position of the first failed test that detects
fault d, as the measure of the effectiveness of early fault
detection. In the experimental study, we concentrated on
model faults.
For the experiment, we have created three system
models: ATM model, cruise control model, and fuel pump
model. The sizes of models range from 7 to 13 states and
20 to 28 transitions. For each model, the corresponding
system was implemented in C language. The sizes of these
implementations range from 600 to 800 lines of source
code. For each implementation, we have created test suites
0.4
0.2
R
R:
S1:
S2:
IP:
S1
S2
S1
S2
IP
Random prioritization
Selective prioritization – Version I
Selective prioritization – Version II
Model dependence-based prioritization
Figure 8. RP boxplots for the experimental study
The results of the experiment are shown in Figure 8 that
presents boxplots of the RP values for the four test
prioritization methods for three models and an-all model
total. The presented results indicate that model-based test
prioritization may improve the effectiveness of test
prioritization for the Version II of selective prioritization
and the model dependence-based test prioritization.
However, the results for the Version I of selective
prioritization are mixed. In several cases, this test
prioritization method performs much worse than the
random prioritization. This is caused by the fact that
monitoring only “modified” transitions in the modified
model may not be sufficient for effective test prioritization.
On the other hand, the Version II of selective prioritization
monitors also execution of “deleted” transitions that results
in a significant improvement in effectiveness of test
prioritization. The model dependence-based test
prioritization, although a little more expensive compared to
the Version II of selective prioritization, may lead to
improvement in the effectiveness of test prioritization. This
may be attributed to the fact that more information about
the model behavior is collected that may improve the
effectiveness of test prioritization.
7. Conclusions
In this paper, we have presented model-based test
prioritization methods in which the information about the
system model and its behavior is used to prioritize the test
suite for system retesting. In addition, we presented an
analytical framework for comparison of test prioritization
methods with respect to the effectiveness of early fault
detection. In the experimental study, we investigated the
presented test prioritization methods with respect to their
effectiveness of early fault detection. The results from the
experiment are promising and suggest that system models
may improve the effectiveness of test prioritization. Modelbased test prioritization may be a good complement to the
existing code-based test prioritization methods [13].
The experimental study presented in this paper was
relatively small. In the future research, we plan to perform
an experimental study on larger models and systems to have
better understanding of the advantages and limitations of
model-based test prioritization. In addition, we plan to
perform an experimental study in which we will investigate
effectiveness of model-based test prioritization for faults in
implementations of model changes in the system (these are
code-based faults related to implementation of model
changes). We also plan to investigate a possible synergy of
code-based and model-based test prioritization methods.
8. References
[1] S. Beydeda, V. Gruhn, “An Integrated Testing Technique for
Component-Based Software,” Proc. IEEE Computer Systems and
Applications International Conference, pp. 328 –334, 2001.
[2] K. Cheng, A. Krishnakumar, “Automatic Functional Test
Generation Using The Extended Finite State Machine Model,”
Proc. ACM/IEEE Design Automation Conf., pp. 86-91, 1993.
[3] Y. Chen, D. Rosenblum, K. Vo, “Testtube: A System for
Selective Regression Testing,” Proc. IEEE International
Conference on Software Engineering, pp. 211-220, 1994.
[4] J. Dick, A. Faivre, “Automating the Generation and
Sequencing of Test Case from Model-Based Specification,” Proc.
International Symposium on Formal Methods, pp. 268-284, 1992.
[5] R. Dssouli, K. Saleh, E. Aboulhamid, A. En-Nouaary, C.
Bourhfir, “Test Development For Communication Protocols:
Towards Automation,” Computer Networks, 31, pp.1835-1872,
1999.
[6] R. Gupta, M. Harrold, M. Soffa, “An Approach to Regression
Testing Using Slices,” Proc. IEEE International Conference on
Software Maintenance, pp. 299-308, 1992.
[7] B. Korel, A. Al-Yami, “Automated Regression Test
Generation,” Proc. ACM International Symposium on Software
Testing and Analysis, pp. 143-152, 1998.
[8] B. Korel, L. Tahat, B. Vaysburg, “Model Based Regression
Test Reduction Using Dependence Analysis,” Proc. IEEE
International Conference on Software Maintenance, pp. 214-223,
2002.
[9] J. Loyall, S. Mathisen, P. Hurley, J. Williamson, “Automated
Maintenance of Avionics Software”, Proc. IEEE Aerospace and
Electronics Conference, pp.508-514, 1993.
[10] G. Rothermel, M. Harrold, “Selecting Tests and Identifying
Test Coverage Requirements for Modified Software,” Proc. IEEE
International Conference on Software Maintenance, pp. 358-367,
1994.
[11] G. Rothermel, M. Harrold, “A Safe, Efficient Regression
Test Selection Technique,” ACM Transactions on Software
Engineering & Methodology, 6(2), pp. 173-210, 1997.
[12] G. Rothermel, R. Untch, C. Chu, M. Harrold, “Test Case
Prioritization: An Empirical Study,” Proc. IEEE International
Conference on Software Maintenance, pp. 179-188, 1999.
[13] G. Rothermel, R. Untch, M. Harrold, “Prioritizing Test Cases
For Regression Testing,” IEEE Transactions on Software
Engineering, vol. 27, No. 10, pp. 929-948, 2001.
[14] B. Sherlund, B. Korel, “Modification Oriented Software
Testing,” Proc. Quality Week, pp. 1-17, 1991.
[15] W. Tsai, X. Bai, R. Paul, L. Yu, “Scenario-Based Functional
Regression Testing,” Proc. IEEE International Computer Software
and Applications Conference, pp. 496-501, 2001.
[16] B. Vaysburg, L. Tahat, B. Korel, “Dependence Analysis in
Reduction of Requirement Based Test Suites,” Proc. ACM
International Symposium on Software Testing and Analysis, pp.
107-111, 2002.
[17] L. White, “Test Manager: A Regression Testing Tool,” Proc.
IEEE International Conference on Software Maintenance, pp.
338-347, 1993.
[18] W. Wong, J. Horgan, S. London, H. Agrawal, “A Study of
Effective Regression Testing in Practice,” Proc. International
Symposium on Software Reliability Eng., pp. 230-238, 1997.
Download