presentation - Department of Computer Science and Engineering

advertisement
Character String Predicate Based Automatic
Software Test Data Generation
Ruilian Zhao
Michael R. Lyu
Computer Science Dept.
Beijing University of
Chemical Technology
Rlzhao@mail.buct.edu.cn
Computer Science Dept.
Chinese University of Hong Kong
Lyu@cse.cuhk.edu.hk
Outline
Introduction
An overview of related work
Test data generation based on character string
predicate
Experimental results
Conclusion
Introduction
Software testing is usually difficult, expensive and time consuming.
If test data could be automatically generated,
the cost of software testing would be significantly reduced.
Introduction
There are many automatic test data generation approaches.
But, little attention has been paid to the problem of
test data generation for programs
whose predicates can contain character string variables.
Introduction
Character string is an important element in programming.
So,how to generate test data of character string
is a problem that needs further research.
Here, we present an approach to
automatically generate test data for program paths
that include character string predicates, and
a corresponding test data generator is developed.
Introduction
The effectiveness of test data generator
is examined on a number of programs.
The experimental results illustrate that
the test data generator is effective.
An overview of related work
1. Predicate-based testing
Predicate testing is a common approach to
software testing, which requests each predicate
in the program under test to be checked.
There are a lot of predicate testing strategies.
However, they demand that predicates in tested programs
must be numerical predicates.
An overview of related work
2. Test data generation
There are many automatic test data generation approaches.
For example,
Random test data generation
Symbolic execution-based test data generation
Dynamic test data generation
An overview of related work
Some systems are developed by using testing techniques
to generate test data of integer, real or float types.
However, they do not generate test data of character string.
Test data generation based on
character string predicate
The goal of test data generation is
to find a program input on which
a chosen program path will be traversed.
This problem can be reduced to a sequence of subgoals
where each subgoal is solved by
performing function minimization using gradient descent.
Test data generation based on
character string predicate
We focus on how to automatically generate test data
for program paths that include character string predicates.
A character string predicate is the predicate
that consists of at least one character string variable
and one character string comparison function.
Test data generation based on
character string predicate
Similarly to the numerical predicate,
we can construct a branch function with regard to
a character string predicate, which is not take the requirement
branch, so that its value is positive for initial input x0.
For example, strcmp(str1,str2) > 0
Let (x)=str1-str2 , if str1 - str2 is positive for initial input x0,
otherwise (x)=str2-str1 .
The current values of str1 and str2 in this predicate can be
calculated by using program instrumentation technique.
Test data generation based on
character string predicate
The program input is adjusted gradually
until (x) becomes negative.
The required inputs have been found,
namely, the predicate takes the requirement branch.
A problem that we must resolve is
how to compare two character strings as well as
how to evaluate the branch function (x) .
Test data generation based on
character string predicate
So, we first define a function ‫ع‬
L 1
 ( str )   str[i ]  w
L i 1
i 0
where str is a character string, L is its length,
w L i 1 is a positive weighting factor representing
a weighted value imposed upon each character element
of the string, and w is equal to 128.
N(str
)

Test data generation based on
character string predicate
Theorem: Suppose S is a set of character strings,
N  is a set of nonnegative integers.
Let  (str ) is defined as above.
Then  (str )is a one-to-one function from S to N  .
By the theorem, a character string can be
transformed into a unique nonnegative integer.
N(str
)

Test data generation based on
character string predicate
The distance between two strings can be defined as below:
dis( str1 , str2 )

L1 1

i 0
str1[i ]  w Li 1 
L2 1

str2 [i ]  w Li 1
i 0
Where L1 and L2 are the length of string str1, str2,
L=max(L1,L2).
The distance dis(str1,str2) determines a nonnegative integer,
and can be used to evaluate the branch function (x)
with regard to a character string predicate.
N
Test data generation based on
character string predicate
It is easy to see that
str1[0]  str2 [0]  w L1 >
L1

( max( str1[i ], str2 [i ]) )  w Li 1
i 1
by the verification of above theorem
We search an appropriate adjustment direction for
the 0th character of an input variable,
and adjust the character by gradient descent until 0<0.
As a result, we can find an input that makes
the string predicate to take the requirement branch.
N(str
)

Test data generation based on
character string predicate
For an equality (=) or non-equality (≠) predicate ,
we need to construct branch functions
for every unequal character such that i >0,
where i[0,L], L=max(L1,L2)
Then, we search an adjustment direction to
improve the branch function until i  0.
Experimental results
Int max(int argc,char ** argv)
{
argc--;
argv++;
if ((argc>0)&&('-'==**argv))
{ if (!strcmp(argv[0],"-ceiling"))
{ strncpy(ceiling,argv[1],BUFSIZE);
argv++; argv++;
argc--; argc--;
}
else
Which prints the lexicographic
{ fprintf(stderr,"Illegal option %s.\n",argv[0]);
maximum of command-line arguments.
return(2);
}
}
There is one option:-ceiling
if(argc==0)
{ fprintf(stderr,"Max requires at least one argument.\n");
return(2);
This provides a ceiling:
}
If the maximum would be larger than
for(;argc>0;argc--,argv++)
{ if(strcmp(argv[0],result)>0);
this specified ceiling, it is the maximum.
strncpy(result,argv[0],BUFSIZE);
}
if (strcmp(ceiling,result)<=0)
printf("\n max:%s",ceiling);
else
printf("\n max:%s",result);
return(0);
}
The specification:
Experimental results
Int max(int argc,char ** argv)
record (argc,0,'>',"&&");
{
record('-',**argv, '=');
argc--;
argv++;
if ((argc>0)&&('-'==**argv))
if ((argc>0)&&('-'==**argv))
{ record(argv[0],"-ceiling", '!');
{ if (!strcmp(argv[0],"-ceiling"))
{ strncpy(ceiling,argv[1],BUFSIZE);
if (!strcmp(argv[0],"-ceiling"))
argv++; argv++;
…;
argc--; argc--;
}
}
else
record(argc,0,'=',"");
{ fprintf(stderr,"Illegal option %s.\n",argv[0]);
if(argc==0)
return(2);
}
…;
}
record(argc,0,'>',"");
if(argc==0)
{ fprintf(stderr,"Max requires at least one argument.\n");for(;argc>0;argc--,argv++)
return(2);
{ record(argv[0],result, '>', "");
}
for(;argc>0;argc--,argv++)
if (strcmp(argv[0],result)>0)
{ if(strcmp(argv[0],result)>0);
…;
strncpy(result,argv[0],BUFSIZE);
record(argc,0,'>',"");
}
if (strcmp(ceiling,result)<=0)
}
printf("\n max:%s",ceiling);
record(ceiling,result, '-', "");
else
printf("\n max:%s",result);
if (strcmp(ceiling,result)<=0)
return(0);
…;
}
Experimental results
Considering that the FOR loop is executed 0 time, 1 time
and 2 times, there are 31 paths in Max program.
We design 50 program inputs at random,
which are used as the original input to the test data generator.
As a result,
16 test inputs are generated by the test data generator.
Experimental results
We measure the coverage of generated test data
using the ATAC coverage testing tool.
120
Coverage
100
block
80
decision
60
C-use
40
P-use
20
0
1
3
5
7
9 11 13 15
Path
Experimental results
Evaluation Number
Compare the evaluation number of branch function
in the gradient descent, the gradual descent and
the random-number test data generator
under the same coverage.
2000
1800
1600
1400
1200
1000
800
600
400
200
0
Gradient descent
Gradual descent
Random-number
Average
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16
Path
The gradient descent test data generator
is more economical
than the gradual descent and the random generator.
Conclusion
To our knowledge,
this is the first automatic test data generation approach
based on character string predicates.
The preliminary experimental results show
that the methodology is effective.
Download