Pir Mehr Ali Shah
Synopsis for MS Degree in Computer Science
Title: APPLICATION OF SEARCH ALGORITHMS FOR MODEL BASED
REGRESSION TESTING
Name of the Student: Sidra Noureen
Registration Number: 06-arid-310
Date of Admission: 27 th September, 2010
Date of Initiation: 22 nd
Sep, 2011
Probable Duration: One year
SUPERVISORY COMMITTEE i) Supervisor _______________
Dr. Sohail Asghar ii) Member ________________
Mr. Yasir Hafeez ii) Member ________________
Mr. Nasir Mehmood Minhas
Director,
University Institute of Information Technology
Director
Advanced Studies
Testing is an important phase of quality control in Software development.
Software testing is necessary to produce highly reliable systems. The use of a model to describe the behavior of a system is a proven and major advantage to test. The term model-based testing refers to test case derivation from a model representing software behavior. Model-based testing (MBT) has recently gained attention with the popularization of models (including UML) in software design and development.
A lot of work is done on model-based regression testing using search algorithms. With the help of MBT, we can automatically generate test cases. But, MBT when applied on large industrial systems, there is problem to select a subset of test cases from entire test suite as it is difficult to execute the huge number of test cases generated. We will try to design a multiobjective genetic algorithm based test case selection technique to select a subset of test cases.
The Unified Modeling Language (UML) is a graphical language for visualizing, specifying, constructing, and documenting the artifacts of a software-intensive system. UML models consider as a defacto standard in software engineering industry.
UML came up as a solution when the software industry felt desperate need for standardization and utilization of design methodologies. Using UML models, testing process becomes efficient and easy as compared to test thousands lines of code.
Testing process is use to ensure that the script written actually does what it is
1
supposed to do. Testing done using UML models is known as model based testing.
Model Based Testing is software testing in which test cases are generated in whole or in part from a model that describes some (usually functional) aspects of the system under test (SUT). Some potential benefits of MBT are: (i) with help of MBT, testing can be performed successfully, (ii) as model gives same meaning to all, they are shareable and reusable so they give a unifying point of reference. There are many types of testing. Some of them are white box, black box, integration testing, unit testing, alpha testing, beta testing, load testing, acceptance testing, regression testing etc. Our point of focus is regression testing. In regression testing any change if made to software, that change is identified. Then the impact of change is analyzed and it is ensured that it does not affect the original functionality of the system. In regression testing, not all of the test cases are retested, only some are tested again. Regression testing is advantageous as we can selectively perform it. Whenever a software is modified, regression testing is performed to check that after modification the software still performs its functionality correctly.
Many researchers have worked on model based testing. Search algorithms have also been employed .
The reason being the cost of testing is almost fifty percent of the original cost of test suite (B.Beizer, 1990). In order to reduce this cost we need to automate the process. One strategy for automation is application of metaheuristic search (MHS) algorithms (I.H Osman, 1996). MHS algorithms are a set of generic algorithms that are used to find optimal or near optimal solutions to problems that have large complex search spaces (J.Clarke, 2003). Local search algorithms and metaheuristic search algorithms are sub categories of search algorithms. Local search algorithms include hill climbing and tabu search while meta-heuristic search algorithms include simulated annealing and genetic algorithm. In hill climbing, we try
2
to move to a neighbor having highest value so that we can reach some optimal solution. In simulated annealing, we escape some local maxima by allowing some bad moves but gradually decrease their size and frequency. Tabu search uses a local or neighborhood search procedure to iteratively move from a solution x to a solution x ' in the neighborhood of x , until some stopping criterion has been satisfied (P.McMinn,
2011). Genetic algorithm is the most used search algorithm. It is preferred algorithm because of its simplicity and good results. GA doesn’t get trapped in local optima.
Many of the search algorithms exist that are used earlier like hill climbing, simulated annealing (S. Ali, 2008). The problem with these algorithms is that they get trapped into local optimum (D. A. Cooley, 1997). Although search algorithms have been used for white box testing, but their use in model based testing is limited. Simple search strategies may not be sufficient when dealing with complexities such as loops etc. To deal with this problem metaheuristic search algorithms are used as they implement global search and are less likely to be trapped into local optima (S.Ali,
2008). Among search algorithms, Genetic algorithm is widely used and has the ability to search the solution as well as evolve the new generation.
Researchers have been using model based testing for test case generation process along with the use of search algorithms for optimization purpose.
Search algorithms have been applied on different models. Hadi Hemmati has worked on selecting test cases that are similar. But search algorithms have not yet been used for the situation where we need to deal with the modified part of the model.
3
The purpose of this work is to apply suitable search algorithm on model based regression testing and optimizing those test cases which are relevant to modified part.
The literature studied shows much of the work already done on generating test cases using different UML models. According to Habib Youssef et.al (2001), an evolutionary algorithm is such an algorithm whose design is inspired by mechanisms of evolution that take place in nature. Author has given a comparative analysis of different evolutionary algorithms like Tabu Search, Simulated Annealing and Genetic algorithm. These algorithms have many similarities but they also differ in some strategies for searching the optimum solution. The results show that for the three heuristics which have been applied on the mentioned algorithms, Tabu Search performs the best with GA nearly closer to the results and Simulated Annealing is the at the last. There is further need for performing experiments to evaluate the difference between the performances of these algorithms.
According to Bogdan Korel (2002), model based testing is a system testing technique used to test software systems. Author uses EFSM (extended finite state machine). As soon as specifications change, model of the system changes. The approach described in the paper automatically detects the difference between original and modified models as a set of elementary model modifications. For every elementary modification, interaction patterns are used on basis of EFSM dependence analysis. The use of regression testing reduces the size of test suite. They represent any change in a transition by a pair of EMs, namely deletion of the existing transition and addition of the replacing transition. But, unless the starting or terminating state of
4
a transition needs to be changed, expressing any other change in a transition by a pair of deletion and addition becomes costly.
Miller et al. (2006) worked on test case generation using genetic algorithms and program dependence graphs. The results show that 1) in simple programs there is little difference in the results (branch coverage) between RS and their proposed GA approach (TDGen). 2) The difference is seen in larger programs, where a much smaller number of generations are required to achieve 100% branch coverage. 3) It is also observed that for some SUTs, TDGen can achieve 100% branch coverage, where
RS and GADGET cannot.
Xiao et al.
(2007) have empirically evaluated different MHS algorithms and random search (RS) for test data generation. Their results show that GA performs better than all other algorithms including random search. After GA, SA performed better in terms of both cost (number of SUT executions) and effectiveness (condition decision coverage). Results show that ET is consistently the best performer. However, this study is applied on small scale featuring test objects of limited complexity.
Harman et al. (2007) investigated the relationship between the size of the search space (consisting of test inputs) and the performance of search algorithms measured as the number of fitness evaluations to cover a branch. The results show that: 1) there is no relationship between search space reduction and reduction in cost for random search. 2) There is significant improvement in cost reduction for both hill climbing and the genetic algorithm. 3) The reduction in cost is more for the genetic algorithm than for hill climbing. 4) There is no relationship between search space
5
reduction and search effectiveness in terms of coverage for any of the search algorithms.
Atifah Ali et al. (2007) have presented a methodology for regression testing of the UML models. Previously much of the work done on the code based regression testing but testing based on models give results that are more efficient. Considering
UML class diagram and sequence diagram author has generated an extended concurrent control flow graph (ECCFG) for original version of the models, author has also given modified versions of the model, then compared those models and selected the test cases from original test suite which need to be rerun. According to the approach used, considering the events in sequence diagram, the attributes of the object
(which receives the messages) captured from class diagram and pre and post conditions of that object checked from the operation contracts in order to verify if there is any change as this change will affect the object. Based on this observation a graph is constructed and compared with original graph in order to detect changes.
S.Ali et al. (2008) provide an empirical investigation of search based test case generation. This paper presents the results of a systematic, comprehensive review that aims at characterizing how empirical studies designed to investigate SBST costeffectiveness. Paper also tells what empirical evidence is available in the literature regarding SBST cost-effectiveness and scalability. The results show that the number of papers, which contain well-designed and reported empirical studies in the domain of test case generation using SBST, is very small. So, there is a limited body of credible evidence that demonstrates the usefulness of SBST techniques for test case
6
generation. This evidence, however, consistently shows that the genetic algorithms outperform random search in terms of structural coverage.
Leila Naslavsky et al. (2009) have presented an approach and a prototype that helps to trace the elements of the models with test cases. The traceability relationships developed between model elements and test cases support model based regression test selection.
L.C. Briand et al.
(2009) have presented a methodology and created a tool and test cases have been selected using regression testing approach on basis of change analysis. They have classified regression test cases as obsolete, retestable and reuseable. UML sequence diagram employed and a formal mapping between changes in the design is proposed. The technique used is efficient in test selection and gives good support to perform regression testing as earlier as possible. However, the technique does not accommodate changes to external entities such as databases and configuration files.
Qurat-ul-ann et al. (2010) have presented state based testing approach in order to test the system behavior. A tool namely START is developed for regression testing of state-based systems. Any change among original and modified versions of class diagram or state machine is checked with class diagram comparator and state diagram comparator respectively. If any attribute in a class changes, its affect is reflected in state machine and then affected test cases are checked with help of regression test selector. The specified tool consists of a parser9 that reads XMI of class diagram and state machine, comparator (that compares original and modified versions of class
7
diagram and state machine) and a test suite analyzer. But RTS based on UML designs are not as precise when compared to detailed code analysis based techniques
Nan Ye et al. (2011) have presented an automatic regression test selection based on activity diagrams. The approach used automatically generates test cases. The approach combines technique that uses activity diagrams for regression testing and technique of feedback-directed test case generation for new behaviors in changed software. The technique is efficient as it generates test cases automatically but in previous techniques, the generated test cases were represented as sequence of actions in abstract models. The results of the experiments show that approach is efficient in generating test cases and reducing the cost of testing effort.
According to Hadi et al. (2011) test cases are selected on similarity basis.
Author has used state machine diagram for test selection purpose. Genetic algorithm is applied for selecting the optimal test cases. The approach has been applied on industrial level and results show that the approach is effective in detecting faults. The results show that approach has reduced cost of detecting faults up to 73% in one second of time. But, the results of the approach rely on one industrial case study, it should be replicated as many times as possible.
Our intended research work starts with the complete survey for the exploration of model based testing, regression testing and search algorithms.
Model based testing when applied on large industrial systems generates large set of test cases, which are difficult to execute within time and cost. We propose to design a suitable similarity
8
based test case selection technique for selecting a subset of test cases instead of executing whole test suite. After a careful investigation of the work that has been done previously, we will focus on selecting test cases for changed part of model and then optimizing results using appropriate search algorithm.
Many techniques related to regression testing exist like data flow technique, control flow technique, safe regression test selection techniques etc. For the purpose of regression testing, appropriate technique will be used. Later we study other domains and approaches that can provide help in understanding our work. We will see that how test cases generated from UML models and how this process can be automated.
At last, we will recommend/propose improvements for the deficiencies of existing models and present some work that establishes the standard guidelines for effective model based regression testing.
After studying and caring analysis of the selected papers the outcomes have shown where model based testing approaches have been applied, the characteristics, and the limitations.
At last, the results will be shown which can help novice researchers to judge how worthful this work is.
We will try to design a multiobjective genetic algorithm based test case selection technique to select a subset of test cases.
The first step is to generate test paths. Then a subset of test paths will be selected on similarity basis. In addition, multiobjective approach of genetic algorithm will be applied.
9
Ali, S., L. Briand, H. Hemmati and R. Panesar-Walawege. 2008. A Systematic
Review of the Application and Empirical Investigation of Search-based Test-
Case Generation, TSESI.
Ali, A., A. Nadeem, Z. Iqbal and M. Usman. 2007. Regression Testing based on UML
Design Models , 13th IEEE International Symposium on Pacific Rim
Dependable Computing.
Briand, L., Y. Labiche and S. He. 2008. Automatic regression test selection based on
UML design, Elsevier.
Cooley, D. 1999. An introduction to genetic algorithm for scientists and engineers
Farooq, Q., Z. Iqbal, Z. Malik and M. Riebisch. 2010. A Model-Based Regression
Testing Approach for Evolving Software Systems with Flexible Tool Support,
17th IEEE International Conference and Workshops on Engineering of
Computer-Based Systems.
Youssef, H., S. Sait and H. Adiche. 2001. Evolutionary algorithms, simulated annealing and tabu search: a comparative study, Engineering Applications of
Artificial Intelligence, PP. 167-181.
Harman, M. and P. McMinn. 2007. A theoretical empirical analysis of evolutionary testing and hill climbing for structural test data generation. International
Symposium on Software Testing and Analysis (ISSTA '07) London, United
King-dom, ACM.
10
Hemmati, H., L. Briand, A. Arcuri and S. Ali. 2011. An enhanced test case selection approach for model based testing: an industrial case study.
Korel, B., L. Tahat and B. Vaysburg. 2002. Model Based Regression Test Reduction
Using Dependence Analysis, IEEE ICSM.
Miller, J., M. Reformat and H. Zhang. 2006. Automatic test data generation using genetic algorithm and program dependence graphs, Information and Software
Technology, PP. 586-605. Volume no. 48.
Naslavsky, L., H. Ziv and D. Richardson. 2009. A Model-Based Regression Test
Selection Technique, IEEE ICSM.
Xiao, M., M. El-Attar, M. Reformat and J. Miller. 2007. Empirical evaluation of optimization algorithms when used in goal-oriented automated test data generation techniques, Empirical Software Engineering, PP.183-239. Volume no.12.
Ye, N., X. Chen, P. Jiang, W. Ding and X. Li. 2011. Automatic Regression Test
Selection based on Activity Diagrams, Fifth International Conference on
Secure Software Integration and Reliability Improvement Companion.
11